arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2862
2603.06924 2026-03-10 cs.RO

LIPP: Load-Aware Informative Path Planning with Physical Sampling

Hojune Kim, Guangyao Shi, Gaurav S. Sukhatme

详情
英文摘要

In classical Informative Path Planning (C-IPP), robots are typically modeled as mobile sensors that acquire digital measurements such as images or radiation levels. In this model - since making a measurement leaves the robot's physical state unchanged - traversal costs are determined solely by the path taken. This is a natural assumption for many missions, but does not extend to settings involving physical sample collection, where each collected sample adds mass and increases the energy cost of all subsequent motion. As a result, IPP formulations that ignore this coupling between information gain and load-dependent traversal cost can produce plans that are distance-efficient but energy-suboptimal, collecting fewer samples and less data than the energy budget would permit. In this paper, we introduce Load-aware Informative Path Planning (LIPP ), a generalization of C-IPP that explicitly models this coupling and the resulting order-dependent traversal costs. We formulate LIPP as a Mixed-Integer Quadratic Program (MIQP) that jointly optimizes routing, visitation order, and per-location sampling count under an energy budget. We show that LIPP strictly generalizes C-IPP: as sample unit mass $λ\to 0$, the load-dependent energy model reduces exactly to the classical distance budget constraint, recovering C-IPP as a special case. We further derive theoretical bounds on the path-length increase of LIPP relative to C-IPP, characterizing the trade-off for improved energy efficiency. Finally, through extensive simulations across 2000 diverse mission scenarios, we demonstrate that LIPP matches the behavior of C-IPP at zero sample mass and progressively achieves higher uncertainty reduction per unit energy as sample mass increases.

2603.06923 2026-03-10 cs.CL

Reforming the Mechanism: Editing Reasoning Patterns in LLMs with Circuit Reshaping

Zhenyu Lei, Qiong Wu, Jianxiong Dong, Yinhan He, Emily Dodwell, Yushun Dong, Jundong Li

详情
英文摘要

Large language models (LLMs) often exhibit flawed reasoning ability that undermines reliability. Existing approaches to improving reasoning typically treat it as a general and monolithic skill, applying broad training which is inefficient and unable to target specific reasoning errors. We introduce Reasoning Editing, a paradigm for selectively modifying specific reasoning patterns in LLMs while preserving other reasoning pathways. This task presents a fundamental trade-off between Generality, the ability of an edit to generalize across different tasks sharing the same reasoning pattern, and Locality, the ability to preserve other reasoning capabilities. Through systematic investigation, we uncover the Circuit-Interference Law: Edit interference between reasoning patterns is proportional to the overlap of their neural circuits. Guided by this principle, we propose REdit, the first framework to actively reshape neural circuits before editing, thereby modulating interference between reasoning patterns and mitigating the trade-off. REdit integrates three components: (i) Contrastive Circuit Reshaping, which directly addresses the generality-locality trade-off by disentangling overlapping circuits; (ii) Meta-Contrastive Learning, which extends transferability to novel reasoning patterns; and (iii) Dual-Level Protection, which preserves preexisting abilities by constraining reshaping update directions and regularizing task-level predictions. Extensive experiments with Qwen-2.5-3B on propositional logic reasoning tasks across three difficulty levels demonstrate that REdit consistently achieves superior generality and locality compared to baselines, with additional validation in mathematics showing broader potential. Our code is available at https://github.com/LzyFischer/REdit.

2603.06921 2026-03-10 cs.RO cs.LG cs.SY eess.SY

CN-CBF: Composite Neural Control Barrier Function for Safe Robot Navigation in Dynamic Environments

Bojan Derajić, Sebastian Bernhard, Wolfgang Hönig

详情
英文摘要

Safe navigation of autonomous robots remains one of the core challenges in the field, especially in dynamic and uncertain environments. One of the prevalent approaches is safety filtering based on control barrier functions (CBFs), which are easy to deploy but difficult to design. Motivated by the shortcomings of existing learning- and model-based methods, we propose a simple yet effective neural CBF design method for safe robot navigation in dynamic environments. We employ the idea of a composite CBF, where multiple neural CBFs are combined into a single CBF. The individual CBFs are trained via the Hamilton-Jacobi reachability framework to approximate the optimal safe set for single moving obstacles. Additionally, we use the residual neural architecture, which guarantees that the estimated safe set does not intersect with the corresponding failure set. The method is extensively evaluated in simulation experiments for a ground robot and a quadrotor, comparing it against several baseline methods. The results show improved success rates of up to 18\% compared to the best baseline, without increasing the conservativeness of the motion. Also, the method is demonstrated in hardware experiments for both types of robots.

2603.06920 2026-03-10 cs.CV

DLRMamba: Distilling Low-Rank Mamba for Edge Multispectral Fusion Object Detection

Qianqian Zhang, Leon Tabaro, Ahmed M. Abdelmoniem, Junshe An

Comments Has been submitted to the IEEE TGRS journal

详情
英文摘要

Multispectral fusion object detection is a critical task for edge-based maritime surveillance and remote sensing, demanding both high inference efficiency and robust feature representation for high-resolution inputs. However, current State Space Models (SSMs) like Mamba suffer from significant parameter redundancy in their standard 2D Selective Scan (SS2D) blocks, which hinders deployment on resource-constrained hardware and leads to the loss of fine-grained structural information during conventional compression. To address these challenges, we propose the Low-Rank Two-Dimensional Selective Structured State Space Model (Low-Rank SS2D), which reformulates state transitions via matrix factorization to exploit intrinsic feature sparsity. Furthermore, we introduce a Structure-Aware Distillation strategy that aligns the internal latent state dynamics of the student with a full-rank teacher model to compensate for potential representation degradation. This approach substantially reduces computational complexity and memory footprint while preserving the high-fidelity spatial modeling required for object recognition. Extensive experiments on five benchmark datasets and real-world edge platforms, such as Raspberry Pi 5, demonstrate that our method achieves a superior efficiency-accuracy trade-off, significantly outperforming existing lightweight architectures in practical deployment scenarios.

2603.06919 2026-03-10 cs.RO

SurgSync: Time-Synchronized Multi-Modal Data Collection Framework and Dataset for Surgical Robotics

Haoying Zhou, Chang Liu, Yimeng Wu, Junlin Wu, Zijian Wu, Yu Chung Lee, Sara Martuscelli, Spetimiu E. Salcudean, Gregory S. Fischer, Peter Kazanzides

Comments Accepted By International Conference on Robotics and Automation (ICRA), IEEE, 2026. More details can be found at https://surgsync.github.io/

详情
英文摘要

Most existing robotic surgery systems adopt a human-in-the-loop paradigm, often with the surgeon directly teleoperating the robotic system. Adding intelligence to these robots would enable higher-level control, such as supervised autonomy or even full autonomy. However, artificial intelligence (AI) requires large amounts of training data, which is currently lacking. This work proposes SurgSync, a multi-modal data collection framework with offline and online synchronization to support training and real-time inference, respectively. The framework is implemented on a da Vinci Research Kit (dVRK) and introduces (1) dual-mode (online/offline-matching) synchronized recorders, (2) a modern stereo endoscope to achieve image quality on par with clinical systems, and (3) additional sensors such as a side-view camera and a novel capacitive contact sensor to provide ground truth contact data. The framework also incorporates a post-processing toolbox for tasks such as depth estimation, optical flow, and a practical kinematic reprojection method using Gaussian heatmap. User studies with participants of varying skill levels are performed with ex-vivo tissue to provide clinically realistic data, and a network for surgical skill assessment is employed to demonstrate utilization of the collected data. Through the user study experiments, we obtained a dataset of 214 validated instances across multiple canonical training tasks. All software and data are available at surgsync.github.io.

2603.06918 2026-03-10 cs.RO

T2Nav Algebraic Topology Aware Temporal Graph Memory and Loop Detection for ZeroShot Visual Navigation

Quang-Anh N. D., Duc Pham, Minh-Anh Nguyen, Tung Doan, Tuan Dang

Journal ref EEE International Conference on Robotics & Automation 2026 (ICRA)

详情
英文摘要

Deploying autonomous agents in real world environments is challenging, particularly for navigation, where systems must adapt to situations they have not encountered before. Traditional learning approaches require substantial amounts of data, constant tuning, and, sometimes, starting over for each new task. That makes them hard to scale and not very flexible. Recent breakthroughs in foundation models, such as large language models and vision language models, enable systems to attempt new navigation tasks without requiring additional training. However, many of these methods only work with specific input types, employ relatively basic reasoning, and fail to fully exploit the details they observe or the structure of the spaces. Here, we introduce T2Nav, a zeroshot navigation system that integrates heterogeneous data and employs graph-based reasoning. By directly incorporating visual information into the graph and matching it to the environment, our approach enables the system to strike a good balance between exploration and goal attainment. This strategy allows robust obstacle avoidance, reliable loop closure detection, and efficient path planning while eliminating redundant exploration patterns. The system demonstrates flexibility by handling goals specified using reference images of target object instances, making it particularly suitable for scenarios in which agents must navigate to visually similar yet spatially distinct instances. Experiments demonstrate that our approach is efficient and adapts well to unknown environments, moving toward practical zero-shot instance-image navigation capabilities.

2603.06914 2026-03-10 cs.RO

SysNav: Multi-Level Systematic Cooperation Enables Real-World, Cross-Embodiment Object Navigation

Haokun Zhu, Zongtai Li, Zihan Liu, Kevin Guo, Zhengzhi Lin, Yuxin Cai, Guofei Chen, Chen Lv, Wenshan Wang, Jean Oh, Ji Zhang

详情
英文摘要

Object navigation (ObjectNav) in real-world environments is a complex problem that requires simultaneously addressing multiple challenges, including complex spatial structure, long-horizon planning and semantic understanding. Recent advances in Vision-Language Models (VLMs) offer promising capabilities for semantic understanding, yet effectively integrating them into real-world navigation systems remains a non-trivial challenge. In this work, we formulate real-world ObjectNav as a system-level problem and introduce SysNav, a three-level ObjectNav system designed for real-world crossembodiment deployment. SysNav decouples semantic reasoning, navigation planning and motion control to ensure robustness and generalizability. At the high-level, we summarize the environment into a structured scene representation and leverage VLMs to provide semantic-grounded navigation guidance. At the mid-level, we introduce a hierarchical room-based navigation strategy that reserves VLM guidance for room-level decisions, which effectively utilizes its reasoning ability while ensuring system efficiency. At the low-level, planned waypoints are executed through different embodiment-specific motion control modules. We deploy our system on three embodiments, a custom-built wheeled robot, the Unitree Go2 quadruped and the Unitree G1 humanoid, and conduct 190 real-world experiments. Our system achieves substantial improvements in both success rate and navigation efficiency. To the best of our knowledge, SysNav is the first system capable of reliably and efficiently completing building-scale long-range object navigation in complex real-world environments. Furthermore, extensive experiments on four simulation benchmarks demonstrate state-of-the-art performance. Project page is available at: https://cmu-vln.github.io/.

2603.06905 2026-03-10 cs.CL

MedInjection-FR: Exploring the Role of Native, Synthetic, and Translated Data in Biomedical Instruction Tuning

Ikram Belmadani, Oumaima El Khettari, Pacôme Constant dit Beaufils, Benoit Favre, Richard Dufour

Comments Accepted in LREC-2026

详情
英文摘要

Instruction tuning has become essential for adapting large language models (LLMs) to follow domain-specific prompts. Yet, in specialized fields such as medicine, the scarcity of high-quality French instruction data limits effective supervision. To address this gap, we introduce MedInjection-FR, a large-scale French biomedical instruction dataset comprising 571K instruction-response pairs drawn from three complementary sources: native, synthetic, and translated data. We design a controlled experimental framework to systematically assess how data provenance affects instruction tuning, using Qwen-4B-Instruct fine-tuned across seven configurations combining these sources. Results show that native data yield the strongest performance, while mixed setups, particularly native and translated, provide complementary benefits. Synthetic data alone remains less effective but contributes positively when balanced with native supervision. Evaluation on open-ended QA combines automatic metrics, LLM-as-a-judge assessment, and human expert review; although LLM-based judgments correlate best with human ratings, they show sensitivity to verbosity. These findings highlight that data authenticity and diversity jointly shape downstream adaptation and that heterogeneous supervision can mitigate the scarcity of native French medical instructions.

2603.06904 2026-03-10 cs.LG

XGenBoost: Synthesizing Small and Large Tabular Datasets with XGBoost

Jim Achterberg, Marcel Haas, Bram van Dijk, Marco Spruit

详情
英文摘要

Tree ensembles such as XGBoost are often preferred for discriminative tasks in mixed-type tabular data, due to their inductive biases, minimal hyperparameter tuning, and training efficiency. We argue that these qualities, when leveraged correctly, can make for better generative models as well. As such, we present XGenBoost, a pair of generative models based on XGBoost: i) a Denoising Diffusion Implicit Model (DDIM) with XGBoost as score-estimator suited for smaller datasets, and ii) a hierarchical autoregressive model whose conditionals are learned via XGBoost classifiers, suited for large-scale tabular synthesis. The architectures follow from the natural constraints imposed by tree-based learners, e.g., in the diffusion model, combining Gaussian and multinomial diffusion to leverage native categorical splits and avoid one-hot encoding while accurately modelling mixed data types. In the autoregressive model, we use a fixed-order factorization, a hierarchical classifier to impose ordinal inductive biases when modelling numerical features, and de-quantization based on empirical quantile functions to model the non-continuous nature of most real-world tabular datasets. Through two benchmarks, one containing smaller and the other larger datasets, we show that our proposed architectures outperform previous neural- and tree-based generative models for mixed-type tabular synthesis at lower training cost.

2603.06902 2026-03-10 cs.AI

Empowering Locally Deployable Medical Agent via State Enhanced Logical Skills for FHIR-based Clinical Tasks

Wanrong Yang, Zhengliang Liu, Yuan Li, Bingjie Yan, Lingfang Li, Mingguang He, Dominik Wojtczak, Yalin Zheng, Danli Shi

Comments 10 pages, 5 figures

详情
英文摘要

While Large Language Models demonstrate immense potential as proactive Medical Agents, their real-world deployment is severely bottlenecked by data scarcity under privacy constraints. To overcome this, we propose State-Enhanced Logical-Skill Memory (SELSM), a training-free framework that distills simulated clinical trajectories into entity-agnostic operational rules within an abstract skill space. During inference, a Query-Anchored Two-Stage Retrieval mechanism dynamically fetches these entity-agnostic logical priors to guide the agent's step-by-step reasoning, effectively resolving the state polysemy problem. Evaluated on MedAgentBench -- the only authoritative high-fidelity virtual EHR sandbox benchmarked with real clinical data -- SELSM substantially elevates the zero-shot capabilities of locally deployable foundation models (30B--32B parameters). Notably, on the Qwen3-30B-A3B backbone, our framework completely eliminates task chain breakdowns to achieve a 100\% completion rate, boosting the overall success rate by an absolute 22.67\% and significantly outperforming existing memory-augmented baselines. This study demonstrates that equipping models with a dynamically updatable, state-enhanced cognitive scaffold is a privacy-preserving and computationally efficient pathway for local adaptation of AI agents to clinical information systems. While currently validated on FHIR-based EHR interactions as an initial step, the entity-agnostic design of SELSM provides a principled foundation toward broader clinical deployment.

2603.06898 2026-03-10 cs.RO

Collaborative Planning with Concurrent Synchronization for Operationally Constrained UAV-UGV Teams

Zihao Deng, Qianhuang Li, Peng Gao, Maggie Wigness, John Rogers, Donghyun Kim, Hao Zhang

详情
英文摘要

Collaborative planning under operational constraints is an essential capability for heterogeneous robot teams tackling complex large-scale real-world tasks. Unmanned Aerial Vehicles (UAVs) offer rapid environmental coverage, but flight time is often limited by energy constraints, whereas Unmanned Ground Vehicles (UGVs) have greater energy capacity to support long-duration missions, but movement is constrained by traversable terrain. Individually, neither can complete tasks such as environmental monitoring. Effective UAV-UGV collaboration therefore requires energy-constrained multi-UAV task planning, traversability-constrained multi-UGV path planning, and crucially, synchronized concurrent co-planning to ensure timely in-mission recharging. To enable these capabilities, we propose Collaborative Planning with Concurrent Synchronization (CoPCS), a learning-based approach that integrates a heterogeneous graph transformer for operationally constrained task encoding with a transformer decoder for joint, synchronized co-planning that enables UAVs and UGVs to act concurrently in a coordinated manner. CoPCS is trained end-to-end under a unified imitation learning paradigm. We conducted extensive experiments to evaluate CoPCS in both robotic simulations and physical robot teams. Experimental results demonstrate that our method provides the novel multi-robot capability of synchronized concurrent co-planning and substantially improves team performance. More details of this work are available on the project website: https://hcrlab.gitlab.io/project/CoPCS.

2603.06894 2026-03-10 cs.LG cs.CV

Learning From Design Procedure To Generate CAD Programs for Data Augmentation

Yan-Ying Chen, Dule Shu, Matthew Hong, Andrew Taber, Jonathan Li, Matthew Klenk

Comments Accepted by NeurIPS 2025 Workshop: Deep Learning for Code in the Agentic Era

详情
英文摘要

Large Language Models (LLMs) have demonstrated impressive capabilities in a wide range of code generation tasks. However, generating code for certain domains remains challenging. One such domain is Computer-Aided Design (CAD) program, where the goal is to produce scripted parametric models that define object geometry for precise design and manufacturing applications. A key challenge in LLM-based CAD program generation is the limited geometric complexity of generated shapes compared to those found in real-world industrial designs. This shortfall is in part due to the lack of diversity in the available CAD program training data. To address this, we propose a novel data augmentation paradigm that prompts an LLM to generate CAD programs conditioned on a reference surface program and a modeling procedure - an idea inspired by practices in industrial design. By varying the reference surface using a collection of organic shapes, our method enriches the geometric distribution of generated CAD models. In particular, it introduces edges and faces defined by spline-based curvature, which are typically missing or underrepresented in existing open-source CAD program datasets. Experiments show that our method produces CAD samples with significantly greater geometric diversity and a higher resemblance to industry-grade CAD designs in terms of the proportion of organic shape primitives. This enhancement makes our CAD data augmentation approach a useful tool for training LLMs and other deep learning models in CAD generation.

2603.06889 2026-03-10 cs.LG

Single-pass Possibilistic Clustering with Damped Window Footprints

Jeffrey Dale, James Keller, Aquila Galusha

详情
英文摘要

Streaming clustering is a domain that has become extremely relevant in the age of big data, such as in network traffic analysis or in processing continuously-running sensor data. Furthermore, possibilistic models offer unique benefits over approaches from the literature, especially with the introduction of a "fuzzifier" parameter that controls how quickly typicality degrades as one gets further from cluster centers. We propose a single-pass possibilistic clustering (SPC) algorithm that is effective and easy to apply to new datasets. Key contributions of SPC include the ability to model non-spherical clusters, closed-form footprint updates over arbitrarily sized damped windows, and the employment of covariance union from the multiple hypothesis tracking literature to merge two cluster mean and covariance estimates. SPC is validated against five other streaming clustering algorithm on the basis of cluster purity and normalized mutual information.

2603.06888 2026-03-10 cs.AI

Enhancing the Detection of Coronary Artery Disease Using Machine Learning

Karan Kumar Singh, Nikita Gajbhiye, Gouri Sankar Mishra

Comments 20 pages, 11 figures, 5 tables. This paper proposes a hybrid Bi-LSTM and GRU based machine learning framework for improved detection of Coronary Artery Disease using CCTA imaging data and clinical features

详情
英文摘要

Coronary Artery Disease (CAD) remains a leading cause of morbidity and mortality worldwide. Early detection is critical to recover patient outcomes and decrease healthcare costs. In recent years, machine learning (ML) advancements have shown significant potential in enhancing the accuracy of CAD diagnosis. This study investigates the application of ML algorithms to improve the detection of CAD by analyzing patient data, including clinical features, imaging, and biomarker profiles. Bi-directional Long Short-Term Memory (Bi-LSTM), Gated Recurrent Units (GRU), and a hybrid of Bi-LSTM+GRU were trained on large datasets to predict the presence of CAD. Results demonstrated that these ML models outperformed traditional diagnostic methods in sensitivity and specificity, offering a robust tool for clinicians to make more informed decisions. The experimental results show that the hybrid model achieved an accuracy of 97.07%. By integrating advanced data preprocessing techniques and feature selection, this study ensures optimal learning and model performance, setting a benchmark for the application of ML in CAD diagnosis. The integration of ML into CAD detection presents a promising avenue for personalized healthcare and could play a pivotal role in the future of cardiovascular disease management.

2603.06884 2026-03-10 cs.AI

Distributed Legal Infrastructure for a Trustworthy Agentic Web

Tomer Jordi Chaffer, Victor Jiawei Zhang, Sante Dino Facchini, Botao Amber Hu, Helena Rong, Zihan Guo, Xisen Wang, Carlos Santana, Giovanni De Gasperis

详情
英文摘要

The agentic web marks a structural transition from a human-centered information network to a digital environment populated by artificial intelligence (AI) agents that perceive, decide, and act autonomously. As delegated action unfolds at machine speed, exceeds discrete moments of human judgment, and distributes decision-making across non-human actors, existing legal frameworks face growing strain, creating an urgent need for new mechanisms capable of sustaining legality in this emerging order. A trustworthy agentic web therefore depends on the infrastructuring of legality through interoperable protocols that organize identity, delegation, and accountability across systems, enabling coherent governance beyond isolated platforms. Towards this end, this article advances a distributed legal infrastructure (DLI), a governance paradigm composed of five interlocking layers: (1) self-sovereign, soulbound agent identities; (2) cognitive AI logic and constraint systems; (3) decentralized adjudication mechanisms for dispute resolution; (4) bottom-up agentic market regulation to mitigate information asymmetries and network effects, including insurance-based models; and (5) portable institutional frameworks that enable legal interoperability while preserving plural sources of authority. This reference framework contributes to emerging research on embedding legality within agentic web infrastructure, aligning distributed technical systems with accountability, contestability, and rule-of-law principles.

2603.06879 2026-03-10 cs.RO

Material Driven HRI Design: Aesthetics as Explainability

Natalie Friedman, Kevin Weatherwax, Chengchao Zhu

Comments 4 pages, 1 table, 2026 ACM/IEEE Human-Robot Interaction Conference Workshop on Articulating the Value of Design Research for HRI

详情
英文摘要

Aesthetics - often treated as secondary to function-guides how people interpret robots' roles. A great deal of robot designs - both real and fictitious - use sleek industrial aesthetics. These feature hard glossy plastics, hiding as much of the underlying mechanical and electrical components as possible, resembling something akin to a nude humanoid figure. This leaves robots as something of a blank slate to which end-users apply coverings to, often based on media of fiction and non-fiction alike. We argue that designers can take cues from fashion to design interaction and set appropriate expectations. Rather than viewing appearance as decoration, we propose that color, texture, and material choices function as interaction signals. These signals can invite or discourage touch, clarify a robot's role, and help align user expectations with a robot's actual capabilities. When done thoughtfully, such cues can create familiarity and legibility; when done poorly, they can lead to wrong expectations. This preliminary paper proposes a framework describing how materials can create explainability by signaling expectations for interaction, task, and environment. We use this framework to do a content analysis of 6 robots.

2603.06878 2026-03-10 cs.AI

Not Too Short, Not Too Long: How LLM Response Length Shapes People's Critical Thinking in Error Detection

Natalie Friedman, Adelaide Nyanyo, Kevin Weatherwax, Lifei Wang, Chengchao Zhu, Zeshu Zhu, S. Joy Mountford

Comments 6 pages, 1 table, 2 figures

Journal ref In MIRAGE 2026 workshop held at 31st International Conference on Intelligent User Interfaces. ACM, New York, NY, USA, 6 pages

详情
英文摘要

Large language models (LLMs) have become common decision-support tools across educational and professional contexts, raising questions about how their outputs shape human critical thinking. Prior work suggests that the amount of AI assistance can influence cognitive engagement, yet little is known about how specific properties of LLM outputs (e.g., response length) impacts users' critical evaluation of information. In this study, we examine whether the length of LLM responses shapes users' accuracy in evaluating LLM-generated reasoning on critical thinking tasks, particularly in interaction with the correctness of the LLM's reasoning. To begin evaluating this, we conducted a within-subjects experiment with 24 participants who completed 15 modified Watson--Glaser critical thinking items, each accompanied by an LLM-generated explanation that varied in length and correctness. Mixed-effects logistic regression revealed a strong and statistically reliable effect of LLM output correctness on participant accuracy, with participants more likely to answer correctly when the LLM's explanation was correct. Response length appeared to moderated this effect: when the LLM output was incorrect, medium-length explanations were associated with higher participant accuracy than either shorter or longer explanations, whereas accuracy remained high across lengths when the LLM output was correct. Together, these findings suggest that response length alone may be insufficient to support critical thinking, and that how reasoning is presented-including a potential advantage of mid-length explanations under some conditions-points to design opportunities for LLM-based decision-support systems that emphasize transparent reasoning and calibrated expressions of certainty.

2603.06874 2026-03-10 cs.AI cs.CL

LieCraft: A Multi-Agent Framework for Evaluating Deceptive Capabilities in Language Models

Matthew Lyle Olson, Neale Ratzlaff, Musashi Hinck, Tri Nguyen, Vasudev Lal, Joseph Campbell, Simon Stepputtis, Shao-Yen Tseng

Comments AAAI 2026 Alignment track. Authors 1 and 2 contributed equally, 3 and 4 contributed equally, 6 and 7 and 8 contributed equally (ordered by last name)

详情
英文摘要

Large Language Models (LLMs) exhibit impressive general-purpose capabilities but also introduce serious safety risks, particularly the potential for deception as models acquire increased agency and human oversight diminishes. In this work, we present LieCraft: a novel evaluation framework and sandbox for measuring LLM deception that addresses key limitations of prior game-based evaluations. At its core, LieCraft is a novel multiplayer hidden-role game in which players select an ethical alignment and execute strategies over a long time-horizon to accomplish missions. Cooperators work together to solve event challenges and expose bad actors, while Defectors evade suspicion while secretly sabotaging missions. To enable real-world relevance, we develop 10 grounded scenarios such as childcare, hospital resource allocation, and loan underwriting that recontextualize the underlying mechanics in ethically significant, high-stakes domains. We ensure balanced gameplay in LieCraft through careful design of game mechanics and reward structures that incentivize meaningful strategic choices while eliminating degenerate strategies. Beyond the framework itself, we report results from 12 state-of-the-art LLMs across three behavioral axes: propensity to defect, deception skill, and accusation accuracy. Our findings reveal that despite differences in competence and overall alignment, all models are willing to act unethically, conceal their intentions, and outright lie to pursue their goals.

2603.06873 2026-03-10 cs.CV

PICS: Pairwise Image Compositing with Spatial Interactions

Hang Zhou, Xinxin Zuo, Sen Wang, Li Cheng

Comments ICLR 2026. Project page: https://ryanhangzhou.github.io/pics/ , code: https://github.com/RyanHangZhou/PICS

详情
英文摘要

Despite strong single-turn performance, diffusion-based image compositing often struggles to preserve coherent spatial relations in pairwise or sequential edits, where subsequent insertions may overwrite previously generated content and disrupt physical consistency. We introduce PICS, a self-supervised composition-by-decomposition paradigm that composes objects in parallel while explicitly modeling the compositional interactions among (fully-/partially-)visible objects and background. At its core, an Interaction Transformer employs mask-guided Mixture-of-Experts to route background, exclusive, and overlap regions to dedicated experts, with an adaptive α-blending strategy that infers a compatibility-aware fusion of overlapping objects while preserving boundary fidelity. To further enhance robustness to geometric variations, we incorporate geometry-aware augmentations covering both out-of-plane and in-plane pose changes of objects. Our method delivers superior pairwise compositing quality and substantially improved stability, with extensive evaluations across virtual try-on, indoor, and street scene settings showing consistent gains over state-of-the-art baselines. Code and data are available at https://github.com/RyanHangZhou/PICS

2603.06869 2026-03-10 cs.AI cs.CL cs.LG

Symmetry-Constrained Language-Guided Program Synthesis for Discovering Governing Equations from Noisy and Partial Observations

Mirza Samad Ahmed Baig, Syeda Anshrah Gillani

Comments 12 pages, 4 figures, 5 tables

详情
英文摘要

Discovering compact governing equations from experimental observations is one of the defining objectives of quantitative science, yet practical discovery pipelines routinely fail when measurements are noisy, relevant state variables are unobserved, or multiple symbolic structures explain the data equally well within statistical uncertainty. Here we introduce SymLang (Symmetry-constrained Language-guided equation discovery), a unified framework that brings together three previously separate ideas: (i) typed symmetry-constrained grammars that encode dimensional analysis, group-theoretic invariance, and parity constraints as hard production rules, eliminating on average 71.3% of candidate expression trees before any fitting; (ii) language-model-guided program synthesis in which a fine-tuned 7B-parameter proposer, conditioned on interpretable data descriptors, efficiently navigates the constrained search space; and (iii) MDL-regularized Bayesian model selection coupled with block-bootstrap stability analysis that quantifies structural uncertainty rather than committing to a single best equation. Across 133 dynamical systems spanning classical mechanics, electrodynamics, thermodynamics, population dynamics, and nonlinear oscillators, SymLang achieves an exact structural recovery rate of 83.7% under 10% observational noise - a 22.4 percentage-point improvement over the next-best baseline - while reducing out-of-distribution extrapolation error by 61% and near-eliminating conservation-law violations (3.1 x 10-3 vs. 187.3 x 10-3 physical drift for the closest competitor). In all tested regimes the framework correctly identifies structural degeneracy, reporting it explicitly rather than returning a confidently wrong single equation. The framework is fully open-source and reproducible, providing a principled pathway from raw data to interpretable, physically auditable symbolic laws.

2603.06864 2026-03-10 cs.RO

Robodimm: A Physics-Grounded Framework for Automated Actuator Sizing in Scalable Modular Robots

J. L. Torres, M. Munoz, J. D. Alvarez, J. L. Blanco, A. Gimenez

Comments 8 pages, 3 figures. Preprint version submitted to arXiv

详情
英文摘要

Selecting an appropriate motor-gearbox combination is a critical design task in robotics because it directly affects cost, mass, and dynamic performance. This process is especially challenging in modular robots with closed kinematic chains, where joint torques are coupled and actuator inertia propagates through the mechanism. We present Robodimm, a software framework for automated actuator sizing in scalable robot architectures. By leveraging Pinocchio for dynamics and Pink for inverse kinematics, Robodimm uses a Karush-Kuhn-Tucker (KKT) formulation for constrained inverse dynamics. The platform supports parametric scaling, interactive trajectory programming through jog modes, and a two-round validation workflow that addresses actuator self-weight effects.

2603.06863 2026-03-10 cs.CV cs.AI

A prior information informed learning architecture for flying trajectory prediction

Xianda Huang, Zidong Han, Ruibo Jin, Zhenyu Wang, Wenyu Li, Xiaoyang Li, Yi Gong

详情
英文摘要

Trajectory prediction for flying objects is critical in domains ranging from sports analytics to aerospace. However, traditional methods struggle with complex physical modeling, computational inefficiencies, and high hardware demands, often neglecting critical trajectory events like landing points. This paper introduces a novel, hardware-efficient trajectory prediction framework that integrates environmental priors with a Dual-Transformer-Cascaded (DTC) architecture. We demonstrate this approach by predicting the landing points of tennis balls in real-world outdoor courts. Using a single industrial camera and YOLO-based detection, we extract high-speed flight coordinates. These coordinates, fused with structural environmental priors (e.g., court boundaries), form a comprehensive dataset fed into our proposed DTC model. A first-level Transformer classifies the trajectory, while a second-level Transformer synthesizes these features to precisely predict the landing point. Extensive ablation and comparative experiments demonstrate that integrating environmental priors within the DTC architecture significantly outperforms existing trajectory prediction frameworks

2603.06860 2026-03-10 cs.CV

ColonSplat: Reconstruction of Peristaltic Motion in Colonoscopy with Dynamic Gaussian Splatting

Weronika Smolak-Dyżewska, Joanna Kaleta, Diego Dall'Alba, Przemysław Spurek

详情
英文摘要

Accurate 3D reconstruction of colonoscopy data, accounting for complex peristaltic movements, is crucial for advanced surgical navigation and retrospective diagnostics. While recent novel view synthesis and 3D reconstruction methods have demonstrated remarkable success in general endoscopic scenarios, they struggle in the highly constrained environment of the colon. Due to the limited field of view of a camera moving through an actively deforming tubular structure, existing endoscopic methods reconstruct the colon appearance only for initial camera trajectory. However, the underlying anatomy remains largely static; instead of updating Gaussians' spatial coordinates (xyz), these methods encode deformation through either rotation, scale or opacity adjustments. In this paper, we first present a benchmark analysis of state-of-the-art dynamic endoscopic methods for realistic colonoscopic scenes, showing that they fail to model true anatomical motion. To enable rigorous evaluation of global reconstruction quality, we introduce DynamicColon, a synthetic dataset with ground-truth point clouds at every timestep. Building on these insights, we propose ColonSplat, a dynamic Gaussian Splatting framework that captures peristaltic-like motion while preserving global geometric consistency, achieving superior geometric fidelity on C3VDv2 and DynamicColon datasets. Project page: https://wmito.github.io/ColonSplat

2603.06854 2026-03-10 cs.SD cs.AI

Are Audio-Language Models Listening? Audio-Specialist Heads for Adaptive Audio Steering

Neta Glazer, Lenny Aharon, Ethan Fetaya

详情
英文摘要

Multimodal large language models can exhibit text dominance, over-relying on linguistic priors instead of grounding predictions in non-text inputs. One example is large audio-language models (LALMs) where decisive audio evidence can be under-utilized even when it contains important information. To address this issue we use mechanistic interpretability to identify a small set of audio-specialist attention heads whose audio attention yields a ``listening'' signal. We show that this signal increases when audio evidence affects the model's output, providing an indicator of audio engagement under standard prompting. Leveraging this localization, we construct an audio--silence steering direction and apply an inference-time activation intervention to the final representation, amplifying the model's audio effect. To demonstrate the utility of this intervention, we show on MMAU that this improves accuracy by up to +8.0 percentage points on two Qwen-based LALMs, without any parameter updates.

2603.06853 2026-03-10 cs.CV math.AT

An Extended Topological Model For High-Contrast Optical Flow

Brad Turow, Jose A. Perea

Comments 28 pages, 31 figures

详情
英文摘要

In this paper, we identify low-dimensional models for dense core subsets in the space of $3\times 3$ high-contrast optical flow patches sampled from the Sintel dataset. In particular, we leverage the theory of approximate and discrete circle bundles to identify a 3-manifold whose boundary is a previously proposed optical flow torus, together with disjoint circles corresponding to pairs of binary step-edge range image patches. The 3-manifold model we introduce provides an explanation for why the previously-proposed torus model could not be verified with direct methods (e.g., a straightforward persistent homology computation). We also demonstrate that nearly all optical flow patches in the top 1 percent by contrast norm are found near the family of binary step-edge circles described above, rather than the optical flow torus, and that these frequently occurring patches are concentrated near motion boundaries (which are of particular importance for computer vision tasks such as object segmentation and tracking). Our findings offer insights on the subtle interplay between topology and geometry in inference for visual data.

2603.06852 2026-03-10 cs.CV

Active View Selection with Perturbed Gaussian Ensemble for Tomographic Reconstruction

Yulun Wu, Ruyi Zha, Wei Cao, Yingying Li, Yuanhao Cai, Yaoyao Liu

详情
英文摘要

Sparse-view computed tomography (CT) is critical for reducing radiation exposure to patients. Recent advances in radiative 3D Gaussian Splatting (3DGS) have enabled fast and accurate sparse-view CT reconstruction. Despite these algorithmic advancements, practical reconstruction fidelity remains fundamentally bounded by the quality of the captured data, raising the crucial yet underexplored problem of X-ray active view selection. Existing active view selection methods are primarily designed for natural-light scenes and fail to capture the unique geometric ambiguities and physical attenuation properties inherent in X-ray imaging. In this paper, we present Perturbed Gaussian Ensemble, an active view selection framework that integrates uncertainty modeling with sequential decision-making, tailored for X-ray Gaussian Splatting. Specifically, we identify low-density Gaussian primitives that are likely to be uncertain and apply stochastic density scaling to construct an ensemble of plausible Gaussian density fields. For each candidate projection, we measure the structural variance of the ensemble predictions and select the one with the highest variance as the next best view. Extensive experimental results on arbitrary-trajectory CT benchmarks demonstrate that our density-guided perturbation strategy effectively eliminates geometric artifacts and consistently outperforms existing baselines in progressive tomographic reconstruction under unified view selection protocols.

2603.06850 2026-03-10 cs.RO

Nonlinear Performance Degradation of Vision-Based Teleoperation under Network Latency

Aws Khalil, Jaerock Kwon

详情
英文摘要

Teleoperation is increasingly being adopted as a critical fallback for autonomous vehicles. However, the impact of network latency on vision-based, perception-driven control remains insufficiently studied. The present work investigates the nonlinear degradation of closed-loop stability in camera-based lane keeping under varying network delays. To conduct this study, we developed the Latency-Aware Vision Teleoperation testbed (LAVT), a research-oriented ROS 2 framework that enables precise, distributed one-way latency measurement and reproducible delay injection. Using LAVT, we performed 180 closed-loop experiments in simulation across diverse road geometries. Our findings reveal a sharp collapse in stability between 150 ms and 225 ms of one-way perception latency, where route completion rates drop from 100% to below 50% as oscillatory instability and phase-lag effects emerge. We further demonstrate that additional control-channel delay compounds these effects, significantly accelerating system failure even under constant visual latency. By combining this systematic empirical characterization with the LAVT testbed, this work provides quantitative insights into perception-driven instability and establishes a reproducible baseline for future latency-compensation and predictive control strategies. Project page, supplementary video, and code are available at https://bimilab.github.io/paper-LAVT

2603.06846 2026-03-10 cs.CV cs.RO

MotionBits: Video Segmentation through Motion-Level Analysis of Rigid Bodies

Howard H. Qian, Kejia Ren, Yu Xiang, Vicente Ordonez, Kaiyu Hang

Comments 23 pages, 18 figures

详情
英文摘要

Rigid bodies constitute the smallest manipulable elements in the real world, and understanding how they physically interact is fundamental to embodied reasoning and robotic manipulation. Thus, accurate detection, segmentation, and tracking of moving rigid bodies is essential for enabling reasoning modules to interpret and act in diverse environments. However, current segmentation models trained on semantic grouping are limited in their ability to provide meaningful interaction-level cues for completing embodied tasks. To address this gap, we introduce MotionBit, a novel concept that, unlike prior formulations, defines the smallest unit in motion-based segmentation through kinematic spatial twist equivalence, independent of semantics. In this paper, we contribute (1) the MotionBit concept and definition, (2) a hand-labeled benchmark, called MoRiBo, for evaluating moving rigid-body segmentation across robotic manipulation and human-in-the-wild videos, and (3) a learning-free graph-based MotionBits segmentation method that outperforms state-of-the-art embodied perception methods by 37.3\% in macro-averaged mIoU on the MoRiBo benchmark. Finally, we demonstrate the effectiveness of MotionBits segmentation for downstream embodied reasoning and manipulation tasks, highlighting its importance as a fundamental primitive for understanding physical interactions.

2603.06842 2026-03-10 cs.RO

RoboCritics: Enabling Reliable End-to-End LLM Robot Programming through Expert-Informed Critics

Callie Y. Kim, Nathan Thomas White, Evan He, Frederic Sala, Bilge Mutlu

Comments 10 pages, 5 figures, Proceedings of the 21st ACM/IEEE International Conference on Human Robot Interaction (HRI 2026)

详情
英文摘要

End-user robot programming grants users the flexibility to re-task robots in situ, yet it remains challenging for novices due to the need for specialized robotics knowledge. Large Language Models (LLMs) hold the potential to lower the barrier to robot programming by enabling task specification through natural language. However, current LLM-based approaches generate opaque, "black-box" code that is difficult to verify or debug, creating tangible safety and reliability risks in physical systems. We present RoboCritics, an approach that augments LLM-based robot programming with expert-informed motion-level critics. These critics encode robotics expertise to analyze motion-level execution traces for issues such as joint speed violations, collisions, and unsafe end-effector poses. When violations are detected, critics surface transparent feedback and offer one-click fixes that forward structured messages back to the LLM, enabling iterative refinement while keeping users in the loop. We instantiated RoboCritics in a web-based interface connected to a UR3e robot and evaluated it in a between-subjects user study (n=18). Compared to a baseline LLM interface, RoboCritics reduced safety violations, improved execution quality, and shaped how participants verified and refined their programs. Our findings demonstrate that RoboCritics enables more reliable and user-centered end-to-end robot programming with LLMs.

2603.06836 2026-03-10 cs.CL cs.GL

Validation of a Small Language Model for DSM-5 Substance Category Classification in Child Welfare Records

Brian E. Perron, Dragan Stoll, Bryan G. Victor, Zia Qia, Andreas Jud, Joseph P. Ryan

详情
英文摘要

Background: Recent studies have demonstrated that large language models (LLMs) can perform binary classification tasks on child welfare narratives, detecting the presence or absence of constructs such as substance-related problems, domestic violence, and firearms involvement. Whether smaller, locally deployable models can move beyond binary detection to classify specific substance types from these narratives remains untested. Objective: To validate a locally hosted LLM classifier for identifying specific substance types aligned with DSM-5 categories in child welfare investigation narratives. Methods: A locally hosted 20-billion-parameter LLM classified child maltreatment investigation narratives from a Midwestern U.S. state. Records previously identified as containing substance-related problems were passed to a second classification stage targeting seven DSM-5 substance categories. Expert human review of 900 stratified cases assessed classification precision, recall, and inter-method reliability (Cohen's kappa). Test-retest stability was evaluated using approximately 15,000 independently classified records. Results: Five substance categories achieved almost perfect inter-method agreement (kappa = 0.94-1.00): alcohol, cannabis, opioid, stimulant, and sedative/hypnotic/anxiolytic. Classification precision ranged from 92% to 100% for these categories. Two low-prevalence categories (hallucinogen, inhalant) performed poorly. Test-retest agreement ranged from 92.1% to 99.1% across the seven categories. Conclusions: A small, locally hosted LLM can reliably classify substance types from child welfare administrative text, extending prior work on binary classification to multi-label substance identification.