arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.04843 2026-04-07 cs.CV cs.AI

InfBaGel: Human-Object-Scene Interaction Generation with Dynamic Perception and Iterative Refinement

Yude Zou, Junji Gong, Xing Gao, Zixuan Li, Tianxing Chen, Guanjie Zheng

Comments ICLR 2026

详情

英文摘要

Human-object-scene interactions (HOSI) generation has broad applications in embodied AI, simulation, and animation. Unlike human-object interaction (HOI) and human-scene interaction (HSI), HOSI generation requires reasoning over dynamic object-scene changes, yet suffers from limited annotated data. To address these issues, we propose a coarse-to-fine instruction-conditioned interaction generation framework that is explicitly aligned with the iterative denoising process of a consistency model. In particular, we adopt a dynamic perception strategy that leverages trajectories from the preceding refinement to update scene context and condition subsequent refinement at each denoising step of consistency model, yielding consistent interactions. To further reduce physical artifacts, we introduce a bump-aware guidance that mitigates collisions and penetrations during sampling without requiring fine-grained scene geometry, enabling real-time generation. To overcome data scarcity, we design a hybrid training startegy that synthesizes pseudo-HOSI samples by injecting voxelized scene occupancy into HOI datasets and jointly trains with high-fidelity HSI data, allowing interaction learning while preserving realistic scene awareness. Extensive experiments demonstrate that our method achieves state-of-the-art performance in both HOSI and HOI generation, and strong generalization to unseen scenes. Project page: https://yudezou.github.io/InfBaGel-page/

URL PDF HTML ☆

赞 0 踩 0

2604.04842 2026-04-07 cs.CL

Do No Harm: Exposing Hidden Vulnerabilities of LLMs via Persona-based Client Simulation Attack in Psychological Counseling

Qingyang Xu, Yaling Shen, Stephanie Fong, Zimu Wang, Yiwen Jiang, Xiangyu Zhao, Jiahe Liu, Zhongxing Xu, Vincent Lee, Zongyuan Ge

2604.04841 2026-04-07 cs.SD eess.AS eess.SP

Joint Fullband-Subband Modeling for High-Resolution SingFake Detection

Xuanjun Chen, Chia-Yu Hu, Sung-Feng Huang, Haibin Wu, Hung-yi Lee, Jyh-Shing Roger Jang

Comments Submitted to INTERSPEECH 2026

2604.04839 2026-04-07 cs.CL

MERIT: Multilingual Expert-Reward Informed Tuning for Chinese-Centric Low-Resource Machine Translation

Zhixiang Lu, Chong Zhang, Chenyu Xue, Angelos Stefanidis, Chong Li, Jionglong Su, Zhengyong Jiang

2604.04834 2026-04-07 cs.CV cs.MM cs.RO eess.IV

E-VLA: Event-Augmented Vision-Language-Action Model for Dark and Blurred Scenes

Jiajun Zhai, Hao Shi, Shangwei Guo, Kailun Yang, Kaiwei Wang

Comments Code and dataset will be available at https://github.com/JJayzee/E-VLA

2604.04826 2026-04-07 cs.RO

Efficient Multi-Objective Planning with Weighted Maximization Using Large Neighbourhood Search

Krishna Kalavadia, Shamak Dutta, Yash Vardhan Pant, Stephen L. Smith

2604.04820 2026-04-07 cs.AI cs.CL

ANX: Protocol-First Design for AI Agent Interaction with a Supporting 3EX Decoupled Architecture

Xu Mingze

Comments This open-source AI agent interaction protocol (ANX) is benchmarked against existing protocols (MCP, A2A, ANP, OpenCLI, SkillWeaver, CHEQ, COLLAB-LLM) across four dimensions: tooling, discovery, security, and multi-agent SOP collaboration. Code: https://github.com/mountorc/anx-protocol

2604.04811 2026-04-07 cs.RO cs.CV cs.HC

AnyUser: Translating Sketched User Intent into Domestic Robots

Songyuan Yang, Huibin Tan, Kailun Yang, Wenjing Yang, Shaowu Yang

Comments Accepted to IEEE Transactions on Robotics (T-RO)

2604.04808 2026-04-07 cs.LG cs.AI

Selecting Decision-Relevant Concepts in Reinforcement Learning

Naveen Raman, Stephanie Milani, Fei Fang

Comments 16 pages, 13 figures

2604.04800 2026-04-07 cs.LG cs.CR

Forgetting to Witness: Efficient Federated Unlearning and Its Visible Evaluation

Houzhe Wang, Xiaojie Zhu, Chi Chen

2604.04797 2026-04-07 cs.CV cs.LG

Multi-Modal Sensor Fusion using Hybrid Attention for Autonomous Driving

Mayank Mayank, Bharanidhar Duraisamy, Florian Geiß, Abhinav Valada

Comments 9 pages, 8 figures

2604.04791 2026-04-07 cs.CL

How Far Are We? Systematic Evaluation of LLMs vs. Human Experts in Mathematical Contest in Modeling

Yuhang Liu, Heyan Huang, Yizhe Yang, Hongyan Zhao, Zhizhuo Zeng, Yang Gao

2604.04790 2026-04-07 cs.CL cs.LG

HUKUKBERT: Domain-Specific Language Model for Turkish Law

Mehmet Utku Öztürk, Tansu Türkoğlu, Buse Buz-Yalug

Comments 15 pages

2604.04780 2026-04-07 cs.CV

CLEAR: Unlocking Generative Potential for Degraded Image Understanding in Unified Multimodal Models

Xiangzhao Hao, Zefeng Zhang, Zhenyu Zhang, Linhao Yu, Yao Chen, Yiqian Zhang, Haiyun Guo, Shuohuan Wang, Yu Sun

2604.04767 2026-04-07 cs.LG cs.AI cs.CL

Cog-DRIFT: Exploration on Adaptively Reformulated Instances Enables Learning from Hard Reasoning Problems

Justin Chih-Yao Chen, Archiki Prasad, Zaid Khan, Joykirat Singh, Runchu Tian, Elias Stengel-Eskin, Mohit Bansal

Comments 22 pages, 4 figures. Code: https://github.com/dinobby/Cog-DRIFT

详情

英文摘要

Reinforcement learning from verifiable rewards (RLVR) has improved the reasoning abilities of LLMs, yet a fundamental limitation remains: models cannot learn from problems that are too difficult to solve under their current policy, as these yield no meaningful reward signal. We propose a simple yet effective solution based on task reformulation. We transform challenging open-ended problems into cognitively simpler variants -- such as multiple-choice and cloze formats -- that preserve the original answer while reducing the effective search space and providing denser learning signals. These reformulations span a spectrum from discriminative to generative tasks, which we exploit to bootstrap learning: models first learn from structured, easier formats, and this knowledge transfers back to improve performance on the original open-ended problems. Building on this insight, we introduce Cog-DRIFT, a framework that constructs reformulated variants and organizes them into an adaptive curriculum based on difficulty. Training progresses from easier to harder formats, enabling the model to learn from problems that previously yielded zero signal under standard RL post-training. Cog-DRIFT not only improves on the originally unsolvable hard problems (absolute +10.11% for Qwen and +8.64% for Llama) but also generalizes well to other held-out datasets. Across 2 models and 6 reasoning benchmarks, our method consistently outperforms standard GRPO and strong guided-exploration baselines. On average, Cog-DRIFT shows +4.72% (Qwen) and +3.23% (Llama) improvements over the second-best baseline. We further show that Cog-DRIFT improves pass@k at test time, and the curriculum improves sample efficiency. Overall, our results highlight task reformulation and curriculum learning as an effective paradigm for overcoming the exploration barrier in LLM post-training.

URL PDF HTML ☆

赞 0 踩 0

2604.04749 2026-04-07 cs.AI

AI Trust OS -- A Continuous Governance Framework for Autonomous AI Observability and Zero-Trust Compliance in Enterprise Environments

Eranga Bandara, Asanga Gunaratna, Ross Gore, Abdul Rahman, Ravi Mukkamala, Sachin Shetty, Sachini Rajapakse, Isurunima Kularathna, Peter Foytik, Safdar H. Bouk, Xueping Liang, Amin Hass, Ng Wee Keong, Kasun De Zoysa

2604.04743 2026-04-07 cs.CL cs.AI cs.SY eess.SY

Hallucination Basins: A Dynamic Framework for Understanding and Controlling LLM Hallucinations

Kalyan Cherukuri, Lav R. Varshney

2604.04736 2026-04-07 cs.LG cs.AI cs.DC

Sampling Parallelism for Fast and Efficient Bayesian Learning

Asena Karolin Özdemir, Lars H. Heyen, Arvid Weyrauch, Achim Streit, Markus Götz, Charlotte Debus

Comments 12 pages, 10 figures, 1 table

详情

英文摘要

Machine learning models, and deep neural networks in particular, are increasingly deployed in risk-sensitive domains such as healthcare, environmental forecasting, and finance, where reliable quantification of predictive uncertainty is essential. However, many uncertainty quantification (UQ) methods remain difficult to apply due to their substantial computational cost. Sampling-based Bayesian learning approaches, such as Bayesian neural networks (BNNs), are particularly expensive since drawing and evaluating multiple parameter samples rapidly exhausts memory and compute resources. These constraints have limited the accessibility and exploration of Bayesian techniques thus far. To address these challenges, we introduce sampling parallelism, a simple yet powerful parallelization strategy that targets the primary bottleneck of sampling-based Bayesian learning: the samples themselves. By distributing sample evaluations across multiple GPUs, our method reduces memory pressure and training time without requiring architectural changes or extensive hyperparameter tuning. We detail the methodology and evaluate its performance on a few example tasks and architectures, comparing against distributed data parallelism (DDP) as a baseline. We further demonstrate that sampling parallelism is complementary to existing strategies by implementing a hybrid approach that combines sample and data parallelism. Our experiments show near-perfect scaling when the sample number is scaled proportionally to the computational resources, confirming that sample evaluations parallelize cleanly. Although DDP achieves better raw speedups under scaling with constant workload, sampling parallelism has a notable advantage: by applying independent stochastic augmentations to the same batch on each GPU, it increases augmentation diversity and thus reduces the number of epochs required for convergence.

URL PDF HTML ☆

赞 0 踩 0

2604.04735 2026-04-07 cs.CL

Lighting Up or Dimming Down? Exploring Dark Patterns of LLMs in Co-Creativity

Zhu Li, Jiaming Qu, Yuan Chang

2604.04732 2026-04-07 cs.CL cs.AI

Metaphors We Compute By: A Computational Audit of Cultural Translation vs. Thinking in LLMs

Yuan Chang, Jiaming Qu, Zhu Li

2604.04723 2026-04-07 cs.CL cs.AI

Individual and Combined Effects of English as a Second Language and Typos on LLM Performance

Serena Liu, Yutong Yang, Prisha Sheth, Weixuan Dong, Mingjiao Diao, Xinru Zhu, Nikhil Banga, Oscar Melendez, Arnav Sharma, Minda Zhao, Marina Lin, Mengyu Wang

2604.04722 2026-04-07 cs.CV

Don't Waste Bits! Adaptive KV-Cache Quantization for Lightweight On-Device LLMs

Sayed Pedram Haeri Boroujeni, Niloufar Mehrabi, Patrick Woods, Gabriel Hillesheim, Abolfazl Razi

Comments Accepted by the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026

详情

英文摘要

Large Language Models (LLMs) have achieved remarkable progress across reasoning, generation, and decision-making tasks, yet deploying them on mobile, embedded, and edge devices remains particularly challenging. On-device LLM inference is heavily constrained by the memory and bandwidth overhead of the key-value (KV) cache, which grows linearly with context length and often dominates decoding cost. Existing KV-cache quantization schemes typically rely on fixed precision or hand-crafted heuristics, thereby wasting bits on low-impact tokens while over-compressing informative ones, leading to avoidable accuracy degradation. Inspired by Huffman coding's principle of variable-length allocation, we propose adaptive KV-cache quantization, a learned policy that assigns bit-width proportional to token importance, minimizing expected memory and latency without sacrificing competitive accuracy. Our framework extracts lightweight token-level features, including token frequency, quality score, attention variance, and entropy-based uncertainty, and feeds them into a compact data-driven controller that dynamically selects KV precision from {2-bit, 4-bit, 8-bit, FP16} during decoding. This adaptive precision policy reduces KV memory footprint and latency while improving accuracy compared to static KV quantization and rule-based baselines, and maintaining competitive accuracy close to FP16 inference across standard LLM benchmarks. Extensive experiments across multiple commonsense reasoning benchmarks using SmolLM-135M, SmolLM-360M, and SmolLM-1.7B demonstrate that our controller consistently improves the accuracy-latency trade-off. For instance, with SmolLM-360M on HellaSwag, our method reduces decoding latency (ms/token) by 17.75% relative to static KV quantization, improves accuracy by 7.60 points, and remains within only 0.30 points of FP16 inference.

URL PDF HTML ☆

赞 0 踩 0

2604.04720 2026-04-07 cs.CL cs.AI

What Makes Good Multilingual Reasoning? Disentangling Reasoning Traces with Measurable Features

Dayeon Ki, Kevin Duh, Marine Carpuat

Comments 31 pages, 7 figures

2604.04717 2026-04-07 cs.LG cond-mat.mtrl-sci cs.AI stat.ML

The Infinite-Dimensional Nature of Spectroscopy and Why Models Succeed, Fail, and Mislead

Umberto Michelucci, Francesca Venturini

2604.04708 2026-04-07 cs.CL cs.AI

BiST: A Gold Standard Bangla-English Bilingual Corpus for Sentence Structure and Tense Classification with Inter-Annotator Agreement

Abdullah Al Shafi, Swapnil Kundu Argha, M. A. Moyeen, Abdul Muntakim, Shoumik Barman Polok

2604.04704 2026-04-07 cs.CL

IDIOLEX: Unified and Continuous Representations for Idiolectal and Stylistic Variation

Anjali Kantharuban, Aarohi Srivastava, Fahim Faisal, Orevaoghene Ahia, Antonios Anastasopoulos, David Chiang, Yulia Tsvetkov, Graham Neubig

2604.04701 2026-04-07 cs.LG cs.AI

MUXQ: Mixed-to-Uniform Precision MatriX Quantization via Low-Rank Outlier Decomposition

Seoungsub Lee, In Seo Kim, Seon Wook Kim

2604.04698 2026-04-07 cs.LG cs.CV

Explainable Machine Learning for Sepsis Outcome Prediction Using a Novel Romanian Electronic Health Record Dataset

Andrei-Alexandru Bunea, Ovidiu Ghibea, Dan-Matei Popovici, Ion Daniel, Octavian Andronic

2604.04693 2026-04-07 cs.CV

3D Gaussian Splatting for Annular Dark Field Scanning Transmission Electron Microscopy Tomography Reconstruction

Beiyuan Zhang, Hesong Li, Ruiwen Shao, Ying Fu

2604.04690 2026-04-07 cs.RO cs.AI

Pickalo: Leveraging 6D Pose Estimation for Low-Cost Industrial Bin Picking

Alessandro Tarsi, Matteo Mastrogiuseppe, Saverio Taliani, Simone Cortinovis, Ugo Pattacini