arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.21148 2026-04-27 cs.CL

"This Wasn't Made for Me": Recentering User Experience and Emotional Impact in the Evaluation of ASR Bias

Siyu Liang, Alicia Beckford Wassink

详情

英文摘要

Studies on bias in Automatic Speech Recognition (ASR) tend to focus on reporting error rates for speakers of underrepresented dialects, yet less research examines the human side of system bias: how do system failures shape users' lived experiences, how do users feel about and react to them, and what emotional toll do these repeated failures exact? We conducted user experience studies across four U.S. locations (Atlanta, Gulf Coast, Miami Beach, and Tucson) representing distinct English dialect communities. Our findings reveal that most participants report technologies fail to consider their cultural backgrounds and require constant adjustment to achieve basic functionality. Despite these experiences, participants maintain high expectations for ASR performance and express strong willingness to contribute to model improvement. Qualitative analysis of open-ended narratives exposes the deeper costs of these failures. Participants report frustration, annoyance, and feelings of inadequacy, yet the emotional impact extends beyond momentary reactions. Participants recognize that systems were not designed for them, yet often internalize failures as personal inadequacy despite this critical awareness. They perform extensive invisible labor, including code-switching, hyper-articulation, and emotional management, to make failing systems functional. Meanwhile, their linguistic and cultural knowledge remains unrecognized by technologies that encode particular varieties as standard while rendering others marginal. These findings demonstrate that algorithmic fairness assessments based on accuracy metrics alone miss critical dimensions of harm: the emotional labor of managing repeated technological rejection, the cognitive burden of constant self-monitoring, and the psychological toll of feeling inadequate in one's native language variety.

URL PDF HTML ☆

赞 0 踩 0

2604.21108 2026-04-27 cs.CL cs.LG

Machine learning and emoji prediction: How much accuracy can MARBERT achieve?

Mohammed Q. Shormani, Ibrahim Abdulmalik Hassan Muneef Y. Alshawsh

Comments 15 pages, 4 Figures, 3 Tables

2604.21026 2026-04-27 cs.LG

MCAP: Deployment-Time Layer Profiling for Memory-Constrained LLM Inference

Anurita Das

Comments Code available at https://github.com/genovationtech/nve

2604.20133 2026-04-27 cs.AI

EvoAgent: An Evolvable Agent Framework with Skill Learning and Multi-Agent Delegation

Aimin Zhang, Jiajing Guo, Fuwei Jia, Chen Lv, Boyu Wang, Fangzheng Li

2604.20046 2026-04-27 cs.CV

Gaussians on a Diet: High-Quality Memory-Bounded 3D Gaussian Splatting Training

Yangming Zhang, Jian Xu, Chaojian Li, Kunxiong Zhu, Wei Niu, Gagan Agrawal, Yang Katie Zhao, Jian Wang, Yingyan Celine Lin, Miao Yin

2604.19858 2026-04-27 cs.CV

Wan-Image: Pushing the Boundaries of Generative Visual Intelligence

Chaojie Mao, Chen-Wei Xie, Chongyang Zhong, Haoyou Deng, Jiaxing Zhao, Jie Xiao, Jinbo Xing, Jingfeng Zhang, Jingren Zhou, Jingyi Zhang, Jun Dan, Kai Zhu, Kang Zhao, Keyu Yan, Minghui Chen, Pandeng Li, Shuangle Chen, Tong Shen, Yu Liu, Yue Jiang, Yulin Pan, Yuxiang Tuo, Zeyinzi Jiang, Zhen Han, Ang Wang, Bang Zhang, Baole Ai, Bin Wen, Boang Feng, Feiwu Yu, Gang Wang, Haiming Zhao, He Kang, Jianjing Xiang, Jianyuan Zeng, Jinkai Wang, Junjie Zhou, Ke Sun, Linqian Wu, Pei Gong, Pingyu Wu, Ruiwen Wu, Tongtong Su, Wenmeng Zhou, Wenting Shen, Wenyuan Yu, Xianjun Xu, Xiaoming Huang, Xiejie Shen, Xin Xu, Yan Kou, Yangyu Lv, Yifan Zhai, Yitong Huang, Yun Zheng, Yuntao Hong, Zhe Zhang, Zhicheng Zhang

2604.18919 2026-04-27 cs.CL

Proposing Topic Models and Evaluation Frameworks for Analyzing Associations with External Outcomes: An Application to Leadership Analysis Using Large-Scale Corporate Review Data

Yura Yoshida, Masato Kanai, Masataka Nakayama, Haruki Ohsawa, Yukiko Uchida, Arata Yuminaga, Gakuse Hoshina, Nobuo Sayama

2604.18130 2026-04-27 cs.LG cs.CE stat.AP

An `Inverse' Experimental Framework to Estimate Market Efficiency

Thomas Asikis, Heinrich H. Nax

Comments Minor fix: added co-author middle name for clarity

详情

英文摘要

Digital marketplaces processing billions of dollars annually represent critical infrastructure in sociotechnical ecosystems, yet their performance optimization lacks principled measurement frameworks that can inform algorithmic governance decisions regarding market efficiency and fairness from complex market data. By looking at orderbook data from double auction markets alone, because bids and asks do not represent true maximum willingnesses to buy and true minimum willingnesses to sell, there is little an economist can say about the market's actual performance in terms of allocative efficiency. We turn to experimental data to address this issue, `inverting' the standard induced value approach of double auction experiments. Our aim is to predict key market features relevant to market efficiency, particularly allocative efficiency, using orderbook data only -- specifically bids, asks and price realizations, but not the induced reservation values -- as early as possible. Since there is no established model of strategically optimal behavior in these markets, and because orderbook data is highly unstructured, non-stationary and non-linear, we propose quantile-based normalization techniques that help us build general predictive models. We develop and train several models, including linear regressions and gradient boosting trees, leveraging quantile-based input from the underlying supply-demand model. Our models can predict allocative efficiency with reasonable accuracy from the earliest bids and asks, and these predictions improve with additional realized price data. The performance of the prediction techniques varies by target and market type. Our framework holds significant potential for application to real-world market data, offering valuable insights into market efficiency and performance, even prior to any trade realizations.

URL PDF HTML ☆

赞 0 踩 0

2604.17801 2026-04-27 cs.CV

View-Consistent 3D Scene Editing via Dual-Path Structural Correspondense and Semantic Continuity

Pufan Li, Bi'an Du, Shenghe Zheng, Junyi Yao, Wei Hu

Comments Preprint. 10 pages, 7 figures

2604.17706 2026-04-27 cs.RO

OmniVLA-RL: A Vision-Language-Action Model with Spatial Understanding and Online RL

Haoxiang Jie, Yaoyuan Yan, Xiangyu Wei, Kailin Wang, Hongjie Yan, Zhiyou Heng, Daocheng Chen

2604.17578 2026-04-27 cs.LG math.ST stat.TH

Recovery Guarantees for Continual Learning of Dependent Tasks: Memory, Data-Dependent Regularization, and Data-Dependent Weights

Liangzu Peng, Uday Kiran Reddy Tadipatri, Ziqing Xu, Eric Eaton, René Vidal

2604.16989 2026-04-27 cs.CL cs.AI cs.LG cs.LO

Bolzano: Case Studies in LLM-Assisted Mathematical Research

Martin Balko, Jan Grebík, Pavel Hubáček, Martin Koutecký, Matěj Kripner, Václav Rozhoň, Robert Šámal, Adrián Zámečník

Comments 33 pages, 1 figure. Project page: https://bolzano.app

2604.16915 2026-04-27 cs.CV

KIRA: Knowledge-Intensive Image Retrieval and Reasoning Architecture for Specialized Visual Domains

Parthaw Goswami, Jaynto Goswami Deep

详情

Journal ref: CVPR 2026 2nd Workshop on Knowledge-Intensive Multimodal Reasoning

英文摘要

Retrieval augmented generation (RAG) has transformed text based question answering, yet its extension to visual domains remains hindered by fundamental challenges: bridging the modality gap between image queries and text heavy knowledge bases, constructing semantically meaningful visual knowledge bases, performing multihop reasoning over retrieved images, and verifying that generated answers are faithfully grounded in visual evidence. We present KIRA (Knowledge Intensive Image Retrieval and Reasoning Architecture), a unified five stage framework that addresses ten core problems in visual RAG for specialized domains. KIRA introduces: (1) hierarchical semantic chunking with DINO based region detection for multi granularity knowledge base construction, (2) domain adaptive contrastive encoders with fewshot adaptation for rare visual concepts, (3) dualpath crossmodal retrieval with chainOfThought query expansion, (4) chainOfRetrieval for multihop visual reasoning with temporal and multiview support, and (5) evidence conditioned grounded generation with posthoc hallucination verification. We also propose DOMAINVQAR, a benchmark suite that evaluates visual RAG along three axes (retrieval precision, reasoning faithfulness, and domain correctness) going beyond standard recall metrics. Experiments across four specialized domains (medical Xray, circuit diagrams, satellite imagery, and histopathology) with a progressive six variant ablation demonstrate that KIRA achieves 0.97 retrieval precision, 1.0 grounding scores, and 0.707 domain correctness averaged across domains, while the ablation reveals actionable insights about when each component helps and when components introduce precision diversity tradeoffs that must be managed. Code will be released upon acceptance.

URL PDF HTML ☆

赞 0 踩 0

2604.16377 2026-04-27 cs.CL cs.CY

GoCoMA: Hyperbolic Multimodal Representation Fusion for Large Language Model-Generated Code Attribution

Nitin Choudhury, Bikrant Bikram Pratap Maurya, Bhavinkumar Vinodbhai Kuwar, Arun Balaji Buduru

Comments Accepted to the International Conference on Multimedia & Expo (ICME) 2026

2604.14651 2026-04-27 cs.CL

CURA: Clinical Uncertainty Risk Alignment for Language Model-Based Risk Prediction

Sizhe Wang, Ziqi Xu, Claire Najjuuko, Charles Alba, Chenyang Lu

Comments Accepted at ACL 2026 Main Conference

2604.14306 2026-04-27 cs.CL cs.AI

EuropeMedQA Study Protocol: A Multilingual, Multimodal Medical Examination Dataset for Language Model Evaluation

Francesco Andrea Causio, Vittorio De Vita, Olivia Riccomi, Michele Ferramola, Federico Felizzi, Alessandro Tosi, Antonio Cristiano, Lorenzo De Mori, Chiara Battipaglia, Melissa Sawaya, Luigi De Angelis, Marcello Di Pumpo, Alessandra Piscitelli, Pietro Eric Risuleo, Alessia Longo, Giulia Vojvodic, Mariapia Vassalli, Bianca Destro Castaniti, Nicolò Scarsi, Manuel Del Medico

2604.13847 2026-04-27 cs.LG cs.AI

SparseBalance: Load-Balanced Long Context Training with Dynamic Sparse Attention

Hongtao Xu, Jianchao Tan, Yuxuan Hu, Pengju Lu, Hongyu Wang, Pingwei Sun, Yerui Sun, Yuchen Xie, Xunliang Cai, Mingzhen Li, Weile Jia

2604.12293 2026-04-27 cs.RO

Defining an Evaluation Method for External Human-Machine Interfaces

Jose Gonzalez-Belmonte, Jaerock Kwon

Comments 62 pages, 8 figures, 26 tables,

2604.11406 2026-04-27 cs.RO

Using Unwrapped Full Color Space Recording to Measure the Exposedness of Vehicle Exterior Parts for External Human Machine Interfaces

Jose Gonzalez-Belmonte, Jaerock Kwon

Comments 10 pages, 13 figures

2604.11103 2026-04-27 cs.SD cs.AI

ActorMind: Emulating Human Actor Reasoning for Speech Role-Playing

Xi Chen, Wei Xue, Yike Guo

2604.10649 2026-04-27 cs.LG cs.CL

SpectralLoRA: Is Low-Frequency Structure Sufficient for LoRA Adaptation? A Spectral Analysis of Weight Updates

Rajveer Singh

Comments v2: Added SVD-DCT correlation analysis (Pearson r=0.906, p<1e-9) connecting the empirical ~33% spectral constant to the Dyson Brownian Motion framework of Olsen et al. (2025); updated Section 7 and References. 11 pages, 6 figures, 7 tables. Indian Institute of Technology Roorkee

2604.10079 2026-04-27 cs.CL

Why Supervised Fine-Tuning Fails to Learn: A Systematic Study of Incomplete Learning in Large Language Models

Chao Xue, Yao Wang, Mengqiao Liu, Di Liang, Xingsheng Han, Peiyang Liu, Xianjie Wu, Chenyao Lu, Lei Jiang, Yu Lu, Haibo Shi, Shuang Liang, Minlong Peng, Flora D. Salim

Comments Accepted by ACL 2026 Main

2604.07599 2026-04-27 cs.RO

SANDO: Safe Autonomous Trajectory Planning for Dynamic Unknown Environments

Kota Kondo, Jesús Tordesillas, Jonathan P. How

Comments 20 pages, 17 figures

详情

英文摘要

SANDO is a safe trajectory planner for 3D dynamic unknown environments, where obstacle locations and motions are unknown a priori and a collision-free plan can become unsafe at any moment, requiring fast replanning. Existing soft-constraint planners are fast but cannot guarantee collision-free paths, while hard-constraint methods ensure safety at the cost of longer computation. SANDO addresses this trade-off through three contributions. First, a heat map-based A* global planner steers paths away from high-risk regions using soft costs, and a spatiotemporal safe flight corridor (STSFC) generator produces time-layered polytopes that inflate obstacles only by their worst-case reachable set at each time layer, rather than by the worst case over the entire horizon. Second, trajectory optimization is formulated as a Mixed-Integer Quadratic Program (MIQP) with hard collision-avoidance constraints, and a variable elimination technique reduces the number of decision variables, enabling fast computation. Third, a formal safety analysis establishes collision-free guarantees under explicit velocity-bound and estimation-error assumptions. Ablation studies show that variable elimination yields up to 7.4x speedup in optimization time, and that STSFCs are critical for feasibility in dense dynamic environments. Benchmark simulations against state-of-the-art methods across standardized static benchmarks, obstacle-rich static forests, and dynamic environments show that SANDO consistently achieves the highest success rate with no constraint violations across all difficulty levels; perception-only experiments without ground truth obstacle information confirm robust performance under realistic sensing. Hardware experiments on a UAV with fully onboard planning, perception, and localization demonstrate six safe flights in static environments and ten safe flights among dynamic obstacles.

URL PDF HTML ☆

赞 0 踩 0

2604.01907 2026-04-27 cs.CV cs.AI

Lifting Unlabeled Internet-level Data for 3D Scene Understanding

Yixin Chen, Yaowei Zhang, Huangyue Yu, Junchao He, Yan Wang, Jiangyong Huang, Hongyu Shen, Junfeng Ni, Shaofei Wang, Baoxiong Jia, Song-Chun Zhu, Siyuan Huang

Comments CVPR 2026. Project page: https://sv-pp.github.io/

2604.01608 2026-04-27 cs.AI

From Multi-Agent to Single-Agent: When Is Skill Distillation Beneficial?

Binyan Xu, Dong Fang, Haitao Li, Kehuan Zhang

Comments 35 pages, 15 figures, 10 tables

2603.29171 2026-04-27 cs.CV cs.LG

Segmentation of Gray Matters and White Matters from Brain MRI data

Chang Sun, Rui Shi, Tsukasa Koike, Tetsuro Sekine, Akio Morita, Tetsuya Sakai

2603.28584 2026-04-27 cs.CV

ORSIFlow: Saliency-Guided Rectified Flow for Optical Remote Sensing Salient Object Detection

Haojing Chen, Zhihang Liu, Yutong Li, Tao Tan, Haoyu Bian, Qiuju Ma

2603.28258 2026-04-27 cs.CL cs.AI

Categorical Perception in Large Language Model Hidden States: Structural Warping at Digit-Count Boundaries

Jon-Paul Cacioli

Comments 25 pages, 5 figures, 7 tables. Pre-registered on OSF (osf.io/qrxf3). Code at https://github.com/synthiumjp/weber

2603.26041 2026-04-27 cs.CV

Rethinking Token Pruning for Historical Screenshots in GUI Visual Agents: Semantic, Spatial, and Temporal Perspectives

Daiqiang Li, Zihao Pan, Zeyu Zhang, Ronghao Chen, Huacan Wang, Honggang Chen, Haiyun Jiang

2603.19500 2026-04-27 cs.AI cs.CV cs.GR cs.LG

Teaching an Agent to Sketch One Part at a Time

Xiaodan Du, Ruize Xu, David Yunis, Yael Vinker, Greg Shakhnarovich