arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.27838 2026-03-31 cs.CL

ProText: A benchmark dataset for measuring (mis)gendering in long-form texts

Hadas Kotek, Margit Bowler, Patrick Sonnenberg, Yu'an Yang

Comments 13 pages, 10 figures, 6 tables

详情

英文摘要

We introduce ProText, a dataset for measuring gendering and misgendering in stylistically diverse long-form English texts. ProText spans three dimensions: Theme nouns (names, occupations, titles, kinship terms), Theme category (stereotypically male, stereotypically female, gender-neutral/non-gendered), and Pronoun category (masculine, feminine, gender-neutral, none). The dataset is designed to probe (mis)gendering in text transformations such as summarization and rewrites using state-of-the-art Large Language Models, extending beyond traditional pronoun resolution benchmarks and beyond the gender binary. We validated ProText through a mini case study, showing that even with just two prompts and two models, we can draw nuanced insights regarding gender bias, stereotyping, misgendering, and gendering. We reveal systematic gender bias, particularly when inputs contain no explicit gender cues or when models default to heteronormative assumptions.

URL PDF HTML ☆

赞 0 踩 0

2603.27832 2026-03-31 cs.CV cs.LG

3-D Representations for Hyperspectral Flame Tomography

Nicolas Tricard, Zituo Chen, Sili Deng

Comments 7 pages, 2 figures, 1 table

2603.27819 2026-03-31 cs.LG cs.AI cs.CL

KVSculpt: KV Cache Compression as Distillation

Bo Jiang, Sian Jin

2603.27818 2026-03-31 cs.CV cs.RO

Benchmarking Multi-View BEV Object Detection with Mixed Pinhole and Fisheye Cameras

Xiangzhong Liu, Hao Shen

Comments 8 pages,5 figures, IEEE International Conference on Robotics and Automation (ICRA),Vienna, Austria, 1-5 June 2026

2603.27814 2026-03-31 cs.LG stat.ML

RG-TTA: Regime-Guided Meta-Control for Test-Time Adaptation in Streaming Time Series

Indar Kumar, Akanksha Tiwari, Sai Krishna Jasti, Ankit Hemant Lade

Comments 18 pages, 8 figures

详情

英文摘要

Test-time adaptation (TTA) enables neural forecasters to adapt to distribution shifts in streaming time series, but existing methods apply the same adaptation intensity regardless of the nature of the shift. We propose Regime-Guided Test-Time Adaptation (RG-TTA), a meta-controller that continuously modulates adaptation intensity based on distributional similarity to previously-seen regimes. Using an ensemble of Kolmogorov-Smirnov, Wasserstein-1, feature-distance, and variance-ratio metrics, RG-TTA computes a similarity score for each incoming batch and uses it to (i) smoothly scale the learning rate -- more aggressive for novel distributions, conservative for familiar ones -- and (ii) control gradient effort via loss-driven early stopping rather than fixed budgets, allowing the system to allocate exactly the effort each batch requires. As a supplementary mechanism, RG-TTA gates checkpoint reuse from a regime memory, loading stored specialist models only when they demonstrably outperform the current model (loss improvement >= 30%). RG-TTA is model-agnostic and strategy-composable: it wraps any forecaster exposing train/predict/save/load interfaces and enhances any gradient-based TTA method. We demonstrate three compositions -- RG-TTA, RG-EWC, and RG-DynaTTA -- and evaluate 6 update policies (3 baselines + 3 regime-guided variants) across 4 compact architectures (GRU, iTransformer, PatchTST, DLinear), 14 datasets (6 real-world multivariate benchmarks + 8 synthetic regime scenarios), and 4 forecast horizons (96, 192, 336, 720) under a streaming evaluation protocol with 3 random seeds (672 experiments total). Regime-guided policies achieve the lowest MSE in 156 of 224 seed-averaged experiments (69.6%), with RG-EWC winning 30.4% and RG-TTA winning 29.0%. Overall, RG-TTA reduces MSE by 5.7% vs TTA while running 5.5% faster; RG-EWC reduces MSE by 14.1% vs standalone EWC.

URL PDF HTML ☆

赞 0 踩 0

2603.27813 2026-03-31 cs.CV

MuSEAgent: A Multimodal Reasoning Agent with Stateful Experiences

Shijian Wang, Jiarui Jin, Runhao Fu, Zexuan Yan, Xingjian Wang, Mengkang Hu, Eric Wang, Xiaoxi Li, Kangning Zhang, Li Yao, Wenxiang Jiao, Xuelian Cheng, Yuan Lu, Zongyuan Ge

2603.27811 2026-03-31 cs.CV cs.LG cs.NI

Tracking without Seeing: Geospatial Inference using Encrypted Traffic from Distributed Nodes

Sadik Yagiz Yetim, Gaofeng Dong, Isaac-Neil Zanoria, Ronit Barman, Maggie Wigness, Tarek Abdelzaher, Mani Srivastava, Suhas Diggavi

2603.27809 2026-03-31 cs.CL

Conversational Agents and the Understanding of Human Language: Reflections on AI, LLMs, and Cognitive Science

Andrei Popescu-Belis

Comments 7 pages

2603.27808 2026-03-31 cs.RO

Probe-to-Grasp Manipulation Using Self-Sensing Pneumatic Variable-Stiffness Joints

Ngoc Duy Tran, Yeman Fan, Feng Dai, Khang Nguyen, Anh Nguyen, Hoang Hiep Ly, Tung D. Ta, Shigeru Chiba

2603.27806 2026-03-31 cs.CL cs.CY

Understanding Teacher Revisions of Large Language Model-Generated Feedback

Conrad Borchers, Luiz Rodrigues, Newarney Torrezão da Costa, Cleon Xavier, Rafael Ferreira Mello

Comments Accepted as full paper to the 27th International Conference on Artificial Intelligence in Education (AIED 2026)

2603.27803 2026-03-31 cs.LG cs.MA cs.SY eess.SY math.OC

Distributed Online Submodular Maximization under Communication Delays: A Simultaneous Decision-Making Approach

Zirui Xu, Vasileios Tzoumas

Comments Accepted to ACC 2026

2603.27800 2026-03-31 cs.CV

Diversity Matters: Dataset Diversification and Dual-Branch Network for Generalized AI-Generated Image Detection

Nusrat Tasnim, Kutub Uddin, Khalid Malik

2603.27798 2026-03-31 cs.CV cs.AI cs.ET cs.HC eess.IV

Towards Emotion Recognition with 3D Pointclouds Obtained from Facial Expression Images

Laura Rayón Ropero, Jasper De Laet, Filip Lemic, Pau Sabater Nácher, Nabeel Nisar Bhat, Sergi Abadal, Jeroen Famaey, Eduard Alarcón, Xavier Costa-Pérez

Comments 18 pages, 12 figures, 2 tables. Accepted for publication at IEEE Transactions on Affective Computing

详情

DOI: 10.1109/TAFFC.2026.3679039.

英文摘要

Facial Emotion Recognition is a critical research area within Affective Computing due to its wide-ranging applications in Human Computer Interaction, mental health assessment and fatigue monitoring. Current FER methods predominantly rely on Deep Learning techniques trained on 2D image data, which pose significant privacy concerns and are unsuitable for continuous, real-time monitoring. As an alternative, we propose High-Frequency Wireless Sensing (HFWS) as an enabler of continuous, privacy-aware FER, through the generation of detailed 3D facial pointclouds via on-person sensors embedded in wearables. We present arguments supporting the privacy advantages of HFWS over traditional 2D imaging, particularly under increasingly stringent data protection regulations. A major barrier to adopting HFWS for FER is the scarcity of labeled 3D FER datasets. Towards addressing this issue, we introduce a FLAME-based method to generate 3D facial pointclouds from existing public 2D datasets. Using this approach, we create AffectNet3D, a 3D version of the AffectNet database. To evaluate the quality and usability of the generated data, we design a pointcloud refinement pipeline focused on isolating the facial region, and train the popular PointNet++ model on the refined pointclouds. Fine-tuning the model on a small subset of the unseen 3D FER dataset BU-3DFE yields a classification accuracy exceeding 70%, comparable to oracle-level performance. To further investigate the potential of HFWS-based FER for continuous monitoring, we simulate wearable sensing conditions by masking portions of the generated pointclouds. Experimental results show that models trained on AffectNet3D and fine-tuned with just 25% of BU-3DFE outperform those trained solely on BU-3DFE. These findings highlight the viability of our pipeline and support the feasibility of continuous, privacy-aware FER via wearable HFWS systems.

URL PDF HTML ☆

赞 0 踩 0

2603.27797 2026-03-31 cs.RO

Which Reconstruction Model Should a Robot Use? Routing Image-to-3D Models for Cost-Aware Robotic Manipulation

Akash Anand, Aditya Agarwal, Leslie Pack Kaelbling

Comments 8 pages, 7 tables, 3 figures. Supplementary material included. Project page: https://scout-model-routing.github.io

2603.27796 2026-03-31 cs.RO

Spectral Decomposition of Inverse Dynamics for Fast Exploration in Model-Based Manipulation

Solvin Sigurdson, Benjamin Riviere, Joel Burdick

Comments 8 pages, 8 figures, accepted to the 2026 IEEE International Conference on Robotics and Automation

2603.27792 2026-03-31 cs.LG cs.AI stat.ML

What-If Explanations Over Time: Counterfactuals for Time Series Classification

Udo Schlegel, Thomas Seidl

Comments 24 pages, 1 figure, 3 tables, accepted at the XAI 2026

2603.27790 2026-03-31 cs.CV

Inference-time Trajectory Optimization for Manga Image Editing

Ryosuke Furuta

2603.27781 2026-03-31 cs.CV

GS3LAM: Gaussian Semantic Splatting SLAM

Linfei Li, Lin Zhang, Zhong Wang, Ying Shen

Comments Accepted by ACM MM 2024

详情

DOI: 10.1145/3664647.3680739

英文摘要

Recently, the multi-modal fusion of RGB, depth, and semantics has shown great potential in dense Simultaneous Localization and Mapping (SLAM). However, a prerequisite for generating consistent semantic maps is the availability of dense, efficient, and scalable scene representations. Existing semantic SLAM systems based on explicit representations are often limited by resolution and an inability to predict unknown areas. Conversely, implicit representations typically rely on time-consuming ray tracing, failing to meet real-time requirements. Fortunately, 3D Gaussian Splatting (3DGS) has emerged as a promising representation that combines the efficiency of point-based methods with the continuity of geometric structures. To this end, we propose GS3LAM, a Gaussian Semantic Splatting SLAM framework that processes multimodal data to render consistent, dense semantic maps in real-time. GS3LAM models the scene as a Semantic Gaussian Field (SG-Field) and jointly optimizes camera poses and the field via multimodal error constraints. Furthermore, a Depth-adaptive Scale Regularization (DSR) scheme is introduced to resolve misalignments between scale-invariant Gaussians and geometric surfaces. To mitigate catastrophic forgetting, we propose a Random Sampling-based Keyframe Mapping (RSKM) strategy, which demonstrates superior performance over common local covisibility optimization methods. Extensive experiments on benchmark datasets show that GS3LAM achieves increased tracking robustness, superior rendering quality, and enhanced semantic precision compared to state-of-the-art methods. Source code is available at https://github.com/lif314/GS3LAM.

URL PDF HTML ☆

赞 0 踩 0

2603.27773 2026-03-31 cs.CV

RINO: Rotation-Invariant Non-Rigid Correspondences

Maolin Gao, Shao Jie Hu-Chen, Congyue Deng, Riccardo Marin, Leonidas Guibas, Daniel Cremers

Comments 17 pages, 36 Figures, Computer Vision and Pattern Recognition (CVPR) 2026

2603.27770 2026-03-31 cs.RO

Transferability Through Cooperative Competitions

Rodrigo Serra, Carlos Azevedo, André Silva, Kevin Alcedo, Quentin Rouxel, Peter So, Alejandro Suarez, Alin Albu-Schäeffer, Pedro U. Lima

Comments Description of the cooperative competition concept, with a case study in EU project euROBIN, held in Nancy, November 2024

2603.26542 2026-03-31 cs.RO cs.AI cs.MA math.OC

The Multi-AMR Buffer Storage, Retrieval, and Reshuffling Problem: Exact and Heuristic Approaches

Max Disselnmeyer, Thomas Bömer, Laura Dörr, Bastian Amberg, Anne Meyer

Comments 52 pages, 15 figures and tables

2603.26476 2026-03-31 cs.LG

Shapley meets Rawls: an integrated framework for measuring and explaining unfairness

Fadoua Amri-Jouidel, Emmanuel Kemel, Stéphane Mussard

2603.26425 2026-03-31 cs.CV cs.AI

CPUBone: Efficient Vision Backbone Design for Devices with Low Parallelization Capabilities

Moritz Nottebaum, Matteo Dunnhofer, Christian Micheloni

Comments Accepted at CVPR Findings 2026

2603.26285 2026-03-31 cs.CV cs.AI

PhysVid: Physics Aware Local Conditioning for Generative Video Models

Saurabh Pathak, Elahe Arani, Mykola Pechenizkiy, Bahram Zonooz

Comments Accepted for publication in CVPR 2026

2603.26136 2026-03-31 cs.LG

PEANUT: Perturbations by Eigenvector Alignment for Attacking Graph Neural Networks Under Topology-Driven Message Passing

Bhavya Kohli, Biplab Sikdar

Comments This work is a preprint. 8 content pages, 12 total pages including references

2603.26068 2026-03-31 cs.CV

PAD-Hand: Physics-Aware Diffusion for Hand Motion Recovery

Elkhan Ismayilzada, Yufei Zhang, Zijun Cui

Comments Accepted to CVPR 2026

2603.25750 2026-03-31 cs.SD cs.AI eess.AS

Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models

Kyudan Jung, Jihwan Kim, Soyoon Kim, Jeonghoon Kim, Jaegul Choo, Cheonbok Park

Comments 34 pages, 7 figures, 11 tables

2603.25741 2026-03-31 cs.CV cs.AI cs.RO

Vega: Learning to Drive with Natural Language Instructions

Sicheng Zuo, Yuxuan Li, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu

Comments Code is available at https://github.com/zuosc19/Vega

2603.25706 2026-03-31 cs.CV

Wan-Weaver: Interleaved Multi-modal Generation via Decoupled Training

Jinbo Xing, Zeyinzi Jiang, Yuxiang Tuo, Chaojie Mao, Xiaotang Gai, Xi Chen, Jingfeng Zhang, Yulin Pan, Zhen Han, Jie Xiao, Keyu Yan, Chenwei Xie, Chongyang Zhong, Kai Zhu, Tong Shen, Lianghua Huang, Yu Liu, Yujiu Yang

Comments CVPR 2026 Camera-ready, Webpage: https://doubiiu.github.io/projects/WanWeaver

2603.25008 2026-03-31 cs.CV cs.AI

Few TensoRF: Enhance the Few-shot on Tensorial Radiance Fields

Thanh-Hai Le, Hoang-Hau Tran, Trong-Nghia Vu

Comments 11 pages, 8 figures