arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2603.22629 2026-03-25 cs.CL cs.AI

LGSE: Lexically Grounded Subword Embedding Initialization for Low-Resource Language Adaptation

Hailay Teklehaymanot, Dren Fazlija, Wolfgang Nejdl

Comments 12 pages, 1 figure, 1 Table

详情

英文摘要

Adapting pretrained language models to low-resource, morphologically rich languages remains a significant challenge. Existing vocabulary expansion methods typically rely on arbitrarily segmented subword units, resulting in fragmented lexical representations and loss of critical morphological information. To address this limitation, we propose the Lexically Grounded Subword Embedding Initialization (LGSE) framework, which introduces morphologically informed segmentation for initializing embeddings of novel tokens. Instead of using random vectors or arbitrary subwords, LGSE decomposes words into their constituent morphemes and constructs semantically coherent embeddings by averaging pretrained subword or FastText-based morpheme representations. When a token cannot be segmented into meaningful morphemes, its embedding is constructed using character n-gram representations to capture structural information. During Language-Adaptive Pretraining, we apply a regularization term that penalizes large deviations of newly introduced embeddings from their initialized values, preserving alignment with the original pretrained embedding space while enabling adaptation to the target language. To isolate the effect of initialization, we retain the original pre-trained model vocabulary and tokenizer and update only the new embeddings during adaptation. We evaluate LGSE on three NLP tasks: Question Answering, Named Entity Recognition, and Text Classification, in two morphologically rich, low-resource languages: Amharic and Tigrinya, where morphological segmentation resources are available. Experimental results show that LGSE consistently outperforms baseline methods across all tasks, demonstrating the effectiveness of morphologically grounded embedding initialization for improving representation quality in underrepresented languages. Project resources are available in the GitHub link.

URL PDF HTML ☆

赞 0 踩 0

2603.22626 2026-03-25 cs.CV

PIVM: Diffusion-Based Prior-Integrated Variation Modeling for Anatomically Precise Abdominal CT Synthesis

Dinglun He, Baoming Zhang, Xu Wang, Yao Hao, Deshan Yang, Ye Duan

Comments Accepted at the IEEE International Symposium on Biomedical Imaging (ISBI) 2026 (Oral). Equal contribution by the first three authors

2603.22624 2026-03-25 cs.CV cs.AI

Toward Faithful Segmentation Attribution via Benchmarking and Dual-Evidence Fusion

Abu Noman Md Sakib, OFM Riaz Rahman Aranya, Kevin Desai, Zijie Zhang

Journal ref CVPR 2026

2603.22623 2026-03-25 cs.CV cs.AI

To Agree or To Be Right? The Grounding-Sycophancy Tradeoff in Medical Vision-Language Models

OFM Riaz Rahman Aranya, Kevin Desai

2603.22622 2026-03-25 cs.CV

A Vision Language Model for Generating Procedural Plant Architecture Representations from Simulated Images

Heesup Yun, Isaac Kazuo Uyehara, Ioannis Droutsas, Earl Ranario, Christine H. Diepenbrock, Brian N. Bailey, J. Mason Earles

详情

英文摘要

Three-dimensional (3D) procedural plant architecture models have emerged as an important tool for simulation-based studies of plant structure and function, extracting plant architectural parameters from field measurements, and for generating realistic plants in computer graphics. However, measuring the architectural parameters and nested structures for these models at the field scales remains prohibitively labor-intensive. We present a novel algorithm that generates a 3D plant architecture from an image, creating a functional structural plant model that reflects organ-level geometric and topological parameters and provides a more comprehensive representation of the plant's architecture. Instead of using 3D sensors or processing multi-view images with computer vision to obtain the 3D structure of plants, we proposed a method that generates token sequences that encode a procedural definition of plant architecture. This work used only synthetic images for training and testing, with exact architectural parameters known, allowing testing of the hypothesis that organ-level architectural parameters could be extracted from image data using a vision-language model (VLM). A synthetic dataset of cowpea plant images was generated using the Helios 3D plant simulator, with the detailed plant architecture encoded in XML files. We developed a plant architecture tokenizer for the XML file defining plant architecture, converting it into a token sequence that a language model can predict. The model achieved a token F1 score of 0.73 during teacher-forced training. Evaluation of the model was performed through autoregressive generation, achieving a BLEU-4 score of 94.00% and a ROUGE-L score of 0.5182. This led to the conclusion that such plant architecture model generation and parameter extraction were possible from synthetic images; thus, future work will extend the approach to real imagery data.

URL PDF HTML ☆

赞 0 踩 0

2603.22621 2026-03-25 cs.LG

Transfer learning via interpolating structures

T. A. Dardeno, A. J. Hughes, L. A. Bull, R. S. Mills, N. Dervilis, K. Worden

Comments preprint submitted to Mechanical Systems and Signal Processing

2603.22619 2026-03-25 cs.AI

Bridging the Know-Act Gap via Task-Level Autoregressive Reasoning

Jihyun Janice Ahn, Ryo Kamoi, Berk Atil, Renze Lou, WonWoo Kang, Heehyun Park, Sarkar Snigdha Sarathi Das, Zhuoyang Zou, Xiaoxin Lu, Yusen Zhang, Asfahan Shah, Ridwanul Hasan Tanvir, Lingxiao Zhao, Hongxi Huang, Vignesh Venkatesh, Dianjun Lin, Hamid Shah, Wentao Wang, Zhanpeng Song, Joshua Reed Bassin, Dax Patel, Ishan Appareddy Agrahar, Sahil Pardasani, Xin Dong, Fatemeh Rahbari, Benjamin David Rishel, Soochan Andrew Lee, Yuv Boghani, Ali B. AlNaseeb, Pranav Suby, Seokhyeon Bae, Shreya Buddharaju, Damien Kula, Soumyadeep Das, Hanyang Frank Liu, Faye Mo, Wenpeng Yin

Comments 12 pages

2603.22606 2026-03-25 cs.CV

TrajLoom: Dense Future Trajectory Generation from Video

Zewei Zhang, Jia Jun Cheng Xian, Kaiwen Liu, Ming Liang, Hang Chu, Jun Chen, Renjie Liao

Comments Project page, code, model checkpoints, and datasets: https://trajloom.github.io/

2603.22604 2026-03-25 cs.RO

Trajectory Generation for Underactuated Soft Robot Manipulators using Discrete Elastic Rod Dynamics

Beibei Liu, Akua K. Dickson, Ran Jing, Andrew P. Sabelhaus

2603.22590 2026-03-25 cs.LG cs.CR eess.AS

Precision-Varying Prediction (PVP): Robustifying ASR systems against adversarial attacks

Matías Pizarro, Raghavan Narasimhan, Asja Fischer

2603.22589 2026-03-25 cs.SD eess.AS eess.SP

Velocity Potential Neural Field for Efficient Ambisonics Impulse Response Modeling

Yoshiki Masuyama, Francois G. Germain, Gordon Wichern, Chiori Hori, Jonathan Le Roux

Comments Accepted to ICASSP 2026

2603.22583 2026-03-25 cs.CV cs.RO

A vision-language model and platform for temporally mapping surgery from video

Dani Kiyasseh

2603.22582 2026-03-25 cs.CL cs.AI

Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models?

Richard J. Young

Comments 27 pages, 7 figures, 12 tables

2603.22580 2026-03-25 cs.RO

Task-Agnostic Exoskeleton Control Supports Elderly Joint Energetics during Hip-Intensive Tasks

Jiefu Zhang, Nikhil V. Divekar, Chandramouli Krishnan, Robert D. Gregg

2603.22576 2026-03-25 cs.CL

CAPITU: A Benchmark for Evaluating Instruction-Following in Brazilian Portuguese with Literary Context

Giovana Kerche Bonás, Roseval Malaquias Junior, Marcos Piau, Thiago Laitz, Thales Sales Almeida, Hugo Abonizio, Celio Larcher, Ramon Pires, Rodrigo Nogueira

2603.22574 2026-03-25 cs.RO

GIFT: Generalizing Intent for Flexible Test-Time Rewards

Fin Amin, Nathaniel Dennler, Andreea Bobu

Comments To appear at IEEE ICRA '26

2603.22572 2026-03-25 cs.CV

FullCircle: Effortless 3D Reconstruction from Casual 360$^\circ$ Captures

Yalda Foroutan, Ipek Oztas, Daniel Rebain, Aysegul Dundar, Kwang Moo Yi, Lily Goli, Andrea Tagliasacchi

2603.22566 2026-03-25 cs.CL

Reddit After Roe: A Computational Analysis of Abortion Narratives and Barriers in the Wake of Dobbs

Aria Pessianzadeh, Alex H. Poole, Rezvaneh Rezapour

2603.22561 2026-03-25 cs.AI

AI Mental Models: Learned Intuition and Deliberation in a Bounded Neural Architecture

Laurence Anthony

2603.22539 2026-03-25 cs.CV

Generalized multi-object classification and tracking with sparse feature resonator networks

Lazar Supic, Alec Mullen, E. Paxon Frady

Comments 6 pages, 2 figures, NICE 2026

详情

英文摘要

In visual scene understanding tasks, it is essential to capture both invariant and equivariant structure. While neural networks are frequently trained to achieve invariance to transformations such as translation, this often comes at the cost of losing access to equivariant information - e.g., the precise location of an object. Moreover, invariance is not naturally guaranteed through supervised learning alone, and many architectures generalize poorly to input transformations not encountered during training. Here, we take an approach based on analysis-by-synthesis and factoring using resonator networks. A generative model describes the construction of simple scenes containing MNIST digits and their transformations, like color and position. The resonator network inverts the generative model, and provides both invariant and equivariant information about particular objects. Sparse features learned from training data act as a basis set to provide flexibility in representing variable shapes of objects, allowing the resonator network to handle previously unseen digit shapes from the test set. The modular structure provides a shape module which contains information about the object shape with translation factored out, allowing a simple classifier to operate on centered digits. The classification layer is trained solely on centered data, requiring much less training data, and the network as a whole can identify objects with arbitrary translations without data augmentation. The natural attention-like mechanism of the resonator network also allows for analysis of scenes with multiple objects, where the network dynamics selects and centers only one object at a time. Further, the specific position information of a particular object can be extracted from the translation module, and we show that the resonator can be designed to track multiple moving objects with precision of a few pixels.

URL PDF HTML ☆

赞 0 踩 0

2603.22531 2026-03-25 cs.CV

UrbanVGGT: Scalable Sidewalk Width Estimation from Street View Images

Kaizhen Tan, Fan Zhang

2603.22529 2026-03-25 cs.CV cs.AI cs.CL

Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos

Shoubin Yu, Lei Shu, Antoine Yang, Yao Fu, Srinivas Sunkara, Maria Wang, Jindong Chen, Mohit Bansal, Boqing Gong

Comments CVPR 2026. Project page: https://ego2web.github.io/

详情

英文摘要

Multimodal AI agents are increasingly automating complex real-world workflows that involve online web execution. However, current web-agent benchmarks suffer from a critical limitation: they focus entirely on web-based interaction and perception, lacking grounding in the user's real-world physical surroundings. This limitation prevents evaluation in crucial scenarios, such as when an agent must use egocentric visual perception (e.g., via AR glasses) to recognize an object in the user's surroundings and then complete a related task online. To address this gap, we introduce Ego2Web, the first benchmark designed to bridge egocentric video perception and web agent execution. Ego2Web pairs real-world first-person video recordings with web tasks that require visual understanding, web task planning, and interaction in an online environment for successful completion. We utilize an automatic data-generation pipeline combined with human verification and refinement to curate well-constructed, high-quality video-task pairs across diverse web task types, including e-commerce, media retrieval, knowledge lookup, etc. To facilitate accurate and scalable evaluation for our benchmark, we also develop a novel LLM-as-a-Judge automatic evaluation method, Ego2WebJudge, which achieves approximately 84% agreement with human judgment, substantially higher than existing evaluation methods. Experiments with diverse SoTA agents on our Ego2Web show that their performance is weak, with substantial headroom across all task categories. We also conduct a comprehensive ablation study on task design, highlighting the necessity of accurate video understanding in the proposed task and the limitations of current agents. We hope Ego2Web can be a critical new resource for developing truly capable AI assistants that can seamlessly see, understand, and act across the physical and digital worlds.

URL PDF HTML ☆

赞 0 踩 0

2603.22527 2026-03-25 cs.RO cs.CV

Learning Sidewalk Autopilot from Multi-Scale Imitation with Corrective Behavior Expansion

Honglin He, Yukai Ma, Brad Squicciarini, Wayne Wu, Bolei Zhou

2603.22525 2026-03-25 cs.LG cs.CR

Adversarial Vulnerabilities in Neural Operator Digital Twins: Gradient-Free Attacks on Nuclear Thermal-Hydraulic Surrogates

Samrendra Roy, Kazuma Kobayashi, Souvik Chakraborty, Rizwan-uddin, Syed Bahauddin Alam

Comments 56 pages, 14 figures, 22 tables

详情

英文摘要

Operator learning models are rapidly emerging as the predictive core of digital twins for nuclear and energy systems, promising real-time field reconstruction from sparse sensor measurements. Yet their robustness to adversarial perturbations remains uncharacterized, a critical gap for deployment in safety-critical systems. Here we show that neural operators are acutely vulnerable to extremely sparse (fewer than 1% of inputs), physically plausible perturbations that exploit their sensitivity to boundary conditions. Using gradient-free differential evolution across four operator architectures, we demonstrate that minimal modifications trigger catastrophic prediction failures, increasing relative $L_2$ error from $\sim$1.5% (validated accuracy) to 37-63% while remaining completely undetectable by standard validation metrics. Notably, 100% of successful single-point attacks pass z-score anomaly detection. We introduce the effective perturbation dimension $d_{\text{eff}}$, a Jacobian-based diagnostic that, together with sensitivity magnitude, yields a two-factor vulnerability model explaining why architectures with extreme sensitivity concentration (POD-DeepONet, $d_{\text{eff}} \approx 1$) are not necessarily the most exploitable, since low-rank output projections cap maximum error, while moderate concentration with sufficient amplification (S-DeepONet, $d_{\text{eff}} \approx 4$) produces the highest attack success. Gradient-free search outperforms gradient-based alternatives (PGD) on architectures with gradient pathologies, while random perturbations of equal magnitude achieve near-zero success rates, confirming that the discovered vulnerabilities are structural. Our findings expose a previously overlooked attack surface in operator learning models and establish that these models require robustness guarantees beyond standard validation before deployment.

URL PDF HTML ☆

赞 0 踩 0

2603.22518 2026-03-25 cs.CV cs.AI

High Resolution Flood Extent Detection Using Deep Learning with Random Forest Derived Training Labels

Azizbek Nuriddinov, Ebrahim Ahmadisharaf, Mohammad Reza Alizadeh

Comments Accepted to IGARSS 2026

2603.22509 2026-03-25 cs.CV

Sketch2CT: Multimodal Diffusion for Structure-Aware 3D Medical Volume Generation

Delin An, Chaoli Wang

2603.22507 2026-03-25 cs.RO cs.MA

Energy-Aware Collaborative Exploration for a UAV-UGV Team

Cahit Ikbal Er, Saikiran Juttu, Yasin Yazicioglu

2603.22502 2026-03-25 cs.RO

MapForest: A Modular Field Robotics System for Forest Mapping and Invasive Species Localization

Sandeep Zachariah, Francisco Yandun, Sachet Korada, Abhisesh Silwal

Comments 8 pages, 9 figures. Under review

2603.22497 2026-03-25 cs.CL

Rashid: A Cipher-Based Framework for Exploring In-Context Language Learning

Niyati Bafna, Ryan Soh-Eun Shim, Barbara Plank, David Yarowsky, Hale Sirin

2603.22472 2026-03-25 cs.RO cs.LG cs.MA

Wake Up to the Past: Using Memory to Model Fluid Wake Effects on Robots

Luca Vendruscolo, Eduardo Sebastián, Amanda Prorok, Ajay Shankar

Comments 8 pages, 7 figures. Submitted to IROS 2026. Project website: https://sites.google.com/view/wake-up-to-the-past