arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2601.03156 2026-01-28 cs.LG cs.AI cs.CL cs.CY

Prompt-Counterfactual Explanations for Generative AI System Behavior

Sofie Goethals, Foster Provost, João Sedoc

详情

英文摘要

As generative AI systems become integrated into real-world applications, organizations increasingly need to be able to understand and interpret their behavior. In particular, decision-makers need to understand what causes generative AI systems to exhibit specific output characteristics. Within this general topic, this paper examines a key question: what is it about the input -- the prompt -- that causes an LLM-based generative AI system to produce output that exhibits specific characteristics, such as toxicity, negative sentiment, or political bias. To examine this question, we adapt a common technique from the Explainable AI literature: counterfactual explanations. We explain why traditional counterfactual explanations cannot be applied directly to generative AI systems, due to several differences in how generative AI systems function. We then propose a flexible framework that adapts counterfactual explanations to non-deterministic, generative AI systems in scenarios where downstream classifiers can reveal key characteristics of their outputs. Based on this framework, we introduce an algorithm for generating prompt-counterfactual explanations (PCEs). Finally, we demonstrate the production of counterfactual explanations for generative AI systems with three case studies, examining different output characteristics (viz., political leaning, toxicity, and sentiment). The case studies further show that PCEs can streamline prompt engineering to suppress undesirable output characteristics and can enhance red-teaming efforts to uncover additional prompts that elicit undesirable outputs. Ultimately, this work lays a foundation for prompt-focused interpretability in generative AI: a capability that will become indispensable as these models are entrusted with higher-stakes tasks and subject to emerging regulatory requirements for transparency and accountability.

URL PDF HTML ☆

赞 0 踩 0

2601.01857 2026-01-28 cs.AI

Jenius Agent: Towards Experience-Driven Accuracy Optimization in Real-World Scenarios

Defei Xia, Bingfeng Pi, Shenbin Zhang, Song Hua, Yunfei Wei, Lei Zuo

2512.14961 2026-01-28 cs.CV cs.SD eess.AS eess.IV

Adaptive Multimodal Person Recognition: A Robust Framework for Handling Missing Modalities

Aref Farhadipour, Teodora Vukovic, Volker Dellwo, Petr Motlicek, Srikanth Madikeri

Comments 9 pages and 8 tables

2512.11194 2026-01-28 cs.LG cs.CV

Beyond Memorization: Selective Learning for Copyright-Safe Diffusion Model Training

Divya Kothandaraman, Jaclyn Pytlarz

2512.08931 2026-01-28 cs.CV cs.AI cs.LG

Astra: General Interactive World Model with Autoregressive Denoising

Yixuan Zhu, Jiaqi Feng, Wenzhao Zheng, Yuan Gao, Xin Tao, Pengfei Wan, Jie Zhou, Jiwen Lu

Comments Accepted in ICLR 2026. Code is available at: https://github.com/EternalEvan/Astra

2512.06652 2026-01-28 cs.LG cs.AI

Adaptive Test-Time Training for Predicting Need for Invasive Mechanical Ventilation in Multi-Center Cohorts

Xiaolei Lu, Shamim Nemati

Comments ICLR 2026

2512.06201 2026-01-28 cs.LG

K2-V2: A 360-Open, Reasoning-Enhanced LLM

K2 Team, Zhengzhong Liu, Liping Tang, Linghao Jin, Haonan Li, Nikhil Ranjan, Desai Fan, Shaurya Rohatgi, Richard Fan, Omkar Pangarkar, Huijuan Wang, Zhoujun Cheng, Suqi Sun, Seungwook Han, Bowen Tan, Gurpreet Gosal, Xudong Han, Varad Pimpalkhute, Shibo Hao, Ming Shan Hee, Joel Hestness, Haolong Jia, Liqun Ma, Aaryamonvikram Singh, Daria Soboleva, Natalia Vassilieva, Renxi Wang, Yingquan Wu, Yuekai Sun, Taylor Killian, Alexander Moreno, John Maggs, Hector Ren, Guowei He, Hongyi Wang, Xuezhe Ma, Yuqi Wang, Mikhail Yurochkin, Eric P. Xing

2511.18305 2026-01-28 cs.CV

DiVE-k: Differential Visual Reasoning for Fine-grained Image Recognition

Raja Kumar, Arka Sadhu, Ram Nevatia

Comments ICLR 2026

2511.12760 2026-01-28 cs.LG stat.ML

Conformal Online Learning of Deep Koopman Linear Embeddings

Ben Gao, Jordan Patracone, Stéphane Chrétien, Olivier Alata

Comments NeurIPS 2025

2511.12725 2026-01-28 cs.LG

Convolutional Model Trees

William Ward Armstrong, Hongyi Li, Jun Xu

Comments 11 pages. 2 figures. This article was extensively revised. Drawings were added. Co-authors were added responsible for cited experimental results and their description: Hongyi Li and Jun Xu. Attention is on distilling a deep net into a model tree with convolutions done on hyperplane and leaf-function coefficients. Distortions of images are treated by similar changes to coefficient locations

2511.12103 2026-01-28 cs.CV

BdSL-SPOTER: A Transformer-Based Framework for Bengali Sign Language Recognition with Cultural Adaptation

Sayad Ibna Azad, Md. Atiqur Rahman

Comments Accepted to 20th International Symposium on Visual Computing (ISVC 2025)

2511.10441 2026-01-28 cs.CL

Analogical Structure, Minimal Contextual Cues and Contrastive Distractors: Input Design for Sample-Efficient Linguistic Rule Induction

Chunyang Jiang, Paola Merlo

Comments Accepted by EACL 2026 main conference

2511.06374 2026-01-28 cs.LG stat.ML

Adaptive Regularization for Large-Scale Sparse Feature Embedding Models

Mang Li, Wei Lyu

2511.04375 2026-01-28 cs.RO

Studying the Effect of Explicit Interaction Representations on Learning Scene-level Distributions of Human Trajectories

Anna Mészáros, Javier Alonso-Mora, Jens Kober

2511.04109 2026-01-28 cs.RO

CBMC-V3: A CNS-inspired Control Framework Towards Agile Manipulation with SNN

Yanbo Pang, Qingkai Li, Mingguo Zhao

2511.01990 2026-01-28 cs.CV

Assessing the value of Geo-Foundational Models for Flood Inundation Mapping: Benchmarking models for Sentinel-1, Sentinel-2, and Planetscope for end-users

Saurabh Kaushik, Lalit Maurya, Elizabeth Tellman, ZhiJie Zhang

Journal ref IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (2026)

详情

DOI: 10.1109/JSTARS.2026.3656855

英文摘要

Geo-Foundational Models (GFMs) enable fast and reliable extraction of spatiotemporal information from satellite imagery, improving flood inundation mapping by leveraging location and time embeddings. Despite their potential, it remains unclear whether GFMs outperform traditional models like U-Net. A systematic comparison across sensors and data availability scenarios is still lacking, which is an essential step to guide end-users in model selection. To address this, we evaluate three GFMs, Prithvi 2.0, Clay V1.5, DOFA, and UViT (a Prithvi variant), against TransNorm, U-Net, and Attention U-Net using PlanetScope, Sentinel-1, and Sentinel-2. We observe competitive performance among all GFMs, with only 2-5% variation between the best and worst models across sensors. Clay outperforms others on PlanetScope (0.79 mIoU) and Sentinel-2 (0.70), while Prithvi leads on Sentinel-1 (0.57). In leave-one-region-out cross-validation across five regions, Clay shows slightly better performance across all sensors (mIoU: 0.72(0.04), 0.66(0.07), 0.51(0.08)) compared to Prithvi (0.70(0.05), 0.64(0.09), 0.49(0.13)) and DOFA (0.67(0.07), 0.64(0.04), 0.49(0.09)) for PlanetScope, Sentinel-2, and Sentinel-1, respectively. Across all 19 sites, leave-one-region-out cross-validation reveals a 4% improvement by Clay compared to U-Net. Visual inspection highlights Clay's superior ability to retain fine details. Few-shot experiments show Clay achieves 0.64 mIoU on PlanetScope with just five training images, outperforming Prithvi (0.24) and DOFA (0.35). In terms of computational time, Clay is a better choice due to its smaller model size (26M parameters), making it ~3x faster than Prithvi (650M) and 2x faster than DOFA (410M). Contrary to previous findings, our results suggest GFMs offer small to moderate improvements in flood mapping accuracy at lower computational cost and labeling effort compared to traditional U-Net.

URL PDF HTML ☆

赞 0 踩 0

2510.24940 2026-01-28 cs.CL

SemCoT: Accelerating Chain-of-Thought Reasoning through Semantically-Aligned Implicit Tokens

Yinhan He, Wendy Zheng, Yaochen Zhu, Zaiyi Zheng, Lin Su, Sriram Vasudevan, Qi Guo, Liangjie Hong, Jundong Li

详情

英文摘要

The verbosity of Chain-of-Thought (CoT) reasoning hinders its mass deployment in efficiency-critical applications. Recently, implicit CoT approaches have emerged, which encode reasoning steps within LLM's hidden embeddings (termed ``implicit reasoning'') rather than explicit tokens. This approach accelerates CoT by reducing the reasoning length and bypassing some LLM components. However, existing implicit CoT methods face two significant challenges: (1) they fail to preserve the semantic alignment between the implicit reasoning (when transformed to natural language) and the ground-truth reasoning, resulting in a significant CoT performance degradation, and (2) they focus on reducing the length of the implicit reasoning; however, they neglect the considerable time cost for an LLM to generate one individual implicit reasoning token. To tackle these challenges, we propose a novel semantically-aligned implicit CoT framework termed SemCoT. In particular, for the first challenge, we design a contrastively trained sentence transformer that evaluates semantic alignment between implicit and explicit reasoning, which is used to enforce semantic preservation during implicit reasoning optimization. To address the second challenge, we introduce an efficient implicit reasoning generator by finetuning a lightweight language model using knowledge distillation. This generator is guided by our sentence transformer to distill ground-truth reasoning into semantically aligned implicit reasoning, while also optimizing for accuracy. SemCoT is the first approach that enhances CoT efficiency by jointly optimizing token-level generation speed and preserving semantic alignment with ground-truth reasoning. Extensive experiments demonstrate the superior performance of SemCoT compared to state-of-the-art methods in both efficiency and effectiveness. Our code can be found at https://github.com/YinhanHe123/SemCoT/.

URL PDF HTML ☆

赞 0 踩 0

2510.21850 2026-01-28 cs.CV cs.CL

SCoPE VLM: Selective Context Processing for Efficient Document Navigation in Vision-Language Models

Gyubeum Lim, Yemo Koo, Vijay Krishna Madisetti

2510.20691 2026-01-28 cs.AI

Plan Then Retrieve: Reinforcement Learning-Guided Complex Reasoning over Knowledge Graphs

Yanlin Song, Ben Liu, Víctor Gutiérrez-Basulto, Zhiwei Hu, Qianqian Xie, Min Peng, Sophia Ananiadou, Jeff Z. Pan

2510.20504 2026-01-28 cs.SD

Speaking Clearly: A Simplified Whisper-Based Codec for Low-Bitrate Speech Coding

Xin Zhang, Lin Li, Xiangni Lu, Jianquan Liu, Kong Aik Lee

Comments Accepted by ICASSP 2026

2510.17171 2026-01-28 cs.CV

Generation then Reconstruction: Accelerating Masked Autoregressive Models via Two-Stage Sampling

Feihong Yan, Peiru Wang, Yao Zhu, Kaiyu Pang, Qingyan Wei, Huiqi Li, Linfeng Zhang

Comments 12 pages, 6 figures

2510.15600 2026-01-28 cs.AI cs.CL

Unleashing Scientific Reasoning for Bio-experimental Protocol Generation via Structured Component-based Reward Mechanism

Haoran Sun, Yankai Jiang, Zhenyu Tang, Yaning Pan, Shuang Gu, Zekai Lin, Lilong Wang, Wenjie Lou, Lei Liu, Lei Bai, Xiaosong Wang

2510.14459 2026-01-28 cs.LG cs.AI

Holdout-Loss-Based Data Selection for LLM Finetuning via In-Context Learning

Ling Zhang, Xianliang Yang, Juwon Yu, Park Cheonyoung, Miran Lee, Lei Song, Jiang Bian

2510.13356 2026-01-28 cs.RO

MODUR: A Modular Dual-reconfigurable Robot

Jie Gu, Tin Lun Lam, Chunxu Tian, Zhihao Xia, Yongheng Xing, Dan Zhang

Journal ref Proceedings of the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 10212-10219

2510.12953 2026-01-28 cs.CV cs.AI cs.IR cs.MM

Epistemic-aware Vision-Language Foundation Model for Fetal Ultrasound Interpretation

Xiao He, Huangxuan Zhao, Guojia Wan, Wei Zhou, Yanxing Liu, Juhua Liu, Yongchao Xu, Yong Luo, Dacheng Tao, Bo Du

Comments This paper contains fundamental errors and will not be replaced

2510.11358 2026-01-28 cs.CL cs.AI cs.IR

LLM-Specific Utility: A New Perspective for Retrieval-Augmented Generation

Hengran Zhang, Keping Bi, Jiafeng Guo, Jiaming Zhang, Shuaiqiang Wang, Dawei Yin, Xueqi Cheng

Comments 13 pages, 9 figures

2510.11254 2026-01-28 cs.CL

Do Psychometric Tests Work for Large Language Models? Evaluation of Tests on Sexism, Racism, and Morality

Jana Jung, Marlene Lutz, Indira Sen, Markus Strohmaier

2510.11027 2026-01-28 cs.CV

Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning

Ganlin Yang, Tianyi Zhang, Haoran Hao, Weiyun Wang, Yibin Liu, Dehui Wang, Guanzhou Chen, Zijian Cai, Junting Chen, Weijie Su, Wengang Zhou, Yu Qiao, Jifeng Dai, Jiangmiao Pang, Gen Luo, Wenhai Wang, Yao Mu, Zhi Hou

2510.11001 2026-01-28 cs.CL cs.AI

DND: Boosting Large Language Models with Dynamic Nested Depth

Tieyuan Chen, Xiaodong Chen, Haoxing Chen, Zhenzhong Lan, Weiyao Lin, Jianguo Li

Comments Accepted by ICLR 2026

2510.10802 2026-01-28 cs.CV cs.AI cs.LG

MSCloudCAM: Multi-Scale Context Adaptation with Convolutional Cross-Attention for Multispectral Cloud Segmentation

Md Abdullah Al Mazid, Liangdong Deng, Naphtali Rishe

Comments 6 pages, 3 Figures

AI 大模型

视觉与机器人

科学与医疗

Prompt-Counterfactual Explanations for Generative AI System Behavior

Jenius Agent: Towards Experience-Driven Accuracy Optimization in Real-World Scenarios

Adaptive Multimodal Person Recognition: A Robust Framework for Handling Missing Modalities

Beyond Memorization: Selective Learning for Copyright-Safe Diffusion Model Training

Astra: General Interactive World Model with Autoregressive Denoising

Adaptive Test-Time Training for Predicting Need for Invasive Mechanical Ventilation in Multi-Center Cohorts

K2-V2: A 360-Open, Reasoning-Enhanced LLM

DiVE-k: Differential Visual Reasoning for Fine-grained Image Recognition

Conformal Online Learning of Deep Koopman Linear Embeddings

Convolutional Model Trees

BdSL-SPOTER: A Transformer-Based Framework for Bengali Sign Language Recognition with Cultural Adaptation

Analogical Structure, Minimal Contextual Cues and Contrastive Distractors: Input Design for Sample-Efficient Linguistic Rule Induction

Adaptive Regularization for Large-Scale Sparse Feature Embedding Models

Studying the Effect of Explicit Interaction Representations on Learning Scene-level Distributions of Human Trajectories

CBMC-V3: A CNS-inspired Control Framework Towards Agile Manipulation with SNN

Assessing the value of Geo-Foundational Models for Flood Inundation Mapping: Benchmarking models for Sentinel-1, Sentinel-2, and Planetscope for end-users

SemCoT: Accelerating Chain-of-Thought Reasoning through Semantically-Aligned Implicit Tokens

SCoPE VLM: Selective Context Processing for Efficient Document Navigation in Vision-Language Models

Plan Then Retrieve: Reinforcement Learning-Guided Complex Reasoning over Knowledge Graphs

Speaking Clearly: A Simplified Whisper-Based Codec for Low-Bitrate Speech Coding

Generation then Reconstruction: Accelerating Masked Autoregressive Models via Two-Stage Sampling

Unleashing Scientific Reasoning for Bio-experimental Protocol Generation via Structured Component-based Reward Mechanism

Holdout-Loss-Based Data Selection for LLM Finetuning via In-Context Learning

MODUR: A Modular Dual-reconfigurable Robot

Epistemic-aware Vision-Language Foundation Model for Fetal Ultrasound Interpretation

LLM-Specific Utility: A New Perspective for Retrieval-Augmented Generation

Do Psychometric Tests Work for Large Language Models? Evaluation of Tests on Sexism, Racism, and Morality

Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning

DND: Boosting Large Language Models with Dynamic Nested Depth

MSCloudCAM: Multi-Scale Context Adaptation with Convolutional Cross-Attention for Multispectral Cloud Segmentation