arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2512.00672 2026-02-24 cs.LG cs.AI

ML-Tool-Bench: Tool-Augmented Planning for ML Tasks

Yaswanth Chittepu, Raghavendra Addanki, Tung Mai, Anup Rao, Branislav Kveton

详情

英文摘要

The development of autonomous machine learning (ML) agents capable of end-to-end data science workflows represents a significant frontier in artificial intelligence. These agents must orchestrate complex sequences of data analysis, feature engineering, model selection, and hyperparameter optimization, tasks that require sophisticated planning and iteration. While recent work on building ML agents has explored using large language models (LLMs) for direct code generation, tool-augmented approaches offer greater modularity and reliability. However, existing tool-use benchmarks focus primarily on task-specific tool selection or argument extraction for tool invocation, failing to evaluate the sophisticated planning capabilities required for ML Agents. In this work, we introduce a comprehensive benchmark for evaluating tool-augmented ML agents using a curated set of 61 specialized tools and 15 tabular ML challenges from Kaggle. Our benchmark goes beyond traditional tool-use evaluation by incorporating an in-memory named object management, allowing agents to flexibly name, save, and retrieve intermediate results throughout the workflows. We demonstrate that standard ReAct-style approaches struggle to generate valid tool sequences for complex ML pipelines, and that tree search methods with LLM-based evaluation underperform due to inconsistent state scoring. To address these limitations, we propose two simple approaches: 1) using shaped deterministic rewards with structured textual feedback, and 2) decomposing the original problem into a sequence of sub-tasks, which significantly improves trajectory validity and task performance. Using GPT-4o, our approach improves over ReAct by 16.52 percentile positions, taking the median across all Kaggle challenges. We believe our work provides a foundation for developing more capable tool-augmented planning ML agents.

URL PDF HTML ☆

赞 0 踩 0

2511.18729 2026-02-24 cs.CV

GuideFlow: Constraint-Guided Flow Matching for Planning in End-to-End Autonomous Driving

Lin Liu, Caiyan Jia, Guanyi Yu, Ziying Song, JunQiao Li, Feiyang Jia, Peiliang Wu, Xiaoshuai Hao, Yadan Luo

2511.08094 2026-02-24 cs.LG

Stuart-Landau Oscillatory Graph Neural Network

Kaicheng Zhang, David N. Reynolds, Piero Deidda, Francesco Tudisco

2511.07730 2026-02-24 cs.LG cs.RO

Multistep Quasimetric Learning for Scalable Goal-conditioned Reinforcement Learning

Bill Chunyuan Zheng, Vivek Myers, Benjamin Eysenbach, Sergey Levine

Journal ref ICLR (2026)

2511.05640 2026-02-24 cs.LG cs.GT stat.ML

Blind Inverse Game Theory: Jointly Decoding Rewards and Rationality in Entropy-Regularized Competitive Games

Hamza Virk, Sandro Amaglobeli, Zuhayr Syed

2511.04847 2026-02-24 cs.LG

Test-Time Adaptation for LLM Agents via Environment Interaction

Arthur Chen, Zuxin Liu, Jianguo Zhang, Akshara Prabhakar, Zhiwei Liu, Shelby Heinecke, Silvio Savarese, Victor Zhong, Caiming Xiong

Comments Our code is available here: https://github.com/r2llab/GTTA

2510.27246 2026-02-24 cs.CL cs.AI cs.IR

Beyond a Million Tokens: Benchmarking and Enhancing Long-Term Memory in LLMs

Mohammad Tavakoli, Alireza Salemi, Carrie Ye, Mohamed Abdalla, Hamed Zamani, J Ross Mitchell

2510.26961 2026-02-24 cs.CV eess.IV

SYNAPSE-Net: A Unified Framework with Lesion-Aware Hierarchical Gating for Robust Segmentation of Heterogeneous Brain Lesions

Md. Mehedi Hassan, Shafqat Alam, Shahriar Ahmed Seam, Maruf Ahmed

Comments 18 pages, 10 figures, 8 tables

2510.24856 2026-02-24 cs.CL

Do Large Language Models Grasp The Grammar? Evidence from Grammar-Book-Guided Probing in Luxembourgish

Lujun Li, Yewei Song, Lama Sleem, Yiqun Wang, Yangjie Xu, Cedric Lothritz, Niccolo Gentile, Radu State, Tegawende F. Bissyande, Jacques Klein

Comments This paper has been accepted for publication in the proceedings of the 15th biennial Language Resources and Evaluation Conference (LREC 2026)

2510.23038 2026-02-24 cs.CL cs.AI cs.LG

Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning

Ran Xu, Jingjing Chen, Jiayu Ye, Yu Wu, Jun Yan, Carl Yang, Hongkun Yu

Comments ICLR 2026

Journal ref ICLR 2026

2510.19557 2026-02-24 cs.CV

The Intricate Dance of Prompt Complexity, Quality, Diversity, and Consistency in T2I Models

Zhang Xiaofeng, Aaron Courville, Michal Drozdzal, Adriana Romero-Soriano

Comments accepted to ICLR 2026

详情

英文摘要

Text-to-image (T2I) models offer great potential for creating virtually limitless synthetic data, a valuable resource compared to fixed and finite real datasets. Previous works evaluate the utility of synthetic data from T2I models on three key desiderata: quality, diversity, and consistency. While prompt engineering is the primary means of interacting with T2I models, the systematic impact of prompt complexity on these critical utility axes remains underexplored. In this paper, we first conduct synthetic experiments to motivate the difficulty of generalization with regard to prompt complexity and explain the observed difficulty with theoretical derivations. Then, we introduce a new evaluation framework that can compare the utility of real data and synthetic data, and present a comprehensive analysis of how prompt complexity influences the utility of synthetic data generated by commonly used T2I models. We conduct our study across diverse datasets, including CC12M, ImageNet-1k, and DCI, and evaluate different inference-time intervention methods. Our synthetic experiments show that generalizing to more general conditions is harder than the other way round, since the former needs an estimated likelihood that is not learned by diffusion models. Our large-scale empirical experiments reveal that increasing prompt complexity results in lower conditional diversity and prompt consistency, while reducing the synthetic-to-real distribution shift, which aligns with the synthetic experiments. Moreover, current inference-time interventions can augment the diversity of the generations at the expense of moving outside the support of real data. Among those interventions, prompt expansion, by deliberately using a pre-trained language model as a likelihood estimator, consistently achieves the highest performance in both image diversity and aesthetics, even higher than that of real data.

URL PDF HTML ☆

赞 0 踩 0

2510.16985 2026-02-24 cs.CL cs.AI cs.LG

Parameter-Efficient Fine-Tuning for Low-Resource Languages: A Comparative Study of LLMs for Bengali Hate Speech Detection

Akif Islam, Mohd Ruhul Ameen

Comments Accepted to IEEE COMPAS 2025. 6 pages, 3 figures, 6 tables

Journal ref 2025 IEEE International Conference on Computing, Applications and Systems (COMPAS), 2025

2510.15940 2026-02-24 cs.LG cs.AI

Lean Finder: Semantic Search for Mathlib That Understands User Intents

Jialin Lu, Kye Emond, Kaiyu Yang, Swarat Chaudhuri, Weiran Sun, Wuyang Chen

2510.15862 2026-02-24 cs.AI

Rethinking the Design of Reinforcement Learning-Based Deep Research Agents

Yi Wan, Jiuqi Wang, Liam Li, Jinsong Liu, Ruihao Zhu, Zheqing Zhu

2510.14979 2026-02-24 cs.CV cs.AI

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

Haiwen Diao, Mingxuan Li, Silei Wu, Linjun Dai, Xiaohua Wang, Hanming Deng, Lewei Lu, Dahua Lin, Ziwei Liu

Comments 21 pages, 8 figures, Accepted by ICLR2026

2510.11390 2026-02-24 cs.LG cs.AI

Medical Interpretability and Knowledge Maps of Large Language Models

Razvan Marinescu, Victoria-Elisabeth Gruber, Diego Fajardo

Comments 29 pages, 34 figures, 5 tables

2510.11103 2026-02-24 cs.RO cs.AI

A Primer on SO(3) Action Representations in Deep Reinforcement Learning

Martin Schuck, Sherif Samy, Angela P. Schoellig

2510.10682 2026-02-24 cs.CV

Action-Dynamics Modeling and Cross-Temporal Interaction for Online Action Understanding

Xinyu Yang, Zheheng Jiang, Feixiang Zhou, Yihang Zhu, Na Lv, Nan Xing, Nishan Canagarajah, Huiyu Zhou

Comments 10 pages, 9 figures

2510.09312 2026-02-24 cs.CL cs.AI cs.LG

Verifying Chain-of-Thought Reasoning via Its Computational Graph

Zheng Zhao, Yeskendir Koishekenov, Xianjun Yang, Naila Murray, Nicola Cancedda

Comments Accepted to ICLR 2026 (Oral)

2510.05761 2026-02-24 cs.AI cs.CL

Early Multimodal Prediction of Cross-Lingual Meme Virality on Reddit: A Time-Window Analysis

Sedat Dogan, Nina Dethlefs, Debarati Chakraborty

Comments Accepted to ACM WebSci 2026. 10 pages, 9 fiures and 8 tables

详情

英文摘要

Memes are a central part of online culture, yet their virality remains difficult to predict, especially in cross-lingual settings. We present a large-scale, time-series dataset of 46,578 Reddit memes collected from 25 meme-centric subreddits across eight language groups, with more than one million engagement tracking points. We propose a data-driven definition of virality based on a Hybrid Score that normalises engagement by community size and integrates dynamic features such as velocity and acceleration. This approach directly addresses the field's reliance on static, simple volume-based thresholds with arbitrary cut-offs. Building on this target, we construct a multimodal feature set that combines Visual, Textual, Contextual, Network, and Temporal signals, including structured annotations from a multimodal LLM to scale cross-lingual content labelling in a consistent way. We benchmark interpretable baselines (XGBoost, MLP) against end-to-end deep models (BERT, InceptionV3, CLIP) across early observation windows from 30 to 420 minutes. Our best model, a multimodal XGBoost classifier, achieves a PR AUC of 0.43 at 30 minutes and 0.80 at 420 minutes, indicating that early prediction of meme virality is feasible even under strong class imbalance. The results reveal a clear Content Ceiling, where content-only and deep multimodal baselines plateau at low PR AUC, while structural Network and Temporal features are necessary to surpass this limit. A SHAP-based temporal analysis further uncovers an evidentiary transition, where early predictions are dominated by network priors (author and community context), and later predictions increasingly rely on temporal dynamics (velocity, acceleration) as engagement accumulates. Overall, we reframe meme virality as a dynamic, path-dependent process governed by exposure and early interaction patterns rather than by intrinsic content alone.

URL PDF HTML ☆

赞 0 踩 0

2510.04373 2026-02-24 cs.AI

JEF-Hinter: Leveraging Offline Knowledge for Improving Web Agents Adaptation

Hadi Nekoei, Aman Jaiswal, Patrice Bechard, Oleh Shliazhko, Orlando Marquez Ayala, Mathieu Reymond, Massimo Caccia, Alexandre Drouin, Sarath Chandar, Alexandre Lacoste

2510.02240 2026-02-24 cs.CV cs.AI

RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning

Sicheng Feng, Kaiwen Tuo, Song Wang, Lingdong Kong, Jianke Zhu, Huan Wang

Comments ICLR 2026, website: https://fscdc.github.io/RewardMap/

2510.00553 2026-02-24 cs.LG cs.AI

On Predictability of Reinforcement Learning Dynamics for Large Language Models

Yuchen Cai, Ding Cao, Xin Xu, Zijun Yao, Yuqing Huang, Zhenyu Tan, Benyi Zhang, Guangzhong Sun, Guiquan Liu, Junfeng Fang

Comments 48 pages, 28 figures;

2509.26209 2026-02-24 cs.AI

Diversity-Incentivized Exploration for Versatile Reasoning

Zican Hu, Shilin Zhang, Yafu Li, Jianhao Yan, Xuyang Hu, Leyang Cui, Xiaoye Qu, Chunlin Chen, Yu Cheng, Zhi Wang

Comments 26 pages, 10 figures

Journal ref ICLR 2026

2509.25411 2026-02-24 cs.AI cs.LG

Boolean Satisfiability via Imitation Learning

Zewei Zhang, Huan Liu, Yuanhao Yu, Jun Chen, Xiangyu Xu

Comments Accepted to ICLR 2026. Code: https://github.com/zewei-zhang/ImitSAT

2509.25162 2026-02-24 cs.CV

AlignTok: Aligning Visual Foundation Encoders to Tokenizers for Diffusion Models

Bowei Chen, Sai Bi, Hao Tan, He Zhang, Tianyuan Zhang, Zhengqi Li, Yuanjun Xiong, Jianming Zhang, Kai Zhang

Comments ICLR 2026, Project Page: https://aligntok.github.io/

2509.24526 2026-02-24 cs.CV cs.AI cs.LG

CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models

Zheyuan Hu, Chieh-Hsin Lai, Yuki Mitsufuji, Stefano Ermon

Comments ICLR 2026. Code available at https://github.com/sony/cmt

2509.24243 2026-02-24 cs.RO cs.AI

SafeFlowMatcher: Safe and Fast Planning using Flow Matching with Control Barrier Functions

Jeongyong Yang, Seunghwan Jang, SooJean Han

Comments ICLR 2026(poster)

2509.23575 2026-02-24 cs.RO

Generalizable Coarse-to-Fine Robot Manipulation via Language-Aligned 3D Keypoints

Jianshu Hu, Lidi Wang, Shujia Li, Yunpeng Jiang, Xiao Li, Paul Weng, Yutong Ban

Comments Published in ICLR 2026

2509.21655 2026-02-24 cs.LG stat.ML

DriftLite: Lightweight Drift Control for Inference-Time Scaling of Diffusion Models

Yinuo Ren, Wenhao Gao, Lexing Ying, Grant M. Rotskoff, Jiequn Han

Comments Published at ICLR 2026 (https://openreview.net/forum?id=l01eG3Qikl)

AI 大模型

视觉与机器人

科学与医疗

ML-Tool-Bench: Tool-Augmented Planning for ML Tasks

GuideFlow: Constraint-Guided Flow Matching for Planning in End-to-End Autonomous Driving

Stuart-Landau Oscillatory Graph Neural Network

Multistep Quasimetric Learning for Scalable Goal-conditioned Reinforcement Learning

Blind Inverse Game Theory: Jointly Decoding Rewards and Rationality in Entropy-Regularized Competitive Games

Test-Time Adaptation for LLM Agents via Environment Interaction

Beyond a Million Tokens: Benchmarking and Enhancing Long-Term Memory in LLMs

SYNAPSE-Net: A Unified Framework with Lesion-Aware Hierarchical Gating for Robust Segmentation of Heterogeneous Brain Lesions

Do Large Language Models Grasp The Grammar? Evidence from Grammar-Book-Guided Probing in Luxembourgish

Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning

The Intricate Dance of Prompt Complexity, Quality, Diversity, and Consistency in T2I Models

Parameter-Efficient Fine-Tuning for Low-Resource Languages: A Comparative Study of LLMs for Bengali Hate Speech Detection

Lean Finder: Semantic Search for Mathlib That Understands User Intents

Rethinking the Design of Reinforcement Learning-Based Deep Research Agents

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

Medical Interpretability and Knowledge Maps of Large Language Models

A Primer on SO(3) Action Representations in Deep Reinforcement Learning

Action-Dynamics Modeling and Cross-Temporal Interaction for Online Action Understanding

Verifying Chain-of-Thought Reasoning via Its Computational Graph

Early Multimodal Prediction of Cross-Lingual Meme Virality on Reddit: A Time-Window Analysis

JEF-Hinter: Leveraging Offline Knowledge for Improving Web Agents Adaptation

RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning

On Predictability of Reinforcement Learning Dynamics for Large Language Models

Diversity-Incentivized Exploration for Versatile Reasoning

Boolean Satisfiability via Imitation Learning

AlignTok: Aligning Visual Foundation Encoders to Tokenizers for Diffusion Models

CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models

SafeFlowMatcher: Safe and Fast Planning using Flow Matching with Control Barrier Functions

Generalizable Coarse-to-Fine Robot Manipulation via Language-Aligned 3D Keypoints

DriftLite: Lightweight Drift Control for Inference-Time Scaling of Diffusion Models