arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2510.12476 2026-05-01 cs.CL cs.AI

When Personalization Tricks Detectors: The Feature-Inversion Trap in Machine-Generated Text Detection

Lang Gao, Xuhui Li, Chenxi Wang, Mingzhe Li, Wei Liu, Zirui Song, Jinghui Zhang, Rui Yan, Preslav Nakov, Xiuying Chen

Comments Oral of ACL 2026

详情

英文摘要

Large language models (LLMs) have grown more powerful in language generation, producing fluent text and even imitating personal style. Yet, this ability also heightens the risk of identity impersonation. To the best of our knowledge, no prior work has examined personalized machine-generated text (MGT) detection. In this paper, we introduce \dataset, the first benchmark for evaluating detector robustness in personalized settings, built from literary and blog texts paired with their LLM-generated imitations. Our experimental results demonstrate large performance gaps across detectors in personalized settings: some state-of-the-art models suffer significant drops. We attribute this limitation to the \textit{feature-inversion trap}, where features that are discriminative in general domains become inverted and misleading when applied to personalized text. Based on this finding, we propose \method, a simple and reliable way to predict detector performance changes in personalized settings. \method identifies latent directions corresponding to inverted features and constructs probe datasets that differ primarily along these features to evaluate detector dependence. Our experiments show that \method can accurately predict both the direction and the magnitude of post-transfer changes, showing 85\% correlation with the actual performance gaps. We hope that this work will encourage further research on personalized text detection.

URL PDF HTML ☆

赞 0 踩 0

2510.04978 2026-05-01 cs.AI

Aligning Perception, Reasoning, Modeling and Interaction: A Survey on Physical AI

Kun Xiang, Terry Jingchen Zhang, Yinya Huang, Jixi He, Zirong Liu, Yueling Tang, Ruizhe Zhou, Lijing Luo, Youpeng Wen, Xiuwei Chen, Bingqian Lin, Jianhua Han, Hang Xu, Hanhui Li, Bin Dong, Xiaodan Liang

2509.25845 2026-05-01 cs.CV cs.AI

Training-Free Reward-Guided Image Editing via Trajectory Optimal Control

Jinho Chang, Jaemin Kim, Jong Chul Ye

Comments Poster in ICLR 2026; 22 pages, 9 figures. The code is available at https://github.com/jinhojsk515/ITOC

2509.22017 2026-05-01 cs.LG

AEGIS: Authentic Edge Growth In Sparsity for Link Prediction in Edge-Sparse Bipartite Knowledge Graphs

Hugh Xuechen Liu, Kıvanç Tatar

2509.18308 2026-05-01 cs.CV

Rethinking Pulmonary Embolism Segmentation: A Study of Current Approaches and Challenges with an Open Weight Model

Yixin Zhang, Ryan Chamberlain, Lawrence Ngo, Kevin Kramer, Maciej A. Mazurowski

Comments Published in Journal of Imaging Informatics in Medicine. Model weights available at: https://github.com/mazurowski-lab/PulmonaryEmbolismSegmentation

详情

DOI: 10.1007/s10278-026-01958-4

英文摘要

Pulmonary Embolism (PE) is a life-threatening condition for which accurate and timely detection is critical to patient care. However, our systematic study of PE segmentation algorithms reveals concerning limitations in the current state of research. Challenges such as small and inconsistent datasets, a lack of reproducible baselines, and limited comparative evaluation across models are hindering progress in the field. In this study, we curated a densely annotated dataset comprising 490 CTPA scans, each from a unique patient (430 for training and 60 for testing). We evaluated nine widely used segmentation architectures, including both CNN- and ViT-based models, in 2D and 3D configurations, using mean Dice Similarity Coefficient (mDSC) and Average Symmetric Surface Distance (ASSD) as evaluation metrics. Furthermore, the highest-performing model was evaluated on a public dataset without fine-tuning and achieved reasonable generalization performance. Our results show that: (1) a 3D U-Net with ResNet encoding blocks remains a highly effective architecture for PE segmentation; (2) 3D models consistently outperform their 2D counterparts; (3) across all architectures, when trained and evaluated on the same datasets, model error patterns are highly consistent; and (4) distal emboli remain particularly challenging due to both task complexity and the scarcity of high-quality datasets, highlighting the need for datasets with more comprehensive and consistent distal PE coverage. To promote research reproducibility, the architecture and pretrained weights of our best-performing model are publicly available at https://github.com/mazurowski-lab/PulmonaryEmbolismSegmentation

URL PDF HTML ☆

赞 0 踩 0

2509.15549 2026-05-01 cs.CL

M-DaQ: Retrieving Samples with Multilingual Diversity and Quality for Instruction Fine-Tuning Datasets

Chunguang Zhao, Yilun Liu, Pufan Zeng, Yuanchang Luo, Shimin Tao, Minggui He, Weibin Meng, Song Xu, Chen Liu, Hongxia Ma, Li Zhang, Boxing Chen, Daimeng Wei

Comments Accepted by SIGIR 2026 Short

2509.06033 2026-05-01 cs.CV

Analysis of Blood Report Images Using General Purpose Vision-Language Models

Nadia Bakhsheshi, Hamid Beigy

Comments 4 pages , 3 figures , This paper has been submitted to the IEEE-affiliated ICBME Conference (Iran), 2025, and is currently under review. DOR number: [20.1001.2.0425023682.1404.10.1.440.7]

2509.05291 2026-05-01 cs.CL cs.AI cs.LG

Crosscoding Through Time: Tracking Emergence & Consolidation Of Linguistic Representations Throughout LLM Pretraining

Deniz Bayazit, Aaron Mueller, Antoine Bosselut

Comments Accepted to ACL 2026

2508.21787 2026-05-01 cs.CL cs.AI

PiCSAR: Probabilistic Confidence Selection And Ranking for Reasoning Chains

Joshua Ong Jun Leang, Zheng Zhao, Aryo Pradipta Gema, Sohee Yang, Wai-Chung Kwan, Xuanli He, Wenda Li, Pasquale Minervini, Eleonora Giunchiglia, Shay B. Cohen

2508.13316 2026-05-01 cs.LG

Constraint-Aware Flow Matching via Randomized Exploration

Zhengyan Huan, Jacob Boerma, Li-Ping Liu, Shuchin Aeron

详情

Journal ref: Transactions on Machine Learning Research (TMLR), 2026

英文摘要

We consider the problem of designing constraint-aware flow matching (FM) models that address the issue of constraint violations commonly observed in vanilla generative models. We consider two scenarios, viz.: (a) when a differentiable distance function to the constraint set is given, and (b) when the constraint set is only available via queries to a membership oracle. For case (a), we propose a simple adaptation of the FM objective with an additional term that penalizes the distance between the constraint set and the generated samples. For case (b), we propose to employ randomization and learn a mean flow that is numerically shown to have a high likelihood of satisfying the constraints. This approach deviates significantly from existing works that require simple convex constraints, knowledge of a barrier function, or a reflection mechanism to constrain the probability flow. Furthermore, in the proposed setting we show that a two-stage approach, where both stages approximate the same original flow but with only the second stage probing the constraints via randomization, is more computationally efficient than the corresponding one-stage approach. Through several synthetic cases of constrained generation, we numerically show that the proposed approaches achieve significant gains in terms of constraint satisfaction while matching the target distributions. As a showcase for a practical oracle-based constraint, we show how our approach can be used for training an adversarial example generator, using queries to a hard-label black-box classifier. We conclude with several future research directions. Our code is available at https://github.com/ZhengyanHuan/FM-RE.

URL PDF HTML ☆

赞 0 踩 0

2508.13024 2026-05-01 cs.CL

WebMall -- A Multi-Shop Benchmark for Evaluating Web Agents

Ralph Peeters, Aaron Steiner, Luca Schwarz, Julian Yuya Caspary, Christian Bizer

Comments Accepted for publication at SIGIR 2026

2508.12022 2026-05-01 cs.AI

AI Models for Depressive Disorder Detection and Diagnosis: A Review

Dorsa Macky Aleagha, Payam Zohari, Mostafa Haghir Chehreghani

2508.05469 2026-05-01 cs.LG cs.IT math.IT

Let's Measure Information Step-by-Step: AI-Based Evaluation Beyond Vibes

Zachary Robertson, Sanmi Koyejo

Comments Accepted to TMLR (2026)

2507.14719 2026-05-01 cs.AI

Policy-Grounded Safety Evaluation of 20 Large Language Models

Juan Manuel Contreras

2507.13486 2026-05-01 cs.CV

Uncertainty Quantification Framework for Aerial and UAV Photogrammetry through Error Propagation

Debao Huang, Rongjun Qin

Comments 27 pages, 12 figures, this manuscript has been accepted to ISPRS Journal of Photogrammetry and Remote Sensing

详情

英文摘要

Uncertainty quantification of the photogrammetry process is essential for providing per-point accuracy credentials of the point clouds. Unlike airborne LiDAR, whose accuracy generally remains consistent with objects with varying geometric complexity, the accuracy of photogrammetric point clouds is rather object/scene-dependent, as it relies on algorithm-derived measurements. Generally, errors of the photogrammetric point clouds propagate through a two-step process: Structure-from-Motion (SfM) with Bundle adjustment (BA), followed by Multi-view Stereo (MVS). While uncertainty estimation in the SfM stage has been well studied using the first-order statistics of the reprojection error function, that in the MVS stage remains largely unsolved and non-standardized, primarily due to its non-differentiable and multi-modal nature (i.e., from pixel values to geometry). In this paper, we present an uncertainty quantification framework closing this gap by associating an error covariance matrix per point accounting for this two-step photogrammetry process. Specifically, to estimate the uncertainty in the MVS stage, we propose a novel, self-calibrating method by taking reliable n-view points (n>=6) per-view to regress the disparity uncertainty using highly relevant cues (such as matching cost values) from the MVS stage. Compared to existing approaches, our method uses self-contained, reliable 3D points extracted directly from the MVS process, with the benefit of being self-supervised and naturally adhering to error propagation path of the photogrammetry process, thereby providing a robust and certifiable uncertainty quantification across diverse scenes. We evaluate the framework using a variety of publicly available airborne and UAV imagery datasets. Results demonstrate that our method outperforms existing approaches by achieving high bounding rates without overestimating uncertainty.

URL PDF HTML ☆

赞 0 踩 0

2507.12414 2026-05-01 cs.CV cs.AI cs.LG cs.RO

AutoVDC: Automated Vision Data Cleaning Using Vision-Language Models

Santosh Vasa, Aditi Ramadwar, Jnana Rama Krishna Darabattula, Md Zafar Anwar, Stanislaw Antol, Andrei Vatavu, Thomas Monninger, Sihao Ding

Comments Accepted to IV 2026 Drive-X Foundation Models for Autonomous Driving (Oral presentation)

2507.07986 2026-05-01 cs.LG cs.AI

EXPO: Stable Reinforcement Learning with Expressive Policies

Perry Dong, Qiyang Li, Dorsa Sadigh, Chelsea Finn

Comments changed formatting, added code

2506.22500 2026-05-01 cs.CV cs.AI

OR-VSKC: Resolving Visual-Semantic Knowledge Conflicts in Operating Rooms with Synthetic Data-Guided Alignment

Weiyi Zhao, Xiaoyu Tan, Liang Liu, Sijia Li, Youwei Song, Xihe Qiu

Comments 13 pages, 5 figures. The dataset and appendix are available at https://github.com/zgg2577/VS-KC

2506.08332 2026-05-01 cs.AI

ORFS-agent: Tool-Using Agents for Chip Design Optimization

Amur Ghose, Andrew B. Kahng, Sayak Kundu, Zhiang Wang

详情

英文摘要

Machine learning has been widely used to optimize complex engineering workflows across numerous domains. In integrated circuit design, modern flows (e.g., register-transfer level to physical layout) involve extensive configuration via thousands of parameters, and small changes can have large downstream impacts on design performance, power, and area. Recent advances in Large Language Models (LLMs) offer new opportunities for learning and reasoning within such high-dimensional optimization tasks. In this work, we introduce ORFS-agent, an LLM-based iterative optimization agent that automates parameter tuning in an open-source hardware design flow. ORFS-agent adaptively explores parameter configurations, demonstrating improvements over standard Bayesian optimization approaches in terms of resource efficiency and final design metrics. Across six benchmarks on ASAP7 and SKY130HD, thinking-model backends (Sonnet 4.6 [69] and Kimi K2.5 [28]) improve the geometric-mean normalized wirelength, effective clock period, and co-optimization objectives by up to 1.0%, 1.3%, and 2.7% over OR-AutoTuner while using 40% fewer iterations; the open-weight Kimi K2.5 remains within 0.24% of Sonnet 4.6, enabling private deployment. Relative to the earlier Sonnet 3.5 backend, these thinking models improve the same objectives by up to 7.5%, 3.1%, and 4.0%. Optional retrieval tools accelerate early convergence but do not improve final endpoints. By following natural language objectives to trade off certain metrics for others, ORFS-agent demonstrates a flexible and interpretable framework for multi-objective and constrained optimization. Crucially, ORFS-agent is modular and model-agnostic, and can be plugged into any frontier LLM without any further fine-tuning. We also report checkpoint-aligned trajectories and reasoning summaries that document the agent's decision process.

URL PDF HTML ☆

赞 0 踩 0

2506.07179 2026-05-01 cs.LG cs.AI

Efficient Traffic Forecasting on Large-Scale Road Network by Regularized Adaptive Graph Convolution

Kaiqi Wu, Weiyang Kong, Sen Zhang, Zitong Chen, Yubao Liu

Comments Accepted by ICDE 2026

2506.02671 2026-05-01 cs.CV

Test-Time Distillation for Continual Model Adaptation

Xiao Chen, Jiazhen Huang, Zhiming Liu, Qinting Jiang, Fanding Huang, Jingyan Jiang, Zhi Wang

Comments Accepted by CVPR 2026 Findings

2506.02515 2026-05-01 cs.CL cs.AI cs.LG

FinChain: A Symbolic Benchmark for Verifiable Chain-of-Thought Financial Reasoning

Zhuohan Xie, Daniil Orel, Rushil Thareja, Dhruv Sahnan, Hachem Madmoun, Fan Zhang, Debopriyo Banerjee, Georgi Georgiev, Xueqing Peng, Lingfei Qian, Jimin Huang, Jinyan Su, Aaryamonvikram Singh, Rui Xing, Rania Elbadry, Chen Xu, Haonan Li, Fajri Koto, Ivan Koychev, Tanmoy Chakraborty, Yuxia Wang, Salem Lahlou, Veselin Stoyanov, Sophia Ananiadou, Preslav Nakov

Comments 24 pages, includes 12 figures and 9 tables; introduces the FinChain benchmark and ChainEval metric

2505.24848 2026-05-01 cs.CV cs.LG

Reading Recognition in the Wild

Charig Yang, Samiul Alam, Shakhrul Iman Siam, Michael J. Proulx, Lambert Mathias, Kiran Somasundaram, Luis Pesqueira, James Fort, Sheroze Sheriffdeen, Omkar Parkhi, Carl Ren, Mi Zhang, Yuning Chai, Richard Newcombe, Hyo Jin Kim

Comments NeurIPS 2025. Project Page: https://www.projectaria.com/datasets/reading-in-the-wild/

2505.22798 2026-05-01 cs.LG cs.AI cs.CR

Efficient Preimage Approximation for Neural Network Certification

Anton Björklund, Mykola Zaitsev, Paolo Morettin, Marta Kwiatkowska

Comments Code available at https://github.com/Anton-Bjorklund/Premap2

2505.22701 2026-05-01 cs.CV

Frequency-Adaptive Discrete Cosine-ViT-ResNet Architecture for Sparse-Data Vision

Ziyue Kang, Weichuan Zhang

Comments This preprint has been withdrawn by the primary author after identifying issues in the dataset validation process, including noisy and potentially out-of-domain samples. These issues may affect the reliability of the reported experimental results. The author therefore considers withdrawal to be the most responsible course of action and appreciates the readers' understanding

2505.17056 2026-05-01 cs.CL cs.AI

From Test-taking to Cognitive Scaffolding: A Pedagogical Diagnostic Benchmark for LLMs on English Standardized Tests

Luoxi Tang, Tharunya Sundar, Yuqiao Meng, Shuai Yang, Ankita Patra, Lakshmi Manohar Chippada, Jiqian Zhao, Yi Li, Weicheng Ma, Zhaohan Xi

2504.12612 2026-05-01 cs.AI cs.CR cs.MA

Chronology of Multi-Agent Interactions for Provenance of Evolving Information

Ching-Chun Chang, Isao Echizen

2503.22676 2026-05-01 cs.CV

TranSplat: Instant Object Relighting in Gaussian Splatting via Spherical Harmonic Radiance Transfer

Boyang Tony Yu, Yanlin Jin, Yun He, Akshat Dave, Ravi Ramamoorthi, Guha Balakrishnan

2503.12844 2026-05-01 cs.CV

GuideDog: A Real-World Egocentric Multimodal Dataset for Blind and Low-Vision Accessibility-Aware Guidance

Junhyeok Kim, Jaewoo Park, Junhee Park, Sangeyl Lee, Jiwan Chung, Jisung Kim, Ji Hoon Joung, Youngjae Yu

Comments ACL 2026 Main. Project page: https://jun297.github.io/GuideDog/

2503.07878 2026-05-01 cs.CV cs.AI

A Woman with a Knife or A Knife with a Woman? Measuring Directional Bias Amplification in Image Captions

Rahul Nair, Bhanu Tokas, Hannah Kerner

Comments Accepted at WACV 2026. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2026