arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2505.01462 2026-03-05 cs.AI cs.CY

Synthetic emotions and consciousness: exploring architectural boundaries

Hermann Borotschnig

Comments 34 pages, 2 tables, 1 figure. AI & Soc (2026)

详情

DOI: 10.1007/s00146-026-02896-z

英文摘要

As artificial agents display increasingly sophisticated emotion-like behaviors, frameworks for assessing whether such systems risk instantiating consciousness remain limited. This contribution asks whether synthetic emotion-like control can be implemented while deliberately excluding architectural features that major theories associate with access-like consciousness. We propose architectural principles (A1-A8) for a hierarchical, dual-source implementation in which (i) immediate needs generate motivational signals and (ii) episodic memory provides affective guidance from similar past situations; the two sources converge to modulate action selection. To operationalize consciousness-related risk, we distill predictions from major theories into four engineering risk-reduction constraints: (R1) no content-general, workspace-like global broadcast, (R2) no metarepresentation, (R3) no autobiographical consolidation, and (R4) bounded learning. We address three questions: (Q1) Can emotion-like control satisfy R1-R4? We present a concrete architecture as an existence proof. (Q2) Can the architecture be extended without introducing access-enabling features? We identify stable modifications that preserve compliance. (Q3) Can we trace graded paths that plausibly increase access risk? We map gradual transitions that progressively violate the constraints. Our contribution operates at three levels: on the engineering side, we present a modular, biologically motivated control architecture; on the theoretical side, we propose a control model of emotions and a methodological template for converting consciousness-related questions into auditable architectural tests; on the safety side, we sketch preliminary audit indicators that may inform future governance frameworks. The architecture functions independently as an emotion-like controller, while the risk-reduction criteria may extend to other AI systems.

URL PDF HTML ☆

赞 0 踩 0

2504.20505 2026-03-05 cs.AI

MuRAL: A Multi-Resident Ambient Sensor Dataset Annotated with Natural Language for Activities of Daily Living

Xi Chen, Julien Cumin, Fano Ramparany, Dominique Vaufreydaz

2504.20376 2026-03-05 cs.CV cs.CR

When Memory Becomes a Vulnerability: Towards Multi-turn Jailbreak Attacks against Text-to-Image Generation Systems

Shiqian Zhao, Jiayang Liu, Yiming Li, Runyi Hu, Xiaojun Jia, Wenshu Fan, Xiao Bao, Xinfeng Li, Jie Zhang, Wei Dong, Tianwei Zhang, Luu Anh Tuan

Comments This work proposes a multi-turn jailbreak attack against real-world chat-based T2I generation systems that intergrate memory mechanism. It also constructed a simulation system, with considering three industrial-grade memory mechanisms, 7 kinds of safety filters (both input and output); It is going to appear on USENIX 2026

详情

英文摘要

Modern text-to-image (T2I) generation systems (e.g., DALL$\cdot$E 3) exploit the memory mechanism, which captures key information in multi-turn interactions for faithful generation. Despite its practicality, the security analyses of this mechanism have fallen far behind. In this paper, we reveal that it can exacerbate the risk of jailbreak attacks. Previous attacks fuse the unsafe target prompt into one ultimate adversarial prompt, which can be easily detected or lead to the generation of non-unsafe images due to under- or over-detoxification. In contrast, we propose embedding the malice at the inception of the chat session in memory, addressing the above limitations. Specifically, we propose Inception, the first multi-turn jailbreak attack against real-world text-to-image generation systems that explicitly exploits their memory mechanisms. Inception is composed of two key modules: segmentation and recursion. We introduce Segmentation, a semantic-preserving method that generates multi-round prompts. By leveraging NLP analysis techniques, we design policies to decompose a prompt, together with its malicious intent, according to sentence structure, thereby evading safety filters. Recursion further addresses the challenge posed by unsafe sub-prompts that cannot be separated through simple segmentation. It firstly expands the sub-prompt, then invokes segmentation recursively. To facilitate multi-turn adversarial prompts crafting, we build VisionFlow, an emulation T2I system that integrates two-stage safety filters and industrial-grade memory mechanisms. The experiment results show that Inception successfully allures unsafe image generation, surpassing the SOTA by a 20.0\% margin in attack success rate. We also conduct experiments on the real-world commercial T2I generation platforms, further validating the threats of Inception in practice.

URL PDF HTML ☆

赞 0 踩 0

2503.18349 2026-03-05 cs.CV

Human-Object Interaction via Automatically Designed VLM-Guided Motion Policy

Zekai Deng, Ye Shi, Kaiyang Ji, Lan Xu, Shaoli Huang, Jingya Wang

Comments iclr camera ready

2503.17526 2026-03-05 cs.CV

Beyond the Encoder: Joint Encoder-Decoder Contrastive Pre-Training Improves Dense Prediction

Sébastien Quetin, Tapotosh Ghosh, Farhad Maleki

详情

Journal ref: https://openaccess.thecvf.com/content/WACV2026/html/Quetin_Beyond_the_Encoder_Joint_Encoder-Decoder_Contrastive_Pre-Training_Improves_Dense_Prediction_WACV_2026_paper.html

英文摘要

Contrastive learning methods in self-supervised settings have primarily focused on pre-training encoders, while decoders are typically introduced and trained separately for downstream dense prediction tasks. However, this conventional approach overlooks the potential benefits of jointly pre-training both encoder and decoder. In this paper, we propose DeCon, an efficient encoder-decoder self-supervised learning (SSL) framework that supports joint contrastive pre-training. We first extend existing SSL architectures to accommodate diverse decoders and their corresponding contrastive losses. Then, we introduce a weighted encoder-decoder contrastive loss with non-competing objectives to enable the joint pre-training of encoder-decoder architectures. By adapting a contrastive SSL framework for dense prediction, DeCon establishes consistent state-of-the-art performance on most of the evaluated tasks when pre-trained on Imagenet-1K, COCO and COCO+. Notably, when pre-training a ResNet-50 encoder on COCO dataset, DeCon improves COCO object detection and instance segmentation compared to the baseline framework by +0.37 AP and +0.32 AP, respectively, and boosts semantic segmentation by +1.42 mIoU on Pascal VOC and by +0.50 mIoU on Cityscapes. These improvements generalize across recent backbones, decoders, datasets, and dense tasks beyond segmentation and object detection, and persist in out-of-domain scenarios, including limited-data settings, demonstrating that joint pre-training significantly enhances representation quality for dense prediction. Code is available at https://github.com/sebquetin/DeCon.git.

URL PDF HTML ☆

赞 0 踩 0

2503.17110 2026-03-05 cs.CV cs.LG

Beyond Accuracy: What Matters in Designing Well-Behaved Image Classification Models?

Robin Hesse, Doğukan Bağcı, Bernt Schiele, Simone Schaub-Meyer, Stefan Roth

Comments Published in TMLR (01/2026) | OpenReview: https://openreview.net/forum?id=E7HDtLCoT6 | Project page: https://visinf.github.io/beyond-accuracy/

2503.07638 2026-03-05 cs.LG cs.AI

Leveraging Taxonomy Similarity for Next Activity Prediction in Patient Treatment

Martin Kuhn, Joscha Grüger, Tobias Geyer, Ralph Bergmann

2502.17244 2026-03-05 cs.CV cs.LG

A dataset of high-resolution plantar pressures for gait analysis across varying footwear and walking speeds

Robyn Larracy, Angkoon Phinyomark, Ala Salehi, Eve MacDonald, Saeed Kazemi, Shikder Shafiul Bashar, Aaron Tabor, Erik Scheme

2502.17034 2026-03-05 cs.RO cs.NE

Evolution 6.0: Robot Evolution through Generative Design

Muhammad Haris Khan, Artyom Myshlyaev, Artem Lykov, Miguel Altamirano Cabrera, Dzmitry Tsetserukou

Comments Accepted to HRI

2502.14142 2026-03-05 cs.CV

Token Adaptation via Side Graph Convolution for Efficient Fine-tuning of 3D Point Cloud Transformers

Takahiko Furuya

Comments Accepted to the journal of Machine Vision and Applications

2502.10550 2026-03-05 cs.LG cs.AI cs.RO

Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning

Egor Cherepanov, Nikita Kachaev, Alexey K. Kovalev, Aleksandr I. Panov

Comments 57 pages, 29 figures, 11 tables

2502.01534 2026-03-05 cs.LG cs.AI cs.CL

Preference Leakage: A Contamination Problem in LLM-as-a-judge

Dawei Li, Renliang Sun, Yue Huang, Ming Zhong, Bohan Jiang, Jiawei Han, Xiangliang Zhang, Wei Wang, Huan Liu

Comments Accepted by ICLR 2026

2501.04336 2026-03-05 cs.CV

Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs

Zeyi Huang, Yuyang Ji, Xiaofang Wang, Nikhil Mehta, Tong Xiao, Donghyun Lee, Sigmund Vanvalkenburgh, Shengxin Zha, Bolin Lai, Yiqiu Ren, Licheng Yu, Ning Zhang, Yong Jae Lee, Miao Liu

2501.01317 2026-03-05 cs.LG cs.AI

Difficult Examples Hurt Unsupervised Contrastive Learning: A Theoretical Perspective

Yi-Ge Zhang, Jingyi Cui, Qiran Li, Yisen Wang

Comments Accepted to ICLR 2026 as an Oral Presentation

2412.13091 2026-03-05 cs.CL cs.AI

LMUnit: Fine-grained Evaluation with Natural Language Unit Tests

Jon Saad-Falcon, Rajan Vivek, William Berrios, Nandita Shankar Naik, Matija Franklin, Bertie Vidgen, Amanpreet Singh, Douwe Kiela, Shikib Mehri

2412.06531 2026-03-05 cs.LG cs.AI

Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Egor Cherepanov, Nikita Kachaev, Artem Zholus, Alexey K. Kovalev, Aleksandr I. Panov

Comments 20 pages, 6 figures, 9 tables

2412.01654 2026-03-05 cs.LG

FSMLP: Modelling Channel Dependencies With Simplex Theory Based Multi-Layer Perceptions In Frequency Domain

Zhengnan Li, Haoxuan Li, Hao Wang, Jun Fang, Yuting Tan, Xilong Cheng Yunxiao Qin

2411.19888 2026-03-05 cs.CV cs.LG

FlowCLAS: Enhancing Normalizing Flow Via Contrastive Learning For Anomaly Segmentation

Chang Won Lee, Selina Leveugle, Svetlana Stolpner, Chris Langley, Paul Grouchy, Jonathan Kelly, Steven L. Waslander

Comments WACV 2026 Camera Ready

2411.15272 2026-03-05 cs.LG cs.AI

Curriculum-enhanced GroupDRO: Challenging the Norm of Avoiding Curriculum Learning in Subpopulation Shift Setups

Antonio Barbalau

Comments Accepted as a conference paper at ICAIRC 2024

2410.19450 2026-03-05 cs.AI

Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration

Hai Zhong, Xun Wang, Zhuoran Li, Longbo Huang

Comments Include detailed hyperparameter configurations

2410.09879 2026-03-05 cs.CV

TextMaster: A Unified Framework for Realistic Text Editing via Glyph-Style Dual-Control

Zhenyu Yan, Jian Wang, Aoqiang Wang, Yuhan Li, Wenxiang Shang, Ran Lin

Comments Accepted to ICCV 2025

2410.08184 2026-03-05 cs.CV

Scaling Laws For Diffusion Transformers

Zhengyang Liang, Hao He, Ceyuan Yang, Bo Dai

2409.19289 2026-03-05 cs.CV

FINE: Factorizing Knowledge for Initialization of Variable-sized Diffusion Models

Yucheng Xie, Fu Feng, Ruixiao Shi, Jianlu Shen, Jing Wang, Yong Rui, Xin Geng

2409.06912 2026-03-05 cs.RO cs.AI

A Bayesian Framework for Active Tactile Object Recognition, Pose Estimation and Shape Transfer Learning

Haodong Zheng, Andrei Jalba, Raymond H. Cuijpers, Wijnand IJsselsteijn, Sanne Schoenmakers

2408.06958 2026-03-05 cs.LG stat.ML

AuToMATo: An Out-Of-The-Box Persistence-Based Clustering Algorithm

Marius Huber, Sara Kalisnik, Patrick Schnider

Comments Code: https://doi.org/10.5281/zenodo.17279740

2407.21546 2026-03-05 cs.LG

Black Box Meta-Learning Intrinsic Rewards

Octavio Pappalardo, Rodrigo Ramele, Juan Miguel Santos

Comments Improved exposition; no technical changes to content

2406.06512 2026-03-05 cs.CV cs.AI

Merlin: A Computed Tomography Vision-Language Foundation Model and Dataset

Louis Blankemeier, Ashwin Kumar, Joseph Paul Cohen, Jiaming Liu, Longchao Liu, Dave Van Veen, Syed Jamal Safdar Gardezi, Hongkun Yu, Magdalini Paschali, Zhihong Chen, Jean-Benoit Delbrouck, Eduardo Reis, Robbie Holland, Cesar Truyts, Christian Bluethgen, Yufu Wu, Long Lian, Malte Engmann Kjeldskov Jensen, Sophie Ostmeier, Maya Varma, Jeya Maria Jose Valanarasu, Zhongnan Fang, Zepeng Huo, Zaid Nabulsi, Diego Ardila, Wei-Hung Weng, Edson Amaro Junior, Neera Ahuja, Jason Fries, Nigam H. Shah, Greg Zaharchuk, Marc Willis, Adam Yala, Andrew Johnston, Robert D. Boutin, Andrew Wentland, Curtis P. Langlotz, Jason Hom, Sergios Gatidis, Akshay S. Chaudhari

Comments Nature (2026)

详情

DOI: 10.1038/s41586-026-10181-8

英文摘要

The large volume of abdominal computed tomography (CT) scans coupled with the shortage of radiologists have intensified the need for automated medical image analysis tools. Previous state-of-the-art approaches for automated analysis leverage vision-language models (VLMs) that jointly model images and radiology reports. However, current medical VLMs are generally limited to 2D images and short reports. Here to overcome these shortcomings for abdominal CT interpretation, we introduce Merlin, a 3D VLM that learns from volumetric CT scans, electronic health record data and radiology reports. This approach is enabled by a multistage pretraining framework that does not require additional manual annotations. We trained Merlin using a high-quality clinical dataset of paired CT scans (>6 million images from 15,331 CT scans), diagnosis codes (>1.8 million codes) and radiology reports (>6 million tokens). We comprehensively evaluated Merlin on 6 task types and 752 individual tasks that covered diagnostic, prognostic and quality-related tasks. The non-adapted (off-the-shelf) tasks included zero-shot classification of findings (30 findings), phenotype classification (692 phenotypes) and zero-shot cross-modal retrieval (image-to-findings and image-to-impression). The model-adapted tasks included 5-year chronic disease prediction (6 diseases), radiology report generation and 3D semantic segmentation (20 organs). We validated Merlin at scale, with internal testing on 5,137 CT scans and external testing on 44,098 CT scans from 3 independent sites and 2 public datasets. The results demonstrated high generalization across institutions and anatomies. Merlin outperformed 2D VLMs, CT foundation models and off-the-shelf radiology models. We also release our trained models, code, and dataset, available at: https://github.com/StanfordMIMI/Merlin.

URL PDF HTML ☆

赞 0 踩 0

2405.15198 2026-03-05 cs.CL

RAEE: A Robust Retrieval-Augmented Early Exit Framework for Efficient Inference

Lianming Huang, Shangyu Wu, Yufei Cui, Ying Xiong, Haibo Hu, Xue Liu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue

Comments Accepted at ICLR 2026

2405.01440 2026-03-05 cs.RO cs.AI cs.LG

A Review of Reward Functions for Reinforcement Learning in the context of Autonomous Driving

Ahmed Abouelazm, Jonas Michel, J. Marius Zoellner

Comments Accepted at the 35th IEEE Intelligent Vehicles Symposium (IV 2024)

2404.01249 2026-03-05 cs.CV

FireANTs: Adaptive Riemannian Optimization for Multi-Scale Diffeomorphic Matching

Rohit Jena, Pratik Chaudhari, James C. Gee

Comments Accepted at Nature Communications