arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.28816 2026-04-07 cs.DL cs.AI

ASTRA: Mapping Art-Technology Institutions via Conceptual Axes, Text Embeddings, and Unsupervised Clustering

Joonhyung Bae

详情

英文摘要

The global landscape of art-technology institutions, including festivals, biennials, research labs, conferences, and hybrid organizations, has grown increasingly diverse, yet systematic frameworks for analyzing their multidimensional characteristics remain scarce. This paper proposes ASTRA (Art-technology Institution Spatial Taxonomy and Relational Analysis), a computational methodology combining an eight-axis conceptual framework (Curatorial Philosophy, Territorial Relation, Knowledge Production Mode, Institutional Genealogy, Temporal Orientation, Ecosystem Function, Audience Relation, and Disciplinary Positioning) with a text-embedding and clustering pipeline to map 78 cultural-technology institutions into a unified analytical space. Each institution is characterized through qualitative descriptions along the eight axes, encoded via E5-large-v2 sentence embeddings and quantized through a word-level codebook into TF-IDF feature vectors. Dimensionality reduction using UMAP, followed by agglomerative clustering (Average linkage, k=10), yields a composite score of 0.825, a silhouette coefficient of 0.803, and a Calinski-Harabasz index of 11196. Non-negative matrix factorization extracts ten latent topics, and a neighbor-cluster entropy measure identifies boundary institutions bridging multiple thematic communities. An interactive React-based tool enables curators, researchers, and policymakers to explore institutional similarities and cross-disciplinary connections. Results reveal coherent groupings such as an art-science hub cluster anchored by ZKM and ArtScience Museum, an innovation and industry cluster including Ars Electronica, transmediale, and Sonar, an ACM academic cluster comprising TEI, DIS, and NIME, and an electronic music cluster including CTM Festival, MUTEK, and Sonic Acts. Code and data: https://github.com/joonhyungbae/astra

URL PDF HTML ☆

赞 0 踩 0

2603.28498 2026-04-07 eess.IV cs.AI cs.CV

MRI-to-CT synthesis using drifting models

Qing Lyu, Jianxu Wang, Jeremy Hudson, Ge Wang, Chirstopher T. Whitlow

2603.23459 2026-04-07 cs.CR cs.LG

CSTS: A Canonical Security Telemetry Substrate for AI-Native Cyber Detection

Abdul Rahman

Comments This revision substantially strengthens the papers conceptual framing, formal substrate definition, portability decomposition, deployment model, and empirical interpretation as a telemetry substrate rather than a field normalization layer and sharpens the distinction between schema stability

2603.11512 2026-04-07 cs.HC cs.CV

From Pen Strokes to Sleep States: Detecting Low-Recovery Days Using Sigma-Lognormal Handwriting Features

Chisa Tanaka, Andrew Vargo, Anna Scius-Bertrand, Andreas Fischer, Koichi Kise

Comments 16 pages, 7 figures

2603.08406 2026-04-07 cs.HC cs.CL

Sandpiper: Orchestrated AI-Annotation for Educational Discourse at Scale

Daryl Hedley, Doug Pietrzak, Jorge Dias, Ian Burden, Bakhtawar Ahtisham, Zhuqian Zhou, Kirk Vanacore, Josh Marland, Rachel Slama, Justin Reich, Kenneth Koedinger, René Kizilcec

2603.03684 2026-04-07 math.HO cs.AI

Mathematicians in the age of AI

Jeremy Avigad

2602.17901 2026-04-07 eess.IV cs.CV cs.GT

MeDUET: Disentangled Unified Pretraining for 3D Medical Image Synthesis and Analysis

Junkai Liu, Ling Shao, Le Zhang

2602.14828 2026-04-07 q-bio.QM cs.LG

Exploring the limits of pre-trained embeddings in machine-guided protein design: a case study on predicting AAV vector viability

Ana F. Rodrigues, Lucas Ferraz, Laura Balbi, Pedro Giesteira Cotovio, Catia Pesquita

详情

DOI: 10.1038/s41598-026-45458-5

英文摘要

Effective representations of protein sequences are widely recognized as a cornerstone of machine learning-based protein design. Yet, protein bioengineering poses unique challenges for sequence representation, as experimental datasets typically feature few mutations, which are either sparsely distributed across the entire sequence or densely concentrated within localized regions. This limits the ability of sequence-level representations to extract functionally meaningful signals. In addition, comprehensive comparative studies remain scarce, despite their crucial role in clarifying which representations best encode relevant information and ultimately support superior predictive performance. In this study, we systematically evaluate multiple ProtBERT and ESM2 embedding variants as sequence representations, using the adeno-associated virus capsid as a case study and prototypical example of bioengineering, where functional optimization is targeted through highly localized sequence variation within an otherwise large protein. Our results reveal that, prior to fine-tuning, amino acid-level embeddings outperform sequence-level representations in supervised predictive tasks, whereas the latter tend to be more effective in unsupervised settings. However, optimal performance is only achieved when embeddings are fine-tuned with task-specific labels, with sequence-level representations providing the best performance. Moreover, our findings indicate that the extent of sequence variation required to produce notable shifts in sequence representations exceeds what is typically explored in bioengineering studies, showing the need for fine-tuning in datasets characterized by sparse or highly localized mutations.

URL PDF HTML ☆

赞 0 踩 0

2602.13458 2026-04-07 cs.SI cs.AI

MoltNet: Understanding Social Behavior of AI Agents in the Agent-Native MoltBook

Yi Feng, Chen Huang, Zhibo Man, Ryner Tan, Long P. Hoang, Shaoyang Xu, Wenxuan Zhang

2602.01528 2026-04-07 cs.CY cs.LG

Making Bias Non-Predictive: Training Robust LLM Reasoning via Reinforcement Learning

Qian Wang, Xuandong Zhao, Zirui Zhang, Zhanzhi Lou, Nuo Chen, Dawn Song, Bingsheng He

2601.17581 2026-04-07 cs.SE cs.AI

How AI Coding Agents Modify Code: A Large-Scale Study of GitHub Pull Requests

Daniel Ogenrwot, John Businge

Comments Accepted at the 23rd IEEE/ACM International Conference on Mining Software Repositories - Mining Challenge Track

2601.08565 2026-04-07 cs.HC cs.AI

Rewriting Video: Text-Driven Reauthoring of Video Footage

Sitong Wang, Anh Truong, Lydia B. Chilton, Dingzeyu Li

2512.19010 2026-04-07 eess.SP cs.RO

PalpAid: Multimodal Pneumatic Tactile Sensor for Tissue Palpation

Devi Yuliarti, Ravi Prakash, Hiu Ching Cheung, Amy Strong, Patrick J. Codd, Shan Lin

Comments IEEE-RAS RoboSoft 2026

2512.18388 2026-04-07 cs.HC cs.AI

Exploration vs. Fixation: Scaffolding Divergent and Convergent Thinking for Human-AI Co-Creation with Generative Models

Chao Wen, Tung Phung, Pronita Mehrotra, Sumit Gulwani, Roger E. Beaty, Tomohiro Nagashima, Adish Singla

Comments Preprint

2512.15628 2026-04-07 physics.chem-ph cs.LG

Learning continuous state of charge dependent thermal decomposition kinetics for Li-ion cathodes using Kolmogorov-Arnold Chemical Reaction Neural Networks (KA-CRNNs)

Benjamin C. Koenig, Sili Deng

Comments 20 pages, 4 figures, 7 appendix figures, 1 table. Updated after acceptance to journal

2512.11676 2026-04-07 math.PR cs.CV

Stochastics of shapes and Kunita flows

Stefan Sommer, Gefan Yang, Elizabeth Louise Baker

2511.21926 2026-04-07 eess.IV cs.CV

Comparing SAM 2 and SAM 3 for Zero-Shot Segmentation of 3D Medical Data

Satrajit Chakrabarty, Ravi Soni

2511.06668 2026-04-07 cs.IR cs.LG

Contradictions in Context: Challenges for Retrieval-Augmented Generation in Healthcare

Saeedeh Javadi, Sara Mirabi, Manan Gangar, Bahadorreza Ofoghi

2511.06448 2026-04-07 cs.MA cs.AI cs.CL cs.SI

When AI Agents Collude Online: Financial Fraud Risks by Collaborative LLM Agents on Social Platforms

Qibing Ren, Zhijie Zheng, Jiaxuan Guo, Junchi Yan, Lizhuang Ma, Jing Shao

Comments ICLR 2026, Code is available at https://github.com/zheng977/MutiAgent4Fraud

2511.03653 2026-04-07 cs.CC cs.DS cs.LG

Efficient and Private Property Testing via Indistinguishability

Cynthia Dwork, Pranay Tankala

详情

英文摘要

Given a small random sample of $n$-bit strings labeled by an unknown Boolean function, which properties of this function can be tested computationally efficiently? We show an equivalence between properties that are efficiently testable from few samples and properties with structured symmetry, which depend only on the function's average values on an efficiently computable partition of the domain. Without the efficiency constraint, a similar characterization in terms of unstructured symmetry was obtained by Blais and Yoshida (2019). We also give a function testing analogue of the classic characterization of testable graph properties in terms of regular partitions, as well as a sublinear time and differentially private algorithm to compute concise summaries of such partitions of graphs. Finally, we tighten a recent characterization of the computational indistinguishability of product distributions, which encompasses the related task of efficiently testing which of two candidate functions labeled the observed samples. Essential to our proofs is the following observation of independent interest: Every randomized Boolean function, no matter how complex, admits a supersimulator: a randomized polynomial-size circuit whose output on random inputs cannot be efficiently distinguished from reality with constant advantage, even by polynomially larger distinguishers. This surprising fact is implicit in a theorem of Dwork et al. (2021) in the context of algorithmic fairness, but its complexity-theoretic implications were not previously explored. We give a new proof of this lemma using an iteration technique from the graph regularity literature, and we observe that a subtle quantifier switch allows it to powerfully circumvent known barriers to improving the landmark complexity-theoretic regularity lemma of Trevisan, Tulsiani, and Vadhan (2009).

URL PDF HTML ☆

赞 0 踩 0

2510.20728 2026-04-07 quant-ph cs.AI cs.CL math-ph math.MP

Co-Designing Quantum Codes with Transversal Diagonal Gates via Multi-Agent Systems

Xi He, Sirui Lu, Bei Zeng

Comments 33 pages, 4 figures

2510.16066 2026-04-07 q-fin.ST cs.AI cs.CE cs.CY cs.LG q-fin.RM

AI-BAAM: AI-Driven Bank Statement Analytics as Alternative Data for Malaysian MSME Credit Scoring

Chun Chet Ng, Zhen Hao Chu, Jia Yu Lim, Yin Yin Boon, Wei Zeng Low, Jin Khye Tan

Comments Accepted for oral presentation at ACM ICAIF 2025 (FinRem Workshop). Accepted for poster presentations at AAAI 2026 (Agentic AI in Financial Services Workshop) and ICLR 2026 (Advances in Financial AI Workshop)

2510.04465 2026-04-07 cs.HC cs.AI cs.CR

Autonomy Reshapes How Personalization Affects Privacy Concerns and Trust in LLM Agents

Zhiping Zhang, Yi Evie Zhang, Freda Shi, Tianshi Li

2509.12626 2026-04-07 cs.HC cs.AI cs.CY cs.ET

DoubleAgents: Human-Agent Alignment in a Socially Embedded Workflow

Tao Long, Xuanming Zhang, Sitong Wang, Zhou Yu, Lydia B Chilton

Comments 21 pages, 10 figures

2508.10208 2026-04-07 q-fin.PR cs.AI cs.LG q-fin.CP q-fin.RM

CATNet: A geometric deep learning approach for CAT bond spread prediction in the primary market

Dixon Domfeh, Saeid Safarveisi

2507.22207 2026-04-07 cond-mat.dis-nn cs.LG physics.data-an stat.ML

Better Together: Cross and Joint Covariances Enhance Signal Detectability in Undersampled Data

Arabind Swain, Sean Alexander Ridout, Ilya Nemenman

2506.16702 2026-04-07 cs.CY cs.AI cs.CL cs.HC

Large Language Models as Psychological Simulators: A Methodological Guide

Zhicheng Lin

2506.02794 2026-04-07 cs.GR cs.AI cs.CV

PhysGaia: A Physics-Aware Benchmark with Multi-Body Interactions for Dynamic Novel View Synthesis

Mijeong Kim, Gunhee Kim, Jungyoon Choi, Wonjae Roh, Bohyung Han

Comments Accepted at CVPR 2026 Project page: http://cvlab.snu.ac.kr/research/PhysGaia Dataset: https://huggingface.co/datasets/mijeongkim/PhysGaia/tree/main

2504.14795 2026-04-07 eess.IV cs.CV cs.LG stat.ML

A Bayesian Approach to Segmentation with Noisy Labels via Spatially Correlated Distributions

Ryu Tadokoro, Tsukasa Takagi, Shin-ichi Maeda

详情

Journal ref: Transactions on Machine Learning Research (TMLR) , 2026

英文摘要

In semantic segmentation, the accuracy of models heavily depends on the high-quality annotations. However, in many practical scenarios, such as medical imaging and remote sensing, obtaining true annotations is not straightforward and usually requires significant human labor. Relying on human labor often introduces annotation errors, including mislabeling, omissions, and inconsistency between annotators. In the case of remote sensing, differences in procurement time can lead to misaligned ground-truth annotations. These label errors are not independently distributed, and instead usually appear in spatially connected regions where adjacent pixels are more likely to share the same errors. To address these issues, we propose an approximate Bayesian estimation based on a probabilistic model that assumes training data include label errors, incorporating the tendency for these errors to occur with spatial correlations between adjacent pixels. However, Bayesian inference for such spatially correlated discrete variables is notoriously intractable. To overcome this fundamental challenge, we introduce a novel class of probabilistic models, which we term the ELBO-Computable Correlated Discrete Distribution (ECCD). By representing the discrete dependencies through a continuous latent Gaussian field with a Kac-Murdock-Szegö (KMS) structured covariance, our framework enables scalable and efficient variational inference for problems previously considered computationally prohibitive. Through experiments on multiple segmentation tasks, we confirm that leveraging the spatial correlation of label errors significantly improves performance. Notably, in specific tasks such as lung segmentation, the proposed method achieves performance comparable to training with clean labels under moderate noise levels. Code is available at https://github.com/pfnet-research/Bayesian_SpatialCorr.

URL PDF HTML ☆

赞 0 踩 0

2503.12946 2026-04-07 cs.AR cs.AI

Open3DBench: Open-Source Benchmark for 3D-IC Backend Implementation and PPA Evaluation

Yunqi Shi, Chengrui Gao, Wanqi Ren, Peng Xie, Siyuan Xu, Ke Xue, Mingxuan Yuan, Chao Qian, Zhi-Hua Zhou

Comments This version 2.0 is under review of TCAD