arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.00023 2026-04-02 cs.CL

Phonological Fossils: Machine Learning Detection of Non-Mainstream Vocabulary in Sulawesi Basic Lexicon

Mukhlis Amien, Go Frendi Gunawan

Comments 31 pages, 4 figures, 5 tables. Submitted to Oceanic Linguistics

详情

英文摘要

Basic vocabulary in many Sulawesi Austronesian languages includes forms resisting reconstruction to any proto-form with phonological patterns inconsistent with inherited roots, but whether this non-conforming vocabulary represents pre-Austronesian substrate or independent innovation has not been tested computationally. We combine rule-based cognate subtraction with a machine learning classifier trained on phonological features. Using 1,357 forms from six Sulawesi languages in the Austronesian Basic Vocabulary Database, we identify 438 candidate substrate forms (26.5%) through cognate subtraction and Proto-Austronesian cross-checking. An XGBoost classifier trained on 26 phonological features distinguishes inherited from non-mainstream forms with AUC=0.763, revealing a phonological fingerprint: longer forms, more consonant clusters, higher glottal stop rates, and fewer Austronesian prefixes. Cross-method consensus (Cohen's kappa=0.61) identifies 266 high-confidence non-mainstream candidates. However, clustering yields no coherent word families (silhouette=0.114; cross-linguistic cognate test p=0.569), providing no evidence for a single pre-Austronesian language layer. Application to 16 additional languages confirms geographic patterning: Sulawesi languages show higher predicted non-mainstream rates (mean P_sub=0.606) than Western Indonesian languages (0.393). This study demonstrates that phonological machine learning can complement traditional comparative methods in detecting non-mainstream lexical layers, while cautioning against interpreting phonological non-conformity as evidence for a shared substrate language.

URL PDF HTML ☆

赞 0 踩 0

2604.00022 2026-04-02 cs.CL cs.AI

Criterion Validity of LLM-as-Judge for Business Outcomes in Conversational Commerce

Liang Chen, Qi Liu, Wenhuan Lin, Feng Liang

2604.00021 2026-04-02 cs.CL cs.AI cs.CY

How Do Language Models Process Ethical Instructions? Deliberation, Consistency, and Other-Recognition Across Four Models

Hiroki Fukui

Comments 34 pages, 7 figures, 4 tables. Preprint. OSF pre-registration: osf.io/4n5uf. Companion paper: arXiv:2603.04904

2604.00020 2026-04-02 cs.CL

Detecting Abnormal User Feedback Patterns through Temporal Sentiment Aggregation

Yalun Qi, Sichen Zhao, Zhiming Xue, Xianling Zeng, Zihan Yu

2604.00019 2026-04-02 cs.CL cs.AI

The Chronicles of RiDiC: Generating Datasets with Controlled Popularity Distribution for Long-form Factuality Evaluation

Pavel Braslavski, Dmitrii Iarosh, Nikita Sushko, Andrey Sakhovskiy, Vasily Konovalov, Elena Tutubalina, Alexander Panchenko

Comments Accepted to LREC 2026

2604.00018 2026-04-02 cs.CL cs.AI

Think Twice Before You Write -- an Entropy-based Decoding Strategy to Enhance LLM Reasoning

Jiashu He, Meizhu Liu, Olaitan P Olaleye, Amit Agarwal, M. Avendi, Yassi Abbasi, Matthew Rowe, Hitesh Laxmichand Patel, Paul Li, Tao Sheng, Sujith Ravi, Dan Roth

2604.00017 2026-04-02 cs.CL

Semantic Shifts of Psychological Concepts in Scientific and Popular Media Discourse: A Distributional Semantics Analysis of Russian-Language Corpora

Orlova Anastasia

2604.00016 2026-04-02 cs.CL cs.AI

Are they human? Detecting large language models by probing human memory constraints

Simon Schug, Brenden M. Lake

Comments Code available at https://github.com/smonsays/llm-humanness

2604.00015 2026-04-02 cs.CL

ASCAT: An Arabic Scientific Corpus and Benchmark for Advanced Translation Evaluation

Serry Sibaee, Khloud Al Jallad, Zineb Yousfi, Israa Elsayed Elhosiny, Yousra El-Ghawi, Batool Balah, Omer Nacar

2604.00014 2026-04-02 cs.CL cs.HC

Disentangling Prompt Element Level Risk Factors for Hallucinations and Omissions in Mental Health LLM Responses

Congning Ni, Sarvech Qadir, Bryan Steitz, Mihir Sachin Vaidya, Qingyuan Song, Lantian Xia, Shelagh Mulvaney, Siru Liu, Hyeyoung Ryu, Leah Hecht, Amy Bucher, Christopher Symons, Laurie Novak, Susannah L. Rose, Murat Kantarcioglu, Bradley Malin, Zhijun Yin

Comments Submitted to AMIA 2026 Annual Symposium (under review)

2604.00012 2026-04-02 cs.CL cs.AI

Finding and Reactivating Post-Trained LLMs' Hidden Safety Mechanisms

Mingjie Li, Wai Man Si, Michael Backes, Yang Zhang, Yisen Wang

2604.00010 2026-04-02 cs.CL cs.AI

Can LLMs Perceive Time? An Empirical Investigation

Aniketh Garikaparthi

Comments ICLR 2026 I Can't Believe It's Not Better Workshop

2604.00009 2026-04-02 cs.CL cs.AI

Eyla: Toward an Identity-Anchored LLM Architecture with Integrated Biological Priors -- Vision, Implementation Attempt, and Lessons from AI-Assisted Development

Arif Aditto

Comments 8 pages, 3 tables, 25 references. Preprint under review for workshop submission

2604.00008 2026-04-02 cs.CL cs.AI

How Trustworthy Are LLM-as-Judge Ratings for Interpretive Responses? Implications for Qualitative Research Workflows

Songhee Han, Jueun Shin, Jiyoon Han, Bung-Woo Jun, Hilal Ayan Karabatman

详情

英文摘要

As qualitative researchers show growing interest in using automated tools to support interpretive analysis, a large language model (LLM) is often introduced into an analytic workflow as is, without systematic evaluation of interpretive quality or comparison across models. This practice leaves model selection largely unexamined despite its potential influence on interpretive outcomes. To address this gap, this study examines whether LLM-as-judge evaluations meaningfully align with human judgments of interpretive quality and can inform model-level decision making. Using 712 conversational excerpts from semi-structured interviews with K-12 mathematics teachers, we generated one-sentence interpretive responses using five widely adopted inference models: Command R+ (Cohere), Gemini 2.5 Pro (Google), GPT-5.1 (OpenAI), Llama 4 Scout-17B Instruct (Meta), and Qwen 3-32B Dense (Alibaba). Automated evaluations were conducted using AWS Bedrock's LLM-as-judge framework across five metrics, and a stratified subset of responses was independently rated by trained human evaluators on interpretive accuracy, nuance preservation, and interpretive coherence. Results show that LLM-as-judge scores capture broad directional trends in human evaluations at the model level but diverge substantially in score magnitude. Among automated metrics, Coherence showed the strongest alignment with aggregated human ratings, whereas Faithfulness and Correctness revealed systematic misalignment at the excerpt level, particularly for non-literal and nuanced interpretations. Safety-related metrics were largely irrelevant to interpretive quality. These findings suggest that LLM-as-judge methods are better suited for screening or eliminating underperforming models than for replacing human judgment, offering practical guidance for systematic comparison and selection of LLMs in qualitative research workflows.

URL PDF HTML ☆

赞 0 踩 0

2604.00007 2026-04-02 cs.CL cs.AI

Dynin-Omni: Omnimodal Unified Large Diffusion Language Model

Jaeik Kim, Woojin Kim, Jihwan Hong, Yejoon Lee, Sieun Hyeon, Mintaek Lim, Yunseok Han, Dogeun Kim, Hoeun Lee, Hyunggeun Kim, Jaeyoung Do

Comments Project Page: https://dynin.ai/omni/

2604.00006 2026-04-02 cs.CL cs.CY cs.IR cs.LG

Scalable Identification and Prioritization of Requisition-Specific Personal Competencies Using Large Language Models

Wanxin Li, Denver McNeney, Nivedita Prabhu, Charlene Zhang, Renee Barr, Matthew Kitching, Khanh Dao Duc, Anthony S. Boyce

2604.00005 2026-04-02 cs.AI cs.CL

How Emotion Shapes the Behavior of LLMs and Agents: A Mechanistic Study

Moran Sun, Tianlin Li, Yuwei Zheng, Zhenhong Zhou, Aishan Liu, Xianglong Liu, Yang Liu

Comments 15 pages, 11 figures

2604.00004 2026-04-02 cs.CL cs.AI

LinearARD: Linear-Memory Attention Distillation for RoPE Restoration

Ning Yang, Hengyu Zhong, Wentao Wang, Baoliang Tian, Haijun Zhang, Jun Wang

2604.00002 2026-04-02 cs.CL cs.AI

Benchmark for Assessing Olfactory Perception of Large Language Models

Eftychia Makri, Nikolaos Nakis, Laura Sisson, Gigi Minsky, Leandros Tassiulas, Vahid Satarifard, Nicholas A. Christakis

2603.27048 2026-04-02 cs.CV

MOOZY: A Patient-First Foundation Model for Computational Pathology

Yousef Kotp, Vincent Quoc-Huy Trinh, Christopher Pal, Mahdi S. Hosseini

2603.24639 2026-04-02 cs.LG cs.AI

Experiential Reflective Learning for Self-Improving LLM Agents

Marc-Antoine Allard, Arnaud Teinturier, Victor Xing, Gautier Viaud

Comments Published as a conference paper at the ICLR 2026 MemAgents Workshop

2603.20895 2026-04-02 cs.CL cs.LG

LLM Router: Rethinking Routing with Prefill Activations

Tanay Varshney, Annie Surla, Michelle Xu, Gomathy Venkata Krishnan, Maximilian Jeblick, David Austin, Neal Vaidya, Davide Onofrio

2603.15929 2026-04-02 cs.AI math.AP math.LO

Semi-Autonomous Formalization of the Vlasov-Maxwell-Landau Equilibrium

Vasily Ilin

Comments 11 figures

2603.10035 2026-04-02 cs.CL

TriageSim: A Conversational Emergency Triage Simulation Framework from Structured Electronic Health Records

Dipankar Srirag, Quoc Dung Nguyen, Aditya Joshi, Padmanesan Narasimhan, Salil Kanhere

Comments 6 pages, 3 figures, 2 tables

2603.05537 2026-04-02 cs.CV eess.IV

Sketch It Out: Exploring Label-Free Structural Cues for Multimodal Gait Recognition

Chao Zhang, Zhuang Zheng, Ruixin Li, Zhanyong Mei

Comments 10 pages, 3 figures

2512.19283 2026-04-02 cs.CV

OmniEgoCap: Camera-Agnostic Sequence-Level Egocentric Motion Reconstruction

Kyungwon Cho, Hanbyul Joo

Comments Project Page: https://kyungwoncho.github.io/OmniEgoCap/

2512.04485 2026-04-02 cs.CV

Not All Birds Look The Same: Identity-Preserving Generation For Birds

Aaron Sun, Oindrila Saha, Subhransu Maji

2511.22690 2026-04-02 cs.CV

Ar2Can: An Architect and an Artist Leveraging a Canvas for Multi-Human Generation

Shubhankar Borse, Phuc Pham, Farzad Farhadzadeh, Seokeon Choi, Phong Ha Nguyen, Anh Tuan Tran, Sungrack Yun, Munawar Hayat, Fatih Porikli

Comments Accepted to CVPR 2026

2510.01399 2026-04-02 cs.CV

Resolving the Identity Crisis in Text-to-Image Generation

Shubhankar Borse, Farzad Farhadzadeh, Munawar Hayat, Fatih Porikli

Comments Accepted to CVPR 2026

2509.16483 2026-04-02 cs.CV

Octree Diffusion for Semantic Scene Generation and Completion

Xujia Zhang, Brendan Crowe, Christoffer Heckman

Comments Accepted to ICRA 2026. Revised version with updated paragraphs

详情

Journal ref: Proceedings of the 2026 IEEE International Conference on Robotics and Automation (ICRA)

英文摘要

The completion, extension, and generation of 3D semantic scenes are an interrelated set of capabilities that are useful for robotic navigation and exploration. Existing approaches seek to decouple these problems and solve them one-off. Additionally, these approaches are often domain-specific, requiring separate models for different data distributions, e.g.\ indoor vs.\ outdoor scenes. To unify these techniques and provide cross-domain compatibility, we develop a single framework that can perform scene completion, extension, and generation in both indoor and outdoor scenes, which we term Octree Latent Semantic Diffusion. Our approach operates directly on an efficient dual octree graph latent representation: a hierarchical, sparse, and memory-efficient occupancy structure. This technique disentangles synthesis into two stages: (i) structure diffusion, which predicts binary split signals to construct a coarse occupancy octree, and (ii) latent semantic diffusion, which generates semantic embeddings decoded by a graph VAE into voxel-level semantic labels. To perform semantic scene completion or extension, our model leverages inference-time latent inpainting, or outpainting respectively. These inference-time methods use partial LiDAR scans or maps to condition generation, without the need for retraining or finetuning. We demonstrate high-quality structure, coherent semantics, and robust completion from single LiDAR scans, as well as zero-shot generalization to out-of-distribution LiDAR data. These results indicate that completion-through-generation in a dual octree graph latent space is a practical and scalable alternative to regression-based pipelines for real-world robotic perception tasks.

URL PDF HTML ☆

赞 0 踩 0