arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2511.05681 2026-03-09 cs.CV

Culture in Action: Evaluating Text-to-Image Models through Social Activities

Sina Malakouti, Boqing Gong, Adriana Kovashka

详情

英文摘要

Text-to-image (T2I) diffusion models achieve impressive photorealism by training on large-scale web data, but models inherit cultural biases and fail to depict underrepresented regions faithfully. Existing cultural benchmarks focus mainly on object-centric categories (e.g., food, attire, and architecture), overlooking the social and daily activities that more clearly reflect cultural norms. Few metrics exist for measuring cultural faithfulness. We introduce CULTIVate, a benchmark for evaluating T2I models on cross-cultural activities (e.g., greetings, dining, games, traditional dances, and cultural celebrations). CULTIVate spans 16 countries with 576 prompts and more than 19,000 images, and provides an explainable descriptor-based evaluation framework across multiple cultural dimensions, including background, attire, objects, and interactions. We propose four metrics to measure cultural alignment, hallucination, exaggerated elements, and diversity. Our findings reveal systematic disparities: models perform better for global north countries than for the global south, with distinct failure modes across T2I systems. Human studies confirm that our metrics correlate more strongly with human judgments than existing text-image metrics.

URL PDF HTML ☆

赞 0 踩 0

2510.22212 2026-03-09 cs.CL

DETECT: Determining Ease and Textual Clarity of German Text Simplifications

Maria Korobeynikova, Alessia Battisti, Lukas Fischer, Yingqiang Gao

2510.20886 2026-03-09 cs.CL cs.AI

Shoot First, Ask Questions Later? Building Rational Agents that Explore and Act Like People

Gabriel Grand, Valerio Pepe, Jacob Andreas, Joshua B. Tenenbaum

Comments ICLR 2026

2510.13669 2026-03-09 cs.CV cs.AI cs.LG

CanvasMAR: Improving Masked Autoregressive Video Prediction With Canvas

Zian Li, Muhan Zhang

2510.08730 2026-03-09 cs.CL cs.LG

How Reliable is Language Model Micro-Benchmarking?

Gregory Yauney, Shahzaib Saqib Warraich, Swabha Swayamdipta

Comments Published at ICLR 2026

2510.01082 2026-03-09 cs.SD cs.CR

HVAC-EAR: Eavesdropping Human Speech Using HVAC Systems

Tarikul Islam Tamiti, Biraj Joshi, Rida Hasan, Anomadarshi Barua

2510.01041 2026-03-09 cs.RO cs.SY eess.SY

ROSplane 2.0: A Fixed-Wing Autopilot for Research

Ian Reid, Joseph Ritchie, Jacob Moore, Brandon Sutherland, Gabe Snow, Phillip Tokumaru, Tim McLain

Comments Submitted to the 2026 International Conference on Unmanned Aerial Systems

2510.00995 2026-03-09 cs.RO cs.SY eess.SY

ROSflight 2.0: Lean ROS 2-Based Autopilot for Unmanned Aerial Vehicles

Jacob Moore, Phil Tokumaru, Ian Reid, Brandon Sutherland, Joseph Ritchie, Gabe Snow, Tim McLain

Comments Submitted to the 2026 International Conference on Unmanned Aerial Systems

2509.21281 2026-03-09 cs.RO cs.LG

Taxonomy-aware Dynamic Motion Generation on Hyperbolic Manifolds

Luis Augenstein, Noémie Jaquier, Tamim Asfour, Leonel Rozo

Comments Accepted for publication in IEEE Conference on Robotics and Automation (ICRA), 8 pages, 6 figures, 1 table

2509.17349 2026-03-09 cs.CL cs.AI

Better Late Than Never: Meta-Evaluation of Latency Metrics for Simultaneous Speech-to-Text Translation

Peter Polák, Sara Papi, Luisa Bentivogli, Ondřej Bojar

Comments Changes: - small change in the name (Evaluation -> Meta-Evaluation); - added reference to the implementation; - excluded two test sets (IWSLT22 En-Zh, En-Ja) because of incorrect and missing segmentation; - main results unchanged; - added Degenerate Policy Test; - added sensitivity of the metrics to change in the metric value

2509.05188 2026-03-09 cs.CV

SSL-SLR: Self-Supervised Representation Learning for Sign Language Recognition

Ariel Basso Madjoukeng, Jérôme Fink, Pierre Poitier, Edith Belise Kenmogne, Benoit Frenay

2509.02123 2026-03-09 cs.CL

CMRAG: Co-modality-based visual document retrieval and question answering

Wang Chen, Wenhan Yu, Guanqiang Qi, Weikang Li, Yang Li, Lei Sha, Deguo Xia, Jizhou Huang

Comments Published at ICLR 2026 Workshop on Multimodal Intelligence

2508.18093 2026-03-09 cs.CL

Agri-Query: A Case Study on RAG vs. Long-Context LLMs for Cross-Lingual Technical Question Answering

Julius Gun, Timo Oksanen

2507.01654 2026-03-09 cs.CV cs.LG

SPoT: Subpixel Placement of Tokens in Vision Transformers

Martine Hjelkrem-Tan, Marius Aasan, Gabriel Y. Arteaga, Adín Ramírez Rivera

Comments Appeared in Workshop on Efficient Computing under Limited Resources: Visual Computing (ICCV 2025). Code available at https://github.com/dsb-ifi/SPoT

2506.23138 2026-03-09 cs.CV

VisualPrompter: Semantic-Aware Prompt Optimization with Visual Feedback for Text-to-Image Synthesis

Shiyu Wu, Mingzhen Sun, Weining Wang, Yequan Wang, Jing Liu

Comments ICLR2026 Camera Ready

2506.15751 2026-03-09 cs.AI cs.CL cs.LG

Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts

Kartik Sharma, Yiqiao Jin, Vineeth Rakesh, Yingtong Dou, Menghai Pan, Mahashweta Das, Srijan Kumar

Comments ICLR 2026. Code available at https://github.com/Ksartik/sysformer

2506.15735 2026-03-09 cs.AI cs.LG stat.ML

ContextBench: Modifying Contexts for Targeted Latent Activation

Robert Graham, Edward Stevinson, Leo Richter, Alexander Chia, Joseph Miller, Joseph Isaac Bloom

Comments Published at ICLR 2026

2506.08706 2026-03-09 cs.RO cs.SE

ROS-related Robotic Systems Development with V-model-based Application of MeROS Metamodel

Tomasz Winiarski, Jan Kaniuka, Daniel Giełdowski, Jakub Ostrysz, Krystian Radlak, Dmytro Kushnir

Comments 22 pages

2503.11787 2026-03-09 cs.CV eess.IV

ECLARE: Efficient cross-planar learning for anisotropic resolution enhancement

Samuel W. Remedios, Shuwen Wei, Shuo Han, Jinwei Zhang, Aaron Carass, Kurt G. Schilling, Dzung L. Pham, Jerry L. Prince, Blake E. Dewey

2503.04613 2026-03-09 cs.RO cs.SY eess.SY

Whole-Body Model-Predictive Control of Legged Robots with MuJoCo

John Z. Zhang, Taylor A. Howell, Zeji Yi, Chaoyi Pan, Guanya Shi, Guannan Qu, Tom Erez, Yuval Tassa, Zachary Manchester

Comments to appear at ICRA 2026

2503.01650 2026-03-09 cs.LG cs.RO

CAPS: Context-Aware Priority Sampling for Enhanced Imitation Learning in Autonomous Driving

Hamidreza Mirkhani, Behzad Khamidehi, Ehsan Ahmadi, Mohammed Elmahgiubi, Weize Zhang, Fazel Arasteh, Umar Rajguru, Kasra Rezaee, Dongfeng Bai

Comments Accepted at IEEE International Conference on Robotics & Automation (ICRA 2026)

2603.06403 2026-03-09 cs.LG

Adapter-Augmented Bandits for Online Multi-Constrained Multi-Modal Inference Scheduling

Xianzhi Zhang, Yue Xu, Yinlin Zhu, Di Wu, Yipeng Zhou, Miao Hu, Guocong Quan

2603.06399 2026-03-09 cs.CV

DiffInf: Influence-Guided Diffusion for Supervision Alignment in Facial Attribute Learning

Basudha Pal, Rama Chellappa

2603.06394 2026-03-09 cs.AI cs.LG cs.MA

Talk Freely, Execute Strictly: Schema-Gated Agentic AI for Flexible and Reproducible Scientific Workflows

Joel Strickland, Arjun Vijeta, Chris Moores, Oliwia Bodek, Bogdan Nenchev, Thomas Whitehead, Charles Phillips, Karl Tassenberg, Gareth Conduit, Ben Pellegrini

详情

英文摘要

Large language models (LLMs) can now translate a researcher's plain-language goal into executable computation, yet scientific workflows demand determinism, provenance, and governance that are difficult to guarantee when an LLM decides what runs. Semi-structured interviews with 18 experts across 10 industrial R&D stakeholders surface 2 competing requirements--deterministic, constrained execution and conversational flexibility without workflow rigidity--together with boundary properties (human-in-the-loop control and transparency) that any resolution must satisfy. We propose schema-gated orchestration as the resolving principle: the schema becomes a mandatory execution boundary at the composed-workflow level, so that nothing runs unless the complete action--including cross-step dependencies--validates against a machine-checkable specification. We operationalize the 2 requirements as execution determinism (ED) and conversational flexibility (CF), and use these axes to review 20 systems spanning 5 architectural groups along a validation-scope spectrum. Scores are assigned via a multi-model protocol--15 independent sessions across 3 LLM families--yielding substantial-to-near-perfect inter-model agreement (Krippendorff a=0.80 for ED and a=0.98 for CF), demonstrating that multi-model LLM scoring can serve as a reusable alternative to human expert panels for architectural assessment. The resulting landscape reveals an empirical Pareto front--no reviewed system achieves both high flexibility and high determinism--but a convergence zone emerges between the generative and workflow-centric extremes. We argue that a schema-gated architecture, separating conversational from execution authority, is positioned to decouple this trade-off, and distill 3 operational principles--clarification-before-execution, constrained plan-act orchestration, and tool-to-workflow-level gating--to guide adoption.

URL PDF HTML ☆

赞 0 踩 0

2603.06389 2026-03-09 cs.CV

Solving Jigsaw Puzzles in the Wild: Human-Guided Reconstruction of Cultural Heritage Fragments

Omidreza Safaei, Sinem Aslan, Sebastiano Vascon, Luca Palmieri, Marina Khoroshiltseva, Marcello Pelillo

Comments 6 pages, 3 figures. Presented at the 2025 IEEE 35th International Workshop on Machine Learning for Signal Processing (MLSP). This is the author-accepted version of the paper. The final version is available via IEEE Xplore: https://doi.org/10.1109/MLSP62443.2025.11204324

2603.06386 2026-03-09 cs.CV

REACT++: Efficient Cross-Attention for Real-Time Scene Graph Generation

Maëlic Neau, Zoe Falomir

2603.06384 2026-03-09 cs.CV cs.AI

Prompt Group-Aware Training for Robust Text-Guided Nuclei Segmentation

Yonghuang Wu, Zhenyang Liang, Wenwen Zeng, Xuan Xie, Jinhua Yu

2603.06382 2026-03-09 cs.CV

CHMv2: Improvements in Global Canopy Height Mapping using DINOv3

John Brandt, Seungeun Yi, Jamie Tolan, Xinyuan Li, Peter Potapov, Jessica Ertel, Justine Spore, Huy V. Vo, Michaël Ramamonjisoa, Patrick Labatut, Piotr Bojanowski, Camille Couprie

Comments Submitted to Nature Scientific Data

2603.06378 2026-03-09 cs.CV

MoEMambaMIL: Structure-Aware Selective State Space Modeling for Whole-Slide Image Analysis

Dongqing Xie, Yonghuang Wu

Comments 15 pages, 6 figures, 6 tables

2603.06374 2026-03-09 cs.CV

Rewis3d: Reconstruction Improves Weakly-Supervised Semantic Segmentation

Jonas Ernst, Wolfgang Boettcher, Lukas Hoyer, Jan Eric Lenssen, Bernt Schiele