arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2601.21984 2026-03-27 cs.LG cs.AR

PowerGenie: Analytically-Guided Evolutionary Discovery of Superior Reconfigurable Power Converters

Jian Gao, Yiwei Zou, Abhishek Pradhan, Wenhao Huang, Yumin Su, Kaiyuan Yang, Xuan Zhang

详情

英文摘要

Discovering superior circuit topologies requires navigating an exponentially large design space-a challenge traditionally reserved for human experts. Existing AI methods either select from predefined templates or generate novel topologies at a limited scale without rigorous verification, leaving large-scale performance-driven discovery underexplored. We present PowerGenie, a framework for automated discovery of higher-performance reconfigurable power converters at scale. PowerGenie introduces: (1) an automated analytical framework that determines converter functionality and theoretical performance limits without component sizing or SPICE simulation, and (2) an evolutionary finetuning method that co-evolves a generative model with its training distribution through fitness selection and uniqueness verification. Unlike existing methods that suffer from mode collapse and overfitting, our approach achieves higher syntax validity, function validity, novelty rate, and figure-of-merit (FoM). PowerGenie discovers a novel 8-mode reconfigurable converter with 23% higher FoM than the best training topology. SPICE simulations confirm average absolute efficiency gains of 10% across 8 modes and up to 17% at a single mode. Code will be released upon publication.

URL PDF HTML ☆

赞 0 踩 0

2601.08881 2026-03-27 cs.CV cs.AI

TAG-MoE: Task-Aware Gating for Unified Generative Mixture-of-Experts

Yu Xu, Hongbin Yan, Juan Cao, Yiji Cheng, Tiankai Hang, Runze He, Zijin Yin, Shiyi Zhang, Yuxin Zhang, Jintao Li, Chunyu Wang, Qinglin Lu, Tong-Yee Lee, Fan Tang

Comments Accept by CVPR 2026. Project page: https://yuci-gpt.github.io/TAG-MoE/

2601.03824 2026-03-27 cs.CV cs.AI

IDESplat: Iterative Depth Probability Estimation for Generalizable 3D Gaussian Splatting

Wei Long, Haifeng Wu, Shiyin Jiang, Jinhua Zhang, Xinchun Ji, Shuhang Gu

2601.00393 2026-03-27 cs.CV

NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos

Yuxue Yang, Lue Fan, Ziqi Shi, Junran Peng, Feng Wang, Zhaoxiang Zhang

Comments CVPR 2026; Project Page: https://neoverse-4d.github.io

2512.13454 2026-03-27 cs.CV

Test-Time Modification: Inverse Domain Transformation for Robust Perception

Arpit Jadon, Joshua Niemeijer, Yuki M. Asano

Comments Preprint

2512.13303 2026-03-27 cs.CV

ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

Zhihang Liu, Xiaoyi Bao, Pandeng Li, Junjie Zhou, Zhaohe Liao, Yefei He, Kaixun Jiang, Chen-Wei Xie, Yun Zheng, Hongtao Xie

Comments Accepted to CVPR 2026, project page: https://lntzm.github.io/showtable-page/

2512.10411 2026-03-27 cs.CL cs.AI

SWAA: Sliding Window Attention Adaptation for Efficient and Quality Preserving Long Context Processing

Yijiong Yu, Jiale Liu, Qingyun Wu, Huazheng Wang, Ji Pei

2512.10152 2026-03-27 cs.LG

Rethinking Bivariate Causal Discovery Through the Lens of Exchangeability

Tiago Brogueira, Mário Figueiredo

Comments 35 pages, 5 figures

2512.07237 2026-03-27 cs.CV

Unified Camera Positional Encoding for Controlled Video Generation

Cheng Zhang, Boying Li, Meng Wei, Yan-Pei Cao, Camilo Cruz Gambardella, Dinh Phung, Jianfei Cai

Comments Camera Ready of CVPR2026. Project Page: https://chengzhag.github.io/publication/ucpe/ Code: https://github.com/chengzhag/UCPE

2511.22989 2026-03-27 cs.CV

MultiBanana: A Challenging Benchmark for Multi-Reference Text-to-Image Generation

Yuta Oshima, Daiki Miyake, Kohsei Matsutani, Yusuke Iwasawa, Masahiro Suzuki, Yutaka Matsuo, Hiroki Furuta

Comments Accepted to CVPR2026

2511.20525 2026-03-27 cs.CV

Mistake Attribution: Fine-Grained Mistake Understanding in Egocentric Videos

Yayuan Li, Aadit Jain, Filippos Bellos, Jason J. Corso

Comments 12 pages, 5 figures, 7 tables. Accepted to CVPR 2026

2511.15956 2026-03-27 cs.RO

The Role of Consequential and Functional Sound in Human-Robot Interaction: Toward Audio Augmented Reality Interfaces

Aliyah Smith, Monroe Kennedy

Comments 29 pages, 11 figures

2511.14961 2026-03-27 cs.LG cs.CV

Graph Memory: A Structured and Interpretable Framework for Modality-Agnostic Embedding-Based Inference

Artur A. Oliveira, Mateus Espadoto, Roberto M. Cesar, Roberto Hirata

Comments This version expands the published conference paper (VISAPP 2026) with additional methodological details, experiments, and analysis that were omitted due to page limits. The final published version is available via DOI: 10.5220/0014578800004084

2511.10822 2026-03-27 cs.RO

MIGHTY: Hermite Spline-based Efficient Trajectory Planning

Kota Kondo, Yuwei Wu, Vijay Kumar, Jonathan P. How

Comments 10 pages, 12 figures

2511.03370 2026-03-27 cs.CL

EQ-Negotiator: Dynamic Emotional Personas Empower Small Language Models for Edge-Deployable Credit Negotiation

Yunbo Long, Yuhan Liu, Alexandra Brintrup

2511.03255 2026-03-27 cs.CV cs.AI

Generative deep learning for foundational video translation in ultrasound

Nikolina Tomic, Roshni Bhatnagar, Sarthak Jain, Connor Lau, Tien-Yu Liu, Laura Gambini, Rima Arnaout

2510.24821 2026-03-27 cs.CV cs.AI

Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

Inclusion AI, :, Bowen Ma, Cheng Zou, ChengKun Du, Canxiang Yan, Chunxiang Jin, Chunjie Shen, Chenyu Lian, Chengxiang Fan, Dandan Zheng, Fudong Wang, Furong Xu, Guangming Yao, Haohao Liu, Han Peng, Jun Zhou, Junluan Xia, Jingdong Chen, Jianing Li, Jianxin Sun, Jianjiang Zhu, Jianping Jiang, Jinpeng Ou, Jun Peng, Jin Peng, Kaixiang Ji, Li Tang, Libin Wang, Lixiang Ru, Longhua Tan, Lu Ma, Lan Wang, Mochen Bai, Minghong Cai, Mingxue Yang, Ning Gao, Qingpei Guo, Qinglong Zhang, Qiang Xu, Qin Zhao, Rui Liu, Ruijie Xiong, Ruobing Zheng, Sirui Gao, Shaoxiong Lin, Tao Zhang, Tianqi Li, Tinghao Liu, Tongli Wang, Taoye Huang, Weilong Chai, Xiaomei Wang, Xiaolong Wang, Xiaojian Liu, Xiao Lu, Xiaoyu Li, Xingning Dong, Xuzheng Yu, Xuezhi Wang, Yi Yuan, Yuting Gao, Yuting Xiao, Yunxiao Sun, Yipeng Chen, Yifan Mao, Yifei Wu, Yongjie Lyu, Yingying Zhang, YuQian Li, Ziping Ma, Zhiqiang Fang, Zhihao Qiu, Ziyuan Huang, Zizheng Yang, Zhengyu He

Comments 18 pages, 5 figures

2510.18087 2026-03-27 cs.AI

Planned Diffusion

Daniel Israel, Tian Jin, Ellie Cheng, Guy Van den Broeck, Aditya Grover, Suvinay Subramanian, Michael Carbin

Comments 10 pages, 7 figures

2510.06790 2026-03-27 cs.LG

Get RICH or Die Scaling: Profitably Trading Inference Compute for Robustness

Tavish McDonald, Bo Lei, Stanislav Fort, Bhavya Kailkhura, Brian Bartoldson

Comments 23 pages

详情

Journal ref: ICLR 2026

英文摘要

Test-time reasoning has raised benchmark performances and even shown promise in addressing the historically intractable problem of making models robust to adversarially out-of-distribution (OOD) data. Indeed, recent work used reasoning to aid satisfaction of model specifications designed to thwart attacks, finding a striking correlation between LLM reasoning effort and robustness to jailbreaks. However, this benefit fades when stronger (e.g. gradient-based or multimodal) attacks are used. This may be expected as models often can't follow instructions on the adversarially OOD data created by such attacks, and instruction following is needed to act in accordance with the attacker-thwarting spec. Thus, we hypothesize that the test-time robustness benefits of specs are unlocked by initial robustness sufficient to follow instructions on OOD data. Namely, we posit the Robustness from Inference Compute Hypothesis (RICH): inference-compute defenses profit as the model's training data better reflects the components of attacked data. Guided by the RICH, we test models of varying initial-robustness levels, finding inference-compute adds robustness even to white-box multimodal attacks, provided the model has sufficient initial robustness. Further evidencing a rich-get-richer dynamic, InternVL 3.5 gpt-oss 20B gains little robustness when its test compute is scaled, but such scaling adds significant robustness if we first robustify its vision encoder (creating the first adversarially robust reasoning VLM in the process). Robustifying models makes attacked components of data more in-distribution (ID), and the RICH suggests this fuels compositional generalization -- understanding OOD data via its ID components -- to following spec instructions on adversarial data. Consistently, we find test-time defenses both build and depend on train-time data and defenses.

URL PDF HTML ☆

赞 0 踩 0

2509.21385 2026-03-27 cs.CV cs.LG

Debugging Concept Bottleneck Models through Removal and Retraining

Eric Enouen, Sainyam Galhotra

Comments Accepted to ICLR 2026

2509.20318 2026-03-27 cs.CV

A Comprehensive Evaluation of YOLO-based Deer Detection Performance on Edge Devices

Bishal Adhikari, Jiajia Li, Eric S. Michel, Jacob Dykes, Te-Ming Paul Tseng, Mary Love Tagert, Dong Chen

Comments 13 pages, 7 figures

详情

DOI: 10.3390/electronics15051026

英文摘要

The escalating economic losses in agriculture due to deer intrusion, estimated to be in the hundreds of millions of dollars annually in the U.S., highlight the inadequacy of traditional mitigation strategies such as hunting, fencing, use of repellents, and scare tactics. This underscores a critical need for intelligent, autonomous solutions capable of real-time deer detection and deterrence. But the progress in this field is impeded by a significant gap in the literature, mainly the lack of a domain-specific, practical dataset and limited study on the viability of deer detection systems on edge devices. To address this gap, this study presents a comprehensive evaluation of state-of-the-art deep learning models for deer detection in challenging real-world scenarios. We introduce a curated, publicly available dataset of 3,095 annotated images with bounding box annotations of deer. Then, we provide an extensive comparative analysis of 12 model variants across four recent YOLO architectures (v8 to v11). Finally, we evaluated their performance on two representative edge computing platforms: the CPU-based Raspberry Pi 5 and the GPU-accelerated NVIDIA Jetson AGX Xavier to assess feasibility for real-world field deployment. Results show that the real-time detection performance is not feasible on Raspberry Pi without hardware-specific model optimization, while NVIDIA Jetson provides greater than 30 frames per second (FPS) with 's' and 'n' series models. This study also reveals that smaller, architecturally advanced models such as YOLOv11n, YOLOv8s, and YOLOv9s offer the optimal balance of high accuracy (Average Precision (AP) > 0.85) and computational efficiency (Inference Time < 34 milliseconds).

URL PDF HTML ☆

赞 0 踩 0

2509.16889 2026-03-27 cs.CL

Can GRPO Boost Complex Multimodal Table Understanding?

Xiaoqiang Kang, Shengen Wu, Zimu Wang, Yilin Liu, Xiaobo Jin, Kaizhu Huang, Wei Wang, Yutao Yue, Xiaowei Huang, Qiufeng Wang

Comments EMNLP 2025

2509.15256 2026-03-27 cs.LG cs.AI

A Multi-Scale Graph Neural Process with Cross-Drug Co-Attention for Drug-Drug Interactions Prediction

Zimo Yan, Jie Zhang, Zheng Xie, Yiping Song, Hao Li

2509.13313 2026-03-27 cs.CL

ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization

Xixi Wu, Kuan Li, Yida Zhao, Liwen Zhang, Litu Ou, Huifeng Yin, Zhongwang Zhang, Xinmiao Yu, Dingchu Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Minhao Cheng, Shuai Wang, Hong Cheng, Jingren Zhou

2509.10000 2026-03-27 cs.LG cond-mat.other

Neural Scaling Laws for Deep Regression

Tilen Cadez, Kyoung-Min Kim

Comments Supplementary Information will be provided with the published manuscript

2509.06644 2026-03-27 cs.RO

T-araVLN: Translator for Agricultural Robotic Agents on Vision-and-Language Navigation

Xiaobei Zhao, Xingqi Lyu, Xin Chen, Xiang Li

2508.14185 2026-03-27 cs.RO cs.SY eess.SY math.OC

Lightweight Tracking Control for Computationally Constrained Aerial Systems with the Newton-Raphson Method

Evanns Morales-Cuadrado, Luke Baird, Yorai Wardi, Samuel Coogan

2508.09223 2026-03-27 cs.LG cs.AI

Hierarchical Adaptive networks with Task vectors for Test-Time Adaptation

Sameer Ambekar, Marta Hasny, Laura Daza, Daniel M. Lang, Julia A. Schnabel

Comments WACV 2026

2508.03983 2026-03-27 cs.SD eess.AS

MiDashengLM: Efficient Audio Understanding with General Audio Captions

Heinrich Dinkel, Gang Li, Jizhong Liu, Jian Luan, Yadong Niu, Xingwei Sun, Tianzi Wang, Qiyang Xiao, Junbo Zhang, Jiahao Zhou

Comments Added ACAVCaps reference (ICASSP 2026)

2507.02803 2026-03-27 cs.CV cs.GR

HyperGaussians: High-Dimensional Gaussian Splatting for High-Fidelity Animatable Face Avatars

Gent Serifi, Marcel C. Buehler

Comments CVPR 2026, Project page: https://gserifi.github.io/HyperGaussians, Code: https://github.com/gserifi/HyperGaussians