arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.05944 2026-03-09 cs.RO

How to Model Your Crazyflie Brushless

Alexander Gräfe, Christoph Scherer, Wolfgang Hönig, Sebastian Trimpe

详情

英文摘要

The Crazyflie quadcopter is widely recognized as a leading platform for nano-quadcopter research. In early 2025, the Crazyflie Brushless was introduced, featuring brushless motors that provide around 50% more thrust compared to the brushed motors of its predecessor, the Crazyflie 2.1. This advancement has opened new opportunities for research in agile nano-quadcopter control. To support researchers utilizing this new platform, this work presents a dynamics model of the Crazyflie Brushless and identifies its key parameters. Through simulations and hardware analyses, we assess the accuracy of our model. We furthermore demonstrate its suitability for reinforcement learning applications by training an end-to-end neural network position controller and learning a backflip controller capable of executing two complete rotations with a vertical movement of just 1.8 meters. This showcases the model's ability to facilitate the learning of controllers and acrobatic maneuvers that successfully transfer from simulation to hardware. Utilizing this application, we investigate the impact of domain randomization on control performance, offering valuable insights into bridging the sim-to-real gap with the presented model. We have open-sourced the entire project, enabling users of the Crazyflie Brushless to swiftly implement and test their own controllers on an accurate simulation platform.

URL PDF HTML ☆

赞 0 踩 0

2603.05942 2026-03-09 cs.CV

Adaptive Radial Projection on Fourier Magnitude Spectrum for Document Image Skew Estimation

Luan Pham, Phu Hao Hoang, Xuan Toan Mai, Tuan Anh Tran

Comments This paper has been accepted to ICIP 2022

2603.05940 2026-03-09 cs.CV

SLER-IR: Spherical Layer-wise Expert Routing for All-in-One Image Restoration

Peng Shurui, Xin Lin, Shi Luo, Jincen Ou, Dizhe Zhang, Lu Qi, Truong Nguyen, Chao Ren

2603.05937 2026-03-09 cs.CV cs.AI

Facial Expression Recognition Using Residual Masking Network

Luan Pham, The Huynh Vu, Tuan Anh Tran

2603.05936 2026-03-09 cs.CV

OD-RASE: Ontology-Driven Risk Assessment and Safety Enhancement for Autonomous Driving

Kota Shimomura, Masaki Nambata, Atsuya Ishikawa, Ryota Mimura, Takayuki Kawabuchi, Takayoshi Yamashita, Koki Inoue

Comments Accepted ICCV2025

2603.05935 2026-03-09 cs.RO

Swooper: Learning High-Speed Aerial Grasping With a Simple Gripper

Ziken Huang, Xinze Niu, Bowen Chai, Renbiao Jin, Danping Zou

详情

DOI: 10.1109/LRA.2025.3643298
Journal ref: IEEE Robotics and Automation Letters ( Volume: 11, Issue: 2, February 2026)

英文摘要

High-speed aerial grasping presents significant challenges due to the high demands on precise, responsive flight control and coordinated gripper manipulation. In this work, we propose Swooper, a deep reinforcement learning (DRL) based approach that achieves both precise flight control and active gripper control using a single lightweight neural network policy. Training such a policy directly via DRL is nontrivial due to the complexity of coordinating flight and grasping. To address this, we adopt a two-stage learning strategy: we first pre-train a flight control policy, and then fine-tune it to acquire grasping skills. With the carefully designed reward functions and training framework, the entire training process completes in under 60 minutes on a standard desktop with an Nvidia RTX 3060 GPU. To validate the trained policy in the real world, we develop a lightweight quadrotor grasping platform equipped with a simple off-the-shelf gripper, and deploy the policy in a zero-shot manner on the onboard Raspberry Pi 4B computer, where each inference takes only about 1.0 ms. In 25 real-world trials, our policy achieves an 84% grasp success rate and grasping speeds of up to 1.5 m/s without any fine-tuning. This matches the robustness and agility of state-of-the-art classical systems with sophisticated grippers, highlighting the capability of DRL for learning a robust control policy that seamlessly integrates high-speed flight and grasping. The supplementary video is available for more results. Video: https://zikenhuang.github.io/Swooper/.

URL PDF HTML ☆

赞 0 踩 0

2603.05932 2026-03-09 cs.CV cs.RO

FTSplat: Feed-forward Triangle Splatting Network

Xiong Jinlin, Li Can, Shen Jiawei, Qi Zhigang, Sun Lei, Zhao Dongyang

2603.05929 2026-03-09 cs.CV

Beyond Static Frames: Temporal Aggregate-and-Restore Vision Transformer for Human Pose Estimation

Hongwei Fang, Jiahang Cai, Xun Wang, Wenwu Yang

2603.05928 2026-03-09 cs.CL cs.AI cs.HC cs.LG

Addressing the Ecological Fallacy in Larger LMs with Human Context

Nikita Soni, Dhruv Vijay Kunjadiya, Pratham Piyush Shah, Dikshya Mohanty, H. Andrew Schwartz, Niranjan Balasubramanian

2603.05926 2026-03-09 cs.CV

Towards Driver Behavior Understanding: Weakly-Supervised Risk Perception in Driving Scenes

Nakul Agarwal, Yi-Ting Chen, Behzad Dariush

Comments Accepted to IV 2026

2603.05925 2026-03-09 cs.CV cs.AI

RAC: Rectified Flow Auto Coder

Sen Fang, Yalin Feng, Yanxin Zhang, Dimitris N. Metaxas

Comments 11 Figures, 4 Tables. Project Page at https://world-snapshot.github.io/RAC/

2603.05924 2026-03-09 cs.LG

Weak-SIGReg: Covariance Regularization for Stable Deep Learning

Habibullah Akbar

Comments Accepted at GRaM workshop (ICLR 2026). Code & supplementary: https://github.com/kreasof-ai/sigreg

2603.05923 2026-03-09 cs.CL cs.HC

Learning Next Action Predictors from Human-Computer Interaction

Omar Shaikh, Valentin Teutschbein, Kanishk Gandhi, Yikun Chi, Nick Haber, Thomas Robinson, Nilam Ram, Byron Reeves, Sherry Yang, Michael S. Bernstein, Diyi Yang

Comments 32 pages, 10 figures, see https://generalusermodels.github.io/nap

详情

英文摘要

Truly proactive AI systems must anticipate what we will do next. This foresight demands far richer information than the sparse signals we type into our prompts -- it demands reasoning over the entire context of what we see and do. We formalize this as next action prediction (NAP): given a sequence of a user's multimodal interactions with a computer (screenshots, clicks, sensor data), predict that user's next action. Progress on this task requires both new data and modeling approaches. To scale data, we annotate longitudinal, naturalistic computer use with vision-language models. We release an open-source pipeline for performing this labeling on private infrastructure, and label over 360K actions across one month of continuous phone usage from 20 users, amounting to 1,800 hours of screen time. We then introduce LongNAP, a user model that combines parametric and in-context learning to reason over long interaction histories. LongNAP is trained via policy gradient methods to generate user-specific reasoning traces given some context; retrieve relevant traces from a library of past traces; and then apply retrieved traces in-context to predict future actions. Using an LLM-as-judge evaluation metric (0-1 similarity to ground truth), LongNAP significantly outperforms supervised finetuning and prompted baselines on held-out data (by 79% and 39% respectively). Additionally, LongNAP generalizes to held out users when trained across individuals. The space of next actions a user might take at any moment is unbounded, spanning thousands of possible outcomes. Despite this, 17.1% of LongNAP's predicted trajectories are well-aligned with what a user does next (LLM-judge score $\geq$ 0.5). This rises to 26% when we filter to highly confident predictions. In sum, we argue that learning from the full context of user behavior to anticipate user needs is now a viable task with substantial opportunity.

URL PDF HTML ☆

赞 0 踩 0

2603.05921 2026-03-09 cs.CV cs.AI

BlackMirror: Black-Box Backdoor Detection for Text-to-Image Models via Instruction-Response Deviation

Feiran Li, Qianqian Xu, Shilong Bao, Zhiyong Yang, Xilin Zhao, Xiaochun Cao, Qingming Huang

Comments This paper is accepted by CVPR 2026

2603.05916 2026-03-09 cs.RO

Iterative Convex Optimization with Control Barrier Functions for Obstacle Avoidance among Polytopes

Shuo Liu, Zhe Huang, Calin A. Belta

Comments 9 pages, 4 figures

2603.05911 2026-03-09 cs.CV cs.AI

CORE-Seg: Reasoning-Driven Segmentation for Complex Lesions via Reinforcement Learning

Yuxin Xie, Yuming Chen, Yishan Yang, Yi Zhou, Tao Zhou, Zhen Zhao, Jiacheng Liu, Huazhu Fu

Comments Under Review with Computational Visual Media

2603.05909 2026-03-09 cs.CL

InfoGatherer: Principled Information Seeking via Evidence Retrieval and Strategic Questioning

Maksym Taranukhin, Shuyue Stella Li, Evangelos Milios, Geoff Pleiss, Yulia Tsvetkov, Vered Shwartz

Comments Under review

2603.05908 2026-03-09 cs.CV

Pano3DComposer: Feed-Forward Compositional 3D Scene Generation from Single Panoramic Image

Zidian Qiu, Ancong Wu

Comments Accepted to CVPR 2026. Project page: https://qiuzidian.github.io/pano3dcomposer-page/

2603.05906 2026-03-09 cs.CV

Beyond Geometry: Artistic Disparity Synthesis for Immersive 2D-to-3D

Ping Chen, Zezhou Chen, Xingpeng Zhang, Yanlin Qian, Huan Hu, Xiang Liu, Zipeng Wang, Xin Wang, Zhaoxiang Liu, Kai Wang, Shiguo Lian

Comments Accepet by CVPR 2026 (10 pages, 4 figures)

2603.05905 2026-03-09 cs.CV

CollabOD: Collaborative Multi-Backbone with Cross-scale Vision for UAV Small Object Detection

Xuecheng Bai, Yuxiang Wang, Chuanzhi Xu, Boyu Hu, Kang Han, Ruijie Pan, Xiaowei Niu, Xiaotian Guan, Liqiang Fu, Pengfei Ye

2603.05902 2026-03-09 cs.RO cs.SY eess.SY

Improved hopping control on slopes for small robots using spring mass modeling

Heston Roberts, Pronoy Sarker, Sm Ashikul Islam, Min Gyu Kim

2603.05900 2026-03-09 cs.LG cs.AI

Reference-guided Policy Optimization for Molecular Optimization via LLM Reasoning

Xuan Li, Zhanke Zhou, Zongze Li, Jiangchao Yao, Yu Rong, Lu Zhang, Bo Han

2603.05899 2026-03-09 cs.CV cs.LG

Mitigating Bias in Concept Bottleneck Models for Fair and Interpretable Image Classification

Schrasing Tong, Antoine Salaun, Vincent Yuan, Annabel Adeyeri, Lalana Kagal

2603.05898 2026-03-09 cs.CV

InnoAds-Composer: Efficient Condition Composition for E-Commerce Poster Generation

Yuxin Qin, Ke Cao, Haowei Liu, Ao Ma, Fengheng Li, Honghe Zhu, Zheng Zhang, Run Ling, Wei Feng, Xuanhua He, Zhanjie Zhang, Zhen Guo, Haoyi Bian, Jingjing Lv, Junjie Shen, Ching Law

Comments Accepted by CVPR2026

2603.05895 2026-03-09 cs.CL

Building an Ensemble LLM Semantic Tagger for UN Security Council Resolutions

Hussein Ghaly

2603.05890 2026-03-09 cs.CL cs.AI

Lost in Stories: Consistency Bugs in Long Story Generation by LLMs

Junjie Li, Xinrui Guo, Yuhao Wu, Roy Ka-Wei Lee, Hongzhi Li, Yutao Xie

2603.05888 2026-03-09 cs.CV cs.GR cs.LG

PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction

Xiang Zhang, Sohyun Yoo, Hongrui Wu, Chuan Li, Jianwen Xie, Zhuowen Tu

Comments CVPR 2026. Project Page: https://mlpc-ucsd.github.io/PixARMesh

2603.05883 2026-03-09 cs.CL

VerChol -- Grammar-First Tokenization for Agglutinative Languages

Prabhu Raja

Comments 13 pages. A Morphological Alternative to Statistical Subword Tokenization

2603.05882 2026-03-09 cs.CV

CylinderSplat: 3D Gaussian Splatting with Cylindrical Triplanes for Panoramic Novel View Synthesis

Qiwei Wang, Xianghui Ze, Jingyi Yu, Yujiao Shi

2603.05878 2026-03-09 cs.CL cs.LG

ROSE: Reordered SparseGPT for More Accurate One-Shot Large Language Models Pruning

Mingluo Su, Huan Wang

Comments CPAL 2026 oral