arXivDaily arXiv每日学术速递 周一至周五更新
2601.13681 2026-01-21 cs.CR

ORCA - An Automated Threat Analysis Pipeline for O-RAN Continuous Development

Felix Klement, Alessandro Brighente, Michele Polese, Mauro Conti, Stefan Katzenbeisser

详情
英文摘要

The Open-Radio Access Network (O-RAN) integrates numerous software components in a cloud-like deployment, opening the radio access network to previously unconsidered security threats. With the ever-evolving threat landscape, integrating security practices through a DevSecOps approach is essential for fast and secure releases. Current vulnerability assessment practices often rely on manual, labor-intensive, and subjective investigations, leading to inconsistencies in the threat analysis. To mitigate these issues, we establish an automated pipeline that leverages Natural Language Processing (NLP) to minimize human intervention and associated biases. By mapping real-world vulnerabilities to predefined threat lists with a standardized input format, our approach is the first to enable iterative, quantitative, and efficient assessments, generating reliable threat scores for both individual vulnerabilities and entire system components within O-RAN. We illustrate the effectiveness of our framework through an example implementation for O-RAN, showcasing how continuous security testing can integrate into automated testing pipelines to address the unique security challenges of this paradigm shift in telecommunications.

2510.22007 2026-01-21 math.ST cs.CL cs.CR cs.LG stat.ML stat.TH

Optimal Detection for Language Watermarks with Pseudorandom Collision

T. Tony Cai, Xiang Li, Qi Long, Weijie J. Su, Garrett G. Wen

详情
英文摘要

Text watermarking plays a crucial role in ensuring the traceability and accountability of large language model (LLM) outputs and mitigating misuse. While promising, most existing methods assume perfect pseudorandomness. In practice, repetition in generated text induces collisions that create structured dependence, compromising Type I error control and invalidating standard analyses. We introduce a statistical framework that captures this structure through a hierarchical two-layer partition. At its core is the concept of minimal units -- the smallest groups treatable as independent across units while permitting dependence within. Using minimal units, we define a non-asymptotic efficiency measure and cast watermark detection as a minimax hypothesis testing problem. Applied to Gumbel-max and inverse-transform watermarks, our framework produces closed-form optimal rules. It explains why discarding repeated statistics often improves performance and shows that within-unit dependence must be addressed unless degenerate. Both theory and experiments confirm improved detection power with rigorous Type I error control. These results provide the first principled foundation for watermark detection under imperfect pseudorandomness, offering both theoretical insight and practical guidance for reliable tracing of model outputs.

2412.13274 2026-01-21 quant-ph cs.DS

Assessing fault-tolerant quantum advantage for $k$-SAT with structure

Martijn Brehm, Jordi Weggemans

Comments 40 pages, 17 tables, 1 figure

Journal ref Quantum 10, 1975 (2026)

详情
英文摘要

For many problems, quantum algorithms promise speedups over their classical counterparts. However, these results predominantly rely on asymptotic worst-case analysis, which overlooks significant overheads due to error correction and the fact that real-world instances often contain exploitable structure. In this work, we employ the hybrid benchmarking method to evaluate the potential of quantum Backtracking and Grover's algorithm against the 2023 SAT competition main track winner in solving random $k$-SAT instances with tunable structure, designed to represent industry-like scenarios, using both $T$-depth and $T$-count as cost metrics to estimate quantum run times. Our findings reproduce the results of Campbell, Khurana, and Montanaro (Quantum '19) in the unstructured case using hybrid benchmarking. However, we offer a more sobering perspective in practically relevant regimes: almost all quantum speedups vanish, even asymptotically, when minimal structure is introduced or when $T$-count is considered instead of $T$-depth. Moreover, when the requirement is for the algorithm to find a solution within a single day, we find that only Grover's algorithm has the potential to outperform classical algorithms, but only in a very limited regime and only when using $T$-depth. We also discuss how more sophisticated heuristics could restore the asymptotic scaling advantage for quantum backtracking, but our findings suggest that the potential for practical quantum speedups in more structured $k$-SAT solving will remain limited.

2403.02968 2026-01-21 quant-ph cs.CC cs.DS cs.IT cs.LG math.IT

Hamiltonian Property Testing

Andreas Bluhm, Matthias C. Caro, Aadil Oufkir

Comments 70 pages, 3 figures; minor improvements. Version accepted for publication in Quantum

Journal ref Quantum 10, 1979 (2026)

详情
英文摘要

Locality is a fundamental feature of many physical time evolutions. Assumptions on locality and related structural properties also underlie recently proposed procedures for learning an unknown Hamiltonian from access to the induced time evolution. However, no protocols to rigorously test whether an unknown Hamiltonian is local were known. We investigate Hamiltonian locality testing as a property testing problem, where the task is to determine whether an unknown $n$-qubit Hamiltonian $H$ is $k$-local or $\varepsilon$-far from all $k$-local Hamiltonians, given access to the time evolution along $H$. First, we emphasize the importance of the chosen distance measure: With respect to the operator norm, a worst-case distance measure, incoherent quantum locality testers require $\tildeΩ(2^n)$ many time evolution queries and an expected total evolution time of $\tildeΩ(2^n / \varepsilon)$, and even coherent testers need $Ω(2^{n/2})$ many queries and $Ω(2^{n/2}/\varepsilon)$ total evolution time. In contrast, when distances are measured according to the normalized Frobenius norm, corresponding to an average-case distance, we give a sample-, time-, and computationally efficient incoherent Hamiltonian locality testing algorithm based on randomized measurements. In fact, our procedure can be used to simultaneously test a wide class of Hamiltonian properties beyond locality. Finally, we prove that learning a general Hamiltonian remains exponentially hard with this average-case distance, thereby establishing an exponential separation between Hamiltonian testing and learning. Our work initiates the study of property testing for quantum Hamiltonians, demonstrating that a broad class of Hamiltonian properties is efficiently testable even with limited quantum capabilities, and positioning Hamiltonian testing as an independent area of research alongside Hamiltonian learning.

2402.08606 2026-01-21 quant-ph cs.LG

Arbitrary Polynomial Separations in Trainable Quantum Machine Learning

Eric R. Anschuetz, Xun Gao

Comments 30 pages, 3 figures, version accepted to Quantum

Journal ref Quantum 10, 1976 (2026)

详情
英文摘要

Recent theoretical results in quantum machine learning have demonstrated a general trade-off between the expressive power of quantum neural networks (QNNs) and their trainability; as a corollary of these results, practical exponential separations in expressive power over classical machine learning models are believed to be infeasible as such QNNs take a time to train that is exponential in the model size. We here circumvent these negative results by constructing a hierarchy of efficiently trainable QNNs that exhibit unconditionally provable, polynomial memory separations of arbitrary constant degree over classical neural networks -- including state-of-the-art models, such as Transformers -- in performing a classical sequence modeling task. This construction is also computationally efficient, as each unit cell of the introduced class of QNNs only has constant gate complexity. We show that contextuality -- informally, a quantitative notion of semantic ambiguity -- is the source of the expressivity separation, suggesting that other learning tasks with this property may be a natural setting for the use of quantum learning algorithms.

2310.18900 2026-01-21 quant-ph cs.NA math.NA

Quantum algorithms for linear and non-linear fractional reaction-diffusion equations

Dong An, Konstantina Trivisa

Journal ref Quantum 10, 1969 (2026)

详情
英文摘要

High-dimensional fractional reaction-diffusion equations have numerous applications in the fields of biology, chemistry, and physics, and exhibit a range of rich phenomena. While classical algorithms have an exponential complexity in the spatial dimension, a quantum computer can produce a quantum state that encodes the solution with only polynomial complexity, provided that suitable input access is available. In this work, we investigate efficient quantum algorithms for linear and nonlinear fractional reaction-diffusion equations with periodic boundary conditions. For linear equations, we analyze and compare the complexity of various methods, including the second-order Trotter formula, time-marching method, and truncated Dyson series method. We also present a novel algorithm that combines the linear combination of Hamiltonian simulation technique with the interaction picture formalism, resulting in optimal scaling in the spatial dimension. For nonlinear equations, we employ the Carleman linearization method and propose a block-encoding version that is appropriate for the dense matrices that arise from the spatial discretization of fractional reaction-diffusion equations.

2601.14256 2026-01-21 cs.CV

Implicit Neural Representation Facilitates Unified Universal Vision Encoding

Matthew Gwilliam, Xiao Wang, Xuefeng Hu, Zhenheng Yang

Comments 18 pages, 16 tables, 4 figures

详情
英文摘要

Models for image representation learning are typically designed for either recognition or generation. Various forms of contrastive learning help models learn to convert images to embeddings that are useful for classification, detection, and segmentation. On the other hand, models can be trained to reconstruct images with pixel-wise, perceptual, and adversarial losses in order to learn a latent space that is useful for image generation. We seek to unify these two directions with a first-of-its-kind model that learns representations which are simultaneously useful for recognition and generation. We train our model as a hyper-network for implicit neural representation, which learns to map images to model weights for fast, accurate reconstruction. We further integrate our INR hyper-network with knowledge distillation to improve its generalization and performance. Beyond the novel training design, the model also learns an unprecedented compressed embedding space with outstanding performance for various visual tasks. The complete model competes with state-of-the-art results for image representation learning, while also enabling generative capabilities with its high-quality tiny embeddings. The code is available at https://github.com/tiktok/huvr.

2601.14255 2026-01-21 cs.CV cs.AI

VideoMaMa: Mask-Guided Video Matting via Generative Prior

Sangbeom Lim, Seoung Wug Oh, Jiahui Huang, Heeji Yoon, Seungryong Kim, Joon-Young Lee

Comments Project page: https://cvlab-kaist.github.io/VideoMaMa/

详情
英文摘要

Generalizing video matting models to real-world videos remains a significant challenge due to the scarcity of labeled data. To address this, we present Video Mask-to-Matte Model (VideoMaMa) that converts coarse segmentation masks into pixel accurate alpha mattes, by leveraging pretrained video diffusion models. VideoMaMa demonstrates strong zero-shot generalization to real-world footage, even though it is trained solely on synthetic data. Building on this capability, we develop a scalable pseudo-labeling pipeline for large-scale video matting and construct the Matting Anything in Video (MA-V) dataset, which offers high-quality matting annotations for more than 50K real-world videos spanning diverse scenes and motions. To validate the effectiveness of this dataset, we fine-tune the SAM2 model on MA-V to obtain SAM2-Matte, which outperforms the same model trained on existing matting datasets in terms of robustness on in-the-wild videos. These findings emphasize the importance of large-scale pseudo-labeled video matting and showcase how generative priors and accessible segmentation cues can drive scalable progress in video matting research.

2601.14253 2026-01-21 cs.CV

Motion 3-to-4: 3D Motion Reconstruction for 4D Synthesis

Hongyuan Chen, Xingyu Chen, Youjia Zhang, Zexiang Xu, Anpei Chen

Comments Project page: https://motion3-to-4.github.io/. Code: https://github.com/Inception3D/Motion324

详情
英文摘要

We present Motion 3-to-4, a feed-forward framework for synthesising high-quality 4D dynamic objects from a single monocular video and an optional 3D reference mesh. While recent advances have significantly improved 2D, video, and 3D content generation, 4D synthesis remains difficult due to limited training data and the inherent ambiguity of recovering geometry and motion from a monocular viewpoint. Motion 3-to-4 addresses these challenges by decomposing 4D synthesis into static 3D shape generation and motion reconstruction. Using a canonical reference mesh, our model learns a compact motion latent representation and predicts per-frame vertex trajectories to recover complete, temporally coherent geometry. A scalable frame-wise transformer further enables robustness to varying sequence lengths. Evaluations on both standard benchmarks and a new dataset with accurate ground-truth geometry show that Motion 3-to-4 delivers superior fidelity and spatial consistency compared to prior work. Project page is available at https://motion3-to-4.github.io/.

2601.14251 2026-01-21 cs.CV

LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR

Said Taghadouini, Adrien Cavaillès, Baptiste Aubertin

详情
英文摘要

We present \textbf{LightOnOCR-2-1B}, a 1B-parameter end-to-end multilingual vision--language model that converts document images (e.g., PDFs) into clean, naturally ordered text without brittle OCR pipelines. Trained on a large-scale, high-quality distillation mix with strong coverage of scans, French documents, and scientific PDFs, LightOnOCR-2 achieves state-of-the-art results on OlmOCR-Bench while being 9$\times$ smaller and substantially faster than prior best-performing models. We further extend the output format to predict normalized bounding boxes for embedded images, introducing localization during pretraining via a resume strategy and refining it with RLVR using IoU-based rewards. Finally, we improve robustness with checkpoint averaging and task-arithmetic merging. We release model checkpoints under Apache 2.0, and publicly release the dataset and \textbf{LightOnOCR-bbox-bench} evaluation under their respective licenses.

2601.14250 2026-01-21 cs.CV

OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer

Pengze Zhang, Yanze Wu, Mengtian Li, Xu Bai, Songtao Zhao, Fulong Ye, Chong Mou, Xinghui Li, Zhuowei Chen, Qian He, Mingyuan Gao

Comments Github Page: https://pangzecheung.github.io/OmniTransfer/

详情
英文摘要

Videos convey richer information than images or text, capturing both spatial and temporal dynamics. However, most existing video customization methods rely on reference images or task-specific temporal priors, failing to fully exploit the rich spatio-temporal information inherent in videos, thereby limiting flexibility and generalization in video generation. To address these limitations, we propose OmniTransfer, a unified framework for spatio-temporal video transfer. It leverages multi-view information across frames to enhance appearance consistency and exploits temporal cues to enable fine-grained temporal control. To unify various video transfer tasks, OmniTransfer incorporates three key designs: Task-aware Positional Bias that adaptively leverages reference video information to improve temporal alignment or appearance consistency; Reference-decoupled Causal Learning separating reference and target branches to enable precise reference transfer while improving efficiency; and Task-adaptive Multimodal Alignment using multimodal semantic guidance to dynamically distinguish and tackle different tasks. Extensive experiments show that OmniTransfer outperforms existing methods in appearance (ID and style) and temporal transfer (camera movement and video effects), while matching pose-guided methods in motion transfer without using pose, establishing a new paradigm for flexible, high-fidelity video generation.

2601.14246 2026-01-21 cs.CV

Soft Tail-dropping for Adaptive Visual Tokenization

Zeyuan Chen, Kai Zhang, Zhuowen Tu, Yuanjun Xiong

详情
英文摘要

We present Soft Tail-dropping Adaptive Tokenizer (STAT), a 1D discrete visual tokenizer that adaptively chooses the number of output tokens per image according to its structural complexity and level of detail. STAT encodes an image into a sequence of discrete codes together with per-token keep probabilities. Beyond standard autoencoder objectives, we regularize these keep probabilities to be monotonically decreasing along the sequence and explicitly align their distribution with an image-level complexity measure. As a result, STAT produces length-adaptive 1D visual tokens that are naturally compatible with causal 1D autoregressive (AR) visual generative models. On ImageNet-1k, equipping vanilla causal AR models with STAT yields competitive or superior visual generation quality compared to other probabilistic model families, while also exhibiting favorable scaling behavior that has been elusive in prior vanilla AR visual generation attempts.

2601.14238 2026-01-21 cs.LG

Spatiotemporal Wildfire Prediction and Reinforcement Learning for Helitack Suppression

Shaurya Mathur, Shreyas Bellary Manjunath, Nitin Kulkarni, Alina Vereshchaka

Comments 6 pages, 5 figures (two of them in tables), Conference: IEEE International Conference on Machine Learning and Applications 2025 (ICMLA 2025): https://www.icmla-conference.org/icmla25/

详情
英文摘要

Wildfires are growing in frequency and intensity, devastating ecosystems and communities while causing billions of dollars in suppression costs and economic damage annually in the U.S. Traditional wildfire management is mostly reactive, addressing fires only after they are detected. We introduce \textit{FireCastRL}, a proactive artificial intelligence (AI) framework that combines wildfire forecasting with intelligent suppression strategies. Our framework first uses a deep spatiotemporal model to predict wildfire ignition. For high-risk predictions, we deploy a pre-trained reinforcement learning (RL) agent to execute real-time suppression tactics with helitack units inside a physics-informed 3D simulation. The framework generates a threat assessment report to help emergency responders optimize resource allocation and planning. In addition, we are publicly releasing a large-scale, spatiotemporal dataset containing $\mathbf{9.5}$ million samples of environmental variables for wildfire prediction. Our work demonstrates how deep learning and RL can be combined to support both forecasting and tactical wildfire response. More details can be found at https://sites.google.com/view/firecastrl.

2601.14236 2026-01-21 cs.IT math.IT

Stabilizer-Assisted Inactivation Decoding of Quantum Error-Correcting Codes with Erasures

Giulio Pech, Mert Gökduman, Hanwen Yao, Henry D. Pfister

Comments Presented as poster "Quantum Peeling with Guessing: Fast Stabilizer-Assisted Decoding for Quantum Erasures" at QIP 2026 and submitted to ISIT 2026

详情
英文摘要

In this work, we develop a reduced complexity maximum likelihood (ML) decoder for quantum low-density parity-check (QLDPC) codes over erasures. Our decoder combines classical inactivation decoding, which integrates peeling with symbolic guessing, with a new dual peeling procedure. In the dual peeling stage, we perform row operations on the stabilizer matrix to efficiently reveal stabilizer generators and their linear combinations whose support lies entirely on the erased set. Each such stabilizer identified allows us to freely fix a bit in its support without affecting the logical state of the decoded result. This removes one degree of freedom that would otherwise require a symbolic guess, reducing the number of inactivated variables and decreasing the size of the final linear system that must be solved. We further show that dual peeling combined with standard peeling alone, without inactivation, is sufficient to achieve ML for erasure decoding of surface codes. Simulations across several QLDPC code families confirm that our decoder matches ML logical failure performance while significantly reducing the complexity of inactivation decoding, including more than a 20% reduction in symbolic guesses for the B1 lifted product code at high erasure rates.

2601.14232 2026-01-21 cs.LG cs.AI cs.CV

KAGE-Bench: Fast Known-Axis Visual Generalization Evaluation for Reinforcement Learning

Egor Cherepanov, Daniil Zelezetsky, Alexey K. Kovalev, Aleksandr I. Panov

Comments 38 pages, 44 figures, 3 tables

详情
英文摘要

Pixel-based reinforcement learning agents often fail under purely visual distribution shift even when latent dynamics and rewards are unchanged, but existing benchmarks entangle multiple sources of shift and hinder systematic analysis. We introduce KAGE-Env, a JAX-native 2D platformer that factorizes the observation process into independently controllable visual axes while keeping the underlying control problem fixed. By construction, varying a visual axis affects performance only through the induced state-conditional action distribution of a pixel policy, providing a clean abstraction for visual generalization. Building on this environment, we define KAGE-Bench, a benchmark of six known-axis suites comprising 34 train-evaluation configuration pairs that isolate individual visual shifts. Using a standard PPO-CNN baseline, we observe strong axis-dependent failures, with background and photometric shifts often collapsing success, while agent-appearance shifts are comparatively benign. Several shifts preserve forward motion while breaking task completion, showing that return alone can obscure generalization failures. Finally, the fully vectorized JAX implementation enables up to 33M environment steps per second on a single GPU, enabling fast and reproducible sweeps over visual factors. Code: https://avanturist322.github.io/KAGEBench/.

2601.14228 2026-01-21 cs.LG

Attention-Based Offline Reinforcement Learning and Clustering for Interpretable Sepsis Treatment

Punit Kumar, Vaibhav Saran, Divyesh Patel, Nitin Kulkarni, Alina Vereshchaka

Comments 8 pages, 6 figures, Conference: IEEE International Conference on Machine Learning and Applications 2025 (ICMLA 2025): https://www.icmla-conference.org/icmla25/

详情
英文摘要

Sepsis remains one of the leading causes of mortality in intensive care units, where timely and accurate treatment decisions can significantly impact patient outcomes. In this work, we propose an interpretable decision support framework. Our system integrates four core components: (1) a clustering-based stratification module that categorizes patients into low, intermediate, and high-risk groups upon ICU admission, using clustering with statistical validation; (2) a synthetic data augmentation pipeline leveraging variational autoencoders (VAE) and diffusion models to enrich underrepresented trajectories such as fluid or vasopressor administration; (3) an offline reinforcement learning (RL) agent trained using Advantage Weighted Regression (AWR) with a lightweight attention encoder and supported by an ensemble models for conservative, safety-aware treatment recommendations; and (4) a rationale generation module powered by a multi-modal large language model (LLM), which produces natural-language justifications grounded in clinical context and retrieved expert knowledge. Evaluated on the MIMIC-III and eICU datasets, our approach achieves high treatment accuracy while providing clinicians with interpretable and robust policy recommendations.

2601.14227 2026-01-21 cs.SD

Transformer Architectures for Respiratory Sound Analysis and Multimodal Diagnosis

Theodore Aptekarev, Vladimir Sokolovsky, Gregory Furman

Comments 7 pages, 4 figures

详情
英文摘要

Respiratory sound analysis is a crucial tool for screening asthma and other pulmonary pathologies, yet traditional auscultation remains subjective and experience-dependent. Our prior research established a CNN baseline using DenseNet201, which demonstrated high sensitivity in classifying respiratory sounds. In this work, we (i) adapt the Audio Spectrogram Transformer (AST) for respiratory sound analysis and (ii) evaluate a multimodal Vision-Language Model (VLM) that integrates spectrograms with structured patient metadata. AST is initialized from publicly available weights and fine-tuned on a medical dataset containing hundreds of recordings per diagnosis. The VLM experiment uses a compact Moondream-type model that processes spectrogram images alongside a structured text prompt (sex, age, recording site) to output a JSON-formatted diagnosis. Results indicate that AST achieves approximately 97% accuracy with an F1-score around 97% and ROC AUC of 0.98 for asthma detection, significantly outperforming both the internal CNN baseline and typical external benchmarks. The VLM reaches 86-87% accuracy, performing comparably to the CNN baseline while demonstrating the capability to integrate clinical context into the inference process. These results confirm the effectiveness of self-attention for acoustic screening and highlight the potential of multimodal architectures for holistic diagnostic tools.

2601.14226 2026-01-21 quant-ph cs.LG

Deep Learning Approaches to Quantum Error Mitigation

Leonardo Placidi, Ifan Williams, Enrico Rinaldi, Daniel Mills, Cristina Cîrstoiu, Vanya Eccles, Ross Duncan

Comments 48 pages

详情
英文摘要

We present a systematic investigation of deep learning methods applied to quantum error mitigation of noisy output probability distributions from measured quantum circuits. We compare different architectures, from fully connected neural networks to transformers, and we test different design/training modalities, identifying sequence-to-sequence, attention-based models as the most effective on our datasets. These models consistently produce mitigated distributions that are closer to the ideal outputs when tested on both simulated and real device data obtained from IBM superconducting quantum processing units (QPU) up to five qubits. Across several different circuit depths, our approach outperforms other baseline error mitigation techniques. We perform a series of ablation studies to examine: how different input features (circuit, device properties, noisy output statistics) affect performance; cross-dataset generalization across circuit families; and transfer learning to a different IBM QPU. We observe that generalization performance across similar devices with the same architecture works effectively, without needing to fully retrain models.

2601.14221 2026-01-21 cs.SI

Beyond Polarization: Opinion Mixing and Social Influence in Deliberation

Mohak Goyal, Lodewijk Gelauff, Naman Gupta, Ashish Goel, Kamesh Munagala

详情
英文摘要

Deliberative processes are often discussed as increasing or decreasing polarization. This approach misses a different, and arguably more diagnostic, dimension of opinion change: whether deliberation reshuffles who agrees with whom, or simply moves everyone in parallel while preserving the pre-deliberation rank ordering. We introduce \opinion mixing, measured by Kendall's rank correlation (τ) between pre- and post-deliberation responses, as a complement to variance-based polarization metrics. Across two large online deliberative polls spanning 32 countries (MCF-2022: n=6,342; MCF-2023: n=1,529), deliberation increases opinion mixing relative to survey-only controls: treatment groups exhibit lower rank correlation on (97%) and (93%) of opinion questions, respectively. Polarization measures based on variance tell a more heterogeneous story: controls consistently converge, while treated groups sometimes converge and sometimes diverge depending on the issue. To probe mechanisms, we link transcripts and surveys in a third event (SOF: (n=617), 116 groups) and use LLM-assisted coding of 6,232 discussion statements. Expressed support in discussion statements strongly predicts subsequent group-level opinion shifts; this correlation is amplified by justification quality in the statements but not by argument novelty. To our knowledge, we are the first to observe how different notions of argument quality have different associations with the outcome of deliberation. This suggests that opinion change after deliberation is related to selective uptake of well-reasoned arguments, producing complex patterns of opinion reorganization that standard polarization metrics may miss.

2601.14212 2026-01-21 cs.NE cs.CL

Generalization and Completeness of Stochastic Local Search Algorithms

Daniel Loscos, Narciso Marti-Oliet, Ismael Rodriguez

Comments This paper was published in Swarm and Evolutionary Computation. The present version is the author's accepted manuscript

详情
英文摘要

We generalize Stochastic Local Search (SLS) heuristics into a unique formal model. This model has two key components: a common structure designed to be as large as possible and a parametric structure intended to be as small as possible. Each heuristic is obtained by instantiating the parametric part in a different way. Particular instances for Genetic Algorithms (GA), Ant Colony Optimization (ACO), and Particle Swarm Optimization (PSO) are presented. Then, we use our model to prove the Turing-completeness of SLS algorithms in general. The proof uses our framework to construct a GA able to simulate any Turing machine. This Turing-completeness implies that determining any non-trivial property concerning the relationship between the inputs and the computed outputs is undecidable for GA and, by extension, for the general set of SLS methods (although not necessarily for each particular method). Similar proofs are more informally presented for PSO and ACO.

2601.14209 2026-01-21 cs.LG cs.AI cs.CL

InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning

Matthew Y. R. Yang, Hao Bai, Ian Wu, Gene Yang, Amrith Setlur, Aviral Kumar

详情
英文摘要

Outcome-reward reinforcement learning (RL) has proven effective at improving the reasoning capabilities of large language models (LLMs). However, standard RL assigns credit only at the level of the final answer, penalizing entire reasoning traces when the outcome is incorrect and uniformly reinforcing all steps when it is correct. As a result, correct intermediate steps may be discouraged in failed traces, while spurious steps may be reinforced in successful ones. We refer to this failure mode as the problem of credit assignment. While a natural remedy is to train a process reward model, accurately optimizing such models to identify corrective reasoning steps remains challenging. We introduce Intervention Training (InT), a training paradigm in which the model performs fine-grained credit assignment on its own reasoning traces by proposing short, targeted corrections that steer trajectories toward higher reward. Using reference solutions commonly available in mathematical reasoning datasets and exploiting the fact that verifying a model-generated solution is easier than generating a correct one from scratch, the model identifies the first error in its reasoning and proposes a single-step intervention to redirect the trajectory toward the correct solution. We then apply supervised fine-tuning (SFT) to the on-policy rollout up to the point of error concatenated with the intervention, localizing error to the specific step that caused failure. We show that the resulting model serves as a far better initialization for RL training. After running InT and subsequent fine-tuning with RL, we improve accuracy by nearly 14% over a 4B-parameter base model on IMO-AnswerBench, outperforming larger open-source models such as gpt-oss-20b.

2601.14208 2026-01-21 cs.CV cs.GR cs.LG

Rig-Aware 3D Reconstruction of Vehicle Undercarriages using Gaussian Splatting

Nitin Kulkarni, Akhil Devarashetti, Charlie Cluss, Livio Forte, Dan Buckmaster, Philip Schneider, Chunming Qiao, Alina Vereshchaka

Comments 8 pages, 9 figures, Conference: IEEE International Conference on Machine Learning and Applications 2025 (ICMLA 2025): https://www.icmla-conference.org/icmla25/

详情
英文摘要

Inspecting the undercarriage of used vehicles is a labor-intensive task that requires inspectors to crouch or crawl underneath each vehicle to thoroughly examine it. Additionally, online buyers rarely see undercarriage photos. We present an end-to-end pipeline that utilizes a three-camera rig to capture videos of the undercarriage as the vehicle drives over it, and produces an interactive 3D model of the undercarriage. The 3D model enables inspectors and customers to rotate, zoom, and slice through the undercarriage, allowing them to detect rust, leaks, or impact damage in seconds, thereby improving both workplace safety and buyer confidence. Our primary contribution is a rig-aware Structure-from-Motion (SfM) pipeline specifically designed to overcome the challenges of wide-angle lens distortion and low-parallax scenes. Our method overcomes the challenges of wide-angle lens distortion and low-parallax scenes by integrating precise camera calibration, synchronized video streams, and strong geometric priors from the camera rig. We use a constrained matching strategy with learned components, the DISK feature extractor, and the attention-based LightGlue matcher to generate high-quality sparse point clouds that are often unattainable with standard SfM pipelines. These point clouds seed the Gaussian splatting process to generate photorealistic undercarriage models that render in real-time. Our experiments and ablation studies demonstrate that our design choices are essential to achieve state-of-the-art quality.

2601.14202 2026-01-21 cs.IT cs.CR cs.NI eess.SP math.IT

Storage-Rate Trade-off in A-XPIR

Mohamed Nomeir, Sennur Ulukus

详情
英文摘要

We consider the storage problem in an asymmetric $X$-secure private information retrieval (A-XPIR) setting. The A-XPIR setting considers the $X$-secure PIR problem (XPIR) when a given arbitrary set of servers is communicating. We focus on the trade-off region between the average storage at the servers and the average download cost. In the case of $N=4$ servers and two non-overlapping sets of communicating servers with $K=2$ messages, we characterize the achievable region and show that the three main inequalities compared to the no-security case collapse to two inequalities in the asymmetric security case. In the general case, we derive bounds that need to be satisfied for the general achievable region for an arbitrary number of servers and messages. In addition, we provide the storage and retrieval scheme for the case of $N=4$ servers with $K=2$ messages and two non-overlapping sets of communicating servers, such that the messages are not replicated (in the sense of a coded version of each symbol) and at the same time achieve the optimal achievable rate for the case of replication. Finally, we derive the exact capacity for the case of asymmetric security and asymmetric collusion for $N=4$ servers, with the communication links $\{1,2\}$ and $\{3,4\}$, which splits the servers into two groups, i.e., $g=2$, and with the collusion links $\{1,3\}$, $\{2,4\}$, as $C=\frac{1}{3}$. More generally, we derive a capacity result for a certain family of asymmetric collusion and asymmetric security cases.

2601.14201 2026-01-21 math.NA cs.NA

Convergence analysis and a novel Lagrange multiplier partitioned method for fluid-poroelastic interaction

Amy de Castro, Hyesuk Lee

Comments 5 figures

详情
英文摘要

We propose a partitioned method for the monolithic formulation of the Stokes-Biot system that incorporates Lagrange multipliers enforcing the interface conditions. The monolithic system is discretized using finite elements, and we establish convergence of the resulting approximation. A Schur complement based algorithm is developed together with an efficient preconditioner, enabling the fluid and poroelastic structure subproblems to be decoupled and solved independently at each time step. The Lagrange multipliers approximate the interface fluxes and act as Neumann boundary conditions for the subproblems, yielding parallel solution of the Stokes and Biot equations. Numerical experiments demonstrate the effectiveness of the proposed algorithm and validate the theoretical error estimate.

2601.14198 2026-01-21 math.NA cs.NA

Local electrical impedance tomography via projections

A. Jääskeläinen, A. Vavilov, J. Toivanen, A. Hänninen, V. Kolehmainen, N. Hyvönen

Comments 20 pages, 9 figures

详情
英文摘要

This paper introduces a method for approximately eliminating the effect that conductivity changes outside the region of interest have in electrical impedance tomography, allowing to form a local reconstruction in the region of interest only. The method considers the Jacobian matrix of the forward map, i.e., of the map that sends the discretized conductivity to the electrode measurements, at an initial guess for the conductivity. The Jacobian matrix is divided columnwise into two parts: one corresponding to the region of interest and a nuisance Jacobian corresponding to the rest of the domain. The leading idea is to project both the electrode measurements and the forward map onto the orthogonal complement of the span of a number of left-hand singular vectors for a suitably weighted nuisance Jacobian. The weighting can, e.g., account for the element sizes in a finite element discretization or to prior information on the conductivity outside the region of interest. The inverse problem is then solved by considering the projected relation between the measurements and the forward map, only reconstructing the conductivity in the region of interest. The functionality of the method is demonstrated by applying a reconstruction algorithm that combines lagged diffusivity iteration and total variation regularization to experimental data. In particular, data from a head-shaped water tank is considered, with the conductivity change in the region of interest mimicking growth of a hemorrhagic stroke and the changes outside the region of interest imitating physiological variations in the conductivity of the scalp.

2601.14196 2026-01-21 cs.LG

Differentiated Pickup Point Offering for Emission Reduction in Last-Mile Delivery

Albina Galiullina, Wouter van Heeswijk, Tom van Woensel

详情
英文摘要

Pickup points are widely recognized as a sustainable alternative to home delivery, as consolidating orders at pickup locations can shorten delivery routes and improve first-attempt success rates. However, these benefits may be negated when customers drive to pick up their orders. This study proposes a Differentiated Pickup Point Offering (DPO) policy that aims to jointly reduce emissions from delivery truck routes and customer travel. Under DPO, each arriving customer is offered a single recommended pickup point, rather than an unrestricted choice among all locations, while retaining the option of home delivery. We study this problem in a dynamic and stochastic setting, where the pickup point offered to each customer depends on previously realized customer locations and delivery choices. To design effective DPO policies, we adopt a reinforcement learning-based approach that accounts for spatial relationships between customers and pickup points and their implications for future route consolidation. Computational experiments show that differentiated pickup point offerings can substantially reduce total carbon emissions. The proposed policies reduce total emissions by up to 9% relative to home-only delivery and by 2% on average compared with alternative policies, including unrestricted pickup point choice and nearest pickup point assignment. Differentiated offerings are particularly effective in dense urban settings with many pickup points and short inter-location distances. Moreover, explicitly accounting for the dynamic nature of customer arrivals and choices is especially important when customers are less inclined to choose pickup point delivery over home delivery.

2601.14195 2026-01-21 cs.GT cs.DS

A Minimax Perspective on Almost-Stable Matchings

Frederik Glitzner, David Manlove

Comments Preliminary version to appear at AAMAS 2026

详情
英文摘要

Stability is crucial in matching markets, yet in many real-world settings - from hospital residency allocations to roommate assignments - full stability is either impossible to achieve or can come at the cost of leaving many agents unmatched. When stability cannot be achieved, algorithmicists and market designers face a critical question: how should instability be measured and distributed among participants? Existing approaches to "almost-stable" matchings focus on aggregate measures, minimising either the total number of blocking pairs or the count of agents involved in blocking pairs. However, such aggregate objectives can result in concentrated instability on a few individual agents, raising concerns about fairness and incentives to deviate. We introduce a fairness-oriented approach to approximate stability based on the minimax principle: we seek matchings that minimise the maximum number of blocking pairs any agent is in. Equivalently, we minimise the maximum number of agents that anyone has justified envy towards. This distributional objective protects the worst-off agents from a disproportionate amount of instability. We characterise the computational complexity of this notion across fundamental matching settings. Surprisingly, even very modest guarantees prove computationally intractable: we show that it is NP-complete to decide whether a matching exists in which no agent is in more than one blocking pair, even when preference lists have constant-bounded length. This hardness applies to both Stable Roommates and maximum-cardinality Stable Marriage. On the positive side, we provide polynomial-time algorithms when agents rank at most two others, and present approximation algorithms and integer programs. Our results map the algorithmic landscape and reveal fundamental trade-offs between distributional guarantees and computational feasibility.

2601.14192 2026-01-21 cs.AI cs.CL

Toward Efficient Agents: Memory, Tool learning, and Planning

Xiaofang Yang, Lijun Li, Heng Zhou, Tong Zhu, Xiaoye Qu, Yuchen Fan, Qianshan Wei, Rui Ye, Li Kang, Yiran Qin, Zhiqiang Kou, Daizong Liu, Qi Li, Ning Ding, Siheng Chen, Jing Shao

Comments 35 pages, 200 references

详情
英文摘要

Recent years have witnessed increasing interest in extending large language models into agentic systems. While the effectiveness of agents has continued to improve, efficiency, which is crucial for real-world deployment, has often been overlooked. This paper therefore investigates efficiency from three core components of agents: memory, tool learning, and planning, considering costs such as latency, tokens, steps, etc. Aimed at conducting comprehensive research addressing the efficiency of the agentic system itself, we review a broad range of recent approaches that differ in implementation yet frequently converge on shared high-level principles including but not limited to bounding context via compression and management, designing reinforcement learning rewards to minimize tool invocation, and employing controlled search mechanisms to enhance efficiency, which we discuss in detail. Accordingly, we characterize efficiency in two complementary ways: comparing effectiveness under a fixed cost budget, and comparing cost at a comparable level of effectiveness. This trade-off can also be viewed through the Pareto frontier between effectiveness and cost. From this perspective, we also examine efficiency oriented benchmarks by summarizing evaluation protocols for these components and consolidating commonly reported efficiency metrics from both benchmark and methodological studies. Moreover, we discuss the key challenges and future directions, with the goal of providing promising insights.

2601.14190 2026-01-21 cs.CY

Analyzing Far-Right Telegram Channels as Constituents of Information Autocracy in Russia

Polina Smirnova, Mykola Makhortykh

详情
英文摘要

This study examines how Russian far-right communities on Telegram shape perceptions of political figures through memes and visual narratives. Far from passive spectators, these actors co-produce propaganda, blending state-aligned messages with their own extremist framings. In Russia, such groups are central because they articulate the ideological foundations of the war against Ukraine and reflect the regime's gradual drift toward ultranationalist rhetoric. Drawing on a dataset of 200,000 images from expert-selected far-right Telegram channels, the study employs computer vision and unsupervised clustering to identify memes featuring Russian (Putin, Shoigu) and foreign politicians (Zelensky, Biden, Trump) and to reveal recurrent visual patterns in their representation. By leveraging the large-scale and temporal depth of this dataset, the analysis uncovers differential patterns of legitimation and delegitimation across actors and over time. These insights are not attainable in smaller-scale studies. Preliminary findings show that far-right memes function as instruments of propaganda co-production. These communities do not simply echo official messages but generate bottom-up narratives of legitimation and delegitimation that align with state ideology. By framing leaders as heroic and opponents as corrupt or weak, far-right actors act as informal co-creators of authoritarian legitimacy within Russia's informational autocracy.

2601.14189 2026-01-21 math.NA cs.NA

From big q-Jacobi and Chebyshev polynomials to exponential-reproducing subdivision: new identities

Leonard Peter Bos, Lucia Romani, Alberto Viscardi

详情
英文摘要

In this paper we derive new identities satisfied by Chebyshev polynomials of the first kind and big q-Jacobi polynomials. An immediate benefit of the derived identities is the achievement of closed-form expressions for the Laurent polynomials that identify minimum-support interpolating subdivision schemes reproducing finite sets of integer powers of exponentials.