- EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environmentscs.CL
Jundong Xu, Qingchuan Li, Jiaying Wu, Yihuai Lan · Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations assume static environments. In contrast, real-world deployment is inherently dynamic, requiring agents to continually…
- Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuningcs.CL
Zilin Xiao, Qi Ma, Chun-cheng Jason Chen, Xintao Chen · Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowledge, yet conventional retrieval based on lexical or semantic similarity is poorly suited for complex reasoning tasks: a…
- InterleaveThinker: Reinforcing Agentic Interleaved Generationcs.CV
Dian Zheng, Harry Lee, Manyuan Zhang, Kaituo Feng · Recent image generators have demonstrated impressive photorealism and instruction-following capabilities in single-image generation and editing. However, constrained by their architectures, they cannot achieve interleaved generation…
- Mana: Dexterous Manipulation of Articulated Toolscs.RO
Zhao-Heng Yin, Guanya Shi, Pieter Abbeel, C. Karen Liu · Articulated tool manipulation remains a major challenge in dexterous robotics due to the need to coordinate internal degrees of freedom and contact-rich interactions. While prior work has largely focused on rigid objects, articulated tool…
- Modality Forcing for Scalable Spatial Generationcs.CV
Bardienus Pieter Duisterhof, Deva Ramanan, Jeffrey Ichnowski, Justin Johnson · Text-to-image (T2I) models contain rich spatial priors. Synthesizing photorealistic, cluttered scenes requires an understanding of geometry, including perspective and relative scale. Prior works adapt T2I models to leverage this prior for…
- RepWAM: World Action Modeling with Representation Visual-Action Tokenizerscs.CV
Junke Wang, Qihang Zhang, Shuai Yang, Yiming Luo · This work presents RepWAM, a representation-centric world action model (WAM) built on representation visual-action tokenizers. Existing WAMs typically inherit reconstruction-oriented video tokenizers from pretrained video generation…
- SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoningcs.CV
Seokju Cho, Ryo Hachiuma, Abhishek Badki, Hang Su · Spatial reasoning, the ability to determine where objects are, how they relate, and how they move in 3D, remains a fundamental challenge for vision-language models (VLMs). Tool-augmented agents attempt to address this by augmenting VLMs…
- Understanding Truncated Positional Encodings for Graph Neural Networkscs.LG
James Flora, Mitchell Black, Weng-Keen Wong, Amir Nayyeri · Positional encodings (PEs) enhance the power of graph neural networks (GNNs), both theoretically and empirically. Two of the most popular families of PEs - spectral (e.g., Laplacian eigenspaces, effective resistance) and walk-based…
- Automated reproducibility assessments in the social and behavioral sciences using large language modelscs.AI
Tobias Holtdirk, Pietro Marcolongo, Anna Steinberg Schulten, Felix Henninger · Reproducibility in the social and behavioral sciences is typically evaluated by independent researchers who reanalyze the original data to assess whether the published findings can be recovered. However, such approaches are…
- Agents-K1: Towards Agent-native Knowledge Orchestrationcs.AI
Zongsheng Cao, Bihao Zhan, Jinxin Shi, Jiong Wang · Current LLM-based research agents have advanced through agent orchestration, yet largely overlook scientific knowledge orchestration. Existing works often reduce papers to abstracts, surface mentions, and flat \texttt{cites} edges,…
- Influcoder: Distilling Decoders' Gradient Influence Rankings into an Encoder for Data Attributioncs.CL
Dimitri Kachler, Damien Sileo, Pascal Denis · With the growth of LLMs' (Large Language Models) capabilities, there has been an increasing push to curate high quality datasets by filtering samples in the training data. In general, Data Attribution (DA) methods aim to estimate how…
- HyperTool: Beyond Step-Wise Tool Calls for Tool-Augmented Agentscs.CL
Yaxin Du, Yifan Zhou, Yujie Ge, Jiajun Wang · Tool-augmented LLM agents commonly rely on step-wise atomic tool calls, where each invocation, observation, and value transfer is exposed in the main reasoning trace. This creates an \emph{execution-granularity mismatch}: locally…
- EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discoverycs.AI
Amy Xin, Jiening Siow, Junjie Wang, Zijun Yao · LLM-based agents have shown increasing potential in automating scientific discovery. Given an optimizable metric and an execution environment, they can propose, validate, and iterate scientific solutions, and have produced results that…
- Specifying Hardware Communication as Programscs.PL
Ernest Ng, Nikil Shyamsunder, Francis Pham, Adrian Sampson · To test and debug hardware modules, it is common to write two programs: a driver, which translates high-level transactions into interactions on the module's input and output signals, and a monitor, which analyzes a signal-level execution…
- Before You Think: System 0, AI-Mediated Cognition and Cognitive Colonizationcs.AI
Marianna Bergamaschi Ganapini, Massimo Chiriatti, Enrico Panai, Giuseppe Riva · This paper examines three recent frameworks for understanding the cognitive and epistemic consequences of artificial intelligence: Tri-System Theory, Thinkframes, and System 0. It argues that while the first two capture important…
- Dense Supervision, Sparse Updates: On the Sparsity and Geometry of On-Policy Distillationcs.LG
Guo Yu, Wenlin Liu, Yulan Hu, Hao-Xuan Ma · On-policy distillation (\textsc{OPD}) has recently become a prominent post-training recipe as it combines two desirable ingredients: on-policy student trajectories and dense teacher supervision, yet how this hybrid changes a model's…
- Flex4DHuman: Flexible Multi-view Video Diffusion for 4D Human Reconstructioncs.CV
Jen-Hao Cheng, Yipeng Wang, Hao Zhang, Gengshan Yang · We present Flex4DHuman, a multi-view video diffusion model that transforms a monocular or sparse multi-view video of a dynamic subject into synchronized dense multi-view videos using only relative camera-pose conditioning. Unlike prior…
- World Tracing: Generative Pixel-Aligned Geometry Beyond the Visiblecs.CV
Hao Zhang, Mohamed El Banani, Jen-Hao Cheng, Paul Zhang · Image-to-3D methods often trade off faithfulness and completeness: depth estimators are anchored to input pixels but stop at the visible surface, while image-to-3D models generate complete shapes that are often misaligned with the input.…
- Operadic consistency: a label-free signal for compositional reasoning failures in LLMscs.CL
Nathaniel Bottman, Yinhong Liu, Kyle Richardson · Detecting LLM reasoning failures at inference time without ground-truth labels has motivated a wide range of confidence baselines, including self-consistency, semantic entropy, and P(True), built on within-question sampling and…
- SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptationcs.CL
Marek Šuppa, Andrej Ridzik, Daniel Hládek, Natália Kňažeková · We introduce SkMTEB, the first comprehensive MTEB-style text embedding benchmark for Slovak, a low-resource West Slavic language, comprising 31 datasets across 7 task types -- nearly 4$\times$ the depth of existing multilingual benchmark…
- Surflo: Consistent 3D Surface Flow Model with Global Statecs.CV
Antoine Guédon, Shu Nakamura, Nicolas Dufour, Jiahui Lei · Geometry is invariant to viewpoint, which makes any collection of images a redundant encoding of a single 3D state. Existing feed-forward reconstruction models fail to exploit this: per-view methods emit overlapping, unaligned pointmaps…
- Recursive Agent Harnessescs.CL
Elias Lumer, Sahil Sen, Kevin Paul, Vamse Kumar Subbiah · Recursive language models (RLMs) showed that recursion over model calls is an effective strategy for long-context reasoning, and production coding agents have begun to write code that spawns subagents at scale, most recently in Anthropic's…
- Tuning Agent-Based Predator-Prey Models Toward Lotka-Volterra Dynamicscs.MA
Corinna Mandl, Siddharth Chaturvedi, Marcel van Gerven · Recent growth in compute power has made it increasingly feasible to use large-scale agent-based models to simulate complex adaptive systems. A central difficulty is that such models contain many local rules and parameters, where small…
- The Stable Recovery Manifold: Geometric Principles Governing Recoverability in Continual Learningcs.LG
Ayushman Trivedi, Bhavika Melwani · Catastrophic forgetting is often viewed as the destruction of previously learned knowledge during sequential learning. Building on the Accessibility Collapse framework, we investigate the geometric structure of recoverability in continual…
- Operads for compositional reasoning in LLMscs.CL
Nathaniel Bottman, Kyle Richardson · Question decomposition, i.e. breaking a complex query into simpler sub-queries whose answers are composed to produce a final answer, is a widely used strategy for improving LLM reasoning, yet it currently lacks a rigorous mathematical…
- Aerial Wildfire Suppression Planning with a Hybrid CNN-Cellular Automata Fire Modeleess.SY
Ion Matei, Maksym Zhenirovskyy, Takuya Kurihana, Rohit Vupala · Aerial wildfire suppression requires not only predicting fire spread, but also designing effective intervention strategies under operational and environmental uncertainty. We present a modeling and optimization framework for aerial…
- Beyond Virtual Delay: Improving Packet Delay Bound in Network Calculuscs.PF
Yuming Jiang · In network calculus, a fundamental result is the classical delay bound given by the horizontal deviation between the arrival and service curves. While widely used, the classical bound is derived from the notion of virtual delay. In this…
- From Tokens to Faces: Investigating Discrete Speech Representations for 3D Facial Animationcs.CL
Pedro Correa, Olivier Perrotin, Samir Sadok, Paula Costa · The choice of speech representation is critical in speech-driven 3D facial animation. Representations differ in what they encode: SSL features emphasize segmental and semantic cues, neural codecs yield latents optimized for acoustic…
- Valid Inference with Synthetic Data via Task Exchangeabilitystat.ME
Lezhi Tan, Tijana Zrnic · There is a proliferation of work arguing for the use of synthetic data in scientific research. For example, social scientists are arguing for the use of LLM-generated "silicon samples" in pilot studies; AI evaluations increasingly rely on…
- Generative Modeling of Bach-Style Symbolic Music: A Comparative Study of Autoregressive, Latent-Variable, and Adversarial Approachescs.SD
Kyuil Lee, Dezhi Yu, Yongkang Huang · We study generative modeling of Bach-style symbolic piano music using a shared MIDI corpus and three model families: autoregressive LSTMs with attention, latent-variable models including recurrent VAEs and vector-quantized VAEs, and…