📊 综合分析(含来源与依据)
1. 大语言模型代理正向具备持久记忆与高级工具编排迈进,使其能够在动态环境中自我学习、执行科学实验并自动评估可重复性。【来源: arXiv/1】【来源: arXiv/12】【来源: arXiv/10】【来源: arXiv/13】【来源: arXiv/9】
2. 通过交叉式生成与检索增强的强化微调,代理的推理与规划能力得到显著提升,尤其在多步骤任务中表现更稳健。【来源: arXiv/3】【来源: arXiv/2】
3. 空间推理与三维生成技术快速进化,基于新型视觉‑动作分词、模态强制和像素对齐几何的模型显著提升了机器人操作、多人视角人体重建和完整场景合成的质量。【来源: arXiv/5】【来源: arXiv/6】【来源: arXiv/7】【来源: arXiv/17】【来源: arXiv/18】【来源: arXiv/21】
4. 对组合推理和图神经网络的理论探讨日益深化,利用 operad 一致性信号、组合算子框架以及截断位置编码,为模型可靠性与可解释性提供了新工具。【来源: arXiv/19】【来源: arXiv/25】【来源: arXiv/8】
5. 研究正向细分领域渗透,包括关节工具操作、航空灭火规划、合成数据推断以及巴赫风格音乐生成,展示了大模型在专业任务中的适配潜力。【来源: arXiv/4】【来源: arXiv/26】【来源: arXiv/29】【来源: arXiv/30】
依据:对标题关键词进行聚类后,发现本批次论文围绕 LLM 代理记忆/工具化、交叉生成与强化学习、空间/三维感知、组合推理理论以及特定行业应用这几大方向集中出现,形成了上述趋势概括。
- 1. EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environmentscs.CL
Jundong Xu, Qingchuan Li, Jiaying Wu, Yihuai Lan · Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations assume static environments. In contrast, real-world deployment is inherently dynamic, requiring agents to continually…
- 2. Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuningcs.CL
Zilin Xiao, Qi Ma, Chun-cheng Jason Chen, Xintao Chen · Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowledge, yet conventional retrieval based on lexical or semantic similarity is poorly suited for complex reasoning tasks: a…
- 3. InterleaveThinker: Reinforcing Agentic Interleaved Generationcs.CV
Dian Zheng, Harry Lee, Manyuan Zhang, Kaituo Feng · Recent image generators have demonstrated impressive photorealism and instruction-following capabilities in single-image generation and editing. However, constrained by their architectures, they cannot achieve interleaved generation…
- 4. Mana: Dexterous Manipulation of Articulated Toolscs.RO
Zhao-Heng Yin, Guanya Shi, Pieter Abbeel, C. Karen Liu · Articulated tool manipulation remains a major challenge in dexterous robotics due to the need to coordinate internal degrees of freedom and contact-rich interactions. While prior work has largely focused on rigid objects, articulated tool…
- 5. Modality Forcing for Scalable Spatial Generationcs.CV
Bardienus Pieter Duisterhof, Deva Ramanan, Jeffrey Ichnowski, Justin Johnson · Text-to-image (T2I) models contain rich spatial priors. Synthesizing photorealistic, cluttered scenes requires an understanding of geometry, including perspective and relative scale. Prior works adapt T2I models to leverage this prior for…
- 6. RepWAM: World Action Modeling with Representation Visual-Action Tokenizerscs.CV
Junke Wang, Qihang Zhang, Shuai Yang, Yiming Luo · This work presents RepWAM, a representation-centric world action model (WAM) built on representation visual-action tokenizers. Existing WAMs typically inherit reconstruction-oriented video tokenizers from pretrained video generation…
- 7. SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoningcs.CV
Seokju Cho, Ryo Hachiuma, Abhishek Badki, Hang Su · Spatial reasoning, the ability to determine where objects are, how they relate, and how they move in 3D, remains a fundamental challenge for vision-language models (VLMs). Tool-augmented agents attempt to address this by augmenting VLMs…
- 8. Understanding Truncated Positional Encodings for Graph Neural Networkscs.LG
James Flora, Mitchell Black, Weng-Keen Wong, Amir Nayyeri · Positional encodings (PEs) enhance the power of graph neural networks (GNNs), both theoretically and empirically. Two of the most popular families of PEs - spectral (e.g., Laplacian eigenspaces, effective resistance) and walk-based…
- 9. Automated reproducibility assessments in the social and behavioral sciences using large language modelscs.AI
Tobias Holtdirk, Pietro Marcolongo, Anna Steinberg Schulten, Felix Henninger · Reproducibility in the social and behavioral sciences is typically evaluated by independent researchers who reanalyze the original data to assess whether the published findings can be recovered. However, such approaches are…
- 10. Agents-K1: Towards Agent-native Knowledge Orchestrationcs.AI
Zongsheng Cao, Bihao Zhan, Jinxin Shi, Jiong Wang · Current LLM-based research agents have advanced through agent orchestration, yet largely overlook scientific knowledge orchestration. Existing works often reduce papers to abstracts, surface mentions, and flat \texttt{cites} edges,…
- 11. Influcoder: Distilling Decoders' Gradient Influence Rankings into an Encoder for Data Attributioncs.CL
Dimitri Kachler, Damien Sileo, Pascal Denis · With the growth of LLMs' (Large Language Models) capabilities, there has been an increasing push to curate high quality datasets by filtering samples in the training data. In general, Data Attribution (DA) methods aim to estimate how…
- 12. HyperTool: Beyond Step-Wise Tool Calls for Tool-Augmented Agentscs.CL
Yaxin Du, Yifan Zhou, Yujie Ge, Jiajun Wang · Tool-augmented LLM agents commonly rely on step-wise atomic tool calls, where each invocation, observation, and value transfer is exposed in the main reasoning trace. This creates an \emph{execution-granularity mismatch}: locally…
- 13. EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discoverycs.AI
Amy Xin, Jiening Siow, Junjie Wang, Zijun Yao · LLM-based agents have shown increasing potential in automating scientific discovery. Given an optimizable metric and an execution environment, they can propose, validate, and iterate scientific solutions, and have produced results that…
- 14. Specifying Hardware Communication as Programscs.PL
Ernest Ng, Nikil Shyamsunder, Francis Pham, Adrian Sampson · To test and debug hardware modules, it is common to write two programs: a driver, which translates high-level transactions into interactions on the module's input and output signals, and a monitor, which analyzes a signal-level execution…
- 15. Before You Think: System 0, AI-Mediated Cognition and Cognitive Colonizationcs.AI
Marianna Bergamaschi Ganapini, Massimo Chiriatti, Enrico Panai, Giuseppe Riva · This paper examines three recent frameworks for understanding the cognitive and epistemic consequences of artificial intelligence: Tri-System Theory, Thinkframes, and System 0. It argues that while the first two capture important…
- 16. Dense Supervision, Sparse Updates: On the Sparsity and Geometry of On-Policy Distillationcs.LG
Guo Yu, Wenlin Liu, Yulan Hu, Hao-Xuan Ma · On-policy distillation (\textsc{OPD}) has recently become a prominent post-training recipe as it combines two desirable ingredients: on-policy student trajectories and dense teacher supervision, yet how this hybrid changes a model's…
- 17. Flex4DHuman: Flexible Multi-view Video Diffusion for 4D Human Reconstructioncs.CV
Jen-Hao Cheng, Yipeng Wang, Hao Zhang, Gengshan Yang · We present Flex4DHuman, a multi-view video diffusion model that transforms a monocular or sparse multi-view video of a dynamic subject into synchronized dense multi-view videos using only relative camera-pose conditioning. Unlike prior…
- 18. World Tracing: Generative Pixel-Aligned Geometry Beyond the Visiblecs.CV
Hao Zhang, Mohamed El Banani, Jen-Hao Cheng, Paul Zhang · Image-to-3D methods often trade off faithfulness and completeness: depth estimators are anchored to input pixels but stop at the visible surface, while image-to-3D models generate complete shapes that are often misaligned with the input.…
- 19. Operadic consistency: a label-free signal for compositional reasoning failures in LLMscs.CL
Nathaniel Bottman, Yinhong Liu, Kyle Richardson · Detecting LLM reasoning failures at inference time without ground-truth labels has motivated a wide range of confidence baselines, including self-consistency, semantic entropy, and P(True), built on within-question sampling and…
- 20. SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptationcs.CL
Marek Šuppa, Andrej Ridzik, Daniel Hládek, Natália Kňažeková · We introduce SkMTEB, the first comprehensive MTEB-style text embedding benchmark for Slovak, a low-resource West Slavic language, comprising 31 datasets across 7 task types -- nearly 4$\times$ the depth of existing multilingual benchmark…
- 21. Surflo: Consistent 3D Surface Flow Model with Global Statecs.CV
Antoine Guédon, Shu Nakamura, Nicolas Dufour, Jiahui Lei · Geometry is invariant to viewpoint, which makes any collection of images a redundant encoding of a single 3D state. Existing feed-forward reconstruction models fail to exploit this: per-view methods emit overlapping, unaligned pointmaps…
- 22. Recursive Agent Harnessescs.CL
Elias Lumer, Sahil Sen, Kevin Paul, Vamse Kumar Subbiah · Recursive language models (RLMs) showed that recursion over model calls is an effective strategy for long-context reasoning, and production coding agents have begun to write code that spawns subagents at scale, most recently in Anthropic's…
- 23. Tuning Agent-Based Predator-Prey Models Toward Lotka-Volterra Dynamicscs.MA
Corinna Mandl, Siddharth Chaturvedi, Marcel van Gerven · Recent growth in compute power has made it increasingly feasible to use large-scale agent-based models to simulate complex adaptive systems. A central difficulty is that such models contain many local rules and parameters, where small…
- 24. The Stable Recovery Manifold: Geometric Principles Governing Recoverability in Continual Learningcs.LG
Ayushman Trivedi, Bhavika Melwani · Catastrophic forgetting is often viewed as the destruction of previously learned knowledge during sequential learning. Building on the Accessibility Collapse framework, we investigate the geometric structure of recoverability in continual…
- 25. Operads for compositional reasoning in LLMscs.CL
Nathaniel Bottman, Kyle Richardson · Question decomposition, i.e. breaking a complex query into simpler sub-queries whose answers are composed to produce a final answer, is a widely used strategy for improving LLM reasoning, yet it currently lacks a rigorous mathematical…
- 26. Aerial Wildfire Suppression Planning with a Hybrid CNN-Cellular Automata Fire Modeleess.SY
Ion Matei, Maksym Zhenirovskyy, Takuya Kurihana, Rohit Vupala · Aerial wildfire suppression requires not only predicting fire spread, but also designing effective intervention strategies under operational and environmental uncertainty. We present a modeling and optimization framework for aerial…
- 27. Beyond Virtual Delay: Improving Packet Delay Bound in Network Calculuscs.PF
Yuming Jiang · In network calculus, a fundamental result is the classical delay bound given by the horizontal deviation between the arrival and service curves. While widely used, the classical bound is derived from the notion of virtual delay. In this…
- 28. From Tokens to Faces: Investigating Discrete Speech Representations for 3D Facial Animationcs.CL
Pedro Correa, Olivier Perrotin, Samir Sadok, Paula Costa · The choice of speech representation is critical in speech-driven 3D facial animation. Representations differ in what they encode: SSL features emphasize segmental and semantic cues, neural codecs yield latents optimized for acoustic…
- 29. Valid Inference with Synthetic Data via Task Exchangeabilitystat.ME
Lezhi Tan, Tijana Zrnic · There is a proliferation of work arguing for the use of synthetic data in scientific research. For example, social scientists are arguing for the use of LLM-generated "silicon samples" in pilot studies; AI evaluations increasingly rely on…
- 30. Generative Modeling of Bach-Style Symbolic Music: A Comparative Study of Autoregressive, Latent-Variable, and Adversarial Approachescs.SD
Kyuil Lee, Dezhi Yu, Yongkang Huang · We study generative modeling of Bach-style symbolic piano music using a shared MIDI corpus and three model families: autoregressive LSTMs with attention, latent-variable models including recurrent VAEs and vector-quantized VAEs, and…