Papers
- arXiv · 2604.11806 · score 9.0
Detecting Safety Violations Across Many Agent Traces
The abstract directly addresses the critical intersection of safety auditing and evaluation for agentic systems by proposing a novel method to detect complex, multi-trace violations that existing benchmarks miss.
- arXiv · 2604.11791 · score 9.0
A Mechanistic Analysis of Looped Reasoning Language Models
The abstract provides a deep mechanistic analysis of a novel LLM architecture (looped reasoning) and its inference dynamics, directly addressing architectural design and optimization strategies.
- arXiv · 2604.11790 · score 9.0
ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection
The abstract directly addresses a critical safety vulnerability (indirect prompt injection) specific to agentic systems and proposes a novel runtime framework that enforces security at the tool-call boundary without altering the underlying LLM architecture.
- arXiv · 2604.11784 · score 9.0
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents
The abstract details a comprehensive framework for GUI agents covering RL training, standardized evaluation, persistent memory, and deployment across devices, directly addressing core topics in agentic systems and training methods.
- arXiv · 2604.11759 · score 9.0
Retrieval Is Not Enough: Why Organizational AI Needs Epistemic Infrastructure
The abstract directly addresses RAG limitations by proposing a novel epistemic memory structure and a new evaluation methodology for agentic systems, though it focuses on knowledge representation rather than core LLM architecture or inference optimization.
- arXiv · 2604.11753 · score 9.0
Agentic Aggregation for Parallel Scaling of Long-Horizon Agentic Tasks
The abstract directly addresses parallel test-time scaling for agentic tasks, proposing a novel aggregation mechanism that optimizes inference efficiency and coordinates multiple trajectories, fitting the core themes of agentic systems and inference optimization.
- arXiv · 2604.11716 · score 9.0
SWE-AGILE: A Software Agent Framework for Efficiently Managing Dynamic Reasoning Context
The abstract directly addresses the core challenges of agentic systems in software engineering by proposing a novel memory management strategy (sliding window and reasoning digests) to optimize inference efficiency and context handling.
- arXiv · 2305.14314 · score 9.0
QLoRA: Efficient Finetuning of Quantized LLMs
Landmark paper on memory-efficient LLM finetuning — highly relevant.
- arXiv · 2604.11721 · score 8.0
Evaluating Cooperation in LLM Social Groups through Elected Leadership
The abstract directly addresses multi-agent coordination and agentic systems by proposing and evaluating leadership mechanisms to improve social welfare in LLM simulations.
- arXiv · 2604.11699 · score 8.0
Legal2LogicICL: Improving Generalization in Transforming Legal Cases to Logical Formulas via Diverse Few-Shot Learning
The paper directly addresses RAG for few-shot learning, proposes a novel LLM architecture for legal reasoning, introduces a new evaluation dataset, and discusses training-free methods, though it lacks focus on agentic systems, memory, or safety.
- arXiv · 2604.11805 · score 7.0
Solving Physics Olympiad via Reinforcement Learning on Physics Simulators
The paper focuses on a novel training methodology using physics simulators to enhance LLM reasoning, which directly addresses training methods and implicitly influences architecture capabilities.
- arXiv · 2604.11801 · score 7.0
CLSGen: A Dual-Head Fine-Tuning Framework for Joint Probabilistic Classification and Verbalized Explanation
The abstract proposes a novel fine-tuning framework and architecture for LLMs to improve probability estimation and evaluation metrics while preserving explanation capabilities.
- arXiv · 2604.11748 · score 7.0
LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling
The paper introduces a novel continuous diffusion architecture for language modeling with specific training innovations, directly addressing LLM architecture and training methods while offering implications for inference optimization.
- arXiv · 2604.11741 · score 7.0
Collaborative Multi-Agent Scripts Generation for Enhancing Imperfect-Information Reasoning in Murder Mystery Games
The paper directly addresses multi-agent coordination, training methods (GRPO, fine-tuning), and evaluation of VLMs in complex reasoning scenarios, though it focuses less on RAG, memory, or safety.
- arXiv · 2604.11703 · score 7.0
DreamKG: A KG-Augmented Conversational System for People Experiencing Homelessness
The abstract describes a hybrid RAG system integrating knowledge graphs with LLMs for reliable, context-aware inference, directly addressing architecture and optimization for specific query types.
- arXiv · 2604.11666 · score 7.0
Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind
The paper introduces a novel safety-focused evaluation benchmark for theory-of-mind and adversarial deception, utilizing reinforcement learning to train models, but does not address architecture, agentic systems, RAG, memory, inference optimization, or multi-agent coordination.