110,784 articles

📰 Reads

110,784 articles · Updated every 3 hours

All ⚡ AI Lessons (16091) ArXiv cs.AIDev.to AIDev.to · FORUM WEBForbes InnovationMedium · ProgrammingMedium · AI
ArXiv cs.AI 📄 Paper 10h ago
User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation
arXiv:2501.04410v2 Announce Type: replace Abstract: User simulation is an emerging interdisciplinary topic with multiple critical applications in the era of Gen
ArXiv cs.AI 📄 Paper 10h ago
Epistemic Skills: Reasoning about Knowledge and Oblivion
arXiv:2504.01733v4 Announce Type: replace Abstract: This paper presents a class of epistemic logics that captures the dynamics of acquiring knowledge and descen
ArXiv cs.AI 📄 Paper 10h ago
Memory Assignment for Finite-Memory Strategies in Adversarial Patrolling Games
arXiv:2505.14137v2 Announce Type: replace Abstract: Adversarial Patrolling games form a subclass of Security games where a Defender moves between locations, gua
ArXiv cs.AI 📄 Paper 10h ago
MRS: Multi-Resolution Skills for HRL Agents
arXiv:2505.21410v2 Announce Type: replace Abstract: Hierarchical reinforcement learning (HRL) decomposes the policy into a manager and a worker, enabling long-h
ArXiv cs.AI 📄 Paper 10h ago
SEAT: Sparse Entity-Aware Tuning for Knowledge Adaptation while Preserving Epistemic Abstention
arXiv:2506.14387v3 Announce Type: replace Abstract: Adapting LLMs with new knowledge is increasingly important, but standard fine-tuning often erodes aligned ep
ArXiv cs.AI 📄 Paper 10h ago
GRAIL:Learning to Interact with Large Knowledge Graphs for Retrieval Augmented Reasoning
arXiv:2508.05498v2 Announce Type: replace Abstract: Large Language Models (LLMs) integrated with Retrieval-Augmented Generation (RAG) techniques have exhibited
ArXiv cs.AI 📄 Paper 10h ago
GeoLaux: A Benchmark for Evaluating MLLMs' Geometry Performance on Long-Step Problems Requiring Auxiliary Lines
arXiv:2508.06226v2 Announce Type: replace Abstract: Geometry problem solving (GPS) poses significant challenges for Multimodal Large Language Models (MLLMs) in
ArXiv cs.AI 📄 Paper 10h ago
VideoAgent: Personalized Synthesis of Scientific Videos
arXiv:2509.11253v2 Announce Type: replace Abstract: The technical complexity of research papers often limits their reach, necessitating more accessible formats
ArXiv cs.AI 📄 Paper 10h ago
RepIt: Steering Language Models with Concept-Specific Refusal Vectors
arXiv:2509.13281v5 Announce Type: replace Abstract: Current safety evaluations of language models rely on benchmark-based assessments that may miss localized vu
ArXiv cs.AI 📄 Paper 10h ago
How to Teach Large Multimodal Models New Skills
arXiv:2510.08564v2 Announce Type: replace Abstract: How can we teach large multimodal models (LMMs) new skills without erasing prior abilities? We study sequent
ArXiv cs.AI 📄 Paper 10h ago
StepFly: Agentic Troubleshooting Guide Automation for Incident Diagnosis
arXiv:2510.10074v2 Announce Type: replace Abstract: Effective incident management in large-scale IT systems relies on troubleshooting guides (TSGs), but their m
ArXiv cs.AI 📄 Paper 10h ago
Chain-of-Thought as a Lens: Evaluating Structured Reasoning Alignment between Human Preferences and Large Language Models
arXiv:2511.06168v3 Announce Type: replace Abstract: This paper primarily demonstrates a method to quantitatively assess the alignment between multi-step, struct
ArXiv cs.AI 📄 Paper 10h ago
TROJail: Trajectory-Level Optimization for Multi-Turn Large Language Model Jailbreaks with Process Rewards
arXiv:2512.07761v3 Announce Type: replace Abstract: Large language models have seen widespread adoption, yet they remain vulnerable to multi-turn jailbreak atta
ArXiv cs.AI 📄 Paper 10h ago
Beyond Itinerary Planning-A Real-World Benchmark for Multi-Turn and Tool-Using Travel Tasks
arXiv:2512.22673v3 Announce Type: replace Abstract: Travel planning is a natural real-world task to test large language models' (LLMs) planning and tool-use abi
ArXiv cs.AI 📄 Paper 10h ago
SAGE-32B: Agentic Reasoning via Iterative Distillation
arXiv:2601.04237v2 Announce Type: replace Abstract: We demonstrate SAGE-32B, a 32 billion parameter language model that focuses on agentic reasoning and long ra
ArXiv cs.AI 📄 Paper 10h ago
Reasoning Over Space: Enabling Geographic Reasoning for LLM-Based Generative Next POI Recommendation
arXiv:2601.04562v2 Announce Type: replace Abstract: Generative recommendation with large language models (LLMs) reframes prediction as sequence generation, yet
ArXiv cs.AI 📄 Paper 10h ago
ViDoRe V3: A Comprehensive Evaluation of Retrieval Augmented Generation in Complex Real-World Scenarios
arXiv:2601.08620v2 Announce Type: replace Abstract: Retrieval-Augmented Generation (RAG) pipelines must address challenges beyond simple single-document retriev
ArXiv cs.AI 📄 Paper 10h ago
BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search
arXiv:2601.11037v2 Announce Type: replace Abstract: RL-based agentic search enables LLMs to solve complex questions via dynamic planning and external search. Wh
ArXiv cs.AI 📄 Paper 10h ago
Failure Modes in Multi-Hop QA: The Weakest Link Effect and the Recognition Bottleneck
arXiv:2601.12499v2 Announce Type: replace Abstract: Despite scaling to massive context windows, Large Language Models (LLMs) struggle with multi-hop reasoning d
ArXiv cs.AI 📄 Paper 10h ago
Sentipolis: Emotion-Aware Agents for Social Simulations
arXiv:2601.18027v2 Announce Type: replace Abstract: LLM agents are increasingly used for social simulation, yet emotion is often treated as a transient cue, cau