⚡ AI-Lesson Articles
5,330 articles · Updated every 3 hours · View all news
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
AI-Supervisor: Autonomous AI Research Supervision via a Persistent Research World Model
arXiv:2603.24402v2 Announce Type: new Abstract: Existing automated research systems operate as stateless, linear pipelines -- generating outputs without maintai
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
Multi-Agent Reasoning with Consistency Verification Improves Uncertainty Calibration in Medical MCQA
arXiv:2603.24481v1 Announce Type: new Abstract: Miscalibrated confidence scores are a practical obstacle to deploying AI in clinical settings. A model that is a
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
From Liar Paradox to Incongruent Sets: A Normal Form for Self-Reference
arXiv:2603.24527v1 Announce Type: new Abstract: We introduce incongruent normal form (INF), a structural representation for self-referential semantic sentences.
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
Completeness of Unbounded Best-First Minimax and Descent Minimax
arXiv:2603.24572v1 Announce Type: new Abstract: In this article, we focus on search algorithms for two-player perfect information games, whose objective is to d
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
The Stochastic Gap: A Markovian Framework for Pre-Deployment Reliability and Oversight-Cost Auditing in Agentic Artificial Intelligence
arXiv:2603.24582v1 Announce Type: new Abstract: Agentic artificial intelligence (AI) in organizations is a sequential decision problem constrained by reliabilit
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
Inspection and Control of Self-Generated-Text Recognition Ability in Llama3-8b-Instruct
arXiv:2410.02064v3 Announce Type: cross Abstract: It has been reported that LLMs can recognize their own writing. As this has potential implications for AI safe
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
Mitigating Many-Shot Jailbreaking
arXiv:2504.09604v3 Announce Type: cross Abstract: Many-shot jailbreaking (MSJ) is an adversarial technique that exploits the long context windows of modern LLMs
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
Evidence for Limited Metacognition in LLMs
arXiv:2509.21545v2 Announce Type: cross Abstract: The possibility of LLM self-awareness and even sentience is gaining increasing public attention and has major
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
Leveraging Computerized Adaptive Testing for Cost-effective Evaluation of Large Language Models in Medical Benchmarking
arXiv:2603.23506v1 Announce Type: cross Abstract: The rapid proliferation of large language models (LLMs) in healthcare creates an urgent need for scalable and
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
Beyond Masks: Efficient, Flexible Diffusion Language Models via Deletion-Insertion Processes
arXiv:2603.23507v1 Announce Type: cross Abstract: While Masked Diffusion Language Models (MDLMs) relying on token masking and unmasking have shown promise in la
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
Internal Safety Collapse in Frontier Large Language Models
arXiv:2603.23509v1 Announce Type: cross Abstract: This work identifies a critical failure mode in frontier large language models (LLMs), which we term Internal
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
Visuospatial Perspective Taking in Multimodal Language Models
arXiv:2603.23510v1 Announce Type: cross Abstract: As multimodal language models (MLMs) are increasingly used in social and collaborative settings, it is crucial
ArXiv cs.AI
📄 Paper
⚡ AI Lesson
1w ago
DISCO: Document Intelligence Suite for COmparative Evaluation
arXiv:2603.23511v1 Announce Type: cross Abstract: Document intelligence requires accurate text extraction and reliable reasoning over document content. We intro
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
S-Path-RAG: Semantic-Aware Shortest-Path Retrieval Augmented Generation for Multi-Hop Knowledge Graph Question Answering
arXiv:2603.23512v1 Announce Type: cross Abstract: We present S-Path-RAG, a semantic-aware shortest-path Retrieval-Augmented Generation framework designed to imp
ArXiv cs.AI
📄 Paper
⚡ AI Lesson
1w ago
Berta: an open-source, modular tool for AI-enabled clinical documentation
arXiv:2603.23513v1 Announce Type: cross Abstract: Commercial AI scribes cost \$99-600 per physician per month, operate as opaque systems, and do not return data
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
DepthCharge: A Domain-Agnostic Framework for Measuring Depth-Dependent Knowledge in Large Language Models
arXiv:2603.23514v1 Announce Type: cross Abstract: Large Language Models appear competent when answering general questions but often fail when pushed into domain
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
Training a Large Language Model for Medical Coding Using Privacy-Preserving Synthetic Clinical Data
arXiv:2603.23515v1 Announce Type: cross Abstract: Improving the accuracy and reliability of medical coding reduces clinician burnout and supports revenue cycle
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens
arXiv:2603.23516v1 Announce Type: cross Abstract: Long-term memory is a cornerstone of human intelligence. Enabling AI to process lifetime-scale information rem
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
Beyond Accuracy: Introducing a Symbolic-Mechanistic Approach to Interpretable Evaluation
arXiv:2603.23517v1 Announce Type: cross Abstract: Accuracy-based evaluation cannot reliably distinguish genuine generalization from shortcuts like memorization,
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
Cluster-R1: Large Reasoning Models Are Instruction-following Clustering Agents
arXiv:2603.23518v1 Announce Type: cross Abstract: General-purpose embedding models excel at recognizing semantic similarities but fail to capture the characteri
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
MedMT-Bench: Can LLMs Memorize and Understand Long Multi-Turn Conversations in Medical Scenarios?
arXiv:2603.23519v1 Announce Type: cross Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities across various specialist domains and h
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
From Physician Expertise to Clinical Agents: Preserving, Standardizing, and Scaling Physicians' Medical Expertise with Lightweight LLM
arXiv:2603.23520v1 Announce Type: cross Abstract: Medicine is an empirical discipline refined through long-term observation and the messy, high-variance reality
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
Chitrakshara: A Large Multilingual Multimodal Dataset for Indian languages
arXiv:2603.23521v1 Announce Type: cross Abstract: Multimodal research has predominantly focused on single-image reasoning, with limited exploration of multi-ima
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
1w ago
Qworld: Question-Specific Evaluation Criteria for LLMs
arXiv:2603.23522v1 Announce Type: cross Abstract: Evaluating large language models (LLMs) on open-ended questions is difficult because response quality depends
DeepCamp AI