⚡ AI-Lesson Articles

5,330 articles · Updated every 3 hours · View all news

arXiv:2603.24402v2 Announce Type: new Abstract: Existing automated research systems operate as stateless, linear pipelines -- generating outputs without maintai

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Multi-Agent Reasoning with Consistency Verification Improves Uncertainty Calibration in Medical MCQA

arXiv:2603.24481v1 Announce Type: new Abstract: Miscalibrated confidence scores are a practical obstacle to deploying AI in clinical settings. A model that is a

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

From Liar Paradox to Incongruent Sets: A Normal Form for Self-Reference

arXiv:2603.24527v1 Announce Type: new Abstract: We introduce incongruent normal form (INF), a structural representation for self-referential semantic sentences.

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Completeness of Unbounded Best-First Minimax and Descent Minimax

arXiv:2603.24572v1 Announce Type: new Abstract: In this article, we focus on search algorithms for two-player perfect information games, whose objective is to d

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

The Stochastic Gap: A Markovian Framework for Pre-Deployment Reliability and Oversight-Cost Auditing in Agentic Artificial Intelligence

arXiv:2603.24582v1 Announce Type: new Abstract: Agentic artificial intelligence (AI) in organizations is a sequential decision problem constrained by reliabilit

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Inspection and Control of Self-Generated-Text Recognition Ability in Llama3-8b-Instruct

arXiv:2410.02064v3 Announce Type: cross Abstract: It has been reported that LLMs can recognize their own writing. As this has potential implications for AI safe

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Mitigating Many-Shot Jailbreaking

arXiv:2504.09604v3 Announce Type: cross Abstract: Many-shot jailbreaking (MSJ) is an adversarial technique that exploits the long context windows of modern LLMs

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Evidence for Limited Metacognition in LLMs

arXiv:2509.21545v2 Announce Type: cross Abstract: The possibility of LLM self-awareness and even sentience is gaining increasing public attention and has major

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Leveraging Computerized Adaptive Testing for Cost-effective Evaluation of Large Language Models in Medical Benchmarking

arXiv:2603.23506v1 Announce Type: cross Abstract: The rapid proliferation of large language models (LLMs) in healthcare creates an urgent need for scalable and

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Beyond Masks: Efficient, Flexible Diffusion Language Models via Deletion-Insertion Processes

arXiv:2603.23507v1 Announce Type: cross Abstract: While Masked Diffusion Language Models (MDLMs) relying on token masking and unmasking have shown promise in la

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Internal Safety Collapse in Frontier Large Language Models

arXiv:2603.23509v1 Announce Type: cross Abstract: This work identifies a critical failure mode in frontier large language models (LLMs), which we term Internal

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Visuospatial Perspective Taking in Multimodal Language Models

arXiv:2603.23510v1 Announce Type: cross Abstract: As multimodal language models (MLMs) are increasingly used in social and collaborative settings, it is crucial

ArXiv cs.AI 📄 Paper ⚡ AI Lesson 1w ago

DISCO: Document Intelligence Suite for COmparative Evaluation

arXiv:2603.23511v1 Announce Type: cross Abstract: Document intelligence requires accurate text extraction and reliable reasoning over document content. We intro

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

S-Path-RAG: Semantic-Aware Shortest-Path Retrieval Augmented Generation for Multi-Hop Knowledge Graph Question Answering

arXiv:2603.23512v1 Announce Type: cross Abstract: We present S-Path-RAG, a semantic-aware shortest-path Retrieval-Augmented Generation framework designed to imp

ArXiv cs.AI 📄 Paper ⚡ AI Lesson 1w ago

Berta: an open-source, modular tool for AI-enabled clinical documentation

arXiv:2603.23513v1 Announce Type: cross Abstract: Commercial AI scribes cost \$99-600 per physician per month, operate as opaque systems, and do not return data

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

DepthCharge: A Domain-Agnostic Framework for Measuring Depth-Dependent Knowledge in Large Language Models

arXiv:2603.23514v1 Announce Type: cross Abstract: Large Language Models appear competent when answering general questions but often fail when pushed into domain

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Training a Large Language Model for Medical Coding Using Privacy-Preserving Synthetic Clinical Data

arXiv:2603.23515v1 Announce Type: cross Abstract: Improving the accuracy and reliability of medical coding reduces clinician burnout and supports revenue cycle

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

arXiv:2603.23516v1 Announce Type: cross Abstract: Long-term memory is a cornerstone of human intelligence. Enabling AI to process lifetime-scale information rem

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Beyond Accuracy: Introducing a Symbolic-Mechanistic Approach to Interpretable Evaluation

arXiv:2603.23517v1 Announce Type: cross Abstract: Accuracy-based evaluation cannot reliably distinguish genuine generalization from shortcuts like memorization,

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Cluster-R1: Large Reasoning Models Are Instruction-following Clustering Agents

arXiv:2603.23518v1 Announce Type: cross Abstract: General-purpose embedding models excel at recognizing semantic similarities but fail to capture the characteri

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

MedMT-Bench: Can LLMs Memorize and Understand Long Multi-Turn Conversations in Medical Scenarios?

arXiv:2603.23519v1 Announce Type: cross Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities across various specialist domains and h

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

From Physician Expertise to Clinical Agents: Preserving, Standardizing, and Scaling Physicians' Medical Expertise with Lightweight LLM

arXiv:2603.23520v1 Announce Type: cross Abstract: Medicine is an empirical discipline refined through long-term observation and the messy, high-variance reality

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Chitrakshara: A Large Multilingual Multimodal Dataset for Indian languages

arXiv:2603.23521v1 Announce Type: cross Abstract: Multimodal research has predominantly focused on single-image reasoning, with limited exploration of multi-ima

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1w ago

Qworld: Question-Specific Evaluation Criteria for LLMs

arXiv:2603.23522v1 Announce Type: cross Abstract: Evaluating large language models (LLMs) on open-ended questions is difficult because response quality depends