AI News — Latest Developments & Breakthroughs

ArXiv cs.AI 📄 Paper 1d ago

Compositional Image Synthesis with Inference-Time Scaling

arXiv:2510.24133v2 Announce Type: replace-cross Abstract: Despite their impressive realism, modern text-to-image models still struggle with compositionality, of

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1d ago

GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding

arXiv:2511.00810v3 Announce Type: replace-cross Abstract: Graphical user interface (GUI) grounding is a key capability for computer-use agents, mapping natural-

ArXiv cs.AI 📄 Paper 1d ago

Causal Graph Neural Networks for Healthcare

arXiv:2511.02531v5 Announce Type: replace-cross Abstract: Healthcare artificial intelligence systems often degrade in performance when deployed across instituti

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1d ago

Route Experts by Sequence, not by Token

arXiv:2511.06494v2 Announce Type: replace-cross Abstract: Mixture-of-Experts (MoE) architectures scale large language models (LLMs) by activating only a subset

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1d ago

Binary Verification for Zero-Shot Vision

arXiv:2511.10983v2 Announce Type: replace-cross Abstract: We propose a training-free, binary verification workflow for zero-shot vision with off-the-shelf VLMs.

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1d ago

Any4D: Open-Prompt 4D Generation from Natural Language and Images

arXiv:2511.18746v2 Announce Type: replace-cross Abstract: While video-generation-based embodied world models have gained increasing attention, their reliance on

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1d ago

Aligning LLMs with Biomedical Knowledge using Balanced Fine-Tuning

arXiv:2511.21075v2 Announce Type: replace-cross Abstract: Aligning Large Language Models (LLMs) with biomedical knowledge requires understanding both concepts a

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1d ago

StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos

arXiv:2512.01707v2 Announce Type: replace-cross Abstract: Streaming video understanding requires models not only to process temporally incoming frames, but also

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1d ago

WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning

arXiv:2512.02425v2 Announce Type: replace-cross Abstract: Recent advances in video large language models have demonstrated strong capabilities in understanding

ArXiv cs.AI 📄 Paper ⚡ AI Lesson 1d ago

Fluent Alignment with Disfluent Judges: Post-training for Lower-resource Languages

arXiv:2512.08777v2 Announce Type: replace-cross Abstract: We propose a post-training method for lower-resource languages that preserves the fluency of language

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1d ago

Particulate: Feed-Forward 3D Object Articulation

arXiv:2512.11798v2 Announce Type: replace-cross Abstract: We introduce Particulate, a feed-forward model that, given a 3D mesh of an object, infers its articula

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1d ago

Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

arXiv:2512.13607v2 Announce Type: replace-cross Abstract: Building general-purpose reasoning models with reinforcement learning (RL) entails substantial cross-d

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1d ago

SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations

arXiv:2512.14080v2 Announce Type: replace-cross Abstract: Mixture of Experts (MoE) models have emerged as the de facto architecture for scaling up language mode

ArXiv cs.AI 📄 Paper ⚡ AI Lesson 1d ago

PathFinder: Advancing Path Loss Prediction for Single-to-Multi-Transmitter Scenario

arXiv:2512.14150v3 Announce Type: replace-cross Abstract: Radio path loss prediction (RPP) is critical for optimizing 5G networks and enabling IoT, smart city,

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1d ago

Dual-objective Language Models: Training Efficiency Without Overfitting

arXiv:2512.14549v3 Announce Type: replace-cross Abstract: This paper combines autoregressive and masked-diffusion training objectives without any architectural

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1d ago

MRG-R1: Reinforcement Learning for Clinically Aligned Medical Report Generation

arXiv:2512.16145v2 Announce Type: replace-cross Abstract: Medical report generation aims to automatically produce radiology-style reports from medical images, s

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1d ago

Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs

arXiv:2512.16378v3 Announce Type: replace-cross Abstract: As Large Language Models (LLMs) expand beyond text, integrating speech as a native modality has given

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1d ago

The Dual-State Architecture for Reliable LLM Agents

arXiv:2512.20660v2 Announce Type: replace-cross Abstract: Large Language Models deployed as code generation agents exhibit stochastic behavior incompatible with

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1d ago

RoAD Benchmark: How LiDAR Models Fail under Coupled Domain Shifts and Label Evolution

arXiv:2601.07855v2 Announce Type: replace-cross Abstract: For 3D perception systems to operate reliably in real-world environments, they must remain robust to e

ArXiv cs.AI 📄 Paper ⚡ AI Lesson 1d ago

Incorporating Q&A Nuggets into Retrieval-Augmented Generation

arXiv:2601.13222v2 Announce Type: replace-cross Abstract: RAGE systems integrate ideas from automatic evaluation (E) into Retrieval-augmented Generation (RAG).

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1d ago

Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

arXiv:2601.13227v2 Announce Type: replace-cross Abstract: RAG systems are increasingly evaluated and optimized using LLM judges, an approach that is rapidly bec

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 1d ago

CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

arXiv:2601.13622v3 Announce Type: replace-cross Abstract: Large vision-language models (LVLMs) are typically trained using autoregressive language modeling obje

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1d ago

NRR-Phi: Text-to-State Mapping for Ambiguity Preservation in LLM Inference

arXiv:2601.19933v5 Announce Type: replace-cross Abstract: Large language models exhibit a systematic tendency toward early semantic commitment: given ambiguous

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 1d ago

AI and My Values: User Perceptions of LLMs' Ability to Extract, Embody, and Explain Human Values from Casual Conversations

arXiv:2601.22440v2 Announce Type: replace-cross Abstract: Does AI understand human values? While this remains an open philosophical question, we take a pragmati

📰 ArXiv cs.AI