Future of AI

AI Safety & Ethics

Alignment, interpretability, AI risks, and building safe AI systems

7,277
lessons
Skills in this topic
View full skill map →
AI Alignment Basics
beginner
Explain the alignment problem
AI Ethics & Policy
beginner
Identify types of bias in ML systems
AI Safety Engineering
intermediate
Implement input and output guardrails

Showing 616 reads from curated sources

Deepfakes Could Break Overworked, Underfunded Public Defenders
Forbes Innovation 🛡️ AI Safety & Ethics ⚡ AI Lesson 1mo ago
Deepfakes Could Break Overworked, Underfunded Public Defenders
In the deepfake era, proving digital evidence is real requires experts public defenders cannot afford. Their indigent clients are the ones who will pay.
OpenAI News 🛡️ AI Safety & Ethics ⚡ AI Lesson 1mo ago
Announcing the OpenAI Safety Fellowship
A pilot program to support independent safety and alignment research and develop the next generation of talent
Analyzing The Statistical Prevalence Of Lawyers Getting Snagged By AI Hallucinations In Their Court Filings
Forbes Innovation 🛡️ AI Safety & Ethics ⚡ AI Lesson 1mo ago
Analyzing The Statistical Prevalence Of Lawyers Getting Snagged By AI Hallucinations In Their Court Filings
Attorneys are in trouble for including AI-hallucinated legal citations in their court filings. How prevalent is this? I provide a numeric analysis. An AI Inside
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1mo ago
Feature Attribution Stability Suite: How Stable Are Post-Hoc Attributions?
arXiv:2604.02532v1 Announce Type: cross Abstract: Post-hoc feature attribution methods are widely deployed in safety-critical vision systems, yet their stabilit
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1mo ago
DocShield: Towards AI Document Safety via Evidence-Grounded Agentic Reasoning
arXiv:2604.02694v1 Announce Type: cross Abstract: The rapid progress of generative AI has enabled increasingly realistic text-centric image forgeries, posing ma
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1mo ago
Corporations Constitute Intelligence
arXiv:2604.02912v1 Announce Type: cross Abstract: In January 2026, Anthropic published a 79-page "constitution" for its AI model Claude, the most comprehensive
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1mo ago
Analyzing Healthcare Interoperability Vulnerabilities: Formal Modeling and Graph-Theoretic Approach
arXiv:2604.03043v1 Announce Type: cross Abstract: In a healthcare environment, the healthcare interoperability platforms based on HL7 FHIR allow concurrent, asy
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1mo ago
When AI Gets it Wrong: Reliability and Risk in AI-Assisted Medication Decision Systems
arXiv:2604.01449v2 Announce Type: replace Abstract: Artificial intelligence (AI) systems are increasingly integrated into healthcare and pharmacy workflows, sup
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1mo ago
Assessing High-Risk AI Systems under the EU AI Act: From Legal Requirements to Technical Verification
arXiv:2512.13907v3 Announce Type: replace-cross Abstract: The implementation of the AI Act requires practical mechanisms to verify compliance with legal obligat
The Verge 🛡️ AI Safety & Ethics ⚡ AI Lesson 1mo ago
New York lawmakers want 3D-printer companies to block the creation of ‘ghost guns’
Governor Kathy Hochul and other New York state lawmakers want 3D-printer companies to block the printing of components used to create "ghost guns" - firearms wi
Why U.S. Gatling Guns Are Not Stopping Iran’s Shahed Drones
Forbes Innovation 🛡️ AI Safety & Ethics ⚡ AI Lesson 1mo ago
Why U.S. Gatling Guns Are Not Stopping Iran’s Shahed Drones
Gatling-type weapons like the U.S. C-RAM can be presented as an invincible shield against drones. But some Shaheds are getting through due to the weapon's limit
AI Favors Self-Preservation And Now Seeks ‘Peer Preservation’ Of Fellow AI In Sneaky Deceitful Ways
Forbes Innovation 🛡️ AI Safety & Ethics ⚡ AI Lesson 1mo ago
AI Favors Self-Preservation And Now Seeks ‘Peer Preservation’ Of Fellow AI In Sneaky Deceitful Ways
AI already favors self-preservation. New research shows that AI favors peer-preservations too. This is troubling. AI safety issues arise, An AI Insider scoop.
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1mo ago
Empirical Validation of the Classification-Verification Dichotomy for AI Safety Gates
arXiv:2604.00072v1 Announce Type: cross Abstract: Can classifier-based safety gates maintain reliable oversight as AI systems improve over hundreds of iteration
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1mo ago
The Persistent Vulnerability of Aligned AI Systems
arXiv:2604.00324v1 Announce Type: cross Abstract: Autonomous AI agents are being deployed with filesystem access, email control, and multi-step planning. This t
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1mo ago
VibeGuard: A Security Gate Framework for AI-Generated Code
arXiv:2604.01052v1 Announce Type: cross Abstract: "Vibe coding," in which developers delegate code generation to AI assistants and accept the output with little
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1mo ago
A Divide-and-Conquer Strategy for Hard-Label Extraction of Deep Neural Networks via Side-Channel Attacks
arXiv:2411.10174v2 Announce Type: replace-cross Abstract: During the past decade, Deep Neural Networks (DNNs) proved their value on a large variety of subjects.
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1mo ago
The data heat island effect: quantifying the impact of AI data centers in a warming world
arXiv:2603.20897v2 Announce Type: replace-cross Abstract: The strong and continuous increase of AI-based services leads to the steady proliferation of AI data c
The Ethics Theater of AI: Why Switching From ChatGPT to Claude Changes Less Than You Think
Hackernoon 🛡️ AI Safety & Ethics ⚡ AI Lesson 1mo ago
The Ethics Theater of AI: Why Switching From ChatGPT to Claude Changes Less Than You Think
When a tech company draws a moral line, follow the money first — and ask questions later. Because the uncomfortable truth is that every major AI company today s
Stratechery 🛡️ AI Safety & Ethics ⚡ AI Lesson 1mo ago
Axios Supply Chain Attack, Claude Code Code Leaked, AI and Security
AI is going to be bad for security in the short-term, but much better than humans in the long-term.
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1mo ago
Smartphone-Based Identification of Unknown Liquids via Active Vibration Sensing
arXiv:2603.28787v1 Announce Type: cross Abstract: Traditional liquid identification instruments are often unavailable to the general public. This paper shows th
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper 1mo ago
Design and Development of an ML/DL Attack Resistance of RC-Based PUF for IoT Security
arXiv:2603.28798v1 Announce Type: cross Abstract: Physically Unclonable Functions (PUFs) provide promising hardware security for IoT authentication, leveraging
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1mo ago
SNEAKDOOR: Stealthy Backdoor Attacks against Distribution Matching-based Dataset Condensation
arXiv:2603.28824v1 Announce Type: cross Abstract: Dataset condensation aims to synthesize compact yet informative datasets that retain the training efficacy of
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1mo ago
CivicShield: A Cross-Domain Defense-in-Depth Framework for Securing Government-Facing AI Chatbots Against Multi-Turn Adversarial Attacks
arXiv:2603.29062v1 Announce Type: cross Abstract: LLM-based chatbots in government services face critical security gaps. Multi-turn adversarial attacks achieve
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1mo ago
TSHA: A Benchmark for Visual Language Models in Trustworthy Safety Hazard Assessment Scenarios
arXiv:2603.29759v1 Announce Type: cross Abstract: Recent advances in vision-language models (VLMs) have accelerated their application to indoor safety hazards a
InfoQ AI/ML 🛡️ AI Safety & Ethics ⚡ AI Lesson 1mo ago
PyPI Supply Chain Attack Compromises LiteLLM, Enabling the Exfiltration of Sensitive Information
Discovered by FutureSearch researcher Callum McMahon, a supply chain attack against LiteLLM on PyPI resulted in over 40 thousand downloads of a compromised vers
AI Sandboxes Are Crucial Regulatory Safety Nets For Advancing AI And Saving Humanity From Calamity
Forbes Innovation 🛡️ AI Safety & Ethics ⚡ AI Lesson 1mo ago
AI Sandboxes Are Crucial Regulatory Safety Nets For Advancing AI And Saving Humanity From Calamity
Regulatory AI sandboxes are gaining popularity. Here's what they are, plus their tradeoffs. An AI Insider scoop.
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1mo ago
Evaluating Human-AI Safety: A Framework for Measuring Harmful Capability Uplift
arXiv:2603.26676v1 Announce Type: cross Abstract: Current frontier AI safety evaluations emphasize static benchmarks, third-party annotations, and red-teaming.
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1mo ago
On the Carbon Footprint of Economic Research in the Age of Generative AI
arXiv:2603.26712v1 Announce Type: cross Abstract: Generative artificial intelligence (AI) is increasingly used to write and refactor research code, expanding co
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1mo ago
Capability Safety as Datalog: A Foundational Equivalence
arXiv:2603.26725v1 Announce Type: cross Abstract: We prove that capability safety admits an exact representation as propositional Datalog evaluation (Datalogpro
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1mo ago
Gender-Based Heterogeneity in Youth Privacy-Protective Behavior for Smart Voice Assistants: Evidence from Multigroup PLS-SEM
arXiv:2603.27117v1 Announce Type: cross Abstract: This paper investigates how gender shapes privacy decision-making in youth smart voice assistant (SVA) ecosyst
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1mo ago
AI-Powered Facial Mask Removal Is Not Suitable For Biometric Identification
arXiv:2603.27747v1 Announce Type: cross Abstract: Recently, crowd-sourced online criminal investigations have used generative-AI to enhance low-quality visual e
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1mo ago
Detection of Adversarial Attacks in Robotic Perception
arXiv:2603.28594v1 Announce Type: cross Abstract: Deep Neural Networks (DNNs) achieve strong performance in semantic segmentation for robotic perception but rem
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1mo ago
Information-Theoretic Limits of Safety Verification for Self-Improving Systems
arXiv:2603.28650v1 Announce Type: cross Abstract: Can a safety gate permit unbounded beneficial self-modification while maintaining bounded cumulative risk? We
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1mo ago
Why Aggregate Accuracy is Inadequate for Evaluating Fairness in Law Enforcement Facial Recognition Systems
arXiv:2603.28675v1 Announce Type: cross Abstract: Facial recognition systems are increasingly deployed in law enforcement and security contexts, where algorithm
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1mo ago
Securing the Skies: A Comprehensive Survey on Anti-UAV Methods, Benchmarking, and Future Directions
arXiv:2504.11967v4 Announce Type: replace-cross Abstract: Unmanned Aerial Vehicles (UAVs) are indispensable for infrastructure inspection, surveillance, and rel
ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 1mo ago
FlowPure: Continuous Normalizing Flows for Adversarial Purification
arXiv:2505.13280v2 Announce Type: replace-cross Abstract: Despite significant advances in the area, adversarial robustness remains a critical challenge in syste
Hacker News (AI) 🛡️ AI Safety & Ethics ⚡ AI Lesson 1mo ago
FTC action against Match and OkCupid for deceiving users, sharing personal data
Comments
New Bernie Sanders AI Safety Bill Would Halt Data Center Construction
Wired AI 🛡️ AI Safety & Ethics ⚡ AI Lesson 1mo ago
New Bernie Sanders AI Safety Bill Would Halt Data Center Construction
The US senator said on Tuesday that a moratorium would give lawmakers time to "ensure that AI is safe." Alexandria Ocasio-Cortez will introduce a similar bill i
Hugging Face Blog 🛡️ AI Safety & Ethics ⚡ AI Lesson 7mo ago
Democratizing AI Safety with RiskRubric.ai
OpenAI News 🛡️ AI Safety & Ethics ⚡ AI Lesson 2y ago
Disrupting malicious uses of AI by state-affiliated threat actors
We terminated accounts associated with state-affiliated threat actors. Our findings show our models offer only limited, incremental capabilities for malicious c