Observability Engineering: Metrics, Logs, and Traces
This program explores how observability enables engineers to understand, monitor, and troubleshoot modern distributed systems by using metrics, logs, and traces. You’ll begin by learning the foundational principles of observability, understanding how it differs from traditional monitoring, and exploring the three pillars of observability. Through hands-on demonstrations with Prometheus and Node Exporter, you will learn how system telemetry is collected and how metrics provide visibility into infrastructure and application behavior.
You’ll then design reliability-focused metrics strategies using concepts such as Golden Signals, Service-Level Indicators (SLIs), Service-Level Objectives (SLOs), and error budgets. Practical demonstrations show how to collect application metrics, write PromQL queries, and analyze latency and error patterns. You will also explore metrics visualization and alerting by building Grafana dashboards, configuring thresholds, and creating alert rules with Prometheus and Alertmanager to detect operational incidents quickly.
Next, you’ll examine centralized logging and distributed tracing, learning how logs and traces provide deeper insight into system behavior. Using Loki, Fluent Bit, OpenTelemetry, and Jaeger, you will explore how logs are aggregated, how requests are traced across microservices, and how engineers analyze service dependencies and request latency. You will also learn how modern observability platforms use AI-powered anomaly detection in Grafana to identify unusual system behavior and support proactive monitoring.
By the end of this program, you will be able to:
-Explain the principles of observability and differentiate it from monitoring.
-Collect and analyze system metrics using Prometheus and PromQL.
-Design dashboards and visualizations using Grafana.
-Configure alerts and incident notifications using Prometheus and Alertmanager.
-Implement centralized logging pipelines using Loki and Fluent Bit.
-Instrument distributed sy
Watch on Coursera ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: Data Literacy
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Comparing Tools for Intelligent Demand Prediction in Retail
Dev.to AI
Implementing Intelligent Demand Prediction for Grocery Retail
Dev.to AI
Building a Real Estate Data Pipeline That Aggregates 3,000+ Listings Daily from BizBuySell, CREXi &…
Medium · Data Science
RMSE Is Evidence, Not a Verdict: How Measurement Uncertainty Shapes Model Error
Medium · Data Science
🎓
Tutor Explanation
DeepCamp AI