Disaster Recovery in GKE
Skills:
Kubernetes80%
Key Takeaways
Implementing Disaster Recovery in GKE to protect critical workloads
Full Transcript
hello everyone in this video we'll discuss disaster recovery for gke available on Google Cloud the content creators for this video are tanvi Desai and the kriti Gupta who are technical account managers for Google Cloud they have curated the information for you to have a deeper understanding of how to protect your gke workloads in Google Cloud let's talk about why it is important to discuss disaster recovery and what it looks like for gke we will review how to enable backup for gke and see how some of our customers are finding this feature useful finally we will share some resources for you to follow up with so let's Dive Right In why do you need to plan for Disaster Recovery disasters come in forms that can affect your critical workloads and business it could be in the form of a natural disaster like flooding fires or earthquakes it could be a technological disaster such as power outages cyber attacks and data breaches or the disaster recovery scenarios can include human-caused disasters such as explosions terrorist attacks or huge production errors disasters can occur anytime and in any form they are often unavoidable the key is to always be prepared with a recovery plan no matter what type of disaster strikes a disaster recovery plan can help you protect your data restore your it infrastructure and recover your business operations now more than ever significant infrastructure downtime can have a devastating impact on an Enterprise business as shown here analysts have estimated the average cost of data center outages around nine thousand dollars a minute that's 13 million dollars per day a potentially huge impact and the impact isn't just monetary in many cases the impact to a business's reputation can be even more catastrophic and take longer to recover from if ever as a result Enterprises are increasingly mandating the implementation of robust backup and Disaster Recovery Solutions to provide them with insurance against the unexpected this can help you to avoid costly downtime lost productivity and even business failure by implementing backup and Disaster Recovery in Google Cloud you can help to protect your critical cloud workflows from a variety of threats you can ensure that your business continues to operate even in the event of a disaster this can give you peace of mind and help you to focus on running your business every workload is unique and Google Cloud offers a variety of protection strategies to choose from depending on your business and industry it enables on-demand resource allocation for unprepared events to minimize the cost of support and help you focus on backing up your critical workloads so how do you design strategies for Disaster Recovery think about what are the most important things your organization does these are the functions that you need to make sure can be recovered quickly in the event of a disaster recovery time objective RTO is the goal your organization sets for the maximum length of time it should take to restore normal operations following an outage or data loss recovery Point objective RPO is your goal for the maximum amount of data the organization can tolerate losing you will need to determine how much data you need to protect and how quickly you need to be able to recover when the disaster happens the lower the RTO and RPO requirements are the cost of Disaster Recovery gets higher what are the types of disasters that are most likely to impact your organization once you know what risks you face you can start to develop strategies to mitigate them your recovery plan should include steps for recovering each of your critical business functions it should also include a timeline for recovery and a budget it's important to review your recovery plan regularly run tabletop exercises and simulate disaster scenarios to make sure your Disaster Recovery plan is effective and up to date consider the additional costs and capacity needed for running these exercises make sure the key stakeholders in your organization know their role and understand your Disaster Recovery plan while planning for Disaster Recovery be flexible no two disasters are the same so your recovery plan should be flexible enough to adapt to different situations get help from experts if you don't have the expertise in-house to develop a disaster recovery plan consider working with Google Cloud experts let's talk about Google Cloud's high-level approach on disaster recovery on gke by discussing its key components gke node Auto Repair repairs unhealthy nodes in a gke cluster when enabled gke will periodically check the health of each node in the cluster if a node fails consecutive health checks over an extended time period gke will initiate a repair process for that node the repair process for an unhealthy node may involve one or more of the following steps gke will start draining the node by stopping scheduling new pods on the Node it will then replace the node and migrate pods from the unhealthy node to the new node lastly gke will scale the cluster by creating additional nodes in the cluster to compensate for the loss of the unhealthy node this will keep your cluster healthy reduces manual intervention improves availability and reduces costs gke allows you to specify a liveness check that will run periodically to ensure your pod is running successfully this mechanism ensures that a container is running and healthy when you configure liveness probe in your pod spec kubelet checks the health of the Pod every 5 seconds if the Pod fails the liveness probe three times in a row it will be restarted gke safeguards your storage availability by mapping to persistent disks through persistent volume abstraction persistent volumes are a kubernetes resource that provides a way to store data that persists even if the pods that use the data are deleted they are backed by physical storage such as Google compute engine persistent disks this can be useful for storing data that needs to be available even if a pod fails or is restarted the multi-cluster Gateway supports internal and external load balancing weight-based traffic splitting traffic capacity based load balancing and traffic mirroring between your clusters multi-cluster Ingress lets you configure shared load balancing of resources across multiple gke clusters in different regions it improves application availability reduces operational complexity and increases scalability and provides a single entry point for users to access applications and manages traffic across multiple clusters spreading the kubernetes control plane and its nodes across different zones or regions for the workloads is very important to achieve High availability one option is choosing to deploy your kubernetes workload in a regional or a zonal cluster second within zonal gke offers two types of node pools single zone and multi-zonal single zone clusters have one control plane machine and worker nodes in the same Zone multi-zonal clusters are similar to zonal clusters but they span nodes across multiple zones a regional or multi-zonal cluster will provide a highly available cluster Regional clusters are better suited for high availability as they have multiple control planes across multiple compute zones in a region while zonal clusters have one control plane in a single compute Zone however if cost is a factor or your workload is not as critical a multi-zonal cluster could be a better choice in Regional clusters the control plane remains available during cluster maintenance like rotating IPS upgrading control plane VMS or resizing clusters or node pools when upgrading a regional cluster two out of three control plane VMS are always running during the rolling upgrade so the kubernetes API is still available similarly a single zone outage won't cause any downtime in the regional control plane backup for gke is a fully managed service that helps you protect manage and restore your kubernetes workloads and data in a simple scalable and secure way let's hear what one of our customers is saying about backup for gke Jose Chavez SAS platform and delivery engineer at broadcom says backup for gke makes it easier for us to protect our stateful workloads in gke and it makes restoring those stateful workloads much simpler and faster we see integrated backup as another sign of gke's maturity for stateful workloads and we look forward to using it to serve our worldwide internal customers at broadcom backup for gke is a simple way to protect manage and restore your containerized applications and data with backup for gke you can meet your service level objectives and automate common backup and Recovery tasks you can protect kubernetes resources such as namespaced resources cluster-wide resources and persistent volumes application data including databases file systems and logs you can seamlessly restore your backups to a new or existing kubernetes cluster backup for gke is configurable flexible and easier to adopt for Enterprises backup options allow you to select preferred backup destinations or select and Skip certain resources you can configure backup not to include secrets so the data is not visible via persistent disk control plane by enabling time lock backups you can disable manual or automated deletion of backups to protect from malicious attacks all data is encrypted by default with the option of using customer managed encryption Keys CMAC the restore options let you restore a cluster into a new cluster or a region and flexibility of parameterizing the restore options to different storage classes the subscope feature lets you restore a specific namespace or application if it's accidentally deleted or an upgrade fails by delegating an admin cluster admins can give access to app admins to do ad hoc backups before critical application upgrades this diagram shows the relationship between the different components for backup in gke the resource-based rest API based service serves as the control plane for backup for gke and includes Google Cloud console UI elements that interact with the API the agent runs in every cluster where backups or restores are performed and performs backup and restore operations by interacting with the backup for gke API let's demonstrate how you can enable backup for gke through the gcp console in the Google Cloud console navigate to Google kubernetes engine and click backup if this is the first time you are creating a backup enable the backup for gke API click enable backup next to the cluster that requires backup and restore add required details to create plan for the backup now you're all set for more resilient gke workloads on Google Cloud we hope you found this video helpful check out the links in the description to learn more about backups for gke and to try some Hands-On Labs thanks for watching
Original Description
Want to know how to protect your critical workloads in Google Kubernetes Engine (GKE)? Watch this video to learn more about Disaster Recovery, why it is critical that you have a disaster recovery plan, and how you can enable backups in GKE. This video also provides a step-by-step guide for creating a backup plan for your GKE cluster.
Additional resources:
Get started with GKE → https://goo.gle/3Do63Yl
Blog Post: Google Cloud lanches Backup for GKE → https://goo.gle/44YVnLD
Take a hands-on lab on Google Cloud Skills Boost → https://goo.gle/470cWN1
Video: Protect containerized applications with backup for GKE and Anthos → https://goo.gle/3q19Ckb
Documentation on Backups for GKE → https://goo.gle/3OyKta9
Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Google Cloud Tech · Google Cloud Tech · 37 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
▶
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
I’m going for it #GoogleCloudCertified
Google Cloud Tech
I had to get #GoogleCloudCertified
Google Cloud Tech
Be better overall at what you do #GoogleCloudCertified
Google Cloud Tech
Cloud Monitoring on our radar #Analysis #Uptime
Google Cloud Tech
Introduction to Generative AI Studio
Google Cloud Tech
How to use Github Actions with Google's Workload Identity Federation
Google Cloud Tech
Introduction to Responsible AI
Google Cloud Tech
Networking updates and CDMC-certified architecture
Google Cloud Tech
Create and use a Cloud Storage bucket
Google Cloud Tech
How to digitize text from documents
Google Cloud Tech
Faster analytical queries with AlloyDB
Google Cloud Tech
Next ‘23 sessions and FaaS Wave
Google Cloud Tech
Introduction to Assured Open Source Software
Google Cloud Tech
BigQuery Cost Optimization: Storage
Google Cloud Tech
BigQuery Cost Optimization: Compute
Google Cloud Tech
BigQuery Cost Optimization: Select Queries
Google Cloud Tech
Remote Field Equipment Management with Manufacturing Data Engine
Google Cloud Tech
Supercharging your applications with Cloud SQL Enterprise Plus
Google Cloud Tech
Vector Support on our radar #GenAI
Google Cloud Tech
Architecting a blockchain startup with Google Cloud
Google Cloud Tech
Kubernetes and multitasking updates!
Google Cloud Tech
GKE: Using Kubernetes Events
Google Cloud Tech
How to configure firewall rules for Cloud Composer
Google Cloud Tech
Vertex AI Embeddings API + Matching Engine: Grounding LLMs made easy
Google Cloud Tech
Geospatial analytics on our radar #EarthEngine #BigQuery
Google Cloud Tech
Ensuring requests are set in Kubernetes
Google Cloud Tech
Cloud Next 2023, Google research program, and more!
Google Cloud Tech
How to migrate projects between organizations with Resource Manager
Google Cloud Tech
How to run #MySQL in Google Cloud
Google Cloud Tech
#GenerativeAI for enterprises and #Next2023
Google Cloud Tech
How Google Photos scales to store 4 trillion photos and videos
Google Cloud Tech
Google Cross-Cloud Interconnect (Demo 2)
Google Cloud Tech
GKE Cost Optimization Golden Signals: Introduction
Google Cloud Tech
GKE Cost Optimization Golden Signals: Workload Rightsizing
Google Cloud Tech
GKE Load Balancing: Overview
Google Cloud Tech
GKE Load Balancing: Best Practices
Google Cloud Tech
Disaster Recovery in GKE
Google Cloud Tech
How to configure IP masquerade agent in GKE Standard clusters
Google Cloud Tech
Enable and use GKE Control plane logs
Google Cloud Tech
Compliance in Australia with Assured Workloads
Google Cloud Tech
Creating budgets and budget alerts in Google Cloud #FinOps
Google Cloud Tech
Cloud SQL Enterprise Plus on our radar #mySQL
Google Cloud Tech
What's Next for Google Cloud?
Google Cloud Tech
How Loveholidays scaled with Contact Center AI
Google Cloud Tech
What is fleet team management in GKE?
Google Cloud Tech
Troubleshoot VPC Network Peering
Google Cloud Tech
Introduction to DocAI and Contact Center AI
Google Cloud Tech
Cloud Run Direct VPC egress explained
Google Cloud Tech
Database deployment options in GKE
Google Cloud Tech
Analyze cloud billing data with #BigQuery
Google Cloud Tech
Tips to becoming a world-class Prompt Engineer
Google Cloud Tech
Serverless is simple. Do I need CI/CD?
Google Cloud Tech
Accelerating model deployment with MLOps
Google Cloud Tech
How Hawaii's Department of Human Services scaled with CCAI
Google Cloud Tech
Pricing API on our #Radar
Google Cloud Tech
How Recommendations AI for Media can boost customer retention
Google Cloud Tech
Troubleshooting: Node Not Ready Status
Google Cloud Tech
One weekend until Cloud Next 2023!
Google Cloud Tech
#GoogleCloudNext starts tomorrow!
Google Cloud Tech
#GoogleCloudNext will be demand!
Google Cloud Tech
More on: Kubernetes
View skill →Related AI Lessons
🎓
Tutor Explanation
DeepCamp AI