BigQuery Migration Service: Validation and optimization
Key Takeaways
The video discusses the BigQuery Migration Service, focusing on validation and optimization techniques to ensure successful data transfer from a source data warehouse or data lake to Google Cloud, highlighting best practices and recommendations for BigQuery, data governance, and cost optimization.
Full Transcript
So, you've transferred data from a source data warehouse or data lakeink into Google Cloud and want to know if everything made it or what happens next. This video is [music] for you. One of the easiest ways to make sure your data has been transferred successfully is with automated validations. The BigQuery migration services validation checks for structural mismatches to discover missing data, content mismatches to discover mutated or incorrect data, and type fidelity to verify schema accuracy. The results of a validation are stored in storage buckets and will contain the sample data, generated queries, and validation summaries so you can easily pinpoint and fix missing records. Check the description for more information on the validation services. Once you know all the data has been transferred successfully, you can move on to checking that all governance and access controls have also been migrated correctly. Google Cloud has great tools to help you with data governance, granular access control, data quality, and metadata. If you're thinking of creating AI applications, this is a good time to start planning the foundations to clean and accurate data as you migrate your existing data governance from, for example, Snowflakes Polaris or datab bricks Unity catalog. You will see how the available encryption techniques and data loss prevention features become very useful in this process too. With a secure foundation in place, the next step is to validate adjacent workloads like ETL pipelines, business applications, and reporting layers. This is a good opportunity to start engaging business users and getting them excited with all the unlocked potential for innovation. As the workloads stabilize, you will find aspects to fine-tune and further optimize in your workloads. The best optimizations will be incremental and based on the capabilities and architecture of the new infrastructure. Trust me, this is not the time to rebuild your workloads from scratch. Start with those changes that may be easier to implement. For example, clustering and partitioning are an easy win and also big factors in reducing the amount of scanned records, thus improving performance and lowering costs. Similarly, reducing joins in BigQuery is usually a good idea. So, the normalized tables using nested and repeated fields are a good pattern to apply when possible. You will find many tips like this one in the links in the description. While you're integrating into your new data warehouse or data lake, avoid a common pitfall and consider if your current patterns are still adequate for your needs and the new infrastructure. For example, is EDL still better than ELT? Another common pitfall is overestimating the need for real time data. Will your users still need streaming data or is batch enough? If your integrations were set up some time ago, there may now be new premises and tools as well as more potential to unlock. You probably have some insight into this from the assessment service reports. Speaking of modernization, if this isn't the current practice, consider the path to CI/CD pipelines and AI assisted coding for your developers. Version control and automated testing for your SQL data pipelines and workflows can be implemented slowly and can go a long way. Last but not least, remember we spoke about cost of ownership and execution in our first video. Make sure you stay under budget and apply some guard rails and limits to how your data is processed and queried. Your Google Cloud expert team can help you choose the right big query edition and allocate reservations across projects for your initial estimates. Things that can influence your costs are ingestion mechanisms and using different types of storage wisely. I also recommend having login monitoring and alerts to see how usage patterns evolve and keeping an eye for further optimizations. BigQuery has a recommendation feature for that. I hope you feel ready for your migration and remember there are many experts at Google Cloud ready to help you out. There's also more information in the description of this video. You got this. >> [music] >> Hey, hey, hey. >> [music]
Original Description
Learn more about BigQuery Migration Service → https://goo.gle/BigQuery-Migration-Service
BigQuery (Best practices) → https://goo.gle/bigquery-optimizations
BigQuery (Recommendations) → https://goo.gle/bigquery-recommendations
You’ve moved your data to Google Cloud. Now it is time to make sure it’s accurate, secure, and cost effective. This video concludes our migration series by focusing on the critical steps following data transfer from Databricks, Teradata, Snowflake, Cloudera and many other platforms.
You’ve moved your data to Google Cloud. Now it is time to make sure it’s accurate, secure, and cost effective. This video concludes our migration series by focusing on the critical steps following data transfer from Databricks, Teradata, Snowflake, Cloudera and many other platforms.
We also dive into post migration optimization strategies, helping you avoid common pitfalls like over engineering workloads and pipelines. Discover how to improve performance and lower costs using BigQuery native patterns like clustering, partitioning, and denormalization and how to empower your data engineers to create a powerful foundation for scalable AI.
Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech
#GoogleCloud #BigQuery #DataMigration
Speakers: Lucia Subatin
Products Mentioned: BigQuery, BigQuery Migration Service, Cloud Storage
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Google Cloud Tech · Google Cloud Tech · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
I’m going for it #GoogleCloudCertified
Google Cloud Tech
I had to get #GoogleCloudCertified
Google Cloud Tech
Be better overall at what you do #GoogleCloudCertified
Google Cloud Tech
Cloud Monitoring on our radar #Analysis #Uptime
Google Cloud Tech
Introduction to Generative AI Studio
Google Cloud Tech
How to use Github Actions with Google's Workload Identity Federation
Google Cloud Tech
Introduction to Responsible AI
Google Cloud Tech
Networking updates and CDMC-certified architecture
Google Cloud Tech
Create and use a Cloud Storage bucket
Google Cloud Tech
How to digitize text from documents
Google Cloud Tech
Faster analytical queries with AlloyDB
Google Cloud Tech
Next ‘23 sessions and FaaS Wave
Google Cloud Tech
Introduction to Assured Open Source Software
Google Cloud Tech
BigQuery Cost Optimization: Storage
Google Cloud Tech
BigQuery Cost Optimization: Compute
Google Cloud Tech
BigQuery Cost Optimization: Select Queries
Google Cloud Tech
Remote Field Equipment Management with Manufacturing Data Engine
Google Cloud Tech
Supercharging your applications with Cloud SQL Enterprise Plus
Google Cloud Tech
Vector Support on our radar #GenAI
Google Cloud Tech
Architecting a blockchain startup with Google Cloud
Google Cloud Tech
Kubernetes and multitasking updates!
Google Cloud Tech
GKE: Using Kubernetes Events
Google Cloud Tech
How to configure firewall rules for Cloud Composer
Google Cloud Tech
Vertex AI Embeddings API + Matching Engine: Grounding LLMs made easy
Google Cloud Tech
Geospatial analytics on our radar #EarthEngine #BigQuery
Google Cloud Tech
Ensuring requests are set in Kubernetes
Google Cloud Tech
Cloud Next 2023, Google research program, and more!
Google Cloud Tech
How to migrate projects between organizations with Resource Manager
Google Cloud Tech
How to run #MySQL in Google Cloud
Google Cloud Tech
#GenerativeAI for enterprises and #Next2023
Google Cloud Tech
How Google Photos scales to store 4 trillion photos and videos
Google Cloud Tech
Google Cross-Cloud Interconnect (Demo 2)
Google Cloud Tech
GKE Cost Optimization Golden Signals: Introduction
Google Cloud Tech
GKE Cost Optimization Golden Signals: Workload Rightsizing
Google Cloud Tech
GKE Load Balancing: Overview
Google Cloud Tech
GKE Load Balancing: Best Practices
Google Cloud Tech
Disaster Recovery in GKE
Google Cloud Tech
How to configure IP masquerade agent in GKE Standard clusters
Google Cloud Tech
Enable and use GKE Control plane logs
Google Cloud Tech
Compliance in Australia with Assured Workloads
Google Cloud Tech
Creating budgets and budget alerts in Google Cloud #FinOps
Google Cloud Tech
Cloud SQL Enterprise Plus on our radar #mySQL
Google Cloud Tech
What's Next for Google Cloud?
Google Cloud Tech
How Loveholidays scaled with Contact Center AI
Google Cloud Tech
What is fleet team management in GKE?
Google Cloud Tech
Troubleshoot VPC Network Peering
Google Cloud Tech
Introduction to DocAI and Contact Center AI
Google Cloud Tech
Cloud Run Direct VPC egress explained
Google Cloud Tech
Database deployment options in GKE
Google Cloud Tech
Analyze cloud billing data with #BigQuery
Google Cloud Tech
Tips to becoming a world-class Prompt Engineer
Google Cloud Tech
Serverless is simple. Do I need CI/CD?
Google Cloud Tech
Accelerating model deployment with MLOps
Google Cloud Tech
How Hawaii's Department of Human Services scaled with CCAI
Google Cloud Tech
Pricing API on our #Radar
Google Cloud Tech
How Recommendations AI for Media can boost customer retention
Google Cloud Tech
Troubleshooting: Node Not Ready Status
Google Cloud Tech
One weekend until Cloud Next 2023!
Google Cloud Tech
#GoogleCloudNext starts tomorrow!
Google Cloud Tech
#GoogleCloudNext will be demand!
Google Cloud Tech
More on: RAG Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
How I built the OSS alternatives directory: GitHub ETL, Turso, and the UPSERT trap I hit
Dev.to · MORINAGA
Apache Iceberg in Production: Compaction, Catalogs, and the Pitfalls Nobody Warns You About
Dev.to · Gabriel Henrique
Your First Task as a Data Engineer in a New Company? Make the ETL Pipeline Testable
Towards Data Science
From DataStage and Informatica to Databricks Medallion Architecture: Why Migration Is More Than Code Conversion
Dev.to · Amit Kumar Singh
🎓
Tutor Explanation
DeepCamp AI