Building distributed data processing workloads with AWS Step Functions

AWS Developers · Intermediate ·🔧 Backend Engineering ·2y ago

Key Takeaways

This video demonstrates building distributed data processing workloads with AWS Step Functions, covering topics such as horizontal scaling, extensibility, and cost benefits. It showcases the use of AWS Step Functions, AWS Lambda, S3, and other AWS services to process large datasets.

Full Transcript

hello everyone thanks for joining me today for the session which is building distributed data processing workloads with AWS step functions my name is Uma ramadas I am a specialist Solutions architect at AWS my job is helping customers design and operate well architected applications using AWS serverless services such as step functions Lambda evenbridge prior to my job here at AWS I was a developer and architect for many years today we are talking about how to quickly build a distributed data processing application using step functions and AWS Lambda in the session I briefly explore why we need distributed data processing and how serverless can help then I talk about how step function simplifies building large-scale data processing then I'll show you a quick demo of how I process nearly 500 000 files under two minutes using step functions at last I'll go over some of the best practices to implement along with resources to quickly get you started when building it yourself imagine you work for a supply chain company you get invoices from vendors as PDF files and S3 you need to convert that PDF file to text extract sales data run some calculation load the data to the database and perhaps trigger a reporting job at the end let's assume you get 500 000 invoices and you have to complete the processing in two hours given that problem NASA application programmers you will jump on to lay up the design build a flow diagram and perhaps building the POC and even successfully run it for a few hundred files you add some parallelism with threads and vertically scale the system to handle the more load you realize soon it takes hours or even a day to complete the 500 000 files this is unacceptable for our SLA of two hours when you think about the solution you know the invoice processing is same across 500 000 files all you need to do is to write the logic for one file and run it as a separate process instead of vertically scaling you need to horizontally scale it and distribute it the work across the process so the mental model of the solution would look something like this these green icons or the compute nodes where you're invoice processing solution runs the coordinator component is responsible for Distributing the 500 000 files across the green nodes we can speed up the processing by adding more compute nodes and distribute the invoice files across them let's quickly see the benefit of this approach as you saw earlier the processing will be faster with distributed processing insta weeks to complete the processing you can process in hours maybe minutes which in turn improves your customer experience and many other aspects of the business if your business expands tomorrow and you have to process 1 million records you don't have to re-architect you can easily attach additional nodes to the system it brings extensibility out of the box you are avoiding single point of failures as each node processes fewer files the coordinator component can be built intelligently to handle partial failures if one node fails coordinator component can stand up additional node and continue the processing Additionally you also reap cost benefits cost benefit comes from two ways your work is done in smaller chunks and many nodes compared to one huge dedicated machine instead of running the processing for days you were done in hours saving cost on infrastructure maintenance and improved productivity across multiple dimensions these benefits sounds really attractive but it introduces a new type of challenges let's take a look at a few of them traditionally the processing nodes are run in server full environment so you will have to set up provision and manage the Clusters and scale them based on utilization you have to build the coordinator component yourself if you're a team of application developers you may even have to learn new technology such as distributed processing Frameworks with sorrowful environments and more operational responsibilities sometimes you need to share the responsibilities with teams such as infrastructure and networking Engineers to build and manage the Clusters so it becomes challenging to balance the cost security and speed what if there is a serverless way of doing the distributed processing what if there is a magical component that can iterate on the invoices from S3 and distribute to those green processing nodes and manage State and parallelism for you before we answer the question why serverless firstly managing cluster provisioning setup patching or undifferentiated work you can offload that heavy lifting to AWS serverless secondly serverless applications are built for high availability you don't have to worry about uptime or monitoring of any infrastructure by offering broad array of security controls and shifting many of the security responsibilities to AWS it helps several it helps Developers deploy and publish workloads confidently and reduce time to Market thirdly serverless applications automatically scale and you pay by unit of consumption not by server units that's a very important distinction because when you build applications you don't know how many will move to production or sometimes the scale when you need in the production which means you can experiment more and innovate faster before we introduce that magic feature allow me to introduce a service that has the feature when your application is powered by multiple connected services how do you build that connection track issues or inspect for errors and visualize what's happening across this can be really challenging as your application grows I believe this is where step functions as a workflow service comes into rescue this little video you see in the screen shows you the experience of how you build workflows through step functions Visual Studio you can drag and drop serverless services such as AWS Lambda Amazon dynamodb Amazon sqs and fargate introduce decision Logics in the flow you can run steps in parallel or you can iterate on arrays of items you can even click on the action and configure additional information such as payloads retries input output handling in my opinion this drag and drop interface is one of the nicest interfaces in the console we have of course you can build your workflow as code using a json-like language called Amazon States language ASL you can include ASL in your infrastructure risk code as well to have a nice scalable way of building this workflow here is a magic bit Dev functions integrates with over 10 000 AWS apis across 200 plus AWS services directly you can drag and drop those apis and build workflows quickly these little boxes that you see in the workflow these are called States step functions has several types of States task State activity State parallel State Etc map state or dynamic parallelism in-step functions allows you to dynamically iterate on arrays of items there are two types of map inline map and distributed map you pass an array to it and you can run it one at a time two at a time or maybe three at a time and up to a Max concurrency of 40 for inline map what we found is customers wanted more than the concurrency of 40. they also wanted native integration to iterate on objects from S3 at re invent 2022 we announced distributed map you can go you can now go from a Max concurrency of 40 to 10 000. essentially distributed map runs as a sub workflow with the Lambda function Amazon fargate AWS sdks or any combinations of those inside the workflow it also plays nicely with S3 you can not only give an input of arrays but also S3 object of CSV or Json file format or file type or S3 prefix if you're processing multiple files step functions provides a native iterator service for S3 objects distribute the work across the worker nodes in a highly scalable fashion Aggregates output just like the mapreduce pattern handles failures and so on this allows you to focus on just the business value you only pay for the compute time you use and the service automatically scales to handle the demands of your application you can control the scale at which you want to distribute the work through a simple change this means it gives you all the flexibility and extensibility to go from small to large scale workloads without huge changes or re-architectures data processing jobs are done either at intervals or even driven with pay-per-use pricing model you can reduce your infrastructure cost by running silver full workflow orchestrations and compute services additionally organizations can save cost on training developers to learn distributed processing Frameworks and related tools step functions distribute a map with AWS Lambda can be a really powerful combination with a unparalleled concurrency and scaling of both services you can run your distributed data processing quicker than ever before developers can use their familiar programming languages to build the data processing workload using Lambda functions with discrete tasks and distributed architecture but coordinated through orchestrator individuals can work independently on task additionally AWS step functions provides complete view of what's happening with the workflow visually along with metrics and logs sent to Amazon cloudwatch so the time spent on debugging and monitoring problems are greatly reduced you can use distributed Map to solve problems across Industries and use cases here are a few of them processing unstructured files such as video audio text files is a classic step functions use case for instance with distributed map security partner of yours can scan millions of documents for security vulnerabilities when they onboard a new customer you can reprocess millions of documents as part of backfilling use cases data modeling and simulations like Monte Carlo simulation requires running the same logic multiple times with different inputs these are great use cases for distributed map use cases such as migrating data from one database to another transforming data from one format to another at Large Scale are also common use cases workflows that require parallelism and human in the loop for example end of day financial transactions that might require human approval approvals for certain transactions with that introduction we can now imagine how to go about building our invoice processing use case we can use the native integration with S3 to read the invoices from S3 and found out using its large-scale concurrency with AWS Lambda and after all the invoices are processed we can kick off the reporting process yeah when you build your distributed map workflow you set the map state to distribute it as it runs as a sub workflow you can choose to run the workflow as standard or Express standard and express are two different flavors of Step functions workflow you will see the differences in the coming slides then you specify where you are reading the items from since you are processing 500 000 files and you specify the bucket and prefix where they are located in S3 if you want to process millions of records from a single S3 object you specify the S3 key and configure the resource as S3 get object then you go on to define the batch size and concurrency batching lets you send batches of items to the sub workflow or the child workflow it is a great cost optimization technique as it can considerably decrease the number of State transitions concurrency defines how many parallel sub workflows you want to run It generally depends on how much the downstream service can scale to then you go on to define the sub workflow itself this can be a Lambda function one of ten thousand plus AWS apis optimize Integrations or any combinations of those distributor map will invoke the sub workflow parallelly at the concurrency you define until all the items are processed you will then optionally configure where you want to send the output of your distributed map it is called result writing process so this process will aggregate all the successful sub workflow responses as one file and fail the sub workflow responses as another file when you choose to write to the result writer the output of the distributed map can be about 256 kilobyte limits well into the gigabytes all right it is demo time now in this demo I run a simulation process for a fixtures company that processes home loans the process is going to simulate how many loan applications will fail if the inflation goes about two percent using 500 000 loan application files I have two buckets in my account for this project loan app simulation Source bucket is where all the loan applications are stored under the prefix data I have about you know nearly 500 000 files in this location and if I go back I have another bucket loan app simulation destination bucket this stores the results of the distributed map this is where the result writer writes the results here is my state machine that's going to process these files loan app simulation data processing let me edit this and go to the workflow Studio I talked about earlier I'm going to close this one and zoom in a bit so you can view it all right let's zoom out my the first step in my workflow is a distributed map it has a Lambda function inside the logic of the Lambda function is to accept one or more loan application and find out if the loan application will fail or Not by running the simulation and this Lambda function is inside the workflow so the distributed map all the sub workflows have this Lambda function inside the Lambda function Returns the the loan applications with the failure information or the success information and then the results are stored in the destination bucket remember all the the output of all of the Lambda functions or aggregated and stored in the destination bucket after the distributed map is done processing all the 500 000 files and I have this step called calculate failure ratio it accepts this result packet result writer output and finds out the the percentage of loans that failed and then it stores the information into the dynamodb table as you can remember from the presentation inline map will process at a concurrency of 40 while distributed map has 10 000 concurrency distributed map can process data from data sources such as S3 in our demo I've used the S3 bucket that I showed you earlier to read the data from loan app simulation Source I have hard coded it here you can choose to get it from runtime from the state input different batching enables sending batch of files to the child workflow instead of one at a time so I have set this to 400 I'm also sending a batch of input an additional to the 400 items of metadata to the Lambda function and this is really handy when you want to pass parameters from the previous States and I see the concurrency set to 1000 which mean I'm running 1000 parallel sub workflows again we're using Lambda functions which can scale to a thousand quickly if your Downstream Services has limited scaling you would set this limit to match with it so I'm running 1000 parallel workflows with 400 items batched with each child workflow which means 1000 multiplied by 400 400 000 files or process with just one iteration prior to this feature you will have to use the list object B2 API yourself it returns 1000 objects at the time you process thousand files at a time repeating it for 500 000. with one iteration you can only process thousand files so distributed map list all the objects metadata of the 500 000 files in the beginning itself giving you the scale and the speed you need for running the process with greater parallelism the last bit of information I have um is how I am storing the results of all the child workflows the destination bucket and where the results are stored if I want to add additional steps in the workflow I can drag and drop anytime for example if I want to send or publish a success information through eventbrush I could just search for evenbridge here and just drag and drop evenbridge put events in an apply exit or before I do that I will configure the event bus name and a few other details here and then save the workflow okay I'm not going to do that now I'm going to start the execution and so this starts the processing and since this is a sub workflow I'm going to open this NS separate window and I can see that I have this concurrency of thousand I can even edit it as it runs 400 items as badge I can set tolerated failure I'm going to set this to 10. I'm going to talk about this in the coming slides confirm as I said earlier distributed map reads all the objects metadata of 500 000 files the very first time and then starts a sub workflow so it's actually right now reading it once I read it once it reads it it then starts the sub workflow and you can see that the workflows starts running so now all 400 000 starts running and you can see the sub workflows right here at the bottom I'm going to open one of these workflows and view the input and output as you can see the the batch input I send is I had in the beginning that also is sent as part of all the items and the Lambda function responded with the list of loan applications and whether the loan applications will fail or not now if I go back to here you know all of those 400 499 999 files are processed well within two minutes now if I go back to my parent workflow it's also now completed so the calculate failure ratio and you can see that input is the result writer that's where the output of the distributed map was stored and then based on this information the calculate failure ratio Lambda function calculated how many loan applications failed and how many success and and the failure percentage and the information is also stored in dynamodb table and it's like direct call to dynamodb so I can now go to dynamodb table and explore the items yeah I can see that I am already the items are available so in this demo I showed you how I process nearly 500 000 files using step functions and Lambda Max items limits the number of items to be processed in the workflow for example when you are processing a CSV set of hundred thousand if you set the max item to 500 distributed map will only process 500 items from the data set it's really handy during testing with Max items you can process fewer items identify the batch size and duration of your workflow during the testing it saves you from running all the data through the workflow every time you test giving you the confidence to process millions of files or millions of objects and at the same time reducing the cost of testing when you process large volumes of data batching is a great technique to implement in order to reduce cost by batching more items per sub workflow you process more items in less iterations there by reducing thereby reducing the number of State transitions standard workflows are priced by state transitions so more State transition means more cost so if you're not sure how many items you can batch you can use the max items feature we saw earlier more concurrency generally results in Faster processing while planning out concurrency look at the service and API in your map and set the concurrency low enough not to worry about those services and apis for instance in this example the recognition detect labels call as a default quota of 50 TPS in our larger regions so you would set the concurrency below that to start with if you wanted to do higher throughput processing you would request the quota to be increased another important fine tuning you can make in the workflow is choosing the type of sub workflow standard workflows can run up to a year while Express workflows are limited to 5 minutes while you choose standard workflows for long running durable workloads Express workflows are most suitable for high volume bursty workloads due to their higher burst another important distinction is how they are priced where while standard workflows have a simple pricing model based on the state transition they can be expensive compared to express if there are many steps in the workflow in such cases if your sub workflow can be run in five minutes prefer Express workflows remember this five minutes is Peru sub workflow not the entire processing of the distributed map another important configuration with distributed map is failure toleration we all agree in data processing data quality is questionable you don't want to either stop the entire processing because just one data was incorrect nor continue processing when most of your data is incorrect so distributed map offers you a nice way to set how much or how many items can fail before failing the entire workflow of course you can catch your failures and direct to a failure handling step that can handle the failures programmatically we have come to the end of the presentation before I exit I want to leave a customer testimonial and a few resources to get started with this awesome feature cybergrx helps customers and third parties with cyber risk management they predict with high confidence how a third-party company will respond to a risk assessment questioner to do this they have to run the predictive model on every company in their platform they face the challenge of running their algorithm for 225 000 companies in a timely manner with as few Hands-On resources possible they implemented the solution using distributed map and reduce the processing time from 8 days to 56 minutes foreign workflow collections are a great place to start discovering different distributed map use cases it is available in serverlessland.com it has well over 100 plus patents you can browse the workflows infrastructure as code templates and even ASL definitions outside of workflows serverless land is a great place to visit for blogs workshops patterns to build with serverless services another useful resource is the functions Workshop if you're new to step functions I would highly recommend you to start with basic module it also has Hands-On instructions on how to build large-scale parallelizations with distributed map well this is Umar ramadans thank you so much for joining me you all have a great rest of the day

Original Description

Data processing is fundamental to organizations to meet business goals and unlock new business values such as operational efficiency, process optimization. When building data processing solutions, app developers face not only the challenges inherent to data such as data sanity, integration, security and governance but also skill gap with technology and tooling. Serverless services help developers build and deploy solutions using their familiar programming language without worrying about servers. The session introduce you to challenges with building distributed data processing workloads, explores how distributed map feature of Step Functions and AWS Lambda can solve the challenges with faster and efficient data processing, share use cases, best practices and resources to help you accelerate your data processing journey. #AWS
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from AWS Developers · AWS Developers · 0 of 60

← Previous Next →
1 Using Microsoft Active Directory across On-premises and Cloud Workloads
Using Microsoft Active Directory across On-premises and Cloud Workloads
AWS Developers
2 What is Cloud Computing with AWS? | Hebrew Webinar
What is Cloud Computing with AWS? | Hebrew Webinar
AWS Developers
3 Best Practices for Getting Started with AWS | Hebrew Webinar
Best Practices for Getting Started with AWS | Hebrew Webinar
AWS Developers
4 Best Practices for Using AWS Identity and Access Management (IAM) Roles
Best Practices for Using AWS Identity and Access Management (IAM) Roles
AWS Developers
5 Building Scalable Web Apps | Hebrew Webinar
Building Scalable Web Apps | Hebrew Webinar
AWS Developers
6 Dev & Test on the AWS Cloud | Hebrew Webinar
Dev & Test on the AWS Cloud | Hebrew Webinar
AWS Developers
7 Storage & Backup on AWS | Hebrew webinar
Storage & Backup on AWS | Hebrew webinar
AWS Developers
8 Disaster Recovery on AWS | Hebrew Webinar
Disaster Recovery on AWS | Hebrew Webinar
AWS Developers
9 AWS Israel News  | Episode 1
AWS Israel News | Episode 1
AWS Developers
10 Security Best Practices on AWS | Hebrew Webinar
Security Best Practices on AWS | Hebrew Webinar
AWS Developers
11 Ready: Introduction to AI on AWS | Hebrew Webinar
Ready: Introduction to AI on AWS | Hebrew Webinar
AWS Developers
12 Set: What is ML for developers? | Hebrew Webinar
Set: What is ML for developers? | Hebrew Webinar
AWS Developers
13 Go!: Building your own ChatBot with Amazon Lex | Hebrew Webinar
Go!: Building your own ChatBot with Amazon Lex | Hebrew Webinar
AWS Developers
14 And Beyond: Amazon Sagemaker | Hebrew Webinar
And Beyond: Amazon Sagemaker | Hebrew Webinar
AWS Developers
15 Building API-Driven Microservices with Amazon API Gateway - AWS Online Tech Talks
Building API-Driven Microservices with Amazon API Gateway - AWS Online Tech Talks
AWS Developers
16 Understanding AWS Secrets Manager - AWS Online Tech Talks
Understanding AWS Secrets Manager - AWS Online Tech Talks
AWS Developers
17 Best Practices for Building Enterprise Grade APIs with Amazon API Gateway - AWS Online Tech Talks
Best Practices for Building Enterprise Grade APIs with Amazon API Gateway - AWS Online Tech Talks
AWS Developers
18 Build, Train and Deploy Machine Learning Models on AWS with Amazon SageMaker - AWS Online Tech Talks
Build, Train and Deploy Machine Learning Models on AWS with Amazon SageMaker - AWS Online Tech Talks
AWS Developers
19 AWS Israel News | Episode 2 | re:Invent
AWS Israel News | Episode 2 | re:Invent
AWS Developers
20 AWS Floor28 News - January
AWS Floor28 News - January
AWS Developers
21 AWS Floor28 News - February - Hebrew
AWS Floor28 News - February - Hebrew
AWS Developers
22 AWS Floor28 News - March - Hebrew
AWS Floor28 News - March - Hebrew
AWS Developers
23 AWS Floor28 News - April - Hebrew
AWS Floor28 News - April - Hebrew
AWS Developers
24 AWS Floor28 News - May - Hebrew
AWS Floor28 News - May - Hebrew
AWS Developers
25 Authentication for Your Applications: Getting Started with Amazon Cognito - AWS Online Tech Talks
Authentication for Your Applications: Getting Started with Amazon Cognito - AWS Online Tech Talks
AWS Developers
26 AWS Floor28 News - June - Hebrew
AWS Floor28 News - June - Hebrew
AWS Developers
27 AWS Floor28 News - July - Hebrew
AWS Floor28 News - July - Hebrew
AWS Developers
28 Enriching your app with Image Recognition and AWS AI Services - AWS Webinar - Hebrew
Enriching your app with Image Recognition and AWS AI Services - AWS Webinar - Hebrew
AWS Developers
29 Personalize, Forcast, and Textract - AWS Webinar - Hebrew
Personalize, Forcast, and Textract - AWS Webinar - Hebrew
AWS Developers
30 Managing Your ML Development Lifecycle with Amazon SageMaker - AWS Webinar - Hebrew
Managing Your ML Development Lifecycle with Amazon SageMaker - AWS Webinar - Hebrew
AWS Developers
31 Running your ML code in Amazon Sagemaker - AWS Webinar - Hebrew
Running your ML code in Amazon Sagemaker - AWS Webinar - Hebrew
AWS Developers
32 Get Started in Minutes with Amazon Connect in Your Contact Center - AWS Online Tech Talks
Get Started in Minutes with Amazon Connect in Your Contact Center - AWS Online Tech Talks
AWS Developers
33 AWS Floor28 News - August - Hebrew
AWS Floor28 News - August - Hebrew
AWS Developers
34 AWS Floor28 News - September - Hebrew
AWS Floor28 News - September - Hebrew
AWS Developers
35 Deep Dive on Amazon EventBridge - AWS Online Tech Talks
Deep Dive on Amazon EventBridge - AWS Online Tech Talks
AWS Developers
36 Advanced Serverless Orchestration with AWS Step Functions - AWS Online Tech Talks
Advanced Serverless Orchestration with AWS Step Functions - AWS Online Tech Talks
AWS Developers
37 Living on the Edge - an Introduction to  Amazon CloudFront and Lambda@Edge  - Hebrew Webinar
Living on the Edge - an Introduction to Amazon CloudFront and Lambda@Edge - Hebrew Webinar
AWS Developers
38 AWS Floor28 News - October - Hebrew - YouTube
AWS Floor28 News - October - Hebrew - YouTube
AWS Developers
39 What's New with AWS Storage - AWS Online Tech Talks
What's New with AWS Storage - AWS Online Tech Talks
AWS Developers
40 How to Build a Compelling Migration Business Case Using TSO Logic - AWS Online Tech Talks
How to Build a Compelling Migration Business Case Using TSO Logic - AWS Online Tech Talks
AWS Developers
41 Configuring and Managing Amazon S3 Replication - AWS Online Tech Talks
Configuring and Managing Amazon S3 Replication - AWS Online Tech Talks
AWS Developers
42 AWS Floor28 News - November - Hebrew
AWS Floor28 News - November - Hebrew
AWS Developers
43 Using Relational Databases with AWS Lambda - Easy Connection Pooling - AWS Online Tech Talks
Using Relational Databases with AWS Lambda - Easy Connection Pooling - AWS Online Tech Talks
AWS Developers
44 AWS Floor28 News - December 2019 - Hebrew
AWS Floor28 News - December 2019 - Hebrew
AWS Developers
45 AWS Floor28 News - January 2020 - Hebrew
AWS Floor28 News - January 2020 - Hebrew
AWS Developers
46 Top 10 Data Migration Best Practices - AWS Online Tech Talks
Top 10 Data Migration Best Practices - AWS Online Tech Talks
AWS Developers
47 How to Use Azure Active Directory with AWS SSO - AWS Online Tech Talks
How to Use Azure Active Directory with AWS SSO - AWS Online Tech Talks
AWS Developers
48 AWS Tips & Tricks - Amazon Redshift Advisor - Hebrew
AWS Tips & Tricks - Amazon Redshift Advisor - Hebrew
AWS Developers
49 AWS Tips & Tricks - Amazon Redshift Elastic Resize - Hebrew
AWS Tips & Tricks - Amazon Redshift Elastic Resize - Hebrew
AWS Developers
50 AWS Tips & Tricks - Amazon Redshift Spectrum - Hebrew
AWS Tips & Tricks - Amazon Redshift Spectrum - Hebrew
AWS Developers
51 AWS Tips & Tricks - Savings Plans & Cost Explorer - Hebrew
AWS Tips & Tricks - Savings Plans & Cost Explorer - Hebrew
AWS Developers
52 AWS Tips & Tricks - Amazon Redshift Concurrency Scaling - Hebrew
AWS Tips & Tricks - Amazon Redshift Concurrency Scaling - Hebrew
AWS Developers
53 AWS Tips & Tricks - Training Models with Amazon SageMaker - Hebrew
AWS Tips & Tricks - Training Models with Amazon SageMaker - Hebrew
AWS Developers
54 AWS Tips & Tricks - Auto Model Tuning with Amazon SageMaker - Hebrew
AWS Tips & Tricks - Auto Model Tuning with Amazon SageMaker - Hebrew
AWS Developers
55 AWS Tips & Tricks - Amazon Comprehend - Hebrew
AWS Tips & Tricks - Amazon Comprehend - Hebrew
AWS Developers
56 Understanding High Availability and Disaster Recovery Features for Amazon RDS for Oracle
Understanding High Availability and Disaster Recovery Features for Amazon RDS for Oracle
AWS Developers
57 Amazon Forecast  – Forecasting  - From Months to Days (Hebrew)
Amazon Forecast – Forecasting - From Months to Days (Hebrew)
AWS Developers
58 Visualize your data with Amazon QuickSight (Hebrew)
Visualize your data with Amazon QuickSight (Hebrew)
AWS Developers
59 Amazon Kendra (Hebrew)
Amazon Kendra (Hebrew)
AWS Developers
60 AWS Floor28 News - AI/ML Special Edition
AWS Floor28 News - AI/ML Special Edition
AWS Developers

This video teaches viewers how to build distributed data processing workloads with AWS Step Functions, covering topics such as horizontal scaling, extensibility, and cost benefits. Viewers will learn how to design and implement scalable systems, optimize system performance, and ensure secure data processing workflows.

Key Takeaways
  1. Build workflows through Step Functions Visual Studio
  2. Drag and drop serverless services and APIs
  3. Configure additional information such as payloads, retries, input, output handling
  4. Run steps in parallel or iterate on arrays of items
  5. Use decision logic in the flow
  6. Configure distributed map with concurrency and batch size
  7. Define sub workflow with Lambda function
  8. Send batch of files to child workflow
  9. Write output to result writer
  10. Store results in DynamoDB table
💡 Using AWS Step Functions and serverless services can help reduce infrastructure costs and improve system scalability and performance.

Related AI Lessons

Up next
This Cop Was Held Accountable For His Brutality! #police #lawyer
Hampton Law
Watch →