Introducing job queuing to scale your AWS Glue workloads

Knowledge is a key driver for what you are promoting. Knowledge quantity can enhance considerably over time, and it usually requires concurrent consumption of enormous compute sources. Knowledge integration workloads can turn into more and more concurrent as increasingly purposes demand entry to information on the similar time. In AWS, tons of of hundreds of consumers use AWS Glue, a serverless information integration service, for integrating information throughout a number of information sources at scale. AWS Glue jobs may be triggered asynchronously by way of a schedule or occasion, or began synchronously, on-demand.

Your AWS account has quotas, additionally known as limits, that are the utmost variety of service sources in your AWS account. AWS Glue quotas helps assure the provision of AWS Glue sources and prevents unintended over provisioning of sources. Nevertheless, with giant or spiky workloads, it may be difficult to handle job run concurrency or Knowledge Processing Items (DPU) to remain underneath the service quotas.
Historically, if you hit the quota of concurrent Glue job runs, your jobs fail instantly.

Immediately, we’re happy to announce the final availability of AWS Glue job queuing. Job queuing will increase scalability and improves the shopper expertise of managing AWS Glue jobs. With this new functionality, you now not must handle concurrency of your AWS Glue job runs and try retries simply to keep away from job failures attributable to excessive concurrency. You possibly can merely begin your jobs, and when the job runs are in Ready state, the AWS Glue job queuing function staggers jobs robotically each time potential. This will increase your job success charges and the expertise for giant concurrency workloads.

This submit demonstrates how job queuing helps you scale your Glue workloads and the way job queuing works.

Use instances and advantages for job queuing

The next are frequent information integration use instances the place many concurrent job runs are wanted:

  • Many alternative information sources must be learn in parallel
  • A number of giant datasets must be processed concurrently
  • Knowledge is processed in an event-driven style, and lots of occasions happen on the similar time

AWS Glue has the next service quotas per Area and account associated to concurrent utilization:

  • Max concurrent job runs per account
  • Max concurrent job runs per job
  • Max activity DPUs per account

You may as well configure most concurrency for particular person jobs.

Within the aforementioned typical use instances, if you run a job by way of the StartJobRun API or AWS Glue console, you might hit the higher restrict outlined at any of the mentioned locations. If this occurs, your job fails instantly attributable to errors like ConcurrentRunsExceededException returned by the AWS Glue API endpoint.

Job queuing helps these typical use instances with out forcing you to handle concurrency between all of your job runs. You now not must make handbook retries if you get ConcurrentRunsExceededException. Job queuing enqueues job runs if you hit the restrict and robotically reattempts job runs when sources unencumber. It simplifies your every day operation and reduces latency for the retries. It additionally means that you can scale extra with AWS Glue jobs.

Within the subsequent part, we describe how job queuing is configured.

Configure job queuing for Glue jobs

To allow job queuing on the AWS Glue Studio console, full the next steps:

  1. Open AWS Glue console.
  2. Select Jobs.
  3. Select your job.
  4. Select the Job particulars tab.
  5. For Job Run Queuing, choose Allow job runs to be queued to run later after they can’t run instantly attributable to service quotas
  6. Select Save.

Within the subsequent part, we describe how job queuing works.

How AWS Glue jobs work with job queuing

Within the present job run lifecycle, the job-level and account-level limits are checked when a job begins, and the job strikes to a Failed state when these limits are reached. With job queuing, your job run state goes right into a Ready state to be reattempted as a substitute of Failed. The Ready state implies that job run is queued for retry after the bounds have been exceeded or sources weren’t unavailable. Job queueing is one other retry mechanism along with the customer-specified max retry.

AWS Glue job queuing will enhance the success charges of job runs and scale back failures attributable to limits, however it doesn’t assure job run success. Limits and sources may nonetheless be unavailable by the point the reattempt run begins.

The next screenshot reveals that two job runs are within the Ready state:

The next limits are coated by job queuing:

  • Max concurrent job runs per account exceeded
  • Max concurrent job runs per job exceeded (which incorporates the account-level service quota in addition to the configured parameter on the job)
  • Max concurrent DPUs exceeded
  • Useful resource unavailable attributable to IP handle exhaustion in VPCs

The retry mechanism is configured to retry for a most of quarter-hour or 10 makes an attempt, whichever comes first.

Right here’s the state transition diagram for job runs when job queuing is enabled.

Concerns

Have in mind the next issues:

  • AWS Glue Flex jobs should not supported
  • With job queuing enabled, the parameter MaxRetries isn’t configurable for a similar job

Conclusion

On this submit, we described how the brand new job queuing functionality helps you scale your AWS Glue job workload. You can begin leveraging job queuing in your new jobs or present jobs as we speak. We’re wanting ahead to listening to your suggestions.


In regards to the authors

Noritaka Sekiyama is a Principal Large Knowledge Architect on the AWS Glue workforce. He works primarily based in Tokyo, Japan. He’s chargeable for constructing software program artifacts to assist prospects. In his spare time, he enjoys biking along with his street bike.

 Gyan Radhakrishnan is a Software program Growth Engineer on the AWS Glue workforce. He’s engaged on designing and constructing end-to-end options for information intensive purposes.

Simon Kern is a Software program Growth Engineer on the AWS Glue workforce. He’s captivated with serverless applied sciences, information engineering and constructing nice providers.

Dana Adylova is a Software program Growth Engineer on the AWS Glue workforce. She is engaged on constructing software program for supporting information intensive purposes. In her spare time, she enjoys knitting and studying sci-fi.

Matt Su is a Senior Product Supervisor on the AWS Glue workforce. He enjoys serving to prospects uncover insights and make higher selections utilizing their information with AWS Analytic providers. In his spare time, he enjoys snowboarding and gardening.

Leave a Reply

Your email address will not be published. Required fields are marked *