Speed up knowledge integration with Salesforce and AWS utilizing AWS Glue -

The speedy adoption of software program as a service (SaaS) options has led to knowledge silos throughout varied platforms, presenting challenges in consolidating insights from numerous sources. Efficient knowledge analytics depends on seamlessly integrating knowledge from disparate programs via figuring out, gathering, cleaning, and mixing related knowledge right into a unified format. AWS Glue, a serverless knowledge integration service, has simplified this course of by providing scalable, environment friendly, and cost-effective options for integrating knowledge from varied sources. With AWS Glue, you’ll be able to streamline knowledge integration, scale back knowledge silos and complexities, and achieve agility in managing knowledge pipelines, finally unlocking the true potential of your knowledge belongings for analytics, data-driven decision-making, and innovation.

This put up explores the brand new Salesforce connector for AWS Glue and demonstrates how you can construct a contemporary extract, rework, and cargo (ETL) pipeline with AWS Glue ETL scripts.

Introducing the Salesforce connector for AWS Glue

To fulfill the calls for of numerous knowledge integration use circumstances, AWS Glue now helps SaaS connectivity for Salesforce. This allows customers to rapidly preview and switch their buyer relationship administration (CRM) knowledge, fetch the schema dynamically on request, and question the info. With the AWS Glue Salesforce connector, you’ll be able to ingest and rework your CRM knowledge to any of the AWS Glue supported locations, together with Amazon Easy Storage Service (Amazon S3), in your most well-liked format, together with Apache Iceberg, Apache Hudi, and Linux Basis Delta Lake; knowledge warehouses reminiscent of Amazon Redshift and Snowflake; and many extra. Reverse ETL use circumstances are additionally supported, permitting you to jot down knowledge again to Salesforce.

The next are key advantages of the Salesforce connector for AWS Glue:

You should use AWS Glue native capabilities
It’s nicely examined with AWS Glue capabilities and is manufacturing prepared for any knowledge integration workload
It really works seamlessly on high of AWS Glue and Apache Spark in a distributed trend for environment friendly knowledge processing

Answer overview

For our use case, we need to retrieve the complete load of a Salesforce account object in an information lake on Amazon S3 and seize the incremental adjustments. This resolution additionally permits you to replace sure fields of the account object within the knowledge lake and push it again to Salesforce. To realize this, you create two ETL jobs utilizing AWS Glue with the Salesforce connector, and create a transactional knowledge lake on Amazon S3 utilizing Apache Iceberg.

Within the first job, you configure AWS Glue to ingest the account object from Salesforce and reserve it right into a transactional knowledge lake on Amazon S3 in Apache Iceberg format. You then replace the account object knowledge that’s extracted from the primary job within the transactional knowledge lake in Amazon S3. Lastly, you run the second job to ship that change again to Salesforce.

Conditions

Full the next prerequisite steps:

Create an S3 bucket to retailer the outcomes.
Join a Salesforce account, when you don’t have already got one.

Create an AWS Identification and Entry Administration (IAM) position for the AWS Glue ETL job to make use of. The position should grant entry to all assets utilized by the job, together with Amazon S3 and AWS Secrets and techniques Supervisor. For this put up, we title the position AWSGlueServiceRole-SalesforceConnectorJob. Use the next insurance policies:

AWS managed insurance policies:

Inline coverage:

{
       "Model": "2012-10-17",
       "Assertion": [
              {
                      "Sid": "VisualEditor0",
                      "Effect": "Allow",
                      "Action": [
                             "s3:PutObject",
                             "s3:GetObjectAcl",
                             "s3:GetObject",
                             "s3:GetObjectAttributes",
                             "s3:ListBucket",
                             "s3:DeleteObject",
                             "s3:PutObjectAcl"],
                      "Useful resource": [
                             "arn:aws:s3:::<S3-BUCKET-NAME>",
                             "arn:aws:s3:::<S3-BUCKET-NAME>/*"
                      ]
              }
       ]
}

Create the AWS Glue connection for Salesforce:
1. The Salesforce connector helps two OAuth2 grant sorts: JWT_BEARER and AUTHORIZATION_CODE. For this put up, we use the AUTHORIZATION_CODE grant sort.
2. On the Secrets and techniques Supervisor console, create a brand new secret. Add two keys, ACCESS_TOKEN and REFRESH_TOKEN, and maintain their values clean. These can be populated after you enter your Salesforce credentials.
3. Configure the Salesforce connection in AWS Glue. Use AWSGlueServiceRole-SalesforceConnectorJob whereas creating the Salesforce connection. For this put up, we title the connection Salesforce_Connection.
4. Within the Authorization part, select Authorization Code and the key you created within the earlier step.
5. Present your Salesforce credentials when prompted. The ACCESS_TOKEN and REFRESH_TOKEN keys can be populated after you enter your Salesforce credentials.
Create an AWS Glue database. For this put up, we title it glue_etl_salesforce_db.

Create an ETL job to ingest the account object from Salesforce

Full the next steps to create a brand new ETL job in AWS Glue Studio to switch knowledge from Salesforce to Amazon S3:

On the AWS Glue console, create a brand new job (with the Script editor possibility). For this put up, we title the job Salesforce_to_S3_Account_Ingestion.
On the Script tab, enter the Salesforce_to_S3_Account_Ingestion script.

Make it possible for the title, which you used to create the Salesforce connection, is handed because the connectionName parameter worth within the script, as proven within the following code instance:

# Script generated for node Salesforce

input_Salesforce_Dyf = glueContext.create_dynamic_frame.from_options(connection_type="salesforce", connection_options={"entityName": "Account", "apiVersion": "v60.0", "connectionName": "Salesforce_Connection"}, transformation_ctx="inputSalesforceDyf")

The script fetches data from the Salesforce account object. Then it checks if the account desk exists within the transactional knowledge lake. If the desk doesn’t exist, it creates a brand new desk and inserts the data. If the desk exists, it performs an upsert operation.

On the Job particulars tab, for IAM position, select AWSGlueServiceRole-SalesforceConnectorJob.
Beneath Superior properties, for Further community connection, select the Salesforce connection.
Arrange the job parameters:
1. --conf: spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions --conf spark.sql.catalog.glue_catalog=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog --conf spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO --conf spark.sql.catalog.glue_catalog.warehouse=file:///tmp/spark-warehouse
2. --datalake-formats: iceberg
3. --db_name: glue_etl_salesforce_db
4. --s3_bucket_name: your S3 bucket
5. --table_name: account
Save the job and run it.

Relying on the scale of the info in your account object in Salesforce, the job will take a couple of minutes to finish. After a profitable job run, a brand new desk known as account is created and populated with Salesforce account info.

You should use Amazon Athena to question the info:

SELECT id, title, sort, active__c, upsellopportunity__c, lastmodifieddate

FROM "glue_etl_salesforce_db"."account"

Validate transactional capabilities

You possibly can validate the transactional capabilities supported by Apache Iceberg. For testing, attempt three operations: insert, replace, and delete:

Create a brand new account object in Salesforce, rerun the AWS Glue job, then run the question in Athena to validate the brand new account is created.
Delete an account in Salesforce, rerun the AWS Glue job, and validate the deletion utilizing Athena.
Replace an account in Salesforce, rerun the AWS Glue job, and validate the replace operation utilizing Athena.

Create an ETL job to ship updates again to Salesforce

AWS Glue additionally permits you to write knowledge again to Salesforce. Full the next steps to create an ETL job in AWS Glue to get updates from the transactional knowledge lake and write them to Salesforce. On this state of affairs, you replace an account file and push it again to Salesforce.

On the AWS Glue console, create a brand new job (with the Script editor possibility). For this put up, we title the job S3_to_Salesforce_Account_Writeback.
On the Script tab, enter the S3_to_Salesforce_Account_Writeback script.

Make it possible for the title, which you used to create the Salesforce connection, is handed because the connectionName parameter worth within the script:

# Script generated for node Salesforce

Salesforce_node = glueContext.write_dynamic_frame.from_options(body=SelectFields_dyf, connection_type="salesforce", connection_options={"apiVersion": "v60.0", "connectionName": "Salesforce_Connection", "entityName": "Account", "writeOperation": "UPDATE", "idFieldNames": "Id"}, transformation_ctx="Salesforce_node")

On the Job particulars tab, for IAM position, select AWSGlueServiceRole-SalesforceConnectorJob.
Configure the job parameters:
1. --conf:
  spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions --conf spark.sql.catalog.glue_catalog=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog --conf spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO --conf spark.sql.catalog.glue_catalog.warehouse=file:///tmp/spark-warehouse
2. --datalake-formats: iceberg
3. --db_name: glue_etl_salesforce_db
4. --table_name: account

Run the replace question in Athena to alter the worth of UpsellOpportunity__c for a Salesforce account to “Sure”:

replace “glue_etl_salesforce_db”.”account”
set upsellopportunity__c = ‘Sure’
the place title = ‘<SF Account>’

Run the S3_to_Salesforce_Account_Writeback AWS Glue job.

Relying on the scale of the info in your account object in Salesforce, the job will take a couple of minutes to finish.

Validate the thing in Salesforce. The worth of UpsellOpportunity ought to change.

You’ve gotten now efficiently validated the Salesforce connector.

Issues

You possibly can arrange AWS Glue job triggers to run the ETL jobs on a schedule, in order that the info is usually synchronized between Salesforce and Amazon S3. You can too combine the ETL jobs with different AWS companies, reminiscent of AWS Step Capabilities, Amazon Managed Workflows for Apache Airflow (Amazon MWAA), AWS Lambda, or Amazon EventBridge, to create a extra superior knowledge processing pipeline.

By default, the Salesforce connector doesn’t import deleted data from Salesforce objects. Nevertheless, you’ll be able to set the IMPORT_DELETED_RECORDS choice to “true” to import all data, together with the deleted ones. Consult with Salesforce connection choices for various Salesforce connection choices.

# Script generated for node Salesforce

input_Salesforce_Dyf = glueContext.create_dynamic_frame.from_options(connection_type = "salesforce", connection_options = {"entityName": "Account", "apiVersion": "v60.0", "connectionName": " Salesforce_Connection", "IMPORT_DELETED_RECORDS": "true"},  transformation_ctx="inputSalesforceDyf")

Clear up

To keep away from incurring prices, clear up the assets used on this put up out of your AWS account, together with the AWS Glue jobs, Salesforce connection, Secrets and techniques Supervisor secret, IAM position, and S3 bucket.

Conclusion

The AWS Glue connector for Salesforce simplifies the analytics pipeline, reduces time to insights, and facilitates data-driven decision-making. It empowers organizations to streamline knowledge integration and analytics. The serverless nature of AWS Glue means there isn’t any infrastructure administration, and also you pay just for the assets consumed whereas your jobs are operating. As organizations more and more depend on knowledge for decision-making, this Salesforce connector supplies an environment friendly, cost-effective, and agile resolution to swiftly meet knowledge analytics wants.

To be taught extra in regards to the AWS Glue connector for Salesforce, check with Connecting to Salesforce in AWS Glue Studio. On this consumer information, we stroll via the complete course of, from organising the connection to operating the info switch stream. For extra info on AWS Glue, go to AWS Glue.

In regards to the authors

Ramakant Joshi is an AWS Options Architect, specializing within the analytics and serverless area. He has a background in software program improvement and hybrid architectures, and is keen about serving to prospects modernize their cloud structure.

Kamen Sharlandjiev is a Sr. Massive Information and ETL Options Architect, Amazon MWAA and AWS Glue ETL skilled. He’s on a mission to make life simpler for purchasers who’re dealing with complicated knowledge integration and orchestration challenges. His secret weapon? Totally managed AWS companies that may get the job carried out with minimal effort. Comply with Kamen on LinkedIn to maintain updated with the newest Amazon MWAA and AWS Glue options and information!

Debaprasun Chakraborty is an AWS Options Architect, specializing within the analytics area. He has round 20 years of software program improvement and structure expertise. He’s keen about serving to prospects in cloud adoption, migration and technique.

Speed up knowledge integration with Salesforce and AWS utilizing AWS Glue