Question AWS Glue Knowledge Catalog views utilizing Amazon Athena and Amazon Redshift

At this time’s information lakes are increasing throughout traces of enterprise working in numerous landscapes and utilizing varied engines to course of and analyze information. Historically, SQL views have been used to outline and share filtered information units that meet the necessities of those traces of enterprise for simpler consumption. Nevertheless, with prospects utilizing completely different processing engines of their information lakes, every with its personal model of views, they’re creating separate views per engine, including to upkeep overhead. Moreover, accessing these engine-defined views requires prospects to have elevated entry ranges, granting them entry to each the SQL view itself and the underlying databases and tables referenced within the view’s SQL definition. This strategy impedes granting constant entry to a subset of knowledge utilizing SQL views, hampering productiveness and rising administration overhead.

Glue Knowledge Catalog views is a brand new function of the AWS Glue Knowledge Catalog that prospects can use to create a typical view schema and single metadata container that may maintain view-definitions in several dialects that can be utilized throughout engines akin to Amazon Redshift and Amazon Athena. By defining a single view object that may be queried from a number of engines, Knowledge Catalog views allow prospects to handle permissions on a single view schema persistently utilizing AWS Lake Formation. A view could be shared throughout completely different AWS accounts as properly. For querying these views, customers want entry to the view object solely and don’t want entry to the referenced databases and tables within the view definition. Additional, all requests towards the Knowledge Catalog views, akin to requests for entry credentials on underlying assets, might be logged as AWS CloudTrail administration occasions for auditing functions.

On this weblog submit, we’ll present how one can outline and question a Knowledge Catalog view on high of open supply desk codecs akin to Iceberg throughout Athena and Amazon Redshift. We may also present you the configurations wanted to limit entry to the underlying database and tables. To comply with alongside, we’ve got supplied an AWS CloudFormation template.

Use case

An Instance Corp has two enterprise items: Gross sales and Advertising and marketing. The Gross sales enterprise unit owns buyer datasets, together with buyer particulars and buyer addresses. The Advertising and marketing enterprise unit needs to conduct a focused advertising marketing campaign based mostly on a most popular buyer checklist and has requested information from the Gross sales enterprise unit. The Gross sales enterprise unit’s information steward (AWS Identification and Entry Administration (IAM) function: product_owner_role), who owns the shopper and buyer deal with datasets, plans to create and share non-sensitive particulars of most popular prospects with the Advertising and marketing unit’s information analyst (business_analyst_role) for his or her marketing campaign use case. The Advertising and marketing staff analyst plans to make use of Athena for interactive evaluation for the advertising marketing campaign and later, use Amazon Redshift to generate the marketing campaign report.

On this resolution, we show how you should use Knowledge Catalog views to share a subset of buyer particulars saved in Iceberg format filtered by the most popular flag. This view could be seamlessly queried utilizing Athena and Amazon Redshift Spectrum, with information entry centrally managed by means of AWS Lake Formation.

Conditions

For the answer on this weblog submit, you want the next:

  • An AWS account. In case you don’t have an account, you’ll be able to create one.
  • You’ve created a knowledge lake administrator Be aware of this function’s Amazon Useful resource Title (ARN) to make use of later. For simplicity’s sake, this submit will use IAM Admin function because the Datalake Admin and Redshift Admin however ensure that in your atmosphere you comply with the precept of least privilege.
  • Beneath Knowledge Catalog settings, have the default settings in place. Each of the next choices ought to be chosen:
    • Use solely IAM entry management for brand spanking new databases
    • Use solely IAM entry management for brand spanking new tables in new databases

Get began

To comply with the steps on this submit, register to the AWS Administration Console because the IAM Admin and deploy the next CloudFormation stack to create the required assets:

  1. Select to deploy the CloudFormation template.
    Launch Cloudformation Stack
  2. Present an IAM function that you’ve already configured as a Lake Formation administrator.
  3. Full the steps to deploy the template. Go away all settings as default.
  4. Choose I acknowledge that AWS CloudFormation may create IAM assets, then select Submit.

The CloudFormation stack creates the next assets. Make a remark of those values—you’ll use them later.

  • Amazon Easy Storage Service (Amazon S3) buckets that retailer the desk information and Athena question outcome
  • IAM roles: product_owner_role and business_analyst_role
  • Digital personal cloud (VPC) with the required community configuration, which might be used for compute
  • AWS Glue database: customerdb, which accommodates the buyer and customer_address tables in Iceberg format
  • Glue database: customerviewdb, which can comprise the Knowledge Catalog views
  • Redshift Serverless cluster

The CloudFormation stack additionally registers the info lake bucket with Lake Formation in Lake Formation entry mode. You may confirm this by navigating to the Lake Formation console and choosing Knowledge lake areas below Administration.

Resolution overview

The next determine exhibits the structure of the answer.

Question AWS Glue Knowledge Catalog views utilizing Amazon Athena and Amazon Redshift

As a requirement to create a Knowledge Catalog view, the info lake S3 areas for the tables (buyer and customer_address) must be registered with Lake Formation and granted full permission to product_owner_role.

The Gross sales product proprietor: product_owner_role can also be granted permission to create views below customerviewdb utilizing Lake Formation.

After the Glue Knowledge Catalog View (customer_view) is created on the shopper dataset with the required subset of buyer info, the view is shared with the Advertising and marketing analyst (business_analyst_role), who can then question the popular buyer’s non delicate info as outlined by the view with out gaining access to underlying buyer tables.

  1. Allow Lake Formation permission mode on the customerdbdatabase and its tables.
  2. Grant the database (customerdb) and tables (buyer and customer_address) full permission to product_owner_role utilizing Lake Formation.
  3. Allow Lake Formation permission mode on the database (customerviewdb) the place the a number of dialect Knowledge Catalog view might be created.
  4. Grant full database permission to product_owner_role utilizing Lake Formation.
  5. Create Knowledge Catalog views as product_owner_role utilizing Athena and Amazon Redshift so as to add engine dialects.
  6. Share the database and Knowledge Catalog views learn permission to business_analyst_role utilizing Lake Formation.
  7. Question the Knowledge Catalog view utilizing business_analyst_role from Athena and Amazon Redshift engine.

With the conditions in place and an understanding of the general resolution, you’re able to arrange the answer.

Arrange Lake Formation permissions for product_owner_role

Check in to the LakeFormation console as a knowledge lake administrator. For the examples on this submit, we use the IAM Admin function, Admin as the info lake admin.

Allow Lake Formation permission mode on customerdb and its tables

  1. Within the Lake Formation console, below Knowledge Catalog within the navigation pane, select Databases.
  2. Select customerdb and select Edit.
  3. Beneath Default permissions for newly created tables, clear Use solely IAM entry management for brand spanking new tables on this database.
  4. Select Save.
  5. Beneath Knowledge Catalog within the navigation pane, select Databases.
  6. Choose customerdb and below Motion, choose View
  7. Choose the IAMAllowedPrincipal from the checklist and select Revoke.
  8. Repeat the identical for all tables below the database customerdb.

Grant the product_owner_role entry to customerdb and its tables

Grant product_owner_role all permissions to the customerdb database.

  1. On the Lake Formation console, below Permissions within the navigation pane, select Knowledge lake permissions.
  2. Select Grant.
  3. Beneath Principals, choose IAM customers and roles.
  4. Choose product_owner_role.
  5. Beneath LF-Tags or catalog assets, choose Named Knowledge Catalog assetsand choose customerdb for Databases.
  6. Choose SUPER for Database permissions.
  7. Select Grant to use the permissions.

Grant product_owner_role all permissions to the buyer and customer_address tables.

  1. On the Lake Formation console, below Permissions within the navigation pane, select Knowledge lake permission
  2. Select Grant.
  3. Beneath Principals, choose IAM customers and roles.
  4. Select the product_owner_role.
  5. Beneath LF-Tags or catalog assets, select Named Knowledge Catalog assetsand choose customerdb for databases and buyer and customer_address for tables.
  6. Select SUPER for Desk permissions.
  7. Select Grant to use the permissions.

Allow Lake Formation permission mode

Allow Lake Formation permission mode on the database the place the Knowledge Catalog view might be created.

  1. Within the Lake Formation console, below Knowledge Catalog within the navigation pane, select Databases.
  2. Choose customerviewdb and select Edit.
  3. Beneath Default permissions for newly created tables, clear Use solely IAM entry management for brand spanking new tables on this database.
  4. Select Save.
  5. Select Databases from Knowledge Catalog within the navigation pane.
  6. Choose customerviewdb and below Motion choose View.
  7. Choose the IAMAllowedPrincipal from the checklist and select Revoke.

Grant the product_owner_role entry to customerviewdb utilizing Lake Formation mode

Grant product_owner_role all permissions to the customerviewdb database.

  1. On the Lake Formation console, below Permissions within the navigation pane, select Knowledge lake permissions.
  2. Select Grant
  3. Beneath Principals, choose IAM customers and roles.
  4. Select product_owner_role
  5. Beneath LF-Tags or catalog assets, select Named Knowledge Catalog assetsand choose customerviewdb for Databases.
  6. Choose SUPER for Database permissions.
  7. Select Grant to use the permissions.

Create Glue Knowledge Catalog views as product_owner_role

Now that you’ve Lake Formation permissions set on the databases and tables, you’ll use the product_owner_role to create Knowledge Catalog views utilizing Athena and Amazon Redshift. This may also add the engine dialects for Athena and Amazon Redshift.

Add the Athena dialect

  1. Within the AWS console, both register utilizing product_owner_role or, in case you’re already signed in as an Admin, swap to product_owner_role.
  2. Launch question editor and choose the workgroup athena_glueview from the higher proper aspect of the console. You’ll create a view that mixes information from the buyer and customer_address tables, particularly for patrons who’re marked as most popular. The tables embody private details about the shopper, akin to their identify, date of start, nation of start, and e-mail deal with.
  3. Run the next within the question editor to create the customer_view view below the customerviewdb database.
    create protected multi dialect view customerviewdb.customer_view
    safety definer
    as
    choose c_customer_id, c_first_name, c_last_name, c_birth_day, c_birth_month,
    c_birth_year, c_birth_country, c_email_address,
    ca_country,ca_zip
    from customerdb.buyer, customerdb.customer_address
    the place c_current_addr_sk = ca_address_sk and c_preferred_cust_flag='Y';

  4. Run the next question to preview the view you simply created.
    choose * from customerviewdb.customer_view restrict 10;

  5. Run following question to search out the highest three start years with the very best buyer counts from the customer_view view and show the start yr and corresponding buyer rely for every.
    choose c_birth_year,
    	rely(*) as rely
    from "customerviewdb"."customer_view"
    group by c_birth_year
    order by rely desc
    restrict 3

Output:

  1. To validate that the view is created, go to the navigation pane and select Views below Knowledge catalog on the Lake Formation console
  2. Choose customer_view and go to the SQL definition part to validate the Athena engine dialect.

While you created the view in Athena, it added the dialect for Athena engine. Subsequent, to help the use case described earlier, the advertising marketing campaign report must be generated utilizing Amazon Redshift. For this, it’s essential add the Redshift dialect to the view so you’ll be able to question it utilizing Amazon Redshift as an engine.

Add the Amazon Redshift dialect

  1. Check in to the AWS console as an Admin, navigate to Amazon Redshift console and register to Redshift Qurey editor v2.
  2. Hook up with the Serverless cluster as Admin (federated consumer) and run the next statements to grant permission on the Glue automount database (awsdatacatalog) entry to product_owner_role and business_analyst_role.
    create consumer  "IAMR:product_owner_role" password disable;
    create consumer  "IAMR:business_analyst_role" password disable;
    
    grant utilization on database awsdatacatalog to "IAMR:product_owner_role";
    grant utilization on database awsdatacatalog to "IAMR:business_analyst_role";

  3. Check in to the Amazon Redshift console as product_owner_role and register to the QEv2 editor utilizing product_owner_role (as a federated consumer). You’ll use the next ALTER VIEW question so as to add the Amazon Redshift engine dialect to the view created beforehand utilizing Athena.
  4. Run the next within the question editor:
    alter exterior view awsdatacatalog.customerviewdb.customer_view AS
    choose c_customer_id, c_first_name, c_last_name, c_birth_day, c_birth_month,
    c_birth_year, c_birth_country, c_email_address,
    ca_country, ca_zip
    from awsdatacatalog.customerdb.buyer, awsdatacatalog.customerdb.customer_address
    the place c_current_addr_sk = ca_address_sk and c_preferred_cust_flag='Y'

  5. Run following question to preview the view.
    choose * from awsdatacatalog.customerviewdb.customer_view restrict 10;

  6. Run the identical question that you just ran in Athena to search out the highest three start years with the very best buyer counts from the customer_view view and show the start yr and corresponding buyer rely for every.
    choose c_birth_year,
    	rely(*) as rely
    from awsdatacatalog.customerviewdb.customer_view
    group by c_birth_year
    order by rely desc
    restrict 3

By querying the identical view and working the identical question in Redshift, you obtained the identical outcome set as you noticed in Athena.

Validate the dialects added

Now that you’ve added all of the dialects, navigate to the Lake Formation console to see how the dialects are saved.

  1. On the Lake Formation console, below Knowledge catalog within the navigation pane, select Views.
  2. Choose customer_view and go to SQL definitions part to validate that the Athena and Amazon Redshift dialects have been added.

Alternatively, you may as well create the view utilizing Redshift so as to add Redshift dialect and replace in Athena so as to add the Athena dialect.

Subsequent, you will note how the business_analyst_role can question the view with out gaining access to question the underlying tables and the Amazon S3 location the place the info exists.

Arrange Lake Formation permissions for business_analyst_role

Check in to the Lake Formation console because the DataLake administrator (For this weblog, we use the IAM Admin function, Admin, because the Datalake admin).

Grant business_analyst_role entry to the database and view utilizing Lake Formation

  1. On the Lake Formation console, below Permissions within the navigation pane, select Knowledge lake permissions.
  2. Select Grant
  3. Beneath Principals, choose IAM customers and roles.
  4. Choose business_analyst_role.
  5. Beneath LF-Tags or catalog assets, choose Named Knowledge Catalog assets and choose customerviewdb for Databases.
  6. Choose DESCRIBE for Database permissions.
  7. Select Grant to use the permissions.

Grant the business_analyst_role SELECT and DESCRIBE permissions to customer_view

  1. On the Lake Formation console, below Permissions within the navigation pane, select Knowledge lake permission.
  2. Select Grant.
  3. Beneath Principals, choose IAM customers and roles.
  4. Choose  business_analyst_role.
  5. Beneath LF-Tags or catalog assets, select Named Knowledge Catalog assets and choose customerviewdb for Databases and customer_view for Views.
  6. Select SELECT and DESCRIBE for View permissions.
  7. Select Grant to use the permissions.

Question the Knowledge Catalog views utilizing business_analyst_role

Now that you’ve arrange the answer, take a look at it by querying the info utilizing Athena and Amazon Redshift.

Utilizing Athena

  1. Check in to the Athena console as business_analyst_role.
  2. Launch question editor and choose the workgroup athena_glueview. Choose database customerviewdb from the dropdown on the left and it is best to be capable of see the view created beforehand utilizing product_owner_role. Additionally, discover that no tables are proven as a result of business_analyst_role doesn’t have entry granted for the bottom tables.
  3. Run the next within the question editor to question the view question.
    choose * from customerviewdb.customer_view restrict 10

As you’ll be able to see within the previous determine, business_analyst_role can question the view with out gaining access to the underlying tables.

  1. Subsequent, question the desk buyer on which the view is created. It ought to give an error.
    SELECT * FROM customerdb.buyer restrict 10

Utilizing Amazon Redshift

  1. Navigate to the Amazon Redshift console and register to Amazon Redshift question editor v2. Hook up with the Serverless cluster as business_analyst_role (federated consumer) and run the next within the question editor to question the view.
  2. Choose the customerviewdb on the left aspect of the console. It’s best to see the view customer_view. Additionally, observe that you just can not see the tables from which the view is created. Run the next within the question editor to question the view.
    SELECT * FROM "awsdatacatalog"."customerviewdb"."customer_view";

The enterprise analyst consumer can run the evaluation on the Knowledge Catalog view without having entry to the underlying databases and tables on from which the view is created.

Glue Knowledge Catalog views provide options for varied information entry and governance eventualities. Organizations can use this function to outline granular entry controls on delicate information—akin to personally identifiable info (PII) or monetary information—to assist them adjust to information privateness rules. Moreover, you should use Knowledge Catalog views to implement row-level, column-level, and even cell-level filtering based mostly on the precise privileges assigned to completely different consumer roles or personas, permitting for fine-grained information entry management. Moreover, Knowledge Catalog views can be utilized in information mesh patterns, enabling safe, domain-specific information sharing throughout the group for self-service analytics, whereas permitting customers to make use of most popular analytics engines like Athena or Amazon Redshift on the identical views for governance and constant information entry.

Clear up

To keep away from incurring future fees, delete the CloudFormation stack. For directions, see Deleting a stack on the AWS CloudFormation console. Be sure that the next assets created for this weblog submit are eliminated:

  • S3 buckets
  • IAM roles
  • VPC with community elements
  • Knowledge Catalog database, tables and views
  • Amazon Redshift Serverless cluster
  • Athena workgroup

Conclusion

On this submit, we demonstrated the way to use AWS Glue Knowledge Catalog views throughout a number of engines akin to Athena and Redshift. You may share Knowledge Catalog views in order that completely different personas can question them. For extra details about this new function, see Utilizing AWS Glue Knowledge Catalog views.


In regards to the Authors

Pathik Shah is a Sr. Analytics Architect on Amazon Athena. He joined AWS in 2015 and has been focusing within the massive information analytics house since then, serving to prospects construct scalable and sturdy options utilizing AWS analytics companies.

Srividya Parthasarathy is a Senior Massive Knowledge Architect on the AWS Lake Formation staff. She enjoys constructing information mesh options and sharing them with the group.

Paul Villena is a Senior Analytics Options Architect in AWS with experience in constructing fashionable information and analytics options to drive enterprise worth. He works with prospects to assist them harness the ability of the cloud. His areas of pursuits are infrastructure as code, serverless applied sciences, and coding in Python.

Derek Liu is a Senior Options Architect based mostly out of Vancouver, BC. He enjoys serving to prospects resolve massive information challenges by means of AWS analytic companies.

Leave a Reply

Your email address will not be published. Required fields are marked *