Tags: python sql lambda function aws redshift
Amazon Redshift is fully managed warehouse service provided by AWS. The Amazon Redshift engine is a SQL-compliant, massively-parallel, query processing and database management system designed to support analytics workload.
An Amazon Redshift cluster is comprised of a leader node and one or more compute nodes. We also support a single-node design where leader and compute work is shared on a single node. The leader node accepts connections from client programs, parses requests, generates & compiles query plans for execution on the compute nodes, performs final aggregation of results when required, and coordinates serialization and state of transactions. The compute node(s) perform the heavy lifting inherent in both query processing and data manipulation against local data.[^redshift]
AWS GUI provides Query Editor to execute sql command. However it’s better to leverage APIs by using AWS Lambda function.
Lets start with creating a flat file which would contain all our SQL commands that we want to execute in the Redshift Cluster. We will upload the flat file to the s3 bucket. The content for the file is
Materialized views are useful in warehouses to increase the speed of queries on very large databases. They are a database object that stores the results of a query.
Lets start by creating a lambda function handler
which would be invoked when the lambda function run.
Its not a good practice to hard code cluster information or password in the code. We will this information from the AWS Secret Manger
. I’ll assume that secret is already created. We will write a method to retrieve those values using secret_name
passed as an Environment variable. Other environment variables are
region e.g us-east-1
sql_bucket is the name of s3 bucket
sql_prefix is the prefix for the object stored on s3
Also we need to read the sql commands stored in s3 location.
Lastly we will write our method exec_redshift that will do the following
Retrieve cluster information from AWS Secret Manager
Read content of the sql file stored on s3
Execute sql commands sequentially in redshift cluster