Integrating Git with Snowflake: A Guide for Collaboration
Have you considered using Git with Snowflake? You may be aware that Snowflake now offers direct support for Git integration. This powerful feature can streamline your Continuous Integration/Continuous Deployment (CI/CD) processes and enhance collaboration among teams. In this article, we’ll explore how to implement this integration and provide practical use cases.
Collaborate Using Git Integration in Snowflake
One of the exciting features currently in public preview is the Git integration, which simplifies the construction of CI/CD pipelines directly within the Snowflake environment. This integration can revolutionize how data teams manage their projects on Snowflake. Below, we provide a practical guide to setting up this collaboration and highlight several use cases.
Key Features of Snowflake’s Git Integration
The Git integration in Snowflake allows users to synchronize their remote repositories (GitHub, GitLab, Azure DevOps, and BitBucket) with their Snowflake account using a special object known as a Repository Stage. This object acts as a local clone within Snowflake, giving full access to all branches, tags, and files without leaving the Snowflake environment. Additionally, users can schedule and deploy code releases directly from Snowflake, simplifying code management processes.
How It Works
Integrating GitHub repositories with Snowflake is straightforward. Users create a special stage object called Repository Stage, which functions as a clone of a remote repository. Once created, users can navigate through branches and directories using the LIST command, view code within files using the SELECT command, or execute code stored in files using the EXECUTE IMMEDIATE FROM
command.
Note: In the current version of this feature, pushing commits from Snowflake to the source repository is not supported. Additionally, stage objects do not refresh automatically in Snowflake; users must pull code changes by executing the FETCH command.
Implementation Steps
1. Create an API Integration
First, establish a connection between your Snowflake account and your Git repository. This crucial step must be performed by an ACCOUNTADMIN.
create or replace secret github_secret
type = password
username = 'your_username'
password = 'your_personnal_access_token';
create or replace api integration git_integration
api_provider = git_https_api
api_allowed_prefixes = ('https://github.com/kriswal111')
allowed_authentication_secrets = (github_secret)
enabled = true;
2. Create a Git Repository
Next, create a Git repository that will use your integration object.
create or replace git repository tutorial
api_integration = git_integration
git_credentials = github_secret
origin = 'https://github.com/kriswal111/tutorial.git';
3. Use LIST, SELECT, and EXECUTE IMMEDIATE Commands
To display files in the repository:
-- List files
ls @tutorial/branches/main;
To view the code of an SQL script contained in the Git repository:
-- Show code in file
select $1 from @tutorial/branches/main/models/src/src_hosts.sql;
To execute an SQL script contained in the Git repository:
-- Creating our source models, materialized as views in our brz schema.
execute immediate from @tutorial/branches/main/models/src/src_hosts.sql;
Use Cases
1. Create a Table by Executing an SQL Script
Assume you have raw data stored in an AWS S3 bucket and want to execute a transformation pipeline developed by a collaborator and hosted on GitHub. To run an SQL file from your Git repository, first refresh your Git repository with the command:
-- Fetch git repository updates.
alter git repository airbnb.git.tutorial fetch;
Then execute your SQL scripts with the command:
-- Creating our source models, materialized as views in our brz schema.
execute immediate from @tutorial/branches/main/models/src/src_hosts.sql;
2. Create a Snowpark Procedure
Using Snowpark, you can create a procedure to filter values in the IS_FULL_MOON
field, eliminating any rows where the value differs from “full moon”. Import your Python script to define the procedure:
CREATE OR REPLACE PROCEDURE filter_by_is_full_moon(tableName VARCHAR, is_full_moon VARCHAR)
RETURNS TABLE(listing_id BIGINT, review_date datetime, reviewer_name VARCHAR,
review_text VARCHAR, review_sentiment VARCHAR, is_full_moon VARCHAR)
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
PACKAGES = ('snowflake-snowpark-python')
IMPORTS = ('@tutorial/branches/main/python-handlers/filter.py')
HANDLER = 'filter.filter_by_is_full_moon';
Then execute the procedure:
CALL filter_by_is_full_moon('AIRBNB.GLD.FULL_MOON_REVIEWS', 'full moon');
3. Create a Streamlit Application
To create your Streamlit application, import your Python script to set up the app:
create or replace streamlit streamlit_airbnb_application
root_location = @airbnb.git.tutorial/branches/main/streamlit_app
main_file = '/app.py'
query_warehouse = 'developer';
You can then navigate to your application through the Snowflake user interface:
Project > Streamlit > streamlit_airbnb_application
Example code for creating pie charts for ‘full moon’ and ‘not full moon’:
# Create pie chart for 'full moon'
chart_full_moon = alt.Chart(df_full_moon_grouped).mark_arc().encode(
theta=alt.Theta(field='count', type='quantitative'),
color=alt.Color(field='REVIEW_SENTIMENT', type='nominal',
scale=alt.Scale(domain=['positive', 'neutral', 'negative'],
range=['lawngreen', 'lightgrey', 'crimson'])),
tooltip=['REVIEW_SENTIMENT', 'count']
).properties(
title='Customer Sentiments - Full Moon Days'
)
# Create pie chart for 'not full moon'
chart_not_full_moon = alt.Chart(df_not_full_moon_grouped).mark_arc().encode(
theta=alt.Theta(field='count', type='quantitative'),
color=alt.Color(field='REVIEW_SENTIMENT', type='nominal',
scale=alt.Scale(domain=['positive', 'neutral', 'negative'],
range=['lawngreen', 'lightgrey', 'crimson'])),
tooltip=['REVIEW_SENTIMENT', 'count']
).properties(
title='Customer Sentiments - Non Full Moon Days'
)
# Display charts side by side in Streamlit
st.altair_chart(chart_full_moon | chart_not_full_moon, use_container_width=True)
Conclusion
Integrating Git with Snowflake enables teams to manage their SQL scripts, Snowpark procedures, and Streamlit applications in a centralized and collaborative manner. This approach not only enhances efficiency but also improves version control and code quality. Embrace this integration to streamline your data management workflows and elevate your team’s collaboration efforts!