By building with open source, developers can innovate faster with powerful services. At Snowflake, we are grateful for the community’s efforts, which propelled the software and data revolution. Our engineers regularly contribute to open source projects to accelerate the innovation that our customers and the industry benefit from.
Key Projects We Maintain
Streamlit is a Python library that makes it easy to create and share beautiful, custom web apps for machine learning and data science.
Snowpark for Python Client API
Snowpark for Python, client side library provides dataframe style APIs for querying and processing data in Snowflake. It lets you build and deploy data pipelines, ML workflows and applications from any IDE that can run a Python kernel
Terraform Provider: Snowflake
Terraform is an infrastructure-as-code tool that lets you build, change, and version resources. Our partners at the Chan Zuckerberg Initiative developed a Terraform provider for Snowflake that we now maintain.
SansShell is primarily a gRPC server with a variety of options for localhost debugging and management. Its goal is to replace the need to use an interactive shell for emergency debugging and recovery with a much safer interface.
schemachange is a Python-based tool to manage Snowflake objects. It follows an imperative-style approach to database change management (DCM).
Key Projects We Support
Snowflake supports several open-source projects through project committee roles, sponsorships, contributions, and community engagement.
Anaconda is a distribution of the Python and R programming languages for scientific computing such as data science, machine learning applications, large-scale data processing, and predictive analytics that aims to simplify package management and deployment.
Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and ruRunners for executing them on distributed processing backends.
Apache Iceberg is a table format for storing large, slow-moving tabular data. It is designed to improve on the standard table layout built into Hive, Trino, and Spark.
Apache Flume is a distributed service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows.
dbt is a command-line tool that enables analytics engineers to transform data in their warehouses by writing select statements. Dbt turns those select statements into tables and views and transforms data without extracting or loading it.
Feast is a feature store that speeds up operationalizing analytic data for model training and online inference. It simplifies sharing and reuse of features, and makes it easier to serve features to online systems.
FoundationDB is a distributed database designed to handle large volumes of structured data across clusters of commodity servers. It organizes data as an ordered key-value store and employs ACID transactions for all operations.
Sequelize is an easy-to-use and promise-based Node.js ORM tool for Postgres, MySQL, MariaDB, SQLite, DB2, Microsoft SQL Server, Snowflake, and IBM. It features solid transaction support, relations, eager and lazy loading, read replication and more.
Snowflake Labs hosts projects that were developed by our community, customers, and people at Snowflake. Snowflake’s does not officially maintain these projects. We invite everyone to contribute code, report bugs, and help improve the documentation.
MEET SOME OF OUR CONTRIBUTORS
Contributor, Apache Beam
PMC Member, Apache Airflow
Author & Contributor
How to Get Involved
Do you have an open source project we should support? Do you want to contribute to projects we maintain? Get in touch.