Efficiently query BigQuery and Snowflake from PostgreSQL

Efficiently query BigQuery and Snowflake from PostgreSQL

When do you want to query data in your data warehouse from PostgreSQL?

A few scenarios where you want to query your data warehouse from PostgreSQL include:

  1. You have PostgreSQL stored procs for business logic and you want to extend them to data stored in your data warehouse. You want don't want to de-duplicate data and inflate storage costs.

  2. You want to pre-aggregate data in your data warehouse and create materialized views on your PostgreSQL database.

  3. You want to add a new feature to your application written for PostgreSQL and this feature relies on data stored in your data warehouse (say BigQuery). You want to keep your app code simple by just querying PostgreSQL and not introducing a new data store.

Limitations with current approaches

Currently, there is no easy way to query BigQuery and Snowflake tables from your PostgreSQL database. You can write a foreign data wrapper in Postgres to make this happen. However, writing an efficient and a resilient foreign data wrapper is not trivial.

There are a few extension(s) that claim to do this. However, they can get very inefficient for a few common query types and in-fact query types where BigQuery and Snowflake excel. Their query push-down capabilities are not very mature -

  1. Aggregate (GROUP BY) push-down is not supported.

  2. JOIN, ORDER BY, LIMIT, FUNCTION, UNNEST etc. push down can be finicky and unreliable.

postgres_fdw + peerdb for efficiently querying BigQuery and Snowflake from PostgreSQL

PeerDB provides a Postgres-compatible interface to query tables in BigQuery and Snowflake. It lives outside your PostgreSQL database. PeerDB parses and translates the incoming SQL query to make it compatible with the appropriate Peer (BigQuery or Snowflake). It pushes down most SQL constructs, including filters, JOINs, aggregates, sorts, limits, function/procedure calls, etc., to the attached Peer.

You can also query tables in BigQuery or Snowflake from your existing Postgres database by connecting PeerDB as a postgres_fdw FOREIGN SERVER and creating FOREIGN TABLEs pointing to BigQuery/Snowflake tables. Below are a few benefits you get by following this approach:

  1. Blazing fast queries - You can utilize postgres_fdw’s advance push-down capabilities to maximize the performance of your queries involving BigQuery and Snowflake tables. For queries where push-down is important, you can expect 100x performance gains compared to other foreign data wrappers.

  2. Works for both hosted and on-prem deployments of PostgreSQL - This approach doesn't involve installing a custom extension to your PostgreSQL database. And just relies on postgres_fdw, which is shipped with community PostgreSQL and is supported on every hosted provider incl. AWS RDS, Azure Flexible Server, Cloud SQL, Crunchy Bridge, Supabase etc.

  3. Highly resilient - As we are not installing any software (custom extensions) to your PostgreSQL database and just relying on the native postgres_fdw, crashes and instabilities are rare.

Detailed documentation on setting this up is available here.

Supported data-stores

Currently, this approach works for querying BigQuery (production) and Snowflake (in Beta) from PostgreSQL. We are working on adding support for Databricks (Delta Lake tables), Clickhouse and few others.

If you wanted support for other data stores, join our slack channel and shoot us a request. OR send an email to founders@peerdb.io. Our engineering team operates fast and has built connectors just in a couple of days!

Hope you enjoyed reading the blog. Check out our github repo and get started testing PeerDB