At PeerDB, we are building a fast and a cost-effective way to replicate data from Postgres to Data Warehouses and Queues. Today we are releasing our Azure Event Hubs connector. With this, you get a fast, simple, and reliable way to Change Data Capture (CDC) from PostgreSQL to Azure Event Hubs, enabling downstream apps to consume a raw feed of data from your PostgreSQL database in real-time. This enables use cases such as real-time alerting for Fraud or Anomaly detection in Banking/IoT, Operational Analytics, and more.
In this blog, we delve into existing approaches to replicate Postgres to Event Hubs and their challenges, as well as how PeerDB addresses these challenges to provide an Enterprise-grade experience!
Status Quo
Debezium is hard to use and is not built for Azure Event Hubs
A common ways to replicate data from Postgres to Event Hubs is to use Open Source tools such as Debezium. Below are a few challenges that we've heard from customers trying Debezium with Azure Event Hubs.
Limited Configurability: Debezium offers limited customization for Azure Event Hubs, including the inability to perform advanced mapping between tables and topics, lack of support for custom partitioning schemes per topic, and inability to flatten nested JSONs, among other limitations.
High Setup and Maintenance Costs: One of the common concerns we hear from customers is that setting up and managing Debezium at a production-grade level is challenging. It often requires several months of work by a data engineering team to fully implement.
Not Native to Azure Event Hubs: Debezium leverages the Kafka protocol over Event Hubs to support the Event Hubs connector. The Kafka protocol is not as developed as the native APIs provided by Event Hubs.
PeerDB for Change Data Capture (CDC) from Postgres to Azure Event Hubs
In the past 6 months, we have invested heavily to make replication from Postgres to Azure Event Hubs as robust as possible. We have implemented multiple usability, security, and performance-related features required for enterprise customers. Below are a few highlights.
Simple to Use - SQL Layer that makes life very easy!
Along with a simple UI, PeerDB provides a Postgres-compatible SQL layer to manage replication from Postgres to Azure Event Hubs. You just need to run a couple of SQL commands to setup a highly reliable CDC pipeline: CREATE PEER to make PeerDB aware of the Postgres and Event Hubs peers; CREATE MIRROR to kick off the replication job.
The Postgres-compatible SQL layer comes in very handy for managing replication from a fleet of Postgres databases across different tenants or micro services to Azure Event Hubs. You can script out your pipelines using Python or any other language and use any CI tool to manage your data pipelines.
The following demo showcases PeerDB in action, replicating data from Postgres, running a multi-tenant SaaS app, to Azure Event Hubs.
Blazing fast performance with Sub-Second latency
Use cases requiring replication from Postgres to Azure Event Hubs are highly latency-sensitive. For instance, consider an IoT app publishing raw changes to Event Hubs. PeerDB implements multiple optimizations to provide sub-second latency at high throughputs (10K+ TPS). A few of the optimizations include:
Always consuming the logical replication slot
Parallel apply for Azure Event Hubs
Using native APIs (not the Kafka layer) to ingest into Azure Event Hubs
Highly Configurable - do almost anything you want!
PeerDB provides many nuts and bolts to manage the behavior of CDC. You can control data formats/transformations, security/isolation, and performance while replicating data from Postgres to Azure Event Hubs. A few of them include:
Topics can be spread Namespaces and Subscriptions: You can replicate data from multiple Postgres tables to Event Hubs spread across namespaces and even subscriptions. This ensures guaranteed isolation across topics, which could be critical in multi-tenant SaaS apps.
Define custom partition keys and partition counts across topics: To configure performance across topics, you can define custom partition keys and partition counts per topic.
Flatten JSON and JSONB column: PeerDB allows you to deep flatten JSON and JSONB columns in Postgres into separate key<>value pairs on Azure Event Hubs.
Enterprise grade Security and Isolation
We designed the Azure Event Hubs connector specifically for Enterprise customers. Below are a few security features/items that PeerDB provides.
Guaranteed isolation across Azure Event Hubs topics: PeerDB provides the ability to replicate data from multiple tables in Postgres to separate topics spread across different namespaces and Azure subscriptions. This ensures guaranteed isolation across topics, which could be critical in multi-tenant SaaS apps, where you are providing raw DB feed to your customers.
PeerDB Enterprise Offering: For enterprise customers, PeerDB provides the self-hosted offering, which comes with production-ready Helm charts and Enterprise-grade support. This enables you to provision PeerDB in Azure Kubernetes Services (AKS) within your own VNET.
Production ready Observability
PeerDB UI: PeerDB comes with a comprehensive UI to monitor the replication jobs. You can monitor performance (throughput and latency), logs, and Postgres native metrics such as replication slot size. Additionally, you can create alerts for these metrics and send them to various channels such as Email and Slack.
Integration with Azure Monitor: PeerDB Enterprise can run on Azure Kubernetes Services (AKS). AKS has out-of-the-box integration with Azure Monitor to manage metrics, logs and alerts.
Conclusion
Hope you enjoyed reading this blog! The Azure Event Hubs connector is being used in production by a few large-scale Postgres Azure customers. If you are interested in trying this out, please reach out to us through the Contact Us form on our website.
We are actively working to extend similar support to other queues including Kafka and Google Pub Sub. If you are interested in previewing PeerDB for these queues, reach out to us through the Contact Us form. We also offer a 30 day free trial for PeerDB Cloud.