Native Replication of Postgres Geospatial Data

Native Replication of Postgres Geospatial Data

PostGIS extension enables you to efficiently store and query geospatial data in Postgres. It introduces spatial data types such as GEOMETRY and GEOGRAPHY, which lets you store points, polygons, lines and so on. PostGIS also supports a large set of SQL functions and spatial indexes to effectively manipulate spatial data.

The reason you chose PostGIS is that you want your geospatial workload to be treated as a first-class citizen. You should expect no less when it comes to replicating your geospatial database, instead of geospatial being an afterthought.

In this blog, we will explore what it takes to natively replicate PostGIS data from Postgres to other data stores.

Status Quo

We've overwhelmingly heard from customers that existing data movement tools are not reliable in replicating geospatial data from Postgres. Common issues include -

  1. Suboptimal Data Type Conversion - converting spatial types to STRING or JSON rather than using native spatial types supported by the target.

  2. Data Loss - skipping whole rows when encountering incompatible types leading to loss of data.

Replicating Geospatial data from Postgres

At PeerDB, we are building a specialized data-movement tool for Postgres. With that spirit, we took a step forward to add extensive geospatial support while replicating data from Postgres to Snowflake. This includes -

Demo of replicating Postgres Geospatial data incl. POINT, LINE, POLYGON etc of GEOMETRY and GEOGRAPHY types to Snowflake.

  1. Support for GEOMETRY and GEOGRAPHY data types: These types are stored as GEOMETRY and GEOGRAPHY types on Snowflake.

  2. Replicating Points, Lines and Polygons: A GEOMETRY or GEOGRAPHY in Postgres can be any of a Point, Line or Polygon. PeerDB handles replicating all these data types to Snowflake.
    Let's see how to ingest Point,Line and Polygon to a Postgres table:

          CREATE TABLE locations (
              id serial PRIMARY KEY,
              location_name VARCHAR(255),
              location_geometry GEOMETRY,
              location_geography GEOGRAPHY
          );
          -- Inserting a point with generic geometry and geography
          INSERT INTO locations VALUES (1,'Point A', 
              ST_GeomFromText('POINT(1 2)'), 
              ST_GeogFromText('POINT(1 2)'));
          -- Inserting a line with generic geometry and geography
          INSERT INTO locations VALUES (2, 'Line AB', 
              ST_GeomFromText('LINESTRING(1 2, 3 4, 5 6)'), 
              ST_GeogFromText('LINESTRING(1 2, 3 4, 5 6)'));
          -- Inserting a polygon with generic geometry and geography
          INSERT INTO locations VALUES (3, 'Polygon ABCD', 
              ST_GeomFromText('POLYGON((1 1, 2 1, 2 2, 1 2, 1 1))'), 
              ST_GeogFromText('POLYGON((1 1, 2 1, 2 2, 1 2, 1 1))'));
    
  3. Replicating GEOMETRYCOLLECTION: A geometry can represent a collection of Lines, Points and Polygons. PeerDB handles replicating GEOMETRYCOLLECTION from Postgres to Snowflake.
    Let's see how to insert a GEOMETRYCOLLECTION to a Postgres table:

          /*
          Inserting a GEOMETRYCOLLECTION into the generic 
          geometry column
          */
          INSERT INTO locations (location_name, 
          location_geometry, 
          location_geography)
          VALUES (
              'GeometryCollection',
              ST_GeomFromText('GEOMETRYCOLLECTION(POINT(1 2), 
                  LINESTRING(1 2, 3 4))'),
              ST_GeogFromText('GEOMETRYCOLLECTION(POINT(1 2), 
                  LINESTRING(1 2, 3 4))')
          );
    
  4. Error handling for invalid Geometries: A difference between Postgres and Snowflake is that Postgres lets you store invalid Geometries and Snowflake doesn't. For example, a line with a single point can be stored in Postgres but not in Snowflake. In such a scenario, while replicating data from Postgres, PeerDB captures the invalid data in logs and stores null for that column value on Snowflake.
    Let's see how an invalid GEOMETRY looks like in Postgres

          source=> SELECT ST_IsValid(ST_MakeLine('POINT(1 2)'::geometry));
          NOTICE:  Too few points in geometry component 
          at or near point 1 2
           st_isvalid
          ------------
           f
          (1 row)
    

Conclusion

We added extensive support to replicate geospatial data from Postgres to Snowflake. We will be expanding this to other data stores such as BigQuery and Clickhouse. Please reach out to us if you want this support for other data stores.

If you want to give PeerDB a try for replicating geospatial data from Postgres, these links should prove useful: :)

  1. Quickstart

  2. PeerDB's Github repo

  3. Join PeerDB's Slack community

  4. PeerDB docs