Updated on August 21, 2025

I moved handlers to dedicated package.

I added publicly available token creation endpoint with 2 seconds rate limiter.

Updated on August 20, 2025

I moved all database interactions to separate package. Each resource now has it’s own repository.

Updated on August 19, 2025

Refactoring in progress. I am moving database calls to separate go-package.

Updated on August 18, 2025

I deployed the docs pages!

Additionally I purchased a domain. The API is now available under api.worldlines.dev.

Updated on August 17, 2025

In a week, I’ll be leaving for a long holiday. Until then, I need to wrap up a few loose ends. I expect to finish a beta version of the GADM API before I go.

Remaining tasks:

  • Buy a domain and link it to the server
  • Create a static website for documentation
  • Add an endpoint for access token creation

Today, I started working on the documentation site. I’ll be using Hugo and GitHub Pages — a simple and fast setup that fits the project’s needs for now.

Updated on August 16, 2025

The main goal of the refactoring has been achieved — all GADM endpoints now rely on the new Postgres tables. Along the way, I removed a significant amount of code tied to the old tables. The next step in improving the codebase will be to further separate database logic from the main package, making the structure cleaner and easier to maintain.

Updated on August 15, 2025

I continue working on leveraging adm and adm_geometries tables for all endpoints

Updated on August 14, 2025

I’ve found, fixed and deployed fix to the reverse geocode sql query bug.

I started refactoring the code – first step is to extract database “stuff” from main package. I will probably get inspired by repository patter from DDD.

I started working on leveraging adm and adm_geometries tables for all existing endpoints. This is required step before I dedupe GADM information stored in the database

Updated on August 11, 2025

Reverse geocode endpoint is deployed to production 🙂

Updated on August 10, 2025

I implemented the reverse geocode endpoint.

Once I deploy new endpoint I will take some time to refactor the code. I need to better organize the packages and get rid of duplicated GADM data.

Updated on August 9, 2025

Today I completed the setup for reverse geocoding. The structure is built around two tables:
adm – contains all GADM levels, but without their geometries.
adm_geometries – holds all unique GADM geometries.

The two tables are linked via a geometry hash. Original GADM properties are stored in the adm.metadata column as JSONB. The adm_geometries table also includes extra columns — area_sq_m and bbox — to speed up spatial queries.

The Query

Reverse geocoding will be implemented using ST_Contains.

My initial query looked like this:

EXPLAIN (ANALYZE, VERBOSE, BUFFERS, TIMING, WAL) WITH input_point AS (
  SELECT ST_SetSRID(ST_MakePoint(2.3216141804549526, 48.85511355252443), 4326)::geometry(Point,4326) AS pt
),
ag AS (
  SELECT g.geom_hash, g.area_sq_m
  FROM gadm.adm_geometries AS g
  CROSS JOIN input_point ip
  WHERE ST_Contains(g.geom, ip.pt)
  ORDER BY g.area_sq_m ASC
  LIMIT 1
)
SELECT
  a.lv,
  a.metadata,
  a.geom_hash,
  ag.area_sq_m
FROM gadm.adm AS a
JOIN ag ON a.geom_hash = ag.geom_hash
ORDER BY a.lv DESC
LIMIT 1;

After running EXPLAIN ANALYZE, I found the query took about 170 ms — far too slow. The bottleneck was ST_Contains being called on geometries from all GADM levels. I had expected a point to match only three to five boundaries, depending on the number of levels in a given country. However, the analysis revealed that for my test point in Paris, ST_Contains was executed 15 times.

I needed a way to reduce the number of ST_Contains calls. But first, I wanted to measure how performance varied across different GADM levels. My assumption was that the country level would be the slowest, with each subsequent, smaller level executing faster.

The assumption proved correct — the country-level check was the slowest, at around 150 ms. Surprisingly, all levels below the country performed similarly, taking only about 2.5–5 ms each – not too bad!

This confirmed my initial plan: first, find all geometries whose bounding box contains the point. Then, order them by area and select the smallest one whose geometry actually contains the point.

WITH input_point AS (
  SELECT ST_SetSRID(ST_MakePoint(2.3216141804549526, 48.85511355252443), 4326)::geometry(Point,4326) AS pt
),
candidates AS (
  SELECT g.geom_hash, g.geom, g.area_sq_m
  FROM gadm.adm_geometries AS g
  CROSS JOIN input_point ip
  WHERE ST_Contains(g.bbox, ip.pt)          
  ORDER BY g.area_sq_m ASC
)
SELECT c.geom_hash, c.area_sq_m
FROM candidates c
CROSS JOIN input_point ip
WHERE ST_Contains(c.geom, ip.pt)
ORDER BY c.area_sq_m ASC
LIMIT 1;    

This change cut the execution time down to 5–10 ms, which is an acceptable result for the initial implementation. It’s a >10x improvement!

 Limit  (cost=2548.72..7448.94 rows=1 width=33) (actual time=3.869..3.930 rows=1 loops=1)
   Output: g.geom_hash, g.area_sq_m
   Buffers: shared hit=69
   CTE input_point
     ->  Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.088..0.092 rows=1 loops=1)
           Output: '0101000020E6100000F4DE9774AA920240BA3A635C746D4840'::geometry(Point,4326)
   ->  Nested Loop  (cost=2548.71..7448.93 rows=1 width=33) (actual time=3.769..3.771 rows=1 loops=1)
         Output: g.geom_hash, g.area_sq_m
         Join Filter: st_contains(g.geom, ip.pt)
         Rows Removed by Join Filter: 2
         Buffers: shared hit=69
         ->  Sort  (cost=2548.71..2549.69 rows=391 width=11234) (actual time=3.432..3.434 rows=3 loops=1)
               Output: g.geom_hash, g.geom, g.area_sq_m
               Sort Key: g.area_sq_m
               Sort Method: quicksort  Memory: 32kB
               Buffers: shared hit=69
               ->  Nested Loop  (cost=0.28..648.88 rows=391 width=11234) (actual time=2.291..2.674 rows=15 loops=1)
                     Output: g.geom_hash, g.geom, g.area_sq_m
                     Buffers: shared hit=66
                     ->  CTE Scan on input_point ip_1  (cost=0.00..0.02 rows=1 width=32) (actual time=0.412..0.413 rows=1 loops=1)
                           Output: ip_1.pt
                     ->  Index Scan using idx_adm_geometries_bbox on gadm.adm_geometries g  (cost=0.28..648.47 rows=39 width=11354) (actual time=1.690..2.069 rows=15 loops=1)
                           Output: g.geom_hash, g.geom, g.bbox, g.area_sq_m
                           Index Cond: (g.bbox ~ ip_1.pt)
                           Filter: st_contains(g.bbox, ip_1.pt)
                           Buffers: shared hit=66
         ->  CTE Scan on input_point ip  (cost=0.00..0.02 rows=1 width=32) (actual time=0.002..0.002 rows=1 loops=3)
               Output: ip.pt
 Planning:
   Buffers: shared hit=250
 Planning Time: 17.373 ms
 Execution Time: 5.370 ms
(32 rows)

Edge cases

ST_Contains returns false for points lying exactly on a boundary. This is an issue I’ll need to address later. For now, I’m keeping the implementation simple by ensuring the query returns at most one result.

If I wanted to include boundary points, I’d need to use ST_Covers. However, this requires a different optimization approach, since the query could return multiple jurisdictions—for example, where three countries meet.

Other Optimization Ideas

Further improvements are possible by subdividing geometries into chunks of up to 256 points using the dedicated function ST_Subdivide.

Updated on August 8, 2025

I analyzed geometry duplication across GADM levels and found over 9,000 duplicate geometries. This duplication often occurs in small countries, such as the Vatican, or in islands. To verify whether geometries were truly duplicated, I compared their hashes, point counts, and areas — an approach that should eliminate hash collisions.

-- =======================================================================
-- QUERY 3: Check for potential MD5 collisions by comparing geometry equality
-- =======================================================================
WITH all_geometries AS (
    SELECT 
        md5_geom_binary_hash as hash,
        'adm_0' as table_name,
        geom,
        gid_0 as gid,
        country as name
    FROM adm_0
    
    UNION ALL
    
    SELECT 
        md5_geom_binary_hash as hash,
        'adm_1' as table_name,
        geom,
        gid_1 as gid,
        name_1 as name
    FROM adm_1
    
    UNION ALL
    
    SELECT 
        md5_geom_binary_hash as hash,
        'adm_2' as table_name,
        geom,
        gid_2 as gid,
        name_2 as name
    FROM adm_2
    
    UNION ALL
    
    SELECT 
        md5_geom_binary_hash as hash,
        'adm_3' as table_name,
        geom,
        gid_3 as gid,
        name_3 as name
    FROM adm_3

    UNION ALL
    
    SELECT 
        md5_geom_binary_hash as hash,
        'adm_4' as table_name,
        geom,
        gid_4 as gid,
        name_4 as name
    FROM adm_4

    UNION ALL
    
    SELECT 
        md5_geom_binary_hash as hash,
        'adm_5' as table_name,
        geom,
        gid_5 as gid,
        name_5 as name
    FROM adm_5
    -- Add other tables as needed for comprehensive check
),
hash_groups AS (
    SELECT 
        hash,
        COUNT(*) as count,
        array_agg(DISTINCT table_name) as tables,
        array_agg(name) as names
    FROM all_geometries
    GROUP BY hash
    HAVING COUNT(*) > 1
),
collision_check AS (
    SELECT 
        hg.hash,
        hg.count,
        hg.tables,
        hg.names,
        -- Check if all geometries with same hash are actually equal
        COUNT(DISTINCT ST_AsText(ag.geom)) as unique_geometries_count
    FROM hash_groups hg
    JOIN all_geometries ag ON hg.hash = ag.hash
    GROUP BY hg.hash, hg.count, hg.tables, hg.names
)
SELECT 
    hash,
    count as duplicate_count,
    tables,
    names,
    unique_geometries_count,
    CASE 
        WHEN unique_geometries_count = 1 THEN 'TRUE DUPLICATE'
        ELSE 'POTENTIAL MD5 COLLISION!'
    END as analysis
FROM collision_check
ORDER BY unique_geometries_count DESC, duplicate_count DESC;

An unexpected benefit of this situation is that I can store all geometries in one place, remove duplicates, and save space. One option is to maintain a single geometries table with a one-to-many relationship to the original GADM tables. However, this makes it harder to map back from adm_geometries to the lowest-level adm_<nr> tables.

My next experiment is to merge all adm_ tables into a single table, with metadata stored in a JSON/JSONB column.

table: adm
- lv: int
- metadata: JSON
- geometry_hash // md5 hash

table: adm_geometries
- geometry_hash: text // md5 hash
- geom: GEOMETRY(POLYGON)

That’s it for today!

Updated on August 6, 2025

I started working on reverse geocoding endpoint. I decided I will store the geometries in a single table adm_geometries. This should help optimizing the st_within query across GADM levels. The id of a geometry will be calculated as md5 hash of its shape.

                      Table "gadm.adm_geometries"
 Column |            Type             | Collation | Nullable | Default 
--------+-----------------------------+-----------+----------+---------
 id     | text                        |           | not null | 
 geom   | geometry(MultiPolygon,4326) |           | not null | 
Indexes:
    "adm_geometries_pkey" PRIMARY KEY, btree (id)
    "idx_adm_geometries_geom" gist (geom)

However I encountered a problem – there are duplicated md5 hashes across GADM levels. This is unexpected! Does it mean that certain states fully overlap with their countries?

I will figure it out tomorrow … hopefully

Updated on August 5, 2025

I polished and refined docker compose files. Additionally I tested if “up” migrations work and intended and I added capability to run “down” migrations.

Updated on August 4, 2025

Horay!

I finally deployed first version of my GADM api service. It is very rough at the moment. The deployment process is semi-automated. I can initialize new ubuntu VM, launch dockerized server and database, ingest GADM geopackage and run migrations with just few manual interventions.

Some bash script that I wrote will help me to automate the process even further with github CI/CD pipeline.

At the moment I am happy I can ping my server from local machine and confirm that pagination and filtering works as intended for all GADM levels!