MaintainedJan 2026 – Apr 2026

Dhaka Restaurant Directory

Mapping every restaurant in Dhaka I could find — pulled from open geographic data, cleaned, and shipped as a live directory.

Python
SQL
APIs
Data Analytics
Web Deployment

01 — Problem

Why this project

Dhaka has no central, searchable directory of restaurants. Google Maps is incomplete; Foodpanda only shows places it delivers from; restaurant-discovery apps are paywalled or limited to the wealthier neighbourhoods.

I wanted to know what was actually out there — by area, by cuisine, by price band. The directory had to be free, transparent about its sources, and fast.

02 — Approach

How I tackled it

01
Pull all `amenity=restaurant`, `amenity=cafe`, and `amenity=fast_food` nodes from OpenStreetMap inside a bounding box covering Greater Dhaka.
02
Deduplicate aggressively — OSM contributors often submit the same place under multiple node IDs, sometimes years apart, sometimes with different spellings of the same name in English and Bengali.
03
Normalise cuisine tags into a clean taxonomy (Bengali, Chinese, Continental, Thai, etc.) using a hand-curated mapping.
04
Enrich with neighbourhood and thana lookups using a separate OSM administrative-boundary query.
05
Ship as a static directory with search, filter, and a Folium-rendered map. Hosted on Vercel; rebuilds nightly.

03 — Data sources

Where the data came from

Source	Via	Rows
OpenStreetMap nodes	Overpass API	{{TODO: real count}}
OSM admin boundaries (Dhaka thanas)	Overpass API	~140 polygons
Cuisine taxonomy	Hand-curated CSV (50+ tags → 14 categories)	—

04 — Pipeline

End-to-end flow

01
Overpass query
amenity=restaurant|cafe|fast_food in Dhaka bbox
02
Raw JSON dump
stored locally for reproducibility
03
Deduplicate
Levenshtein + geo proximity within 25m radius
04
Normalise cuisine tags
via curated mapping CSV
05
Reverse-geocode thanas
point-in-polygon against admin boundaries
06
SQLite + GeoJSON
single artefact per build
07
Static site build
Folium map, Jinja2 list pages
08
Deploy to Vercel
nightly rebuild via cron

05 — Code

A key snippet

Overpass query and dedupe heuristic

snippet.pythonpython

import overpy, geopy.distance as gd
from rapidfuzz import fuzz

api = overpy.Overpass()
QUERY = """
[out:json][timeout:60];
area["name"="Dhaka"]->.dhaka;
(
  node["amenity"~"restaurant|cafe|fast_food"](area.dhaka);
);
out body;
"""

def is_duplicate(a, b, name_thresh=88, distance_m=25):
    name_ok = fuzz.ratio(a.tags.get("name", ""),
                         b.tags.get("name", "")) >= name_thresh
    geo_ok  = gd.distance((a.lat, a.lon), (b.lat, b.lon)).m <= distance_m
    return name_ok and geo_ok

nodes = api.query(QUERY).nodes
print(f"raw count: {len(nodes)}")
# ... pairwise dedupe (kdtree-bucketed in the real script) ...

06 — Results

What it shipped

Metric	Value
Restaurants in directory	{{TODO}}
Duplicate rate before dedupe	{{TODO}}%
Cuisine categories	14
Thanas covered	{{TODO}}
Build time end-to-end	{{TODO}} sec

Caveat: Coverage is uneven — wealthier areas (Gulshan, Banani, Dhanmondi) are over-represented because OSM contribution density tracks tech adoption. The Old Dhaka coverage is improving but still patchy.

07 — Lessons

What I learned

Eighty percent of the work in any open-data project is deduplication. The interesting part shows up only after that.
Bengali ↔ English name reconciliation needs a transliteration step, not just string-matching.
OSM is a fantastic free starting point but assumes you can verify a sample against ground truth. I spot-checked 50 entries on foot.
Shipping nightly was the right call — keeps the directory fresh without me touching it.

08 — Links

References

Source on GitHub

All projects