Adapting Large Language Models for Emergency Dispatch in Togo Using Local Infrastructure Data
yeye is a framework for constructing geospatially grounded instruction-tuning datasets from real infrastructure data and using them to fine-tune LLMs for emergency dispatch decision support in low-resource settings.
Submitted to Deep Learning Indaba 2026 — AI for Social Impact and Sustainable Systems track.
⚠️ Research prototype only. yeye must not be deployed in any operational emergency dispatch context. See Limitations.
Understaffed emergency call centers in countries like Togo cannot guarantee a trained human operator for every incoming call. Off-the-shelf LLMs like Mistral-7B have no knowledge of Togo's health facilities, road networks, or resource constraints — this information is too sparsely documented online for models to acquire from pretraining alone.
yeye addresses this by:
- Extracting infrastructure data from OpenStreetMap and UN OCHA Humanitarian Data Exchange
- Programmatically generating 1,000 instruction-tuning pairs spanning 12 emergency types across 15 Togo cities
- Fine-tuning Mistral-7B via LoRA (QLoRA 4-bit) on a single T4 GPU
- Evaluating on a blind test set where all infrastructure context is withheld from prompts
| Metric | Base Mistral-7B | yeye | Delta |
|---|---|---|---|
| Combined score | 73.8% | 79.6% | +5.8pp |
| Facility grounding | 67.5% | 88.5% | +21.0pp |
| Exact facility recall | 38.0% | 87.0% | +49.0pp |
| Win rate | 34/100 | 60/100 | McNemar p=0.007 |
The headline result: exact facility name recall improves from 38% to 87% under blind evaluation — a 49 percentage-point gain (95% CI: +37.4 to +60.6 percentage points).
The infrastructure dataset contains 1,903 facilities extracted from Togo:
| Category | Count | Named |
|---|---|---|
| Hospitals and clinics | 695 | 648 (93%) |
| Fuel stations | 589 | 446 (76%) |
| Pharmacies | 382 | 340 (89%) |
| Police stations | 222 | 176 (79%) |
| Fire stations | 15 | 13 (87%) |
Data sources:
- OpenStreetMap via Overpass API (bounding box: 6.08°N–11.15°N, 0.15°W–1.80°E)
- UN OCHA Humanitarian Data Exchange (validated health facility dataset)
Deduplication: OSM and HDX records within 50m sharing the same category are merged.
OpenStreetMap ──┐
├──→ Pair Generator ──→ LoRA Fine-Tuning ──→ Blind Evaluation ──→ Results
UN OCHA HDX ───┤ 1,000 pairs r=16, α=32 Context withheld
│ 800/100/100 QLoRA 4-bit n=100
Scenario Params ┘ 12 types T4 GPU
- Base model: Mistral-7B-Instruct-v0.2
- Method: LoRA with QLoRA 4-bit (NF4 quantization)
- LoRA targets: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Rank: r=16, α=32, dropout=0.05
- Trainable parameters: ~0.4% of full model
- Optimizer: AdamW, lr=2×10⁻⁴, cosine decay, 5% warmup
- Hardware: Single NVIDIA T4 GPU (Google Colab Pro)
Scoring uses three weighted components:
- Facility grounding (50%): Exact name match from 1,903-entry dataset → 1.0; city/location token only → 0.5; no reference → 0.0
- Distance/ETA plausibility (30%): Values checked against ±50% of ground truth
- Rubric completeness (20%): Severity acknowledgment, dispatch action, constraint reporting
Statistical significance tested via McNemar's test on win/loss counts.
- Python 3.10+
- Google Colab Pro (T4 or A100 GPU) or equivalent
- HuggingFace account with Mistral-7B access
pip install transformers peft bitsandbytes datasets accelerateOpen yeye.ipynb in Google Colab and run all cells sequentially. The notebook handles:
- Infrastructure data extraction from OSM/HDX
- Instruction-tuning pair generation
- LoRA fine-tuning on Mistral-7B
- Blind evaluation and scoring
- Statistical testing and visualization
- Evaluation circularity: Training and test sets are generated by the same program — the model may have learned generator-specific patterns rather than geographic knowledge.
- No human evaluation: Scoring dimensions are automated proxies; correspondence to expert dispatcher judgment is unknown.
- Synthetic data only: No real Togo emergency call transcripts were used.
- Language gap: All prompts are in English; operational dispatch in Togo uses French, Ewe, Kabiyè, and other local languages.
- Resource-type confusion: The model sometimes dispatches the wrong resource type (e.g., police to a gas leak) — a safety-critical failure mode.
- No RAG baseline: Retrieval-augmented generation was not compared.
- Single country, single model: Generalizability is assumed but not demonstrated.
| Source | License |
|---|---|
| OpenStreetMap | ODbL (Open Database License) |
| UN OCHA HDX | CC BY (Creative Commons Attribution) |
No personal data, emergency call recordings, or protected health information was used.
@inproceedings{yeye2026,
title = {yeye
: Adapting Large Language Models for Emergency Dispatch
in Togo Using Local Infrastructure Data},
author = {[Author]},
booktitle = {Deep Learning Indaba 2026},
year = {2026}
}This project is released for research purposes. See LICENSE for details.