2.7 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
Traverse is a semantic search tool for OpenStreetMap tags. It converts French natural language queries ("où manger", "parking vélo") into OSM tags (amenity=restaurant, amenity=bicycle_parking).
Commands
Build search indexes (requires GPU via switcherooctl):
switcherooctl launch uv run create-index.py
Run interactive search:
switcherooctl launch uv run search.py
Run evaluation (98.4% recall on 100 test cases):
switcherooctl launch uv run test/evaluate.py
Architecture
Two-stage retrieval pipeline with pure, interchangeable functions:
candidates, search_settings, rerank_settings = prepare()
results = search(query, candidates, search_settings) # list[Candidate] → list[Candidate]
reranked = rerank(query, results, rerank_settings) # list[Candidate] → list[Candidate]
-
Embedding Search (
utils/embedding_search.py): Usesintfloat/multilingual-e5-basewith "query:"/"passage:" prefixes. Searches both POI and attribute FAISS indexes, returns top candidates. -
Cross-Encoder Reranking (
utils/rerank_with_crossencoder.py): UsesQwen/Qwen3-Reranker-0.6B(LLM-based yes/no reranker) on CUDA. Splits results into popular (usage >= 10k) and niche, returns top 5 of each.
Core Types
Candidate dataclass (utils/types/__init__.py) — used everywhere:
tag,description_fr,description_natural,category,usage_count,score
Data Flow
data/osm_wiki_tags_cleaned.json: Source data with OSM tags, French descriptions, and enriched descriptionsdata/osm_wiki_tags_natural_desc.json: Natural French descriptions generated by Mistral Largecreate-index.py: Generates separate FAISS indexes for POI and attribute categoriesdata/poi.index,data/attributes.index: FAISS vector indexesutils/prepare.py: Startup functions —load_candidates(),load_search_settings(),load_rerank_settings(),prepare()
Tag Categories
- POI: Points of interest (restaurants, shops, etc.)
- Attributes: Characteristics (cuisine type, wheelchair access, etc.)
Key Files
utils/types/__init__.py:Candidatedataclassutils/prepare.py: Data/model loading functionsutils/embedding_search.py:search()- embedding searchutils/rerank_with_crossencoder.py:rerank()- cross-encoder rerankingtest/evaluate.py: Evaluation script with recall/MRR metricsdata/search_cases.json: Test cases for evaluation
Future Work
- API REST (FastAPI) - planned
- Automatic POI/attribute detection (tested, heuristics best at 87%)
- Query expansion with LLM (implemented but adds latency without improving recall)