CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Traverse is a semantic search tool for OpenStreetMap tags. It converts French natural language queries ("où manger", "parking vélo") into OSM tags (amenity=restaurant, amenity=bicycle_parking).

Commands

Build search indexes (requires GPU via switcherooctl):

switcherooctl launch uv run create-index.py

Run interactive search:

switcherooctl launch uv run search.py

Run evaluation (98.4% recall on 100 test cases):

switcherooctl launch uv run test/evaluate.py

Architecture

Two-stage retrieval pipeline with pure, interchangeable functions:

candidates, search_settings, rerank_settings = prepare()
results = search(query, candidates, search_settings)    # list[Candidate] → list[Candidate]
reranked = rerank(query, results, rerank_settings)       # list[Candidate] → list[Candidate]

Embedding Search (utils/embedding_search.py): Uses intfloat/multilingual-e5-base with "query:"/"passage:" prefixes. Searches both POI and attribute FAISS indexes, returns top candidates.
Cross-Encoder Reranking (utils/rerank_with_crossencoder.py): Uses Qwen/Qwen3-Reranker-0.6B (LLM-based yes/no reranker) on CUDA. Splits results into popular (usage >= 10k) and niche, returns top 5 of each.

Core Types

Candidate dataclass (utils/types/__init__.py) — used everywhere:

tag, description_fr, description_natural, category, usage_count, score

Data Flow

data/osm_wiki_tags_cleaned.json: Source data with OSM tags, French descriptions, and enriched descriptions
data/osm_wiki_tags_natural_desc.json: Natural French descriptions generated by Mistral Large
create-index.py: Generates separate FAISS indexes for POI and attribute categories
data/poi.index, data/attributes.index: FAISS vector indexes
utils/prepare.py: Startup functions — load_candidates(), load_search_settings(), load_rerank_settings(), prepare()

Tag Categories

POI: Points of interest (restaurants, shops, etc.)
Attributes: Characteristics (cuisine type, wheelchair access, etc.)

Key Files

utils/types/__init__.py: Candidate dataclass
utils/prepare.py: Data/model loading functions
utils/embedding_search.py: search() - embedding search
utils/rerank_with_crossencoder.py: rerank() - cross-encoder reranking
test/evaluate.py: Evaluation script with recall/MRR metrics
data/search_cases.json: Test cases for evaluation

Future Work

API REST (FastAPI) - planned
Automatic POI/attribute detection (tested, heuristics best at 87%)
Query expansion with LLM (implemented but adds latency without improving recall)

2.7 KiB Raw Permalink Blame History