traverse/CLAUDE.md
2026-02-09 01:39:47 +01:00

2.7 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Traverse is a semantic search tool for OpenStreetMap tags. It converts French natural language queries ("où manger", "parking vélo") into OSM tags (amenity=restaurant, amenity=bicycle_parking).

Commands

Build search indexes (requires GPU via switcherooctl):

switcherooctl launch uv run create-index.py

Run interactive search:

switcherooctl launch uv run search.py

Run evaluation (98.4% recall on 100 test cases):

switcherooctl launch uv run test/evaluate.py

Architecture

Two-stage retrieval pipeline with pure, interchangeable functions:

candidates, search_settings, rerank_settings = prepare()
results = search(query, candidates, search_settings)    # list[Candidate] → list[Candidate]
reranked = rerank(query, results, rerank_settings)       # list[Candidate] → list[Candidate]
  1. Embedding Search (utils/embedding_search.py): Uses intfloat/multilingual-e5-base with "query:"/"passage:" prefixes. Searches both POI and attribute FAISS indexes, returns top candidates.

  2. Cross-Encoder Reranking (utils/rerank_with_crossencoder.py): Uses Qwen/Qwen3-Reranker-0.6B (LLM-based yes/no reranker) on CUDA. Splits results into popular (usage >= 10k) and niche, returns top 5 of each.

Core Types

Candidate dataclass (utils/types/__init__.py) — used everywhere:

  • tag, description_fr, description_natural, category, usage_count, score

Data Flow

  • data/osm_wiki_tags_cleaned.json: Source data with OSM tags, French descriptions, and enriched descriptions
  • data/osm_wiki_tags_natural_desc.json: Natural French descriptions generated by Mistral Large
  • create-index.py: Generates separate FAISS indexes for POI and attribute categories
  • data/poi.index, data/attributes.index: FAISS vector indexes
  • utils/prepare.py: Startup functions — load_candidates(), load_search_settings(), load_rerank_settings(), prepare()

Tag Categories

  • POI: Points of interest (restaurants, shops, etc.)
  • Attributes: Characteristics (cuisine type, wheelchair access, etc.)

Key Files

  • utils/types/__init__.py: Candidate dataclass
  • utils/prepare.py: Data/model loading functions
  • utils/embedding_search.py: search() - embedding search
  • utils/rerank_with_crossencoder.py: rerank() - cross-encoder reranking
  • test/evaluate.py: Evaluation script with recall/MRR metrics
  • data/search_cases.json: Test cases for evaluation

Future Work

  • API REST (FastAPI) - planned
  • Automatic POI/attribute detection (tested, heuristics best at 87%)
  • Query expansion with LLM (implemented but adds latency without improving recall)