mirror of
https://github.com/tjiho/traverse.git
synced 2026-02-16 20:57:31 +01:00
71 lines
2.7 KiB
Markdown
71 lines
2.7 KiB
Markdown
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
## Project Overview
|
|
|
|
Traverse is a semantic search tool for OpenStreetMap tags. It converts French natural language queries ("où manger", "parking vélo") into OSM tags (`amenity=restaurant`, `amenity=bicycle_parking`).
|
|
|
|
## Commands
|
|
|
|
Build search indexes (requires GPU via switcherooctl):
|
|
```bash
|
|
switcherooctl launch uv run create-index.py
|
|
```
|
|
|
|
Run interactive search:
|
|
```bash
|
|
switcherooctl launch uv run search.py
|
|
```
|
|
|
|
Run evaluation (98.4% recall on 100 test cases):
|
|
```bash
|
|
switcherooctl launch uv run test/evaluate.py
|
|
```
|
|
|
|
## Architecture
|
|
|
|
Two-stage retrieval pipeline with pure, interchangeable functions:
|
|
|
|
```python
|
|
candidates, search_settings, rerank_settings = prepare()
|
|
results = search(query, candidates, search_settings) # list[Candidate] → list[Candidate]
|
|
reranked = rerank(query, results, rerank_settings) # list[Candidate] → list[Candidate]
|
|
```
|
|
|
|
1. **Embedding Search** (`utils/embedding_search.py`): Uses `intfloat/multilingual-e5-base` with "query:"/"passage:" prefixes. Searches both POI and attribute FAISS indexes, returns top candidates.
|
|
|
|
2. **Cross-Encoder Reranking** (`utils/rerank_with_crossencoder.py`): Uses `Qwen/Qwen3-Reranker-0.6B` (LLM-based yes/no reranker) on CUDA. Splits results into popular (usage >= 10k) and niche, returns top 5 of each.
|
|
|
|
### Core Types
|
|
|
|
`Candidate` dataclass (`utils/types/__init__.py`) — used everywhere:
|
|
- `tag`, `description_fr`, `description_natural`, `category`, `usage_count`, `score`
|
|
|
|
### Data Flow
|
|
|
|
- `data/osm_wiki_tags_cleaned.json`: Source data with OSM tags, French descriptions, and enriched descriptions
|
|
- `data/osm_wiki_tags_natural_desc.json`: Natural French descriptions generated by Mistral Large
|
|
- `create-index.py`: Generates separate FAISS indexes for POI and attribute categories
|
|
- `data/poi.index`, `data/attributes.index`: FAISS vector indexes
|
|
- `utils/prepare.py`: Startup functions — `load_candidates()`, `load_search_settings()`, `load_rerank_settings()`, `prepare()`
|
|
|
|
### Tag Categories
|
|
|
|
- **POI**: Points of interest (restaurants, shops, etc.)
|
|
- **Attributes**: Characteristics (cuisine type, wheelchair access, etc.)
|
|
|
|
## Key Files
|
|
|
|
- `utils/types/__init__.py`: `Candidate` dataclass
|
|
- `utils/prepare.py`: Data/model loading functions
|
|
- `utils/embedding_search.py`: `search()` - embedding search
|
|
- `utils/rerank_with_crossencoder.py`: `rerank()` - cross-encoder reranking
|
|
- `test/evaluate.py`: Evaluation script with recall/MRR metrics
|
|
- `data/search_cases.json`: Test cases for evaluation
|
|
|
|
## Future Work
|
|
|
|
- API REST (FastAPI) - planned
|
|
- Automatic POI/attribute detection (tested, heuristics best at 87%)
|
|
- Query expansion with LLM (implemented but adds latency without improving recall)
|