starling.search.search_engine
FAISS Search Engine
High-performance similarity search with flexible filtering, length gating, and exact reranking.
Overview
The SearchEngine provides fast ANN (Approximate Nearest Neighbor) search with:
Multi-level Filtering: Embedding distance, sequence length, exact matches, identity
Length Gating: Pre-filter candidates by sequence length using indexed lookups
Reranking: Exact rescoring of top-k candidates using full encoder
Batch Processing: Efficient handling of multiple queries
Flexible Metrics: Cosine similarity or L2 distance
Basic Usage
>>> from starling.search import SearchEngine
>>> import torch
>>> engine = SearchEngine.load("my_index.faiss", metric="cosine")
>>> queries = torch.randn(10, 768)
>>> queries = torch.nn.functional.normalize(queries, dim=1)
>>> results = engine.search(queries=queries, k=100, nprobe=128, return_similarity=True)
>>> for qi, hits in enumerate(results):
... for score, gid, header, length in hits[:5]:
... print(qi, score, gid, length)
Advanced Usage
Filtering by Length: >>> engine.search(queries, k=100, nprobe=128, length_min=50, length_max=500)
Excluding Exact Matches: >>> engine.search(queries, query_sequences=[“MKTLLIL…”], k=100, exclude_exact=True)
Exact Reranking: >>> engine.search(queries, k=100, nprobe=128, rerank=True, rerank_device=”cuda:0”)
Search Parameters
See search() docstring for the full parameter list.
Common Patterns
Pattern 1: Near duplicates (filter exact + very similar) >>> engine.search(queries, k=100, nprobe=256, exclude_exact=True, max_cosine_similarity=0.99)
Pattern 2: Length-focused neighborhood >>> L = 200 >>> engine.search(queries, k=1000, nprobe=128, length_min=L-50, length_max=L+50)
Pattern 3: Diverse similar sequences >>> engine.search(queries, k=500, nprobe=128, max_cosine_similarity=0.80, length_min=50, length_max=500)
Notes
Normalize queries for cosine:
torch.nn.functional.normalize(q, dim=1)Higher
nprobeimproves recall at cost of latencyUse
rerankfor improved precision after coarse ANN stage
Classes
FAISS-backed similarity search with rich post-filtering. |