starling.search.search_engine

FAISS Search Engine

High-performance similarity search with flexible filtering, length gating, and exact reranking.

Overview

The SearchEngine provides fast ANN (Approximate Nearest Neighbor) search with:

  • Multi-level Filtering: Embedding distance, sequence length, exact matches, identity

  • Length Gating: Pre-filter candidates by sequence length using indexed lookups

  • Reranking: Exact rescoring of top-k candidates using full encoder

  • Batch Processing: Efficient handling of multiple queries

  • Flexible Metrics: Cosine similarity or L2 distance

Basic Usage

>>> from starling.search import SearchEngine
>>> import torch
>>> engine = SearchEngine.load("my_index.faiss", metric="cosine")
>>> queries = torch.randn(10, 768)
>>> queries = torch.nn.functional.normalize(queries, dim=1)
>>> results = engine.search(queries=queries, k=100, nprobe=128, return_similarity=True)
>>> for qi, hits in enumerate(results):
...     for score, gid, header, length in hits[:5]:
...         print(qi, score, gid, length)

Advanced Usage

Filtering by Length: >>> engine.search(queries, k=100, nprobe=128, length_min=50, length_max=500)

Excluding Exact Matches: >>> engine.search(queries, query_sequences=[“MKTLLIL…”], k=100, exclude_exact=True)

Exact Reranking: >>> engine.search(queries, k=100, nprobe=128, rerank=True, rerank_device=”cuda:0”)

Search Parameters

See search() docstring for the full parameter list.

Common Patterns

Pattern 1: Near duplicates (filter exact + very similar) >>> engine.search(queries, k=100, nprobe=256, exclude_exact=True, max_cosine_similarity=0.99)

Pattern 2: Length-focused neighborhood >>> L = 200 >>> engine.search(queries, k=1000, nprobe=128, length_min=L-50, length_max=L+50)

Pattern 3: Diverse similar sequences >>> engine.search(queries, k=500, nprobe=128, max_cosine_similarity=0.80, length_min=50, length_max=500)

Notes

  • Normalize queries for cosine: torch.nn.functional.normalize(q, dim=1)

  • Higher nprobe improves recall at cost of latency

  • Use rerank for improved precision after coarse ANN stage

Classes

SearchEngine

FAISS-backed similarity search with rich post-filtering.