starling.search.store.SequenceStore
- class SequenceStore[source]
Bases:
objectSQLite-backed per-gid sequence metadata store.
Table:
sequences( gid INTEGER PRIMARY KEY, len INTEGER NOT NULL, hash8 INTEGER, seq BLOB NOT NULL, -- 1 byte flag + payload (0=plain UTF-8, 1=zstd) shard INTEGER, local_idx INTEGER, header BLOB -- 1 byte flag + payload (0=plain UTF-8, 1=zstd, NULL if missing) )
Methods
__init__Close the database connection without publishing (cleanup only).
Writers only: commit, optimize, close, then atomically replace the live DB.
Decode BLOB to header string, handling compression.
Decode BLOB to sequence string, handling compression.
Encode header to BLOB with optional zstd compression.
Encode sequence to BLOB with optional zstd compression.
Return all gids whose sequence length is within [min_len, max_len].
Fetch header and length by global ID.
Batched fetch of gid, header, len, and hash8.
Fetch sequence string by global ID.
Compute 8-byte SHA1 hash of sequence for deduplication.
Fast batched insert.
Open an immutable, read-only connection that never locks or blocks.
Create a writer that builds into a UNIQUE tmp file using an IMMEDIATE transaction and write-optimized PRAGMAs, then later publishes atomically via close_publish().
- classmethod open_writer(live_db_path: str) SequenceStore[source]
Create a writer that builds into a UNIQUE tmp file using an IMMEDIATE transaction and write-optimized PRAGMAs, then later publishes atomically via close_publish().
- classmethod open_reader(live_db_path: str) SequenceStore[source]
Open an immutable, read-only connection that never locks or blocks.
- close_publish() None[source]
Writers only: commit, optimize, close, then atomically replace the live DB.
- insert_rows(rows: Sequence[Tuple[int, int, int, bytes, int, int, bytes | None]]) None[source]
Fast batched insert. Each row: (gid, len, hash8, seq_blob, shard, local_idx, header_blob)
- get_header_len(gid: int) Tuple[str | None, int | None][source]
Fetch header and length by global ID.
- get_many_meta(gids: Iterable[int]) List[Tuple[int, str | None, int | None, int | None]][source]
Batched fetch of gid, header, len, and hash8. Returns a list of (gid, header, length, hash8) tuples.
- get_gids_by_length_range(min_len: int | None, max_len: int | None) List[int][source]
Return all gids whose sequence length is within [min_len, max_len]. Uses SQLite index on len for speed.
- static encode_seq(seq: str, use_zstd: bool) bytes[source]
Encode sequence to BLOB with optional zstd compression.