starling.search.store.SequenceStore

class SequenceStore[source]

Bases: object

SQLite-backed per-gid sequence metadata store.

Table:

sequences(
    gid       INTEGER PRIMARY KEY,
    len       INTEGER NOT NULL,
    hash8     INTEGER,
    seq       BLOB NOT NULL,   -- 1 byte flag + payload (0=plain UTF-8, 1=zstd)
    shard     INTEGER,
    local_idx INTEGER,
    header    BLOB              -- 1 byte flag + payload (0=plain UTF-8, 1=zstd, NULL if missing)
)

Methods

__init__

close

Close the database connection without publishing (cleanup only).

close_publish

Writers only: commit, optimize, close, then atomically replace the live DB.

decode_header

Decode BLOB to header string, handling compression.

decode_seq

Decode BLOB to sequence string, handling compression.

encode_header

Encode header to BLOB with optional zstd compression.

encode_seq

Encode sequence to BLOB with optional zstd compression.

get_gids_by_length_range

Return all gids whose sequence length is within [min_len, max_len].

get_header_len

Fetch header and length by global ID.

get_many_header_len

get_many_meta

Batched fetch of gid, header, len, and hash8.

get_seq

Fetch sequence string by global ID.

hash8

Compute 8-byte SHA1 hash of sequence for deduplication.

insert_rows

Fast batched insert.

open_reader

Open an immutable, read-only connection that never locks or blocks.

open_writer

Create a writer that builds into a UNIQUE tmp file using an IMMEDIATE transaction and write-optimized PRAGMAs, then later publishes atomically via close_publish().

classmethod open_writer(live_db_path: str) SequenceStore[source]

Create a writer that builds into a UNIQUE tmp file using an IMMEDIATE transaction and write-optimized PRAGMAs, then later publishes atomically via close_publish().

classmethod open_reader(live_db_path: str) SequenceStore[source]

Open an immutable, read-only connection that never locks or blocks.

close_publish() None[source]

Writers only: commit, optimize, close, then atomically replace the live DB.

close() None[source]

Close the database connection without publishing (cleanup only).

insert_rows(rows: Sequence[Tuple[int, int, int, bytes, int, int, bytes | None]]) None[source]

Fast batched insert. Each row: (gid, len, hash8, seq_blob, shard, local_idx, header_blob)

get_seq(gid: int) str | None[source]

Fetch sequence string by global ID.

Parameters:

gid (int) – Global sequence identifier.

Returns:

Decoded sequence string, or None if GID not found.

Return type:

str or None

get_header_len(gid: int) Tuple[str | None, int | None][source]

Fetch header and length by global ID.

Parameters:

gid (int) – Global sequence identifier.

Returns:

(header, length) tuple, or (None, None) if GID not found.

Return type:

tuple of (str or None, int or None)

get_many_header_len(gids: Iterable[int]) List[Tuple[int, str | None, int | None]][source]
get_many_meta(gids: Iterable[int]) List[Tuple[int, str | None, int | None, int | None]][source]

Batched fetch of gid, header, len, and hash8. Returns a list of (gid, header, length, hash8) tuples.

get_gids_by_length_range(min_len: int | None, max_len: int | None) List[int][source]

Return all gids whose sequence length is within [min_len, max_len]. Uses SQLite index on len for speed.

static hash8(seq: str) int[source]

Compute 8-byte SHA1 hash of sequence for deduplication.

static encode_seq(seq: str, use_zstd: bool) bytes[source]

Encode sequence to BLOB with optional zstd compression.

static decode_seq(blob: bytes) str[source]

Decode BLOB to sequence string, handling compression.

static encode_header(header: str | None, use_zstd: bool) bytes | None[source]

Encode header to BLOB with optional zstd compression.

static decode_header(blob: bytes | None) str | None[source]

Decode BLOB to header string, handling compression.