Loading...
Loading...
Use when writing Python that processes biological sequences (DNA/RNA/protein) with the seqpro package — encoding, one-hot, k-mer shuffling, reverse complement, GC content, variable-length sequence batches, or anything involving seqpro's `Ragged` array. Covers the seqpro API surface and the conventions you need to use it correctly.
npx skill4agent add ml4gland/seqpro seqprosrc/kshuffle.rsRaggedimport seqpro as spsp.Raggedsp.bedsp.gtfpython/seqpro/__init__.pySeqTypepython/seqpro/_utils.pyndarraystr_object_bytes_uint8sp.cast_seqs(...)|S1uint8|S1uint8length_axisohe_axischeck_axes()_numba.pyforsp.DNAsp.RNAsp.AAsp.NucleotideAlphabetsp.AminoAlphabetpython/seqpro/alphabets/_alphabets.pypython/seqpro/transforms/| Task | Call | Notes |
|---|---|---|
| Normalize input | | → ` |
| One-hot encode | | last axis added for OHE dim |
| Decode OHE | | |
| Tokenize / detokenize | | integer ids |
| Pad | | |
| Reverse complement | | works on str/bytes/OHE |
| K-mer shuffle | | calls Rust |
| Jitter | | |
| Random sequences | | |
| GC content | | |
| Nucleotide content | | |
| Coverage binning | | |
| BED / GTF I/O | | polars/pyranges-backed |
sp.<fn>?Raggedsp.Raggedpython/seqpro/rag/_array.pyak.ArrayRaggeddataNDArray(total_elements, *fixed_trailing_dims)rag.dataoffsetsint64(N+1,)(2, N)rag.offsetsshape(batch, None, ohe_dim)Nonerag.rag_dimrag.lengthsndarrayimport numpy as np, seqpro as sp
# From lengths (most common — you have a flat data buffer and per-segment lengths)
data = np.frombuffer(b"ACGTACGTACG", dtype="S1")
lengths = np.array([4, 3, 4])
rag = sp.rag.Ragged.from_lengths(data, lengths) # shape (3, None)
# From explicit offsets
offsets = np.array([0, 4, 7, 11], dtype=np.int64)
rag = sp.rag.Ragged.from_offsets(data, shape=(3, None), offsets=offsets)
# Empty with known shape
rag = sp.rag.Ragged.empty((10, None, 4), dtype=np.uint8) # batch of 10 OHE seqsRagged.empty(shape, dtype)NoneshapeNoneRagged| Task | Do | Don't |
|---|---|---|
| Bulk numeric op on the flat data | | Iterate |
Apply a | Just call it: | Manually unpack and rebuild |
| Reinterpret bytes/dtype | | |
| Reshape non-ragged axes | | Touch |
| Drop a size-1 axis | | |
| Densify to NumPy | | Loop and stack |
| Strip to plain awkward | |
Raggedak.zipRaggedRaggedak.behaviorimport awkward as ak
seq_rag = sp.rag.Ragged.from_lengths(seq_flat, lengths) # |S1
score_rag = sp.rag.Ragged.from_lengths(score_flat, lengths) # f4
batch = ak.zip({"seq": seq_rag, "score": score_rag}) # → Ragged (record layout)
assert isinstance(batch, sp.rag.Ragged)
batch["score"].data[:] *= 2.0 # zero-copy mutation of the flat score bufferlengthsoffsetsak.Arrayrag.dtype[("seq","S1"),("score","f4")]rag.datarag.partsrag["field"].dataviewapplyto_numpyRaggedak.behaviornp.addnp.expRaggedRaggedak.zipak.fieldsRaggedak.Arrayak.to_packed(rag)|S1python/seqpro/rag/_array.py_gufuncs.py_utils.pyrag.offsets(2, N)(N+1,)rag.is_contiguousak.to_packed(rag)(N+1,)rag.datarag.data.shapeisinstance(rag.data, dict)rag.partsRaggedNoneshape__init__from_lengthsfrom_offsetsuint8sp.k_shuffleseqpro._k_shuffle| Need | File |
|---|---|
| Public surface | |
| Input casting / axis helpers | |
| OHE / tokens / padding | |
| Augmentations | |
| Stats | |
| Alphabets | |
| Ragged | |
| Transforms (pipeline objects) | |
| BED/GTF | |
| Rust k-shuffle | |
| Tests as usage examples | |
| Rendered docs | |
for_numba.pylength_axisohe_axisRagged_parts__init__dataoffsetspartsfrom_lengthsfrom_offsetsemptyRagged|S1