Collected sources and patterns will appear here. Add from search, explore, or the patterns library.
DNASequence -> List<CodonToken>
Segment genomic sequences into non-overlapping 3-mer (codon) chunks to preserve translational reading frames.
Problem it solves
Standard subword or character tokenizers ignore natural biological codon boundaries.
Consumes
Emits
The real projects this mechanism was found in. Attribution is the point — this is how the best teams actually do it.