Exons
- class fac.CodingExon(gene_idx: int, acceptor: int, donor: int, prev_donor: int, next_acceptor: int, phase_start: int)
Represents a coding exon in a gene. Can be constitutive or alternative.
To load these, see
`load_long_canonical_internal_coding_exons`.- Parameters:
gene_idx – Index of the gene in the validation set
acceptor – Position of the acceptor site. Start of the exon
donor – Position of the donor site. Note: not the same as the “end” of the exon since that’s exclusive
prev_donor – Position of the previous donor site
next_acceptor – Position of the next acceptor site
phase_start – The phase of the start of the exon, 0, 1, or 2. This is relative to the coding frame, i.e., a phase of 1 indicates that the exon starts with a 2mer followed by a codon.
- property all_locations: Tuple[int, int, int, int]
Returns a tuple of the locations of the exon. This is a tuple of the (prev_donor, acceptor, donor, next_acceptor) positions.
- Returns:
Tuple[int, int, int, int]. The locations of the exon.
- property length: int
Return the length of the exon. This is the number of bases in the exon.
- Returns:
int. The length of the exon.
- property text
Compute the text of the exon. This is the sequence of the exon in the gene.
- Returns:
np.ndarray. The sequence of the exon. Will be of shape (L, 4) where L is the length of the exon.
- to_dict()
Convert the exon to a dictionary. This is useful for saving the exon to a file.
Invariant:
`CodingExon(**exon.to_dict()) == exon`- Returns:
Dict[str, Any]. The dictionary representation of the exon.
- fac.sequence_to_codons(sequence, off=0)
Convert a sequence to codons. If the sequence is a one-hot encoding, it will be converted to integers [0, 1, 2, 3] by taking the argmax along the last axis. The sequence will be sliced from off to the end of the sequence, and then truncated to the nearest multiple of 3.
- Parameters:
sequence – np.ndarray. The sequence to convert to codons. Should be of shape (N, 4) or (N,).
off – int. The offset to start the frame from. Things before this offset will be ignored.
- Returns:
np.ndarray. The sequence as codons. Will be of shape ((N - off) // 3, 3).
- fac.is_stop(codons)
Compute whether a sequence of codons is a stop codon.
- Parameters:
codons – np.ndarray. The codons to check. Should be of shape (N, 3).
- Returns:
np.ndarray. A boolean array of shape (N,) where True indicates a stop codon.
- fac.all_frames_closed(exon_sequences)
Compute whether all frames are closed for a set of exon sequences.
- Parameters:
exon_sequences – List[np.ndarray]. The exon sequences to check. Each sequence should be of shape (N, 4).
- Returns:
np.ndarray. A boolean array of shape (N,) where True indicates that all frames are closed.