Exons

class fac.CodingExon(gene_idx: int, acceptor: int, donor: int, prev_donor: int, next_acceptor: int, phase_start: int)

Represents a coding exon in a gene. Can be constitutive or alternative.

To load these, see `load_long_canonical_internal_coding_exons`.

Parameters:
  • gene_idx – Index of the gene in the validation set

  • acceptor – Position of the acceptor site. Start of the exon

  • donor – Position of the donor site. Note: not the same as the “end” of the exon since that’s exclusive

  • prev_donor – Position of the previous donor site

  • next_acceptor – Position of the next acceptor site

  • phase_start – The phase of the start of the exon, 0, 1, or 2. This is relative to the coding frame, i.e., a phase of 1 indicates that the exon starts with a 2mer followed by a codon.

property all_locations: Tuple[int, int, int, int]

Returns a tuple of the locations of the exon. This is a tuple of the (prev_donor, acceptor, donor, next_acceptor) positions.

Returns:

Tuple[int, int, int, int]. The locations of the exon.

property length: int

Return the length of the exon. This is the number of bases in the exon.

Returns:

int. The length of the exon.

property text

Compute the text of the exon. This is the sequence of the exon in the gene.

Returns:

np.ndarray. The sequence of the exon. Will be of shape (L, 4) where L is the length of the exon.

to_dict()

Convert the exon to a dictionary. This is useful for saving the exon to a file.

Invariant: `CodingExon(**exon.to_dict()) == exon`

Returns:

Dict[str, Any]. The dictionary representation of the exon.

fac.sequence_to_codons(sequence, off=0)

Convert a sequence to codons. If the sequence is a one-hot encoding, it will be converted to integers [0, 1, 2, 3] by taking the argmax along the last axis. The sequence will be sliced from off to the end of the sequence, and then truncated to the nearest multiple of 3.

Parameters:
  • sequence – np.ndarray. The sequence to convert to codons. Should be of shape (N, 4) or (N,).

  • off – int. The offset to start the frame from. Things before this offset will be ignored.

Returns:

np.ndarray. The sequence as codons. Will be of shape ((N - off) // 3, 3).

fac.is_stop(codons)

Compute whether a sequence of codons is a stop codon.

Parameters:

codons – np.ndarray. The codons to check. Should be of shape (N, 3).

Returns:

np.ndarray. A boolean array of shape (N,) where True indicates a stop codon.

fac.all_frames_closed(exon_sequences)

Compute whether all frames are closed for a set of exon sequences.

Parameters:

exon_sequences – List[np.ndarray]. The exon sequences to check. Each sequence should be of shape (N, 4).

Returns:

np.ndarray. A boolean array of shape (N,) where True indicates that all frames are closed.