Deletion Experiments
- fac.deletion.experiment(mod: ModelToAnalyze, *, repair_spec={'type': 'NoRepair'}, distance_out, binary_metric=True, mod_for_base=None, limit=None) DeletionAccuracyDeltaResult
Accuracy delta given deletion experiment. This function runs a deletion experiment on the given model and returns the accuracy delta for each deletion.
- Parameters:
mod – The model to run the deletion experiment on.
repair_spec – The repair strategy to use.
distance_out – The distance out to use for deletions.
binary_metric – Whether to use a binary metric for the predictions.
- Returns:
The accuracy delta for each deletion, as a
DeletionAccuracyDeltaResult.
- fac.deletion.experiments(mods: Dict[str, List[ModelToAnalyze]], *, repair_spec={'type': 'NoRepair'}, distance_out, binary_metric=True)
A wrapper around
fac.deletion.experimentthat takes a dictionary of model series and returns a dictionary of results. The keys of the input dictionary are used as the keys of the output dictionary.See
fac.deletion.experimentfor more details.
- class fac.deletion.DeletionAccuracyDeltaResult(raw_data: ndarray)
Contains the raw output of the deletion experiment, as well as several methods for summarization.
- Parameters:
raw_data – The raw data from the deletion experiment. Called as raw_data[seed, exon_id, deletion - 1, deletion_location, affected_splice_site]
- mean_effect_masked(mask=None, mutation_locations_to_use: Tuple[str] = ("d.s. of 3'SS", "u.s. of 5'SS"), affected_splice_sites_to_use: Tuple[str] = ("3'SS", "5'SS")) ndarray
Compute the mean effect of deletions on the given deletion locations and splice sites, with a mask.
- Parameters:
mask – A mask to apply to the data. If provided, the mask should be of the shape mask[exon_id, num_deletions - 1, deletion_location out of deletion_locations].
mutation_locations_to_use – The deletion locations to consider. Each must be one of
fac.deletion.mutation_locations.affected_splice_sites_to_use – The affected splice sites to consider. Each must be one of
fac.deletion.affected_splice_sites.
- Returns:
The mean effect of deletions at the given locations on the given sites. Shape
(num_seeds, num_deletions). Only averaged over sites where the mask is on.
- mean_effect_matrix(num_deletions: int) ndarray
Returns a matrix representing the mean effect of
num_deletionsdeletions at each location on each splice site.- Parameters:
num_deletions – The number of deletions to consider.
- Returns:
The mean effect matrix. Shape (4, 4). The rows represent deletion locations, and the columns represent affected splice sites. NaN entries (e.g. splice sites outside a prediction window) are skipped; a cell is NaN only if every contributing value is NaN.
- mean_effect_series(mutation_location: str, affected_splice_site: str, mean=True) ndarray
Returns the mean effect of deletions at the given location on the given splice site.
- Parameters:
mutation_location – The location of the deletion. One of
fac.deletion.mutation_locations.affected_splice_site – The affected splice site. One of
fac.deletion.affected_splice_sites.mean – Whether to take the mean across all exons.
- Returns:
The mean effect by deletion location. This is not aggregated over deletions or seeds. Shape:
(num_seeds, num_deletions). Ifmeanis False, the shape is(num_seeds, num_exons, num_deletions). Whenmeanis True, NaN entries are skipped (mean over the finite exons).
- fac.deletion.plot_by_deletion_loc_and_affected_site(deltas_by_model: Dict[str, DeletionAccuracyDeltaResult], distance_out: int)
Plot the effect of deletions on the accuracy of the model. This is plotted for 3’SS and 5’SS separately, for all 4 deletion locations, and for all deletion lengths.
- Parameters:
deltas_by_model – The deltas by model.
distance_out – The distance out.
- fac.deletion.plot_exon_effects_by_orf(deltas_by_model: Dict[str, DeletionAccuracyDeltaResult], distance_out: int, *, axs=None)
Plot the effect of deletions on the accuracy of the model. This is plotted for 3’SS and 5’SS combined into a single effect, for both the exonic deletions (all 4 combinations, averaged).
- Parameters:
deltas_by_model – The deltas by model.
distance_out – The distance out.
axs – The axes to plot on.
- fac.deletion.plot_matrix_at_site(deltas: Dict[str, DeletionAccuracyDeltaResult], distance_out: int, num_deletions: int, height=4)
Plot a matrix of effects for each model. This is a 4x4 matrix where the rows are the deletions in each region (u.s. of 3’SS, d.s. of 3’SS, u.s. of 5’SS, d.s. of 5’SS) and the columns are the affected splice sites (P5’SS, 3’SS, 5’SS, N3’SS).
The values are the drop in accuracy when deleting the given region and splice site.
The values are in percentage points.
- Parameters:
deltas – The deltas by model.
distance_out – The distance out.
num_deletions – The number of deletions.
height – The height of the figure, in inches.
Adjacent Deletions Experiment
- fac.deletion.adjacent_coding_exons() List[Tuple[CodingExon, CodingExon]]
All pairs of consecutive coding exons in the dataset we are using for evaluation. This is a list of tuples, where each tuple is a pair of CodingExon objects. The first element is the first exon, and the second element is the second exon. These are guaranteed to be consecutive in the sense that the first one’s next_acceptor is the second one’s acceptor.
- fac.deletion.close_consecutive_coding_exons() List[Tuple[CodingExon, CodingExon]]
All pairs of consecutive coding exons in the dataset we are using for evaluation, with the additional constraint that the distance between the first exon and the second exon (the intron in between) is less than 1000 bp, and the total length of the two exons is less than 4000 bp.
(The second constraint, given the first, is that the length of the two exons sums to less than 3000 bp, which is not particularly strict.)
- fac.deletion.conditions
Each condition reflects the number of deletions being made to the first exon and the second exon. Each of these conditions is run when evaluting the model, and indices into this list can be used to select the condition to use.
- fac.deletion.run_on_all_adjacent_deletions(model: ModelToAnalyze, *, limit=None, outside=False) ndarray
Run the model on all adjacent deletions, producing a table of results.
- Parameters:
model – The model to run. This should be a ModelToAnalyze object.
limit – The maximum number of pairs of adjacent deletions to run on. If None, run on all pairs. This is useful for debugging.
outside – If True, deletions are made on the outer edges of the exons. This only exists for checking the asymmetry results.
- Returns:
An array of results, of shape (N, C, 4), where N is the number of pairs of adjacent deletions, and C is the number of conditions. The last dimension corresponds to (first.acceptor, first.donor, second.acceptor, second.donor). The array is of type boolean, indicating either above/below the classification threshold.
- fac.deletion.run_on_all_adjacent_deletions_for_multiple_series(mods: Dict[str, List[ModelToAnalyze]], outside=False) Dict[str, ndarray]
Like run_on_all_adjacent_deletions, but for multiple model series.
- fac.deletion.plot_adjacent_deletion_results(results: Dict[str, ndarray], h=3, w=3)
Plots the results of the adjacent deletion experiment, with each subplot corresponding to a different model series.
- Parameters:
results – A dictionary of results, output of run_on_all_adjacent_deletions_for_multiple_series.