API: annotation (peak calling)¶
-
blockify.annotation.annotate(input_file, regions_bed, background_file, measure='enrichment', intermediate=None, alpha=None, correction=None, p_value=None, distance=None, min_size=None, max_size=None, pseudocount=1, tight=False, summit=False)[source]¶ Core annotation and peak calling method.
- Parameters
input_file (BedTool object) – BedTool object (instantiated from pybedtools) for input data
regions_bed (BedTool object) – BedTool object (instantiated from pybedtools) for regions over which we are annotation/calling peaks
background_file (BedTool object) – BedTool object (instantiated from pybedtools) used to parameterize the background model
measure (str) – Either “enrichment” or “depletion” to indicate which direction of effect to test for
intermediate (bool) – Whether or not to return intermediate calculations during peak calling
alpha (float or None) – Multiple-hypothesis adjusted threshold for calling significance
correction (str or None) – Multiple hypothesis correction to perform (see
statsmodels.stats.multitestfor valid values)p_value (float or None) – Straight p-value cutoff (unadjusted) for calling significance
distance (int or None) – Merge significant features within specified distance cutoff
min_size (int or None) – Minimum size cutoff for peaks
max_size (int or None) – Maximum size cutoff for peaks
pseudocount (float) – Pseudocount added to adjust background model
tight (bool) – Whether to tighten the regions in
regions_bedsummit (bool) – Whether to return peak summits instead of full peaks
- Returns
out_bed (BedTool object) – Set of peaks in BED6 format
df (
pandasDataFrame or None) – Ifintermediatespecified, DataFrame containing intermediate calculations during peak calling
-
blockify.annotation.annotate_from_command_line(args)[source]¶ Wrapper function for the command line function
blockify call- Parameters
args (
argparse.Namespaceobject) – Input from command line- Returns
out_bed (BedTool object) – Set of peaks in BED6 format
df (
pandasDataFrame or None) – Ifintermediatespecified, DataFrame containing intermediate calculations during peak calling
-
blockify.annotation.getPeakSummits(df, metric='pValue')[source]¶ From a list of peaks, get a set of peak summits
- Parameters
df (
pandasDataFrame) – Set of peaks fromannotateas a DataFramemetric (str) – Metric to use when filtering for summits. One of “pValue” or “density”
- Returns
summits – Set of peak summits as a DataFrame
- Return type
pandasDataFrame
-
blockify.annotation.parcelConsecutiveBlocks(df)[source]¶ Concatenates consecutive blocks into a DataFrame. If there are multiple non-contiguous sets of consecutive blocks, creates one DataFrame per set.
- Parameters
df (
pandasDataFrame) – Input set of blocks as a DataFrame- Returns
outlist – List of DataFrames, each of which is a set of consecutive blocks
- Return type
list of
pandasDataFrames
-
blockify.annotation.sizeFilter(bed, min_size, max_size)[source]¶ Filter peaks by size.
- Parameters
bed (BedTool object) – Input data file
min_size (int) – Lower bound for peak size
max_size (int) – Upper bound for peak size
- Returns
filtered_peaks – Peaks after size selection
- Return type
BedTool object
-
blockify.annotation.tighten(data)[source]¶ Tightens block boundaries in a BedTool file. This function modifies block boundaries so that they coincide with data points.
- Parameters
data (BedTool object) – Input file of block boundaries
- Returns
refined – BedTool of tightened blocks
- Return type
BedTool object
-
blockify.annotation.validateAnnotationArguments(input_file, regions_bed, background_file, measure, alpha, correction, p_value, distance, min_size, max_size, pseudocount)[source]¶ Validates parameters passed via the command line.
- Parameters
input_file (BedTool object) – BedTool object (instantiated from pybedtools) for input data
regions_bed (BedTool object) – BedTool object (instantiated from pybedtools) for regions over which we are annotation/calling peaks
background_file (BedTool object) – BedTool object (instantiated from pybedtools) used to parameterize the background model
measure (str) – Either “enrichment” or “depletion” to indicate which direction of effect to test for
alpha (float or None) – Multiple-hypothesis adjusted threshold for calling significance
correction (str or None) – Multiple hypothesis correction to perform (see
statsmodels.stats.multitestfor valid values)p_value (float or None) – Straight p-value cutoff (unadjusted) for calling significance
distance (int or None) – Merge significant features within specified distance cutoff
min_size (int or None) – Minimum size cutoff for peaks
max_size (int or None) – Maximum size cutoff for peaks
pseudocount (float) – Pseudocount added to adjust background model
- Returns
None
- Return type
None