API: annotation (peak calling)¶

blockify.annotation.annotate(input_file, regions_bed, background_file, measure='enrichment', intermediate=None, alpha=None, correction=None, p_value=None, distance=None, min_size=None, max_size=None, pseudocount=1, tight=False, summit=False)[source]¶

Core annotation and peak calling method.

Parameters

input_file (BedTool object) – BedTool object (instantiated from pybedtools) for input data
regions_bed (BedTool object) – BedTool object (instantiated from pybedtools) for regions over which we are annotation/calling peaks
background_file (BedTool object) – BedTool object (instantiated from pybedtools) used to parameterize the background model
measure (str) – Either “enrichment” or “depletion” to indicate which direction of effect to test for
intermediate (bool) – Whether or not to return intermediate calculations during peak calling
alpha (float or None) – Multiple-hypothesis adjusted threshold for calling significance
correction (str or None) – Multiple hypothesis correction to perform (see statsmodels.stats.multitest for valid values)
p_value (float or None) – Straight p-value cutoff (unadjusted) for calling significance
distance (int or None) – Merge significant features within specified distance cutoff
min_size (int or None) – Minimum size cutoff for peaks
max_size (int or None) – Maximum size cutoff for peaks
pseudocount (float) – Pseudocount added to adjust background model
tight (bool) – Whether to tighten the regions in regions_bed
summit (bool) – Whether to return peak summits instead of full peaks

Returns

out_bed (BedTool object) – Set of peaks in BED6 format
df (pandas DataFrame or None) – If intermediate specified, DataFrame containing intermediate calculations during peak calling

blockify.annotation.annotate_from_command_line(args)[source]¶

Wrapper function for the command line function blockify call

Parameters

args (argparse.Namespace object) – Input from command line

Returns

out_bed (BedTool object) – Set of peaks in BED6 format
df (pandas DataFrame or None) – If intermediate specified, DataFrame containing intermediate calculations during peak calling

blockify.annotation.getPeakSummits(df, metric='pValue')[source]¶

From a list of peaks, get a set of peak summits

Parameters

df (pandas DataFrame) – Set of peaks from annotate as a DataFrame
metric (str) – Metric to use when filtering for summits. One of “pValue” or “density”

Returns

summits – Set of peak summits as a DataFrame

Return type

pandas DataFrame

blockify.annotation.parcelConsecutiveBlocks(df)[source]¶

Concatenates consecutive blocks into a DataFrame. If there are multiple non-contiguous sets of consecutive blocks, creates one DataFrame per set.

Parameters: df (pandas DataFrame) – Input set of blocks as a DataFrame
Returns: outlist – List of DataFrames, each of which is a set of consecutive blocks
Return type: list of pandas DataFrames

blockify.annotation.sizeFilter(bed, min_size, max_size)[source]¶

Filter peaks by size.

Parameters

bed (BedTool object) – Input data file
min_size (int) – Lower bound for peak size
max_size (int) – Upper bound for peak size

Returns

filtered_peaks – Peaks after size selection

Return type

BedTool object

blockify.annotation.tighten(data)[source]¶

Tightens block boundaries in a BedTool file. This function modifies block boundaries so that they coincide with data points.

Parameters: data (BedTool object) – Input file of block boundaries
Returns: refined – BedTool of tightened blocks
Return type: BedTool object

blockify.annotation.validateAnnotationArguments(input_file, regions_bed, background_file, measure, alpha, correction, p_value, distance, min_size, max_size, pseudocount)[source]¶

Validates parameters passed via the command line.

Parameters

input_file (BedTool object) – BedTool object (instantiated from pybedtools) for input data
regions_bed (BedTool object) – BedTool object (instantiated from pybedtools) for regions over which we are annotation/calling peaks
background_file (BedTool object) – BedTool object (instantiated from pybedtools) used to parameterize the background model
measure (str) – Either “enrichment” or “depletion” to indicate which direction of effect to test for
alpha (float or None) – Multiple-hypothesis adjusted threshold for calling significance
correction (str or None) – Multiple hypothesis correction to perform (see statsmodels.stats.multitest for valid values)
p_value (float or None) – Straight p-value cutoff (unadjusted) for calling significance
distance (int or None) – Merge significant features within specified distance cutoff
min_size (int or None) – Minimum size cutoff for peaks
max_size (int or None) – Maximum size cutoff for peaks
pseudocount (float) – Pseudocount added to adjust background model

Returns

None

Return type

None