API: annotation (peak calling)¶
-
blockify.annotation.
annotate
(input_file, regions_bed, background_file, measure='enrichment', intermediate=None, alpha=None, correction=None, p_value=None, distance=None, min_size=None, max_size=None, pseudocount=1, tight=False, summit=False)[source]¶ Core annotation and peak calling method.
- Parameters
input_file (BedTool object) – BedTool object (instantiated from pybedtools) for input data
regions_bed (BedTool object) – BedTool object (instantiated from pybedtools) for regions over which we are annotation/calling peaks
background_file (BedTool object) – BedTool object (instantiated from pybedtools) used to parameterize the background model
measure (str) – Either “enrichment” or “depletion” to indicate which direction of effect to test for
intermediate (bool) – Whether or not to return intermediate calculations during peak calling
alpha (float or None) – Multiple-hypothesis adjusted threshold for calling significance
correction (str or None) – Multiple hypothesis correction to perform (see
statsmodels.stats.multitest
for valid values)p_value (float or None) – Straight p-value cutoff (unadjusted) for calling significance
distance (int or None) – Merge significant features within specified distance cutoff
min_size (int or None) – Minimum size cutoff for peaks
max_size (int or None) – Maximum size cutoff for peaks
pseudocount (float) – Pseudocount added to adjust background model
tight (bool) – Whether to tighten the regions in
regions_bed
summit (bool) – Whether to return peak summits instead of full peaks
- Returns
out_bed (BedTool object) – Set of peaks in BED6 format
df (
pandas
DataFrame or None) – Ifintermediate
specified, DataFrame containing intermediate calculations during peak calling
-
blockify.annotation.
annotate_from_command_line
(args)[source]¶ Wrapper function for the command line function
blockify call
- Parameters
args (
argparse.Namespace
object) – Input from command line- Returns
out_bed (BedTool object) – Set of peaks in BED6 format
df (
pandas
DataFrame or None) – Ifintermediate
specified, DataFrame containing intermediate calculations during peak calling
-
blockify.annotation.
getPeakSummits
(df, metric='pValue')[source]¶ From a list of peaks, get a set of peak summits
- Parameters
df (
pandas
DataFrame) – Set of peaks fromannotate
as a DataFramemetric (str) – Metric to use when filtering for summits. One of “pValue” or “density”
- Returns
summits – Set of peak summits as a DataFrame
- Return type
pandas
DataFrame
-
blockify.annotation.
parcelConsecutiveBlocks
(df)[source]¶ Concatenates consecutive blocks into a DataFrame. If there are multiple non-contiguous sets of consecutive blocks, creates one DataFrame per set.
- Parameters
df (
pandas
DataFrame) – Input set of blocks as a DataFrame- Returns
outlist – List of DataFrames, each of which is a set of consecutive blocks
- Return type
list of
pandas
DataFrames
-
blockify.annotation.
sizeFilter
(bed, min_size, max_size)[source]¶ Filter peaks by size.
- Parameters
bed (BedTool object) – Input data file
min_size (int) – Lower bound for peak size
max_size (int) – Upper bound for peak size
- Returns
filtered_peaks – Peaks after size selection
- Return type
BedTool object
-
blockify.annotation.
tighten
(data)[source]¶ Tightens block boundaries in a BedTool file. This function modifies block boundaries so that they coincide with data points.
- Parameters
data (BedTool object) – Input file of block boundaries
- Returns
refined – BedTool of tightened blocks
- Return type
BedTool object
-
blockify.annotation.
validateAnnotationArguments
(input_file, regions_bed, background_file, measure, alpha, correction, p_value, distance, min_size, max_size, pseudocount)[source]¶ Validates parameters passed via the command line.
- Parameters
input_file (BedTool object) – BedTool object (instantiated from pybedtools) for input data
regions_bed (BedTool object) – BedTool object (instantiated from pybedtools) for regions over which we are annotation/calling peaks
background_file (BedTool object) – BedTool object (instantiated from pybedtools) used to parameterize the background model
measure (str) – Either “enrichment” or “depletion” to indicate which direction of effect to test for
alpha (float or None) – Multiple-hypothesis adjusted threshold for calling significance
correction (str or None) – Multiple hypothesis correction to perform (see
statsmodels.stats.multitest
for valid values)p_value (float or None) – Straight p-value cutoff (unadjusted) for calling significance
distance (int or None) – Merge significant features within specified distance cutoff
min_size (int or None) – Minimum size cutoff for peaks
max_size (int or None) – Maximum size cutoff for peaks
pseudocount (float) – Pseudocount added to adjust background model
- Returns
None
- Return type
None