API: annotation (peak calling)

blockify.annotation.annotate(input_file, regions_bed, background_file, measure='enrichment', intermediate=None, alpha=None, correction=None, p_value=None, distance=None, min_size=None, max_size=None, pseudocount=1, tight=False, summit=False)[source]

Core annotation and peak calling method.

Parameters
  • input_file (BedTool object) – BedTool object (instantiated from pybedtools) for input data

  • regions_bed (BedTool object) – BedTool object (instantiated from pybedtools) for regions over which we are annotation/calling peaks

  • background_file (BedTool object) – BedTool object (instantiated from pybedtools) used to parameterize the background model

  • measure (str) – Either “enrichment” or “depletion” to indicate which direction of effect to test for

  • intermediate (bool) – Whether or not to return intermediate calculations during peak calling

  • alpha (float or None) – Multiple-hypothesis adjusted threshold for calling significance

  • correction (str or None) – Multiple hypothesis correction to perform (see statsmodels.stats.multitest for valid values)

  • p_value (float or None) – Straight p-value cutoff (unadjusted) for calling significance

  • distance (int or None) – Merge significant features within specified distance cutoff

  • min_size (int or None) – Minimum size cutoff for peaks

  • max_size (int or None) – Maximum size cutoff for peaks

  • pseudocount (float) – Pseudocount added to adjust background model

  • tight (bool) – Whether to tighten the regions in regions_bed

  • summit (bool) – Whether to return peak summits instead of full peaks

Returns

  • out_bed (BedTool object) – Set of peaks in BED6 format

  • df (pandas DataFrame or None) – If intermediate specified, DataFrame containing intermediate calculations during peak calling

blockify.annotation.annotate_from_command_line(args)[source]

Wrapper function for the command line function blockify call

Parameters

args (argparse.Namespace object) – Input from command line

Returns

  • out_bed (BedTool object) – Set of peaks in BED6 format

  • df (pandas DataFrame or None) – If intermediate specified, DataFrame containing intermediate calculations during peak calling

blockify.annotation.getPeakSummits(df, metric='pValue')[source]

From a list of peaks, get a set of peak summits

Parameters
  • df (pandas DataFrame) – Set of peaks from annotate as a DataFrame

  • metric (str) – Metric to use when filtering for summits. One of “pValue” or “density”

Returns

summits – Set of peak summits as a DataFrame

Return type

pandas DataFrame

blockify.annotation.parcelConsecutiveBlocks(df)[source]

Concatenates consecutive blocks into a DataFrame. If there are multiple non-contiguous sets of consecutive blocks, creates one DataFrame per set.

Parameters

df (pandas DataFrame) – Input set of blocks as a DataFrame

Returns

outlist – List of DataFrames, each of which is a set of consecutive blocks

Return type

list of pandas DataFrames

blockify.annotation.sizeFilter(bed, min_size, max_size)[source]

Filter peaks by size.

Parameters
  • bed (BedTool object) – Input data file

  • min_size (int) – Lower bound for peak size

  • max_size (int) – Upper bound for peak size

Returns

filtered_peaks – Peaks after size selection

Return type

BedTool object

blockify.annotation.tighten(data)[source]

Tightens block boundaries in a BedTool file. This function modifies block boundaries so that they coincide with data points.

Parameters

data (BedTool object) – Input file of block boundaries

Returns

refined – BedTool of tightened blocks

Return type

BedTool object

blockify.annotation.validateAnnotationArguments(input_file, regions_bed, background_file, measure, alpha, correction, p_value, distance, min_size, max_size, pseudocount)[source]

Validates parameters passed via the command line.

Parameters
  • input_file (BedTool object) – BedTool object (instantiated from pybedtools) for input data

  • regions_bed (BedTool object) – BedTool object (instantiated from pybedtools) for regions over which we are annotation/calling peaks

  • background_file (BedTool object) – BedTool object (instantiated from pybedtools) used to parameterize the background model

  • measure (str) – Either “enrichment” or “depletion” to indicate which direction of effect to test for

  • alpha (float or None) – Multiple-hypothesis adjusted threshold for calling significance

  • correction (str or None) – Multiple hypothesis correction to perform (see statsmodels.stats.multitest for valid values)

  • p_value (float or None) – Straight p-value cutoff (unadjusted) for calling significance

  • distance (int or None) – Merge significant features within specified distance cutoff

  • min_size (int or None) – Minimum size cutoff for peaks

  • max_size (int or None) – Maximum size cutoff for peaks

  • pseudocount (float) – Pseudocount added to adjust background model

Returns

None

Return type

None