Introduction

blockify is a fast and optimal genomic peak caller for one-dimensional data (e.g. BED, qBED, CCF).

The package is built around the Bayesian blocks algorithm [SNJC13], which finds the optimal change points in time series data assuming a Poisson counting process. We also implement a dynamic pruning strategy which achieves linear runtime performance [KFE12]. An interactive notebook demonstrating Bayesian blocks can be found here.

While Bayesian blocks was originally developed in the astrophysics community for photon-counting experiments, we find that it has applications in genomics. In particular, we use it to analyze transposon calling cards experiments. Calling cards uses a transposase fused to a transcription factor (TF) to deposit transposons near TF binding sites. Bayesian blocks partitions the genome based on the local density of insertions, which in turn are used to identify peaks and candidate TF binding sites. We have also had success using this algorithm to perfom general-purpose genome segmentation.

Note

Recent papers using calling cards include [SLC+19] and [LSM20]. For examples of Bayesian blocks in practice, see [CMC+20] and [MWC+20].

blockify is best designed to process qBED files [MLH+20], although it will work with BED files.

To get started, please see our Installation guide and Tutorial.

References

CMC+20

Alexander J. Cammack, Arnav Moudgil, Jiayang Chen, Michael J. Vasek, Mark Shabsovich, Katherine McCullough, Allen Yen, Tomas Lagunas, Susan E. Maloney, June He, Xuhua Chen, Misha Hooda, Michael N. Wilkinson, Timothy M. Miller, Robi D. Mitra, and Joseph D. Dougherty. A viral toolkit for recording transcription factor–DNA interactions in live mouse tissues. Proceedings of the National Academy of Sciences, pages 201918241, April 2020. URL: http://www.pnas.org/lookup/doi/10.1073/pnas.1918241117, doi:10.1073/pnas.1918241117.

KFE12

R Killick, P Fearnhead, and I A Eckley. Optimal Detection of Changepoints With a Linear Computational Cost. Journal of the American Statistical Association, 107(500):1590–1598, October 2012. URL: https://www.tandfonline.com/doi/full/10.1080/01621459.2012.737745, doi:10.1080/01621459.2012.737745.

LSM20

Jiayue Liu, Christian A Shively, and Robi D Mitra. Quantitative analysis of transcription factor binding and expression using calling cards reporter arrays. Nucleic Acids Research, 48(9):e50–e50, May 2020. URL: https://academic.oup.com/nar/article/48/9/e50/5781211, doi:10.1093/nar/gkaa141.

MLH+20

Arnav Moudgil, Daofeng Li, Silas Hsu, Deepak Purushotham, Ting Wang, and Robi David Mitra. The qBED track: a novel genome browser visualization for point processes. preprint, bioRxiv, April 2020. URL: http://biorxiv.org/lookup/doi/10.1101/2020.04.27.060061, doi:10.1101/2020.04.27.060061.

MWC+20

Arnav Moudgil, Michael N. Wilkinson, Xuhua Chen, June He, Alexander J. Cammack, Michael J. Vasek, Tomás Lagunas, Zongtai Qi, Matthew A. Lalli, Chuner Guo, Samantha A. Morris, Joseph D. Dougherty, and Robi D. Mitra. Self-Reporting Transposons Enable Simultaneous Readout of Gene Expression and Transcription Factor Binding in Single Cells. Cell, pages S009286742030814X, July 2020. shortDOI:d4wx. URL: https://linkinghub.elsevier.com/retrieve/pii/S009286742030814X, doi:10.1016/j.cell.2020.06.037.

SNJC13

Jeffrey D Scargle, Jay P Norris, Brad Jackson, and James Chiang. STUDIES IN ASTRONOMICAL TIME SERIES ANALYSIS. VI. BAYESIAN BLOCK REPRESENTATIONS. The Astrophysical Journal, 764(2):167, February 2013. URL: http://stacks.iop.org/0004-637X/764/i=2/a=167?key=crossref.0539dc6f37f29e250567031865ebbe9a, doi:10.1088/0004-637X/764/2/167.

SLC+19

Christian A. Shively, Jiayue Liu, Xuhua Chen, Kaiser Loell, and Robi D. Mitra. Homotypic cooperativity and collective binding are determinants of bHLH specificity and function. Proceedings of the National Academy of Sciences, 116(32):16143–16152, August 2019. URL: http://www.pnas.org/lookup/doi/10.1073/pnas.1818015116, doi:10.1073/pnas.1818015116.