TUTORIAL - Track 1

High Throughput Sequencing: The Microscope in the Big Data Era

David Tse and Sreeram Kannan

Sunday June 29, 2014
09:00 - 12:00
Room: 316A


Extraordinary advances in sequencing technology in the past decade have revolutionized biology and medicine. Capitalizing on these advances, many high-throughput sequencing based assays (experimental methods) have recently been designed to make various biological measurements of interest, including genetic variations, 3-D structures, transcription, translation, protein binding, etc.

In each of these assays, the biological measurement of interest is reduced to a DNA sequence measurement through biochemistry. The DNA measurement is then carried out using high throughput shotgun sequencers that read short substrings (called reads) from the corresponding DNA. Inference algorithms are then used to extract the biological measurement from the read data, whose size can range from tens of gigabytes to over a terabyte.

The goal of the tutorial is to introduce some basic inference problems that arise in this context and to highlight the role of information theoretic thinking in solving them. In particular, we will look at three problems:

1) DNA assembly: the problem of assembling the underlying genome from reads obtained from the DNA

2) variant calling: the problem of inferring from the reads the variations of the underlying genome from a reference genome,

3) RNA assembly: the problem of assembling the RNA transcripts using reads obtained from the RNA. This includes the reconstruction of tens of thousands of RNA transcripts as well as quantifying their abundances.

Traditionally, these inference problems are approached primarily as software engineering projects, where time and memory requirements are primary concerns while the algorithms themselves are designed based on heuristic considerations with no optimality guarantee. In this tutorial, we discuss how information theoretic principles can lead to a more systematic approach to designing better algorithms. We stress the importance of using real genomics data to drive the theory and algorithm development, as well as the challenges of converting the theory and algorithms to robust scalable software tools that can be used by the biological community.

Download Presentation Slides


Sreeram Kannan is currently a postdoctoral researcher at the University of California, Berkeley. He received his Ph.D. in Electrical Engineering and M.S. in Mathematics from the University of Illinois Urbana- Champaign. He is a co-recipient of the Van Valkenburg research award from UIUC, Qualcomm Roberto Padovani Scholarship for outstanding interns, the Qualcomm Cognitive Radio Contest first prize, the S.V.C. Aiya medal from the Indian Institute of Science, and Intel India Student Research Contest first prize. His research interests include applications of information theory and approximation algorithms to wireless networks and computational biology.

David Tse received the B.A.Sc. degree in systems design engineering from University of Waterloo in 1989, and the M.S. and Ph.D. degrees in electrical engineering from Massachusetts Institute of Technology in 1991 and 1994 respectively. From 1994 to 1995, he was a postdoctoral member of technical staff at A.T. & T. Bell Laboratories. From 1995- 2013, he was on the faculty in the University of California at Berkeley. He is currently a professor at Stanford University. He received a 1967 NSERC graduate fellowship from the government of Canada in 1989, a NSF CAREER award in 1998, the Best Paper Awards at the Infocom 1998 and Infocom 2001 conferences, the Erlang Prize in 2000 from the INFORMS Applied Probability Society, the IEEE Communications and Information Theory Society Joint Paper Awards in 2001 and 2013, the Information Theory Society Paper Award in 2003, the 2009 Frederick Emmons Terman Award from the American Society for Engineering Education, a Gilbreth Lectureship from the National Academy of Engineering in 2012, the Signal Processing Society Best Paper Award in 2012 and the Stephen O. Rice Paper Award in 2013. He is a coauthor, with Pramod Viswanath, of the text "Fundamentals of Wireless Communication", which has been used in over 60 institutions around the world.