Pre-doctoral school in QUANTITATIVE BIOLOGY

July 8th- 19th, 2019 | IFOM, Milan - Italy

Computational genomics

Course description

-First week module-

All the cells in the human body contain almost identical genomes. Yet they differentiate to approximately 200 cell types, performing distinct functions and constituting specific organs and tissues. This is possible because the genome contains the full set of instructions governing the living organism, but each cell type selectively uses only a specific portion of them. Genome functionality is controlled primarily by epigenetics and transcriptional regulation mechanisms. In this context, the three-dimenzional (3D) organization of chromatin into the cell nucleus has gained increasing attention over the years, as an additional crucial layer to regulate genome functionality.

The completion of the human genome project, along with several technological advancements, allowed the development and widespread adoption of a plethora of genome-wide experimental techniques. In particular, methods based on high-throughput sequencing allow characterizing transcriptional and epigenetics regulation at unprecedented resolution and throughput. This unanticipated explosion in sequencing data has established genomics as one of the most "data intensive" sciences, thus pushing forward innovations also in related quantitative fields, including computer science, statistics and biophysics.

In this module we will experience first-hand computational data analyses techniques to extract information from functional genomics data obtained with high-throughput sequencing methods. We will especially focus on information concerning the 3D folding of chromatin as well as its relation to transcription and epigenetic regulation, to understand how these multiple layers control the genome functionality.


Francesco Ferrari

IFOM, The FIRC Institute of Molecular Oncology Foundation

Guest Lecturer:

Guido Tiana is associate professor at the University of Milan, Department of physiscs. The main goal of his research is to understand the behaviour of biological systems from a physical perspective. Proteins, DNA and cells are finite, complex systems that typically show collective phenomena. They are complex because the interaction among their parts is so heterogeneous that the resulting free energy is usually quite rugged. They are small systems, thus the thermodynamic limit does not apply. In order not to be driven by random fluctuations, the various parts of these small systems cooperate to stabilize each other, giving rise to collective phenomena. In particular, the biophysical problems I have studied in detail are: 1) Protein folding. Proteins are chain built out of twenty kinds of amino acids and fold into a unique equilibrium conformation. In a nutshell, the main results of my research concern the importance of few, selected regions of each protein to determine its ability to fold. 2) Design of folding-inhibitor drugs. Peptides with the same sequence as key segments of dangerous proteins are used to inhibit them. This approach has been used successfully to inhibit the protease of the HIV-1 virus, responsible for AIDS. 3) Protein aggregation. A dangerous byway of protein folding is their aggregation into large objects whose stabilization energy is order of magnitudes larger than that of soluble proteins. This is the case, for example, of the Aβ peptide, a short protein that cause Alzheimer’s disease, and the prion protein, which cause Creutzfeld-Jacob disease (the so-called mad-cow disease in bovines). 4) Molecular evolution. The evolution of protein sequences across the eons can be described in terms of Markov processes with the goal of highlighting the part of the protein less prone to mutations. 5) Regulation networks. Protein and DNA form complex networks that can be described in terms of differential equations controlling the concentration of the various species in the cell. In particular, I have studied the oscillations in time of transcripts related to cancer, relating their oscillation to a delay in the associated signaling. 6) Thermodynamics of transcriptional control. Nodes of regulations networks are often proteins, called transcription factors, which bind specific segments of DNA to activate its transcription. The mechanism of transcription activation is non-trivial, and involves cooperative binding and mutual exclusion of the bound species, giving rise to a complicated thermodynamic behavior. 7) The brain spatial representation system. Several neurons in the cerebellum are used to create a representation of the 3D space around us and to develop strategies to move around in it. I study simple physical models to describe how this can be realized. 8) Structure of chromatin. Based on experimental data, I developed a model to obtain the structural properties of chromatin at the kilo-base scale, in particular to highlight conformational fluctuations and their biological relevance. The techniques used for this research are those of statistical mechanics, both analytical and computational. The involve the design of simple models to capture the main physical ingredients that control them, the development of new algorithms for thermodynamic sampling and for generating high-probability molecular trajectories, also writing efficient codes in C to implement these algorithms.