Artificial Intelligence & Systems Biology

Our research combines artificial intelligence and systems biology techniques to decode how biological networks are rewired in a disease like cancer, where cancer cells evolve and interact within a complex and dynamic microenvironment.

We search for gene networks and ‘signatures’ that enable us to understand how normal cells rewire their molecular circuits to become cancer cells, and how they respond to microenvironment characteristics, such as low oxygen levels (hypoxia), or signalling from non-cancer cells. This helps us to understand cancer evolution, heterogeneity, and to predict therapeutic strategies that are most appropriate. More specifically, in cancer, and also other complex diseases, the development of drug resistance is a major underlying cause of limited targeted therapy effect on overall patient survival. Combination therapy with multiple drugs is an attractive option, but experimental strategies to evaluate potential combinations are faced with an ever-increasing number of drugs. Computational approaches have been instrumental in addressing this challenge, but their effectiveness has been limited by the lack of a framework able to model interactions of the three major factors affecting therapeutic resistance: a) selection of resistant clones, b) adaptability of gene signalling networks over time, and c) a protective and hypoxic tumour microenvironment. To address this, our programme focuses on four main tasks:

1. Modelling the complexity of the tumour microenvironment

Funded by a European Research Council Award, we have developed a framework which combines Agent-Based Modelling, a powerful computational technique, with modelling of gene networks, lending itself naturally to model heterogeneous population of cells acting and evolving in a dynamic microenvironment. These models enable us to study cancer initiation, clonal competition, interactions of cancer cells with the host microenvironment, and response to different drugs and combinations.
A detailed description can be found at: process.innovation.ox.ac.uk.
The website with the first deposited version of the software implementing the methodology is: merlin.oncology.ox.ac.uk .

The ABMGN framework (components are shown in the Figure) was developed on the assumption that the phenotype of a cell is the set of observable characteristics resulting from interaction of the cell genotype with the surrounding environment, which together determine cell behaviour after specific perturbations (such as drug treatment).

Ultimately, our final goal is to use these types of models to identify likely resistance mechanisms, and predict the most appropriate treatment for each patient.

Traditional analysis of molecular pathways and gene networks has provided an invaluable tool to understanding the basis for cell behaviour; however, typically it does not consider the cell microenvironment, which is a key determinant of phenotype. In contrast, ABMGN allows identification of single-cell behaviour determinants, while uncovering emerging properties of multi-cellular growth and evolution in a given environmental context. The framework allows us to model co-occurring intrinsic (e.g. mutation-dependent) and extrinsic (e.g. drug-dependent) perturbations, and their interactions, and lends itself naturally to study complex heterogeneous populations of cells acting and evolving in a dynamic microenvironment.
To train these models multi-omics and phenotypic data are needed. We are integrating data from multi-omics in cell lines and clinical cohorts, and CRISPR screens, produced in ours and in collaborating labs.

2. Using Artificial Intelligence to integrate multiple omics

A set of multiple factors and interactions underlie cancer progression and response to treatment. We have a longstanding expertise in exploiting machine learning techniques to integrate the multiple omic layers measured by us and collaborators in human cancer samples, and derive generalizable and robust signatures of specific biological phenotypes and clinical endpoints. Leveraging and fostering the synergies between research in AI in the computing sciences department at Bocconi, where Francesca Buffa is full professor, and in the biomedical sciences at IFOM, we aim to boost this area of research to answer basic cancer biology questions and generate useful clinical tools.

Integration of mutation, amplification, transcriptional, methylation and miRNA sequencing data in 15 cancer types, 7738 cancer/normal clinical samples, to build a pan-cancer network of the association between miRNA (grey nodes) and cancer hallmarks (color-coded nodes). Dhawan et al, Nature Comm, 9, 5228 (2018).

A validated miRNA prognostic signature in breast cancer estrogen receptor (ER) positive and negative cohorts. The signature was developed using penalized linear regression with nested leave one-out and cross-validation. The miRNA identified have been subsequently characterised in a number of follow-up studies. Cancer Res. 71:5635-45 (2011).Featured in: Key Paper Evaluation. Expert Rev Anticancer Ther 2012 Mar;12(3):323-30.

Integrated analysis of mutation, amplification and gene expression data in 10 cancer types, 6538 tumour/normal samples, revealed common amplification and over-expression, but infrequent mutation of metabolic genes. Top candidate drivers of metabolic disruption are show across cancer types (see manuscript for abbreviations), shade of blue indicates increasing fraction of samples with amplification and over-expression. Haider S et al, Genome Biol. 17(1):140 (2016).

3. Exploiting multi-omics technologies to reconstruct gene networks

We have been developing and applying gene network analysis approaches to infer cell signalling and transcription factors networks using transcriptomic data in clinical samples from hundreds to thousands of individuals. This offers a tool to infer gene function ‘in context’, which complements information obtained from in-vitro (‘out of context’) perturbation experiments. To reduce the number of false associations, we have been exploiting previously acquired knowledge to inform the network derivation. A particularly successful technique has been to ‘seed’ networks by starting from well-validated genes. Further genes are then recruited based on the association of their expression with the expression of the initial seeds, and the network is so expanded.

We have previously demonstrated that robust signatures can be extracted from these networks by selecting hub genes. These can be used to estimate the network activity in human clinical samples, and have provided biomarkers that have been validated in prospective studies.

Derivation of an angiogenesis signature and discovery of ELTD1, a major player in both development and cancer angiogenesis. Masiero M. et al, Cancer Cell 24:229-41 (2013) Featured in: Highlights, American Association of Cancer Research, Cancer Res 73; 5299 (2013).

Derivation of a hypoxia related network (seeds in yellow, recruited genes in blue) conserved across multiple cancer types. Buffa FM et al, Br J Cancer 102:428-35 (2010) & Winter SC-Buffa FM [joint] et al, Cancer Res 67:3441-9 (2007). The hypoxia signatures extracted from these networks have been used in several clinical studies, including currently in the S:CORT collaborative network.

4. Using deposited knowledge to infer the biological phenotype of clinical samples

To help bridging the gap between cancer cell lines models and cancer patients, we have been developing ways of exploiting gene signatures to generate and validate biological hypotheses in human clinical samples. These consist of a set of genes, whose collective expression is associated with a known phenotype or cancer hallmark. For example, we have used hypoxia, metabolism and angiogenesis signatures to study the temporal changes and treatment response of individual cancer samples to drugs targeting metabolism and angiogenesis (e.g. recently Lord SR et al, Cell Metabolism 28:679-688, 2018). Whilst this task is challenging, the increasing availability of gene signatures is a resource which needs to be evaluated, and, if suitable, exploited to this aim. To facilitate this, we are developing protocols that allow to evaluate the applicability of previously developed gene signatures to newly acquired datasets.

SigQC: a procedural approach for standardising the evaluation of gene signatures. The main elements of the protocol include indicators of gene expression, variability and data structure, and evaluation of different metrics to represent the signature information content. Dhawan A et al. Nature Protocols 2019)

Buffa Lab Researcher Contact the researcher