Systems Biology at Harvard Medical School

Home | People | Research | Publications | Contact | Internal
Home | Contact | Faculty
Home | HMS |

Fontana Laboratory — Research

  "Executable Knowledge" and models of molecular information processing


We aim at integrating bioinformatics with a recent practice of modeling for the purpose of exploring dynamical processes in protein-protein interaction (PPI) networks at a level of complexity that represents available mechanistic knowledge. In short, the objective is to make dynamical models and knowledge representation twin sides of biological reasoning: a model should be a transparent, formal and executable representation of the facts it rests upon. These facts describe interaction capabilities of proteins, by reference to their biophysical and biochemical features, and can be expressed as abstract rules in a formal language that hides the microscopic details.

Many diverse experimental techniques now generate data that provide information about mechanistic aspects of protein-protein interaction. Depending on how they were acquired, such data can exhibit varying degrees of causal resolution and contextual reference: "SH2 domains bind phosphotyrosines" or "In frog-egg extracts, Axin-1 binds a region in the armadillo repeat of -catenin, if beta-catenin is unphosphorylated at S45." The issue is how to deploy PPI data towards achieving an understanding of system-level processes that shape development, maintenance, and evolution of biological organizations in terms of the biophysical and biochemical interaction capabilities of individual molecular agents.

Models are required to enable a computational inspection of the dynamic nature and behavior of protein-protein interaction networks. Our emphasis is on models that represent these networks "as we know them" by formalizing empirical statements about interaction mechanisms, while leaving an auditable trail of evidence linking them to the sources from which they were derived. We call models of this kind \emph{Executable Knowledge}. They differ from traditional models in form and function, as they must allow for the potential explosion of state that results from multiple and partially independent binding events combined with post-translational modifications. Allowing for this combinatorial complexity rules out models that are based on the explicit specification of all possible molecular species, such as kinetic differential equations or chemical master equations governing joint probability distributions. These approaches are unfeasible unless the system of interest is dramatically simplified at the outset, thus running afoul of representation and risking to bill elaborated prejudice as understanding. The ambition here is to permit the construction of initial models that represent knowledge, thus making simplification an optional subsequent goal rather than a necessity imposed by the lack of alternatives.

Achieving this class of extended models hinges on:

  • The definition of a formal language with a clear operational semantics for representing biochemical and biophysical aspects of interaction at a pragmatic level of abstraction;
  • The implementation of mathematically sound and scalable tools for analyzing and executing arbitrary collections of interaction statements;
  • Computational protocols for constructing a corpus of formalized interaction statements from the content of extant databases and the literature;
  • The deployment of such corpora, models, and supporting tools as web services.
  • Identifying and addressing biological questions that would be very difficult or impossible to address otherwise.
We pursue this research program in partnership with Vincent Danos (CNRS, Ecole Normale Superieure Paris, University of Edinburgh), Jean Krivine (Paris Diderot), Jérôme Feret (INRIA, Ecole Normale Superieure Paris), and Russ Harmer (CNRS, Ecole Normale Superieure Lyon). Our approach revolves around an agent-based (or rule-based) view of biological molecules and their actions. This approach is analogous in spirit to how reactions are represented in organic chemistry, but more attuned to the needs of molecular biologists. In chemistry, the composition of molecules is expressed in a formal language and chemical reactions are codified in terms of rules describing how functional groups engage in specific transformations regradless of the full molecular context in which they are embedded. A rule only describes that part of context required for an interaction, leaving the rest unspecified. We use a formal language - Kappa, originally proposed by Vincent Danos and Cosimo Laneve - to express proteins in terms of "sites" that represent interaction capabilities. Such capabilities carry "state", like binding (as in complex formation), any number of post-translational tags, or information about localization. Rules formally express empirically obtained facts about protein-protein interactions. They specify the state of sites only to an extent necessary for stipulating the conditions for interaction. Similar approaches have been taken independently elsewhere, most notably in BioNetGen developed by James Faeder, Michael Blinov and Bill Hlavacek.