Systems Biology at Harvard Medical School

Home | People | Research | Publications | Contact | Internal
Home | Contact | Faculty
Home | HMS |
subglobal4 link | subglobal4 link | subglobal4 link | subglobal4 link | subglobal4 link | subglobal4 link | subglobal4 link
subglobal5 link | subglobal5 link | subglobal5 link | subglobal5 link | subglobal5 link | subglobal5 link | subglobal5 link
subglobal6 link | subglobal6 link | subglobal6 link | subglobal6 link | subglobal6 link | subglobal6 link | subglobal6 link
subglobal7 link | subglobal7 link | subglobal7 link | subglobal7 link | subglobal7 link | subglobal7 link | subglobal7 link
subglobal8 link | subglobal8 link | subglobal8 link | subglobal8 link | subglobal8 link | subglobal8 link | subglobal8 link

Fontana Laboratory — Research


  Models of molecular information processing


   

To understand biology and cure disease, we must study the complex dynamical systems created by interactions between gene products, which collectively generate phenotype. We need to reason about the dynamical processes by which cells transform environmental cues into internal signals and convert them into actions that determine cell fate.

Two challenges stand out:

  • Biological processes underlying cellular decisions are combinatorial. Post-translational modification and complex formation give rise to astronomical numbers of possible chemical species, making a description in terms of differential equations unfeasible and meaningless - unless the description occurs at some suitable level of aggregation. But exactly what is suitable? And is the same aggregation suitable in all circumstances? And how would we find out without first exploring a system based on its full complement of possibilities?
  • Empirical knowledge about protein-protein interactions is rapidly evolving, while being scattered across different research communities. Models and the facts they rest upon must therefore become self-documenting. They need to be embedded in an "operating environment" designed to help biologists cope with incomplete, inconsistent, and continually changing information. Modeling in biology is also a process for inventorizing knowledge.

The research program we pursue in partnership with Vincent Danos (Edinburgh) revolves around an agent-based (or rule-based) view of biological molecules and their actions (*). This approach is analogous in spirit to how reactions are represented in organic chemistry, but more attuned to the needs of molecular biologists. In chemistry, the internal structure of molecules is expressed in a formal language. Chemical reactions are then codified in terms of rules describing how functional groups engage in specific transformations. A rule only describes the structural context required for an interaction, leaving the rest unspecified (for example the residue of an amino acid). We use a formal language - Kappa, originally proposed by Vincent Danos and Cosimo Laneve - to express proteins in terms of "sites" that represent interaction capabilities. Such capabilities carry "state", like binding (as in complex formation), any number of post-translational modifications, or information about localization. Rules then formally express empirically obtained facts about protein-protein interactions. They specify the state of sites only to an extent necessary for stipulating the conditions for interaction. Similar approaches have been taken before, most notably in BioNetGen, with the intent of using rules as an aid in automatically generating large systems of differential equations. Our philosophy differs in that we specify a system as a set of rules, analyze that set directly (deploying techniques from abstract interpretation), and use it to drive a stochastic simulation without ever writing an equation. The system of rules replaces the system of equations as the formal object to be analyzed. Indeed, preliminary representations of EGF signaling in terms of 300+ interaction rules would yield more differential equations than Avogadro's number!

The current implementation of kappa tools includes:

  • A scalable generic stochastic simulator for rule-based models. (Scalable means that an update step of the simulation is independent of the number of potential molecular species, independent of the number of molecular instances in the system, and only logarithmically dependent on the number of rules.)
  • A tool for discovering and displaying the causal interdependencies among rules.
  • A tool for computing and displaying the contact map (the protein-protein interaction map) implied by the rules.
  • A tool for event-based dependency analysis. This tool determines the events that are necessary for generating, in a given system, any user-specified observable. The tool also determines and displays the precedence relations that hold among these events ("event structure"), formalizing the concept of "pathway".
  • A compressor that minimizes rules in a particular context. This is extremely useful for debugging models, because the compressor detects rules that are never applicable.
  • An enumeration of all possible "local views" of the proteins in the system. A local view specifies the state of the sites of a protein as well as the identity of its binding partners.
  • A full-fledged graphical interface for rule definition, along with a simple system for managing rule databases, building models, and graphically displaying simulation trajectories.

We apply these tools in the study of large and combinatorially complex signaling systems (such as EGF or mTOR), and exploit them in more theoretical studies aimed at understanding principles of molecular information processing.

This framework is being further developed (particularly in the hands of Jérôme Feret and Jean Krivine). Its robust implementation was made possible by bringing together a world-class team of computer scientists - Vincent Danos, Jérôme Feret, Jean Krivine, Russ Harmer, and many others - in the context of a venture-backed company, Plectix BioSystems, that WF founded in 2005.


(*)
The main conceptual significance of an agent-based view is the ability to track "agent lineages". For example, in an agent-based view, an enzyme-substrate complex is an entity that is explicitly represented as consisting of an enzyme agent and a substrate agent. In contrast, within the framework of differential equations, the compositional structure of an enzyme-substrate complex is not represented at all. The kinetic equations refer to the complex simply in terms of a variable (holding a concentration value) whose name is arbitrary and of no formal significance. If names of variables have any structure, then only as a mnemonic device. The practical significance of agent-based models consists in (i) breaking through the combinatorial barrier and (ii) codifying empirical facts as executable (because formal) rules that describe protein behaviors.