2017-12-12

Inferring who infected whom

Transmission tree, transmission chain

  • transmission tree: directed acyclic graph representing transmission events of an outbreak
  • transmission chain: path from a case to a descending case

Why does it matter?

Different transmission contexts need different responses.

A difficult problem

The number of possible transmission trees grows very fast:

  • for 10 cases with unique onsets, \(\sim 3,500,000\) trees

  • for 60 cases with unique onsets, \(\sim 8^{81}\) trees (more or less the estimated number of atoms in the universe)

Some data can be informative

  • mutations accumulate in the pathogen genome along the transmission chains

  • can be used to reconstruct transmission trees

But even genomic data have limits


For most diseases, whole genome sequences alone are not sufficient for reconstructing transmission trees.

An evidence synthesis approach


  • data (e.g. dates of symptom onset) restrict possible trees

  • combine different types of data to identify a small set of plausible trees

Outbreak reconstruction using outbreaker2

Original idea of outbreaker

Original outbreaker model used serial interval and genetic data to reconstruct transmission events.


outbreaker2: a modular platform for outbreak reconstruction


Temporal likelihood

Combines generation time, incubation period, and dates of onset.

Temporal likelihood

Dates of symptom onset (\(t_i\)), generation time (\(w()\)), incubation period (\(f()\)), number of generations (\(\kappa_i\)).

\[ p(t_i | T_i^{inf}) \times p(T_i^{inf} | T_{\alpha_i}^{inf}, \kappa_i) = f(t_i - T_i^{inf}) \times w^{(\kappa_i)}(T_i^{inf} - T_{\alpha_i}^{inf}) \]

Reporting likelihood

  • \(\kappa_i\): number of generations between case \(i\) and its more recent sampled "ancestor"; \(\pi\) case reporting probability

Geometric distribution:

\[ p(\kappa_i | \pi) = (1 - \pi^{})^{\kappa_i - 1} \times \pi \]

Genetic likelihood

  • \(d()\): number of mutations between 2 sequences; \(s_i\), \(s_{\alpha_i}\): sequences of infectee their infector; \(L\): genome length; \(\mu\): mutation rate per generation of infection

\[ p(s_i | s_{\alpha_i} \mu) = \mu^{d(s_i, s_{\alpha_i})} \times (1 - \mu)^{(\kappa_i L - d(s_i, s_{\alpha_i}))} \]

Contact likelihood

Relies on: contact reporting probability (\(\epsilon\)) and probability of contact between non-transmission pairs (\(\lambda\)).

\(\alpha_i = j\): \(p(c_{i,j} = 1 ) = \epsilon\) ; \(p(c_{i,j} = 0) = 1 - \epsilon\)

\(\alpha_i \neq j\): \(p(c_{i,j} = 1) = \lambda \epsilon\) ; \(p(c_{i,j} = 0) = (1 - \lambda) + \lambda (1 - \epsilon)\)