Skip to content

TreeSlicer: Example 4

Louis edited this page Dec 12, 2018 · 9 revisions

Estimating change-point times of rates as dates

In this example we are interested in estimating the time of a major transition. That is, we are estimating a single change-point time between the origin of the tree and the present. We also wish to put a prior on the change-point time, restricting it to be between 1960 and 1980.

Estimating the time of a major transition

Why do this?

This is useful for testing hypotheses about times of major transitions, such as:

  • Did the reproductive number change after the introduction of harm-reduction measures?
  • Is the infectious period shorter after the introduction of treatment?
  • Did the speciation rate increase after the K-T extinction?

Without TreeSlicer

Example XML: dengue4_bdsky_csc.xml

It is possible to estimate the time of a single parameter shift using BDSKY. First, add the change-point times to the state:

<parameter id="reproductiveNumberChangeTime" name="stateNode" value="0 24"/>

Note that this vector needs to be of length 2 for estimating 1 change-point time (BDSKY requires a 0 in the change-point times vector for a reason I don't quite understand). We initialise the change-point time to 1970 (the most recent sampling time is 1994, 1994-1970=24).

Next, we add this parameter to the birth-death skyline tree-prior as the birthRateChangeTimes:

<distribution id="BirthDeathSkySerial" 
              spec="beast.evolution.speciation.BirthDeathSkylineModel" 
              tree="@Tree"
              reproductiveNumber="@reproductiveNumber" 
              becomeUninfectiousRate="@becomeUninfectiousRate" 
              samplingProportion="@samplingProportion" 
              origin="@origin">
    <birthRateChangeTimes idref="reproductiveNumberChangeTime"/>
    <reverseTimeArrays spec="beast.core.parameter.BooleanParameter" value="true false false false false false"/>
</distribution>

We also need to indicate to BDSKY that the reproductive number (or birth-rate) change-point times should be reversed, meaning that change-point times are measured as time between the present and the change-point time (heights on the tree), instead of between the origin and the change-point time (default behaviour for BDSKY). Measuring the change-point time from the origin would lead to possibly poor convergence, because the change-point time will change every time the origin time is changed. We do this using the reverseTimeArrays array. Reproductive number (or birth-rate) change-point times are the first in the array (see BDSKY code/documentation), which is why the first element is set to true.

To set a prior on the change-point time we first have to select only the second element in the change-point times vector and then place a prior on it alone. This is because the prior probability is evaluated at every element of a vector, and we don't necessarily want to include 0 in our prior on the change-point height (meaning that the present, 1994 in this case, is included in the prior for the change-point date). To do this we can use beast.core.util.Slice or ExcludablePrior (can be found in multitypetree.distributions or beast.math.distributions). I prefer to use Slice, since it is part of the BEAST2 core and it involves less typing. First declare the slice somewhere in the XML file (not inside the MCMC object):

<function spec="beast.core.util.Slice" id="reproductiveNumberChangeTimeSlice" 
          arg="@reproductiveNumberChangeTime" index="1" count="1"/> 

This tells BEAST2 to select 1 element from reproductiveNumberChangeTime, starting at element 1 (Slice starts number at 0). Now we can place a prior on the change-point time, in this case we add a uniform prior between 1960 and 1980. Since the most recent sample is from 1994, the uniform prior is between 14 and 34 (note that the upper and lower limits reverse when switching from dates to heights):

<prior id="reproductiveNumberChangeTimePrior" name="distribution" x="@reproductiveNumberChangeTimeSlice">
    <Uniform name="distr" upper="34" lower="14"/>
</prior>

Finally, it is possible to log the change-point time as a date using an RPNCalculator:

<RPNcalculator id="ReChangeDateRPN" spec="beast.math.statistic.RPNcalculator" 
               expression="1994 reproductiveNumberChangeTime -">
    <parameter idref="reproductiveNumberChangeTime"/>
</RPNcalculator>

With TreeSlicer

Example XML: dengue4_bdsky_csc_treeslicer.xml

With TreeSlicer it is possible to directly estimate the change-point time as a date, and place a prior on the date (without needing to select parts of the vector). First, add the change-point date to the state (initialised to 1970):

<parameter id="reproductiveNumberChangeDate" name="stateNode" value="1970"/>

Then change birthRateChangeTimes to a TreeSlicer object:

<distribution id="BirthDeathSkySerial" 
              spec="beast.evolution.speciation.BirthDeathSkylineModel" 
              tree="@Tree"
              reproductiveNumber="@reproductiveNumber" 
              becomeUninfectiousRate="@becomeUninfectiousRate" 
              samplingProportion="@samplingProportion" 
              origin="@origin">
    <birthRateChangeTimes spec="TreeSlicer" id="ReTreeSlice" tree="@Tree" 
                          dates="@reproductiveNumberChangeDate" type="dates"/>
    <reverseTimeArrays spec="beast.core.parameter.BooleanParameter" value="true false false false false false"/>
</distribution>

Now add a prior on the change-point date:

<prior id="reproductiveNumberChangeDatePrior" name="distribution" x="@reproductiveNumberChangeDate">
    <Uniform name="distr" upper="1980" lower="1960"/>
</prior>

TODO: Make it possible to estimate multiple change-point times. Since there is no ordered double vector in BEAST2, simply adding multiple change-point times won't work well, since the scale operators can change the order (it will work eventually, but it will mix poorly). It is also very complex to put different priors on different change-point times using excludable prior or slice.

So why use TreeSlicer?

  • Security: With TreeSlicer, the change-point time is a part of the model only, and is completely independent of the data. That means the sequencing data in the XML file can be changed, without needing to change the model specification, which makes the code more secure and general. Suppose we add a new sequence sampled at 1996. Without TreeSlicer, we would need to recompute the times between the most recent sample and the limits for the change-point time prior. With TreeSlicer, no changes need to be made, which means there are less opportunities to introduce errors.

  • Ease of use: With TreeSlicer there is no need to calculate the time between the most recent sample and the change-point time prior parameters or the estimated time. It's easy enough in this example (1994 - 1980 = 14 etc.), but it can easily get more complex. In particular it will be easy to introduce errors in post-processing, because the estimated time always needs to be converted to a date, which requires knowing the (correct) date of the most recent sample.

Clone this wiki locally