Recurrent Event Analysis • aion

library(aion)
library(survival)
library(card)

Recurrent events are an extension of general survival models. These functions are mostly well captured within the survival package, except for the data preparation. This will walk through the approach in R using using a sample data set from the card package.

First we will set up the data. There are several events, and several event types, including a final censoring event. This is in a relatively standard data format, with each row fully describing each subject and the corresponding events.

data("stress")
head(stress)
#> # A tibble: 6 × 11
#>      id start      stop       head_ache_date_1 head_ache…¹ head_ach…² heart_ac…³
#>   <dbl> <date>     <date>     <date>           <date>      <date>     <date>    
#> 1   123 2014-08-26 2016-12-08 2016-12-08       NA          NA         NA        
#> 2   117 2014-08-12 2016-10-30 2016-10-30       NA          NA         NA        
#> 3   145 2014-10-22 2016-07-31 2016-06-01       2016-07-31  NA         NA        
#> 4   144 2014-10-21 2017-03-24 2015-02-12       NA          NA         2017-02-18
#> 5   204 2015-03-16 2015-04-03 NA               NA          NA         NA        
#> 6   283 2015-10-12 2016-01-19 NA               NA          NA         NA        
#> # … with 4 more variables: heart_ache_date_2 <date>, heart_ache_date_3 <date>,
#> #   death <dbl>, broken_heart <dbl>, and abbreviated variable names
#> #   ¹head_ache_date_2, ²head_ache_date_3, ³heart_ache_date_1

# For short cuts
events <- c(paste0("head_ache_date_", 1:3), paste0("heart_ache_date_", 1:3))

Traditional model

This is a traditional survival analysis. It requires almost no formating, but is available for thoroughness.

trad_data <-
  recur(
    stress,
    model_type = "trad",
    id = "id",
    left = "start",
    right = "stop",
    censor = "death"
  )

head(trad_data)
#> # A tibble: 6 × 7
#>      id status start  stop strata   date       events
#>   <dbl>  <dbl> <dbl> <dbl> <chr>    <date>      <int>
#> 1     1      0     0  1914 strata_0 2017-10-23      0
#> 2     2      0     0  1797 strata_0 2017-07-17      0
#> 3     3      0     0     2 strata_0 2012-08-29      0
#> 4     4      0     0  1776 strata_0 2017-07-10      0
#> 5     5      0     0  1780 strata_0 2017-07-20      0
#> 6     6      0     0  1753 strata_0 2017-07-10      0

# Now add back a covariate for analysis
df <- merge(trad_data, stress[c("id", "broken_heart")], by = "id", all.x = TRUE)

coxph(Surv(stop, status) ~ broken_heart, data = df) |>
  summary()
#> Call:
#> coxph(formula = Surv(stop, status) ~ broken_heart, data = df)
#> 
#>   n= 98, number of events= 6 
#>    (2 observations deleted due to missingness)
#> 
#>                 coef exp(coef) se(coef)      z Pr(>|z|)
#> broken_heart -0.1294    0.8786   0.8166 -0.158    0.874
#> 
#>              exp(coef) exp(-coef) lower .95 upper .95
#> broken_heart    0.8786      1.138    0.1773     4.354
#> 
#> Concordance= 0.508  (se = 0.102 )
#> Likelihood ratio test= 0.03  on 1 df,   p=0.9
#> Wald test            = 0.03  on 1 df,   p=0.9
#> Score (logrank) test = 0.03  on 1 df,   p=0.9

Andersen-Gill model

The Andersen-Gill model is a generalization of the Cox model. There is a common baseline hazard for all events, and assumes constant hazard throughout time. It highlights the estimate on the intensity of the reucrrent events.

ag_data <-
  recur(
    stress,
    model_type = "ag",
    id = "id",
    left = "start",
    right = "stop",
    censor = "death",
    event_dates = events
  )

head(ag_data)
#> # A tibble: 6 × 7
#>      id status start  stop strata   date       events
#>   <dbl>  <dbl> <dbl> <dbl> <chr>    <date>      <int>
#> 1     1      0     0  1914 strata_0 2017-10-23      0
#> 2     2      0     0  1797 strata_0 2017-07-17      0
#> 3     3      0     0     2 strata_0 2012-08-29      0
#> 4     4      0     0  1776 strata_0 2017-07-10      0
#> 5     5      0     0  1780 strata_0 2017-07-20      0
#> 6     6      0     0  1753 strata_0 2017-07-10      0

As you can see, the number of rows has increased, as there are multiple events now for each subject. Of note, the events are given equal weight.

# Now add back a covariate for analysis
df <- merge(ag_data, stress[c("id", "broken_heart")], by = "id", all.x = TRUE)

coxph(
  Surv(start, stop, status) ~ broken_heart,
  method = "breslow",
  data = df
) |>
  summary()
#> Warning in Surv(start, stop, status): Stop time must be > start time, NA created
#> Call:
#> coxph(formula = Surv(start, stop, status) ~ broken_heart, data = df, 
#>     method = "breslow")
#> 
#>   n= 173, number of events= 87 
#>    (18 observations deleted due to missingness)
#> 
#>                 coef exp(coef) se(coef)      z Pr(>|z|)    
#> broken_heart -1.0597    0.3466   0.2370 -4.471 7.77e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#>              exp(coef) exp(-coef) lower .95 upper .95
#> broken_heart    0.3466      2.885    0.2178    0.5515
#> 
#> Concordance= 0.632  (se = 0.028 )
#> Likelihood ratio test= 22.33  on 1 df,   p=2e-06
#> Wald test            = 19.99  on 1 df,   p=8e-06
#> Score (logrank) test = 21.93  on 1 df,   p=3e-06

Wei-Lin-Weissfield or marginal model

This model does not specifiy dependence structures amongst recurrent event times within a subject. There is also not a baseline hazard assumption. However, the max number of events are specified in advance. We tend to use this when the dependence structure is complex or unknown (and may not matter).

marg_data <-
  recur(
    stress,
    model_type = "marginal",
    id = "id",
    left = "start",
    right = "stop",
    censor = "death",
    event_dates = events
  )

head(marg_data)
#> # A tibble: 6 × 7
#>      id status start  stop strata   date       events
#>   <dbl>  <dbl> <dbl> <dbl> <chr>    <date>      <int>
#> 1     1      0     0  1914 strata_0 2017-10-23      0
#> 2     2      0     0  1797 strata_0 2017-07-17      0
#> 3     3      0     0     2 strata_0 2012-08-29      0
#> 4     4      0     0  1776 strata_0 2017-07-10      0
#> 5     5      0     0  1780 strata_0 2017-07-20      0
#> 6     6      0     0  1753 strata_0 2017-07-10      0

unique(marg_data$strata)
#> [1] "strata_0" "strata_1" "strata_2" "strata_3" "strata_4" "strata_5"

There are strata now for recurring events.

# Now add back a covariate for analysis
df <- merge(marg_data, stress[c("id", "broken_heart")], by = "id", all.x = TRUE)

coxph(
  Surv(start, stop, status) ~ broken_heart + cluster(id),
  robust = TRUE,
  method = "breslow",
  data = df
) |>
  summary()
#> Warning in Surv(start, stop, status): Stop time must be > start time, NA created
#> Call:
#> coxph(formula = Surv(start, stop, status) ~ broken_heart, data = df, 
#>     robust = TRUE, method = "breslow", cluster = id)
#> 
#>   n= 180, number of events= 90 
#>    (11 observations deleted due to missingness)
#> 
#>                 coef exp(coef) se(coef) robust se      z Pr(>|z|)   
#> broken_heart -0.6795    0.5069   0.2306    0.2476 -2.744  0.00607 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#>              exp(coef) exp(-coef) lower .95 upper .95
#> broken_heart    0.5069      1.973     0.312    0.8235
#> 
#> Concordance= 0.572  (se = 0.03 )
#> Likelihood ratio test= 9.35  on 1 df,   p=0.002
#> Wald test            = 7.53  on 1 df,   p=0.006
#> Score (logrank) test = 9.02  on 1 df,   p=0.003,   Robust = 9.93  p=0.002
#> 
#>   (Note: the likelihood ratio and score tests assume independence of
#>      observations within a cluster, the Wald and robust score tests do not).

Prentice, Williams, Peterson (PWP) model

The PWP orders events into strata, and counts all subjects at being risk for the first event. But, it only includes risk for subjects that have already had an event for subsequent strata. The two methods are to use the total time or the time between events as the measure of time.

total_data <-
  recur(
    stress,
    model_type = "pwptt",
    id = "id",
    left = "start",
    right = "stop",
    censor = "death",
    event_dates = events
  )

tail(total_data)
#> # A tibble: 6 × 7
#>      id status start  stop strata   date       events
#>   <dbl>  <dbl> <dbl> <dbl> <chr>    <date>      <int>
#> 1   283      1     0    99 strata_0 2016-01-19      0
#> 2   284      0     0   796 strata_0 2017-12-19      0
#> 3   287      1     0   131 strata_1 2016-02-27      1
#> 4   287      0   131   865 strata_0 2018-03-02      1
#> 5   311      0     0   728 strata_0 2018-01-23      0
#> 6   321      0     0   680 strata_0 2018-01-12      0

gap_data <-
  recur(
    stress,
    model_type = "pwpgt",
    id = "id",
    left = "start",
    right = "stop",
    censor = "death",
    event_dates = events
  )

tail(gap_data)
#> # A tibble: 6 × 7
#>      id status start  stop strata   date       events
#>   <dbl>  <dbl> <dbl> <dbl> <chr>    <date>      <int>
#> 1   283      1     0    99 strata_0 2016-01-19      0
#> 2   284      0     0   796 strata_0 2017-12-19      0
#> 3   287      1     0   131 strata_1 2016-02-27      1
#> 4   287      0     0   734 strata_0 2018-03-02      1
#> 5   311      0     0   728 strata_0 2018-01-23      0
#> 6   321      0     0   680 strata_0 2018-01-12      0

The strata for events can effect the start time for risk for an event in this model type. This show cases the findings with total time as part of the risk.

# Now add back a covariate for analysis
df <- merge(total_data, stress[c("id", "broken_heart")], by = "id", all.x = TRUE)

coxph(
  Surv(start, stop, status) ~ broken_heart + cluster(id) + strata(strata),
  robust = TRUE,
  method = "breslow",
  data = df
) |>
  summary()
#> Warning in Surv(start, stop, status): Stop time must be > start time, NA created
#> Call:
#> coxph(formula = Surv(start, stop, status) ~ broken_heart + strata(strata), 
#>     data = df, robust = TRUE, method = "breslow", cluster = id)
#> 
#>   n= 173, number of events= 87 
#>    (18 observations deleted due to missingness)
#> 
#>                 coef exp(coef) se(coef) robust se      z Pr(>|z|)
#> broken_heart -0.2337    0.7916   0.2705    0.2402 -0.973    0.331
#> 
#>              exp(coef) exp(-coef) lower .95 upper .95
#> broken_heart    0.7916      1.263    0.4943     1.268
#> 
#> Concordance= 0.51  (se = 0.034 )
#> Likelihood ratio test= 0.77  on 1 df,   p=0.4
#> Wald test            = 0.95  on 1 df,   p=0.3
#> Score (logrank) test = 0.75  on 1 df,   p=0.4,   Robust = 0.87  p=0.4
#> 
#>   (Note: the likelihood ratio and score tests assume independence of
#>      observations within a cluster, the Wald and robust score tests do not).

Here we use gap time.

# Now add back a covariate for analysis
df <- merge(gap_data, stress[c("id", "broken_heart")], by = "id", all.x = TRUE)

coxph(
  Surv(start, stop, status) ~ broken_heart + cluster(id) + strata(strata),
  robust = TRUE,
  method = "breslow",
  data = df
) |>
  summary()
#> Warning in Surv(start, stop, status): Stop time must be > start time, NA created
#> Call:
#> coxph(formula = Surv(start, stop, status) ~ broken_heart + strata(strata), 
#>     data = df, robust = TRUE, method = "breslow", cluster = id)
#> 
#>   n= 173, number of events= 87 
#>    (18 observations deleted due to missingness)
#> 
#>                 coef exp(coef) se(coef) robust se     z Pr(>|z|)
#> broken_heart 0.01515   1.01526  0.25229   0.23920 0.063     0.95
#> 
#>              exp(coef) exp(-coef) lower .95 upper .95
#> broken_heart     1.015      0.985    0.6353     1.623
#> 
#> Concordance= 0.501  (se = 0.035 )
#> Likelihood ratio test= 0  on 1 df,   p=1
#> Wald test            = 0  on 1 df,   p=0.9
#> Score (logrank) test = 0  on 1 df,   p=1,   Robust = 0  p=0.9
#> 
#>   (Note: the likelihood ratio and score tests assume independence of
#>      observations within a cluster, the Wald and robust score tests do not).

Additional methods

Multi-state models
Frailty models

Will expand this in the future.