pyears               package:survival               R Documentation

_P_e_r_s_o_n _Y_e_a_r_s

_D_e_s_c_r_i_p_t_i_o_n:

     This function computes the person-years of follow-up time
     contributed by a  cohort of subjects, stratified into subgroups. 
     It also computes the number of subjects who contribute to each
     cell of the  output table, and optionally the number of events
     and/or expected number of  events in each cell.

_U_s_a_g_e:

     pyears(formula, data, weights, subset, na.action,  
            ratetable=survexp.us, scale=365.25, expect=c('event', 'pyears'),
            model=FALSE, x=FALSE, y=FALSE, data.frame=FALSE)

_A_r_g_u_m_e_n_t_s:

 formula: a formula object.  The response variable will be a vector of
          follow-up times  for each subject, or a 'Surv' object
          containing the survival time and an event indicator.  The
          predictors consist of optional grouping variables  separated
          by + operators (exactly as in 'survfit'), time-dependent
          grouping  variables such as age (specified with 'tcut'), and
          optionally a  'ratetable' term.  This latter matches each
          subject to his/her expected cohort.  

    data: a data frame in which to interpret the variables named in 
          the 'formula', or in the 'subset' and the 'weights' argument.            

 weights: case weights.  

  subset: expression saying that only a subset of the rows of the data 
          should be used in the fit.  

na.action: a missing-data filter function, applied to the model.frame,
          after any  'subset' argument has been used.  Default is
          'options()$na.action'.  

ratetable: a table of event rates, such as 'survexp.uswhite'.  

   scale: a scaling for the results.  As most rate tables are in
          units/day, the  default value of 365.25 causes the output to
          be reported in years.  

  expect: should the output table include the expected number of
          events, or the expected number of person-years of
          observation.  This is only valid with a rate table. 

data.frame: return a data frame rather than a set of arrays.

model, x, y: If any of these is true, then the model frame, the model
          matrix, and/or the vector of response times will be returned
          as components of the final result. 

_D_e_t_a_i_l_s:

     Because 'pyears' may have several time variables, it is necessary
     that all  of them be in the same units.  For instance, in the call 


      py <- pyears(futime ~ rx + ratetable(age=age, sex=sex,
     year=entry.dt))

     with a ratetable whose natural unit is days, it is important that
     'futime',  'age' and 'entry.dt' all be in days.  Given the wide
     range of possible inputs,  it is difficult for the routine to do
     sanity checks of this aspect. 

     A special function 'tcut' is needed to specify time-dependent
     cutpoints.  For instance, assume that age is in years, and that
     the desired final  arrays have as one of their margins the age
     groups 0-2, 2-10, 10-25, and 25+.  A subject who enters the study
     at age 4 and remains under observation for  10 years will
     contribute follow-up time to both the 2-10 and 10-25  subsets.  If
     'cut(age, c(0,2,10,25,100))' were used in the formula, the 
     subject would be classified according to his starting age only. 
     The 'tcut' function has the same arguments as 'cut',  but produces
     a different output object which allows the 'pyears' function  to
     correctly track the subject. 

     The results of 'pyears' are normally used as input to further
     calculations.  The 'print' routine, therefore, is designed to give
     only a summary of the  table. 

     The example below is from a study of hip fracture rates from 1930
     - 1990  in Rochester, Minnesota.  Survival post hip fracture has
     increased over  that time, but so has the survival of elderly
     subjects in the population  at large.  A model of relative
     survival helps to clarify what has happened:  Poisson regression
     is used, but replacing exposure time with expected  exposure (for
     an age and sex matched control).  Death rates change with age, of
     course, so the result is carved into  1 year increments of time. 
     Males and females were done separately.

_V_a_l_u_e:

     a list with components: 

  pyears: an array containing the person-years of exposure. (Or other
          units, depending  on the rate table and the scale). The
          dimension and dimmanes of the array correspond to the
          variables on the right hand side of the model equation. 

       n: an array containing the number of subjects who contribute
          time to each cell  of the 'pyears' array.  

   event: an array containing the observed number of events.  This will
          be present only  if the response variable is a 'Surv' object.            

expected: an array containing the expected number of events (or person
          years if  'expect ="pyears"').  This will be present only if
          there was a  'ratetable' term.  

    data: if the 'data.frame' option was set, a data frame containing
          the variables 'n', 'event', 'pyears' and 'event' that
          supplants the four arrays listed above, along with variables
          corresponding to each dimension. There will be one row for
          each cell in the arrays.

offtable: the number of person-years of exposure in the cohort that was
          not part of  any cell in the 'pyears' array.  This is often
          useful as an error check; if  there is a mismatch of units
          between two variables, nearly all the person  years may be
          off table.  

 summary: a summary of the rate-table matching. This is also useful as
          an error  check.  

    call: an image of the call to the function.  

na.action: the 'na.action' attribute contributed by an 'na.action'
          routine, if any.  

_S_e_e _A_l_s_o:

     'ratetable',  'survexp',  'Surv'.

_E_x_a_m_p_l_e_s:

     # Look at progression rates jointly by calendar date and age
     # 
     temp.yr  <- tcut(mgus$dxyr, 55:92, labels=as.character(55:91)) 
     temp.age <- tcut(mgus$age, 34:101, labels=as.character(34:100))
     ptime <- ifelse(is.na(mgus$pctime), mgus$futime, mgus$pctime)
     pstat <- ifelse(is.na(mgus$pctime), 0, 1)
     pfit <- pyears(Surv(ptime/365.25, pstat) ~ temp.yr + temp.age + sex,  mgus,
          data.frame=TRUE) 
     # Turn the factor back into numerics for regression
     tdata <- pfit$data
     tdata$age <- as.numeric(as.character(tdata$temp.age))
     tdata$year<- as.numeric(as.character(tdata$temp.yr))
     fit1 <- glm(event ~ year + age+ sex +offset(log(pyears)),
                  data=tdata, family=poisson)
     ## Not run: 
     # fit a gam model 
     gfit.m <- gam(y ~ s(age) + s(year) + offset(log(time)),  
                             family = poisson, data = tdata) 
     ## End(Not run)

     # Example #2  Create the hearta data frame: 
     hearta <- by(heart, heart$id,  
                  function(x)x[x$stop == max(x$stop),]) 
     hearta <- do.call("rbind", hearta) 
     # Produce pyears table of death rates on the surgical arm
     #  The first is by age at randomization, the second by current age
     fit1 <- pyears(Surv(stop/365.25, event) ~ cut(age + 48, c(0,50,60,70,100)) + 
            surgery, data = hearta, scale = 1)
     fit2 <- pyears(Surv(stop/365.25, event) ~ tcut(age + 48, c(0,50,60,70,100)) + 
            surgery, data = hearta, scale = 1)
     fit1$event/fit1$pyears  #death rates on the surgery and non-surg arm

     fit2$event/fit2$pyears  #death rates on the surgery and non-surg arm

