coxph                package:survival                R Documentation

_F_i_t _P_r_o_p_o_r_t_i_o_n_a_l _H_a_z_a_r_d_s _R_e_g_r_e_s_s_i_o_n _M_o_d_e_l

_D_e_s_c_r_i_p_t_i_o_n:

     Fits a Cox proportional hazards regression model.  Time dependent
     variables, time dependent strata, multiple events per subject, 
     and other extensions are incorporated using the counting process
     formulation  of Andersen and Gill.

_U_s_a_g_e:

     coxph(formula, data=, weights, subset, 
           na.action, init, control, 
           method=c("efron","breslow","exact"), 
           singular.ok=TRUE, robust=FALSE, 
           model=FALSE, x=FALSE, y=TRUE, ...)

_A_r_g_u_m_e_n_t_s:

 formula: a formula object, with the response on the left of a '~'
          operator, and  the terms on the right.  The response must be
          a survival object as  returned by the 'Surv' function.  

    data: a data.frame in which to interpret the variables named in 
          the 'formula', or in the 'subset' and the 'weights' argument.            

 weights: vector of case weights.  If 'weights' is a vector of
          integers, then the  estimated coefficients are equivalent to
          estimating the model from data  with the individual cases
          replicated as many times as indicated by  'weights'.  

  subset: expression indicating which subset of the rows of data should
          be used in  the fit.    All observations are included by
          default.  

na.action: a missing-data filter function.  This is applied to the
          model.frame after any  subset argument has been used. 
          Default is 'options()\$na.action'.  

    init: vector of initial values of the iteration.  Default initial 
          value is zero for all variables.  

 control: Object of class 'coxph.control' specifying iteration limit
          and other control options. Default is 'coxph.control(...)'. 

  method: a character string specifying the method for tie handling. 
          If there   are no tied death times all the methods are
          equivalent.  Nearly all Cox regression programs use the
          Breslow method by default,  but not this one.  The Efron
          approximation is used as the default here, as it is much more
           accurate when dealing with tied death times, and is as
          efficient  computationally.  The exact method computes the
          exact partial likelihood, which is  equivalent  to a
          conditional logistic model.  If there are a large number of
          ties  the computational time will be excessive.  

singular.ok: logical value indicating how to handle collinearity in the
          model matrix.  If 'TRUE', the program will automatically skip
          over columns of the X  matrix that are linear combinations of
          earlier columns.  In this case the  coefficients for such
          columns will be NA, and the variance matrix will  contain
          zeros. For ancillary calculations, such as the linear
          predictor,  the missing coefficients are treated as zeros.  

  robust: if 'TRUE' a robust variance estimate is returned. Default is
          'TRUE' if the model includes a 'cluster' operative,  'FALSE'
          otherwise.  

   model: logical value: if 'TRUE', the model frame is returned in
          component 'model'.  

       x: logical value: if 'TRUE', the model frame is returned in
          component 'x'.  

       y: logical value: if 'TRUE', the model frame is returned in
          component 'y'.  

     ...: Other arguments will be passed to 'coxph.control' 

_D_e_t_a_i_l_s:

     The proportional hazards model is usually expressed in terms of a 
     single survival time value for each person, with possible
     censoring.  Andersen and Gill reformulated the same problem as a
     counting process;  as time marches onward we observe the events
     for a subject, rather  like watching a Geiger counter.  The data
     for a subject is presented as multiple rows or "observations", 
     each  of which applies to an interval of observation (start,
     stop].

_V_a_l_u_e:

     an object of class '"coxph"' representing the fit.  See
     'coxph.object' for details.

_S_i_d_e _E_f_f_e_c_t_s:

     Depending on the call, the 'predict', 'residuals', and 'survfit'
     routines may  need to reconstruct the x matrix created by 'coxph'.
      Differences in the  environment, such as which data frames are
     attached or the value of  'options()\$contrasts', may cause this
     computation to fail or worse, to be  incorrect.  See the survival
     overview document for details.

_S_p_e_c_i_a_l _t_e_r_m_s:

     There are two special terms that may be used in the model
     equation.  A 'strata' term identifies a stratified Cox model;
     separate baseline  hazard functions are fit for each strata.  The
     'cluster' term is used to compute a robust variance for the model.
      The term '+ cluster(id)' where each value of 'id' is unique is
     equivalent to  specifying the 'robust=T' argument, and produces an
     approximate  jackknife estimate of the variance.  If the 'id'
     variable were not  unique, but instead  identifies clusters of
     correlated observations, then the variance  estimate is based on a
     grouped jackknife.

_C_o_n_v_e_r_g_e_n_c_e:

     In certain data cases the actual MLE estimate of a  coefficient is
     infinity, e.g., a dichotomous variable where one of the  groups
     has no events.  When this happens the associated coefficient 
     grows at a steady pace and a race condition will exist in the
     fitting  routine: either the log likelihood converges, the
     information matrix  becomes effectively singular, an argument to
     exp becomes too large for  the computer hardware, or the maximum
     number of interactions is exceeded. (Nearly always the first
     occurs.) The routine attempts to detect when this has happened,
     not always successfully. The primary consequence for he user is
     that the Wald statistic, coefficient/se(coefficient), is not valid
     in this case and should be ignored; the likelihood ratio and Wald
     tests remain valid however.

_P_E_N_A_L_I_S_E_D _R_E_G_R_E_S_S_I_O_N:

     'coxph' can now maximise a penalised partial likelihood with
     arbitrary user-defined penalty.  Supplied penalty functions
     include ridge regression (ridge), smoothing splines (pspline), and
     frailty models (frailty).

_R_e_f_e_r_e_n_c_e_s:

     Andersen, P. and Gill, R. (1982).  Cox's regression model for
     counting processes, a large sample study.  _Annals of Statistics_
     *10*, 1100-1120. 

     Therneau, T., Grambsch, P., Modeling Survival Data: Extending the
     Cox Model.  Springer-Verlag, 2000.

_S_e_e _A_l_s_o:

     'cluster',  'strata',  'Surv', 'survfit, \code{pspline},
     \code{frailty}, \code{ridge}'.

_E_x_a_m_p_l_e_s:

     # Create the simplest test data set 
     test1 <- list(time=c(4,3,1,1,2,2,3), 
                   status=c(1,1,1,0,1,1,0), 
                   x=c(0,2,1,1,1,0,0), 
                   sex=c(0,0,0,0,1,1,1)) 
     # Fit a stratified model 
     coxph(Surv(time, status) ~ x + strata(sex), test1) 
     # Create a simple data set for a time-dependent model 
     test2 <- list(start=c(1,2,5,2,1,7,3,4,8,8), 
                   stop=c(2,3,6,7,8,9,9,9,14,17), 
                   event=c(1,1,1,1,1,1,1,0,0,0), 
                   x=c(1,0,0,1,0,1,1,1,0,0)) 
     summary(coxph(Surv(start, stop, event) ~ x, test2)) 

     #
     # Create a simple data set for a time-dependent model
     #
     test2 <- list(start=c(1, 2, 5, 2, 1, 7, 3, 4, 8, 8),
                     stop =c(2, 3, 6, 7, 8, 9, 9, 9,14,17),
                     event=c(1, 1, 1, 1, 1, 1, 1, 0, 0, 0),
                     x    =c(1, 0, 0, 1, 0, 1, 1, 1, 0, 0) )

     summary( coxph( Surv(start, stop, event) ~ x, test2))

     # Fit a stratified model, clustered on patients 

     bladder1 <- bladder[bladder$enum < 5, ] 
     coxph(Surv(stop, event) ~ (rx + size + number) * strata(enum) + 
           cluster(id), bladder1) 

