1\name{svytable} 2\alias{svreptable} 3\alias{svytable} 4\alias{svytable.svyrep.design} 5\alias{svytable.survey.design} 6\alias{svychisq} 7\alias{svychisq.survey.design} 8\alias{svychisq.svyrep.design} 9\alias{summary.svytable} 10\alias{print.summary.svytable} 11\alias{summary.svreptable} 12\alias{degf} 13\alias{degf.svyrep.design} 14\alias{degf.survey.design2} 15\alias{degf.twophase} 16%- Also NEED an '\alias' for EACH other topic documented here. 17\title{Contingency tables for survey data} 18\description{ 19 Contingency tables and chisquared tests of association for survey data. 20} 21\usage{ 22\method{svytable}{survey.design}(formula, design, Ntotal = NULL, round = FALSE,...) 23\method{svytable}{svyrep.design}(formula, design, Ntotal = sum(weights(design, "sampling")), round = FALSE,...) 24\method{svychisq}{survey.design}(formula, design, 25 statistic = c("F", "Chisq","Wald","adjWald","lincom","saddlepoint"),na.rm=TRUE,...) 26\method{svychisq}{svyrep.design}(formula, design, 27 statistic = c("F", "Chisq","Wald","adjWald","lincom","saddlepoint"),na.rm=TRUE,...) 28\method{summary}{svytable}(object, 29 statistic = c("F","Chisq","Wald","adjWald","lincom","saddlepoint"),...) 30degf(design, ...) 31\method{degf}{survey.design2}(design, ...) 32\method{degf}{svyrep.design}(design, tol=1e-5,...) 33} 34%- maybe also 'usage' for other objects documented here. 35\arguments{ 36 \item{formula}{Model formula specifying margins for the table (using \code{+} only)} 37 \item{design}{survey object} 38 \item{statistic}{See Details below} 39 \item{Ntotal}{A population total or set of population stratum totals 40 to normalise to.} 41 \item{round}{Should the table entries be rounded to the nearest 42 integer?} 43 \item{na.rm}{Remove missing values} 44 \item{object}{Output from \code{svytable}} 45 \item{...}{For \code{svytable} these are passed to \code{xtabs}. Use 46 \code{exclude=NULL}, \code{na.action=na.pass} to include \code{NA}s 47 in the table} 48 \item{tol}{Tolerance for \code{\link{qr}} in computing the matrix rank} 49 } 50\details{ 51 52The \code{svytable} function computes a weighted crosstabulation. This 53is especially useful for producing graphics. It is sometimes easier 54to use \code{\link{svytotal}} or \code{\link{svymean}}, which also 55produce standard errors, design effects, etc. 56 57The frequencies in the table can be normalised to some convenient total 58such as 100 or 1.0 by specifying the \code{Ntotal} argument. If the 59formula has a left-hand side the mean or sum of this variable rather 60than the frequency is tabulated. 61 62The \code{Ntotal} argument can be either a single number or a data 63frame whose first column gives the (first-stage) sampling strata and 64second column the population size in each stratum. In this second case 65the \code{svytable} command performs `post-stratification': tabulating 66and scaling to the population within strata and then adding up the 67strata. 68 69As with other \code{xtabs} objects, the output of \code{svytable} can be 70processed by \code{ftable} for more attractive display. The 71\code{summary} method for \code{svytable} objects calls \code{svychisq} 72for a test of independence. 73 74\code{svychisq} computes first and second-order Rao-Scott corrections to 75the Pearson chisquared test, and two Wald-type tests. 76 77The default (\code{statistic="F"}) is the Rao-Scott second-order 78correction. The p-values are computed with a Satterthwaite 79approximation to the distribution and with denominator degrees of 80freedom as recommended by Thomas and Rao (1990). The alternative 81\code{statistic="Chisq"} adjusts the Pearson chisquared statistic by a 82design effect estimate and then compares it to the chisquared 83distribution it would have under simple random sampling. 84 85The \code{statistic="Wald"} test is that proposed by Koch et al (1975) 86and used by the SUDAAN software package. It is a Wald test based on the 87differences between the observed cells counts and those expected under 88independence. The adjustment given by \code{statistic="adjWald"} reduces 89the statistic when the number of PSUs is small compared to the number of 90degrees of freedom of the test. Thomas and Rao (1987) compare these 91tests and find the adjustment benefical. 92 93\code{statistic="lincom"} replaces the numerator of the Rao-Scott F with 94the exact asymptotic distribution, which is a linear combination of 95chi-squared variables (see \code{\link{pchisqsum}}, and 96\code{statistic="saddlepoint"} uses a saddlepoint approximation to this 97distribution. The \code{CompQuadForm} package is needed for 98\code{statistic="lincom"} but not for 99\code{statistic="saddlepoint"}. The saddlepoint approximation is 100especially useful when the p-value is very small (as in large-scale 101multiple testing problems). 102 103For designs using replicate weights the code is essentially the same as 104for designs with sampling structure, since the necessary variance 105computations are done by the appropriate methods of 106\code{\link{svytotal}} and \code{\link{svymean}}. The exception is that 107the degrees of freedom is computed as one less than the rank of the 108matrix of replicate weights (by \code{degf}). 109 110 111At the moment, \code{svychisq} works only for 2-dimensional tables. 112} 113\value{ 114 The table commands return an \code{xtabs} object, \code{svychisq} 115 returns a \code{htest} object. 116} 117\references{ 118Davies RB (1973). "Numerical inversion of a characteristic function" 119Biometrika 60:415-7 120 121P. Duchesne, P. Lafaye de Micheaux (2010) "Computing the distribution of 122quadratic forms: Further comparisons between the Liu-Tang-Zhang 123approximation and exact methods", Computational Statistics and Data 124Analysis, Volume 54, 858-862 125 126Koch, GG, Freeman, DH, Freeman, JL (1975) "Strategies in the 127multivariate analysis of data from complex surveys" International 128Statistical Review 43: 59-78 129 130Rao, JNK, Scott, AJ (1984) "On Chi-squared Tests For Multiway 131Contigency Tables with Proportions Estimated From Survey Data" Annals 132of Statistics 12:46-60. 133 134Sribney WM (1998) "Two-way contingency tables for survey or clustered 135data" Stata Technical Bulletin 45:33-49. 136 137Thomas, DR, Rao, JNK (1987) "Small-sample comparison of level and power 138for simple goodness-of-fit statistics under cluster sampling" JASA 82:630-636 139 140} 141 142\note{Rao and Scott (1984) leave open one computational issue. In 143 computing `generalised design effects' for these tests, should the 144 variance under simple random sampling be estimated using the observed 145 proportions or the the predicted proportions under the null 146 hypothesis? \code{svychisq} uses the observed proportions, following 147 simulations by Sribney (1998), and the choices made in Stata} 148 149 150\seealso{\code{\link{svytotal}} and \code{\link{svymean}} report totals 151 and proportions by category for factor variables. 152 153 See \code{\link{svyby}} and \code{\link{ftable.svystat}} to construct 154 more complex tables of summary statistics. 155 156 See \code{\link{svyloglin}} for loglinear models. 157 158 See \code{\link{regTermTest}} for Rao-Scott tests in regression models. 159 160See \url{https://notstatschat.rbind.io/2019/06/08/design-degrees-of-freedom-brief-note/} for an explanation of the design degrees of freedom with replicate weights. 161 162} 163\examples{ 164 data(api) 165 xtabs(~sch.wide+stype, data=apipop) 166 167 dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc) 168 summary(dclus1) 169 170 (tbl <- svytable(~sch.wide+stype, dclus1)) 171 plot(tbl) 172 fourfoldplot(svytable(~sch.wide+comp.imp+stype,design=dclus1,round=TRUE), conf.level=0) 173 174 svychisq(~sch.wide+stype, dclus1) 175 summary(tbl, statistic="Chisq") 176 svychisq(~sch.wide+stype, dclus1, statistic="adjWald") 177 178 rclus1 <- as.svrepdesign(dclus1) 179 summary(svytable(~sch.wide+stype, rclus1)) 180 svychisq(~sch.wide+stype, rclus1, statistic="adjWald") 181 182} 183\keyword{survey}% at least one, from doc/KEYWORDS 184\keyword{category}% __ONLY ONE__ keyword per line 185\keyword{htest}% __ONLY ONE__ keyword per line 186