Genome-wide assocation studies have often been carried out by meta-analysis rather than by pooling individual-level data. For one-dimensional parameter estimates and the corresponding tests of association these meta-analyses lead to essentially no loss of information relative to pooling individual data. The situation is different for multi-parameter tests, such as the omnidirectional rare-variant tests being used in resequencing studies. In this paper we consider one popular rare-variant test, a version of the sequence kernel association test. We show that meta-analyses based on the $p$-value or test statistic from each contributing study are importantly less efficient than an analysis pooling individual data, but that a more sophisticated meta-analysis retains full efficiency. The meta-analysis is based on a reformulation of the test that links it to tests used in survey analysis.
Thomas Lumley, Jennifer Brody, Josee Dupuis, Adrienne Cupples
Rank tests are widely used for exploratory and formal inference in the health and social sciences. With the increasing use of data from complex survey samples in medical research, there is increasing demand for versions of rank tests that account for the sampling design. In the absence of design-based rank tests, naive unweighted rank tests are being used in survey analyses even by researchers who otherwise use inferential methods appropriate for the sampling design. We propose a general approach to constructing design-based rank tests when comparing groups within a complex sample and when using a national survey as a reference distribution, and illustrate both scenarios with examples. We show that the tests have asymptotically correct level and that the relative power of different rank tests is not greatly affected by complex sampling.
Thomas Lumley, Alastair Scott
Uniform central limit theorems (`Donsker theorems’) have been widely useful in semiparametric statistics, both under iid sampling and for stationary sequences and random fields. Only limited results have been available under complex sampling, especially multistage sampling. In this note we derive a complex-sampling analogue of Ossiander’s bracketing-entropy conditions for a uniform central limit theorem, under the assumption that certain design effects are uniformly bounded. We discuss the plausibility of this assumption in realistic surveys.
Targeted resequencing of DNA at specific genes or other genomic loci is now feasible for hundreds or thousands of samples, and costs for larger-scale resequencing are decreasing rapidly. For at least the next few years, resequencing will need to be confined to small subsets of the large samples on which genome-wide association studies have been recently been performed. This paper describes some strategies for subsampling an existing cohort for resequencing, and flexibly analysing the resulting data. We illustrate these strategies by describing the actual design and planned analyses for the example that motivated our research, the CHARGE-S resequencing study carried out by the CHARGE (Cohorts in Heart and Aging Research in Genomic Epidemiology) Consortium.
Thomas Lumley, Josee Dupuis, Kenneth M. Rice, Maja Barbalic, Joshua C. Bis, L. Adrienne Cupples, Bruce M. Psaty, Christopher J. O’Donnell, Eric Boerwinkle
We develop an analogue of the likelihood ratio test for Cox proportional hazards models fitted to sample survey data. We look at methods for computing the asymptotic distribution and at ways of improving the small sample performance. The methods are illustrated with examples using data from the National Health and Nutrition Examination Survey (NHANES) and from a stratified case-cohort study.
Thomas Lumley, Alastair Scott
Now published: read at Statistics in Medicine