SKAT (SNP Set/Sequence Association Test)
(1) Association tests between a set of common and rare SNPs and continuous and dichotomous (case-control) phenotypes using kernel machine methods for data from GWAS and genome-wide sequencing association studies
(2) Sample size and power calculatons for sequencing association studies.
References:
- Lee, Seunggeun, et al. (2012). Optimal Unified Approach for Rare-Variant Association Testing with Application to Small-Sample Case-Control Whole-Exome Sequencing Studies . The American Journal of Human Genetics, 91.2, 224-237.
- Lee, S., Wu, M.C. and Lin, X. (2012). Optimal tests for rare variant effects in sequencing association studies. Biostatistics, 13.4, 762-775. Supplementary Materials.
- Wu, M. C., Lee, S., Cai, T., Li, Y., Boehnke, M. and Lin, X (2011) Rare Variant Association Testing for Sequencing Data Using the Sequence Kernel Association Test (SKAT). American Journal of Human Genetics, , 89.1, 82-93.
- Wu, M. C., Kraft, P., Epstein, M. P.,Taylor, D., M., Chanock, S. J., Hunter, D., J., and Lin, X. (2010) Powerful SNP Set Analysis for Case-Control GenomeWide Association Studies. American Journal of Human Genetics, , 86, 929-942.
MetaSKAT (Meta-analysis for multiple markers)
MetaSKAT is a R package for multiple marker meta-analysis across studies. It can carry out meta-analysis of SKAT, SKAT-O and burden tests with individual level genotype data or gene level summary statistics.
References:
- Lee, S., Teslovich, T.M., Boehnke, M. and Lin, X. (2013) General framework for meta-analysis of rare variants in sequencing association studies, American Journal of Human Genetics, in press.
CEPSKAT (Continuous Extreme Phenotype SKAT)
CEPSKAT extends the SKAT framework to the setting of continuous extreme phenotype samples. You can download the R package for CEPSKAT here. For Windows, download the compiled binary version instead. Consult the help files in the package for instruction and examples of usage.
References:
- Barnett, I., Lee, S., Lin, X. (2012) Detecting Rare Variant Effects Using Extreme Phenotype Sampling in Sequencing Association Studies . Genetic Epidemiology . In press.
coxKM (cox Kernel Machine)
coxKM (cox Kernel Machine) is an R package for conducting SNP-set association tests for right-censored survival outcomes based on kernel machine cox regression framework. coxKM is meant for common genetic variants only. coxKM tests for association between a SNP-set (made up of common variants) and a right-censored survival outcome. Software download , manual download .
References:
- Lin X, Cai T, Wu M, Zhou Q, Liu G, Christiani D and Lin X. 2011. Survival Kernel Machine SNP-set Analysis for Genome-wide AssociationStudies. Genetic Epidemiology 35:620-31. doi: 10.1002/gepi.20610
- Cai T, Tonini G and Lin X. 2011. Kernel machine approach to testing the significance of multiple genetic markers for risk prediction. Biometrics, 67:975-86. doi:10.1111/j.1541-0420.2010.01544.x
gSKAT (family based association test)
gskat is a R package implements a family based association test via GEE Kernel Machine (KM) score test. It has functions to perform both burden test and SKAT test with family members as well as unrelated individuals. The package allows for both continuous and discrete traits in the association test.Software download
User groups: Feel free to join in the group to ask / discuss / comment about the package on the forum.
References:
- Wang X, Lee S, Zhu X, Redline S, and Lin X. (2013) GEE-Based SNP Set Association Test for Continuous and Discrete Traits in Family-Based Association Studies. Genet Epidemiol.?7:778-86.
iSKAT/GESAT
Software download , Manual download .
References:
- Lin, X., Lee, S.,Wu, M.,Wang, C., Chen H., Li, Z. and Lin, X. Test for rare variants by environment interactions in sequencing association studies. Biometrics, in press.
- Lin, X., Lee, S., Christiani, D. C., and Lin, X. (2013). Test for the Interaction between a Genetic Marker Set and Environment in Generalized Linear Models. Biostatistics, 14: 667-681. doi:10.1093/biostatistics/kxt006.
GMMAT (Generalized linear Mixed Model Association Test)
GMMAT is an R package for performing genetic association tests in genome-wide association studies (GWAS) and sequencing association studies, for outcomes with distribution in the exponential family (e.g. binary outcomes) based on generalized linear mixed models (GLMMs). It can be used to analyze genetic data from individuals with population structure and relatedness. GMMAT fits a GLMM with covariate adjustment and random effects to account for population structure and familial or cryptic relatedness. For GWAS, GMMAT performs score tests for each genetic variant. For candidate gene studies, GMMAT can also perform Wald tests to get the effect size estimate for each genetic variant. For rare variant analysis from sequencing association studies, GMMAT performs the variant Set Mixed Model Association Tests (SMMAT), including the burden test, the sequence kernel association test (SKAT), SKAT-O and an efficient hybrid test of the burden test and SKAT, based on user-defined variant sets. See user manual here.
References:
- Breslow NE and Clayton DG. (1993) Approximate Inference in Generalized Linear Mixed Models. Journal of the American Statistical Association 88: 9-25.
- Chen H, Wang C, Conomos MP, Stilp AM, Li Z, Sofer T, Szpiro AA, Chen W, Brehm JM, Celedon JC, Redline S, Papanicolaou GJ, Thornton TA, Laurie CC, Rice K and Lin X. (2016) Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies Using Logistic Mixed Models. The American Journal of Human Genetics 98(4): 653-666.
- Han Chen, Jennifer E. Huffman, Jennifer A. Brody, Chaolong Wang, Seunggeun Lee, Zilin Li, Stephanie M. Gogarten, Tamar Sofer, Lawrence F. Bielak, Joshua C. Bis, John Blangero, Russell P. Bowler, Brian E. Cade, Michael H. Cho, Adolfo Correa, Joanne E. Curran, Paul S. de Vries, David C. Glahn, Xiuqing Guo, Andrew D. Johnson, Sharon Kardia, Charles Kooperberg, Joshua P. Lewis, Xiaoming Liu, Rasika A. Mathias, Braxton D. Mitchell, Jeffrey R. O'Connell, Patricia A. Peyser, Wendy S. Post, Alex P. Reiner, Stephen S. Rich, Jerome I. Rotter, Edwin K. Silverman, Jennifer A. Smith, Ramachandran S. Vasan, James G. Wilson, Lisa R. Yanek, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, TOPMed Hematology and Hemostasis Working Group, Susan Redline, Nicholas L. Smith, Eric Boerwinkle, Ingrid B. Borecki, L. Adrienne Cupples, Cathy C. Laurie, Alanna C. Morrison, Kenneth M. Rice, Xihong Lin. (2018) Efficient variant set mixed model association tests for continuous and binary traits in large-scale whole genome sequencing studies. Submitted.
MPAT (Multiple Phenotype Association Test)
MPAT is an R package for performing multiple phenotype association tests based on univarite GWAS summary statistics. It provides a toolkit of testing procedures to aggregate association evidence across multiple phenotypes for a given genetic variant. All the p-values can be efficiently computed. See user manual here.
References:
- Liu Z and Lin X. (2016) Multiple Phenotype Association Test using Summary statistics in Genome-wide Association Studies. Submitted.
- Liu Z and Lin X. (2016) A Geometric Perspective on the Powers of Principal Component Association Tests in Multiple Phenotype Studies. To be submitted.
SCANG (SCAN the Genome)
SCANG is an R package for performing a flexible and computationally efficient scan statistic procedure (SCANG) that uses the p-value of a variant set-based test as a scan statistic of each moving window, to detect rare variant association regions for both continuous and dichotomous traits. The goal of SCANG is to detect whether any rare-variant association region exists across the genome, and if they do exist, to identify the locations and sizes of these association regions. Specifically, SCANG first fits the null linear or logistic model that includes covariates, e.g., age, sex and ancestry PCs, but no genetic variants. Second, SCANG applies set-based tests to all possible candidate moving windows of different sizes within a pre-specified window range of practical interest. Three tests are included in the SCANG framework: the burden test (SCANG-B), SKAT (SCANG-S) and an efficient omnibus test to aggregate information of the burden test and SKAT and different choices of weights using the ACAT method (SCANG-O). Third, SCANG generates an empirical threshold calculated by Monte Carlo simulation, to control the Genome-wise/Family-wise Type I Error Rate (GWER/FWER) at a given level, e.g., 0.05. The windows with the p-values smaller than this threshold are detected as genome-wise significant association regions. Both individual-window p-values and the genome-wise/family-wise p-values of these genome-wise significant windows are given. See user manual here.
References:
- Zilin Li, Xihao Li, Yaowu Liu, Jincheng Shen, Han Chen, Alanna C. Morrison, Eric Boerwinkle and Xihong Lin. (2019) Dynamic Scan Procedure for Detecting Rare-Variant Association Regions in Whole Genome Sequencing Studies. The American Journal of Human Genetics 104(5): 802-814.
STAAR (variant-Set Test for Association using Annotation infoRmation)
STAAR is an R package for performing variant-Set Test for Association using Annotation infoRmation (STAAR) procedure in whole genome sequencing studies. STAAR is a general framework that incorporates both qualitative functional categories and quantitative complementary functional annotation scores using an omnibus multi-dimensional weighting scheme. See user manual here.
References:
- Li X*, Li Z*, Zhou H, Gaynor SM, Liu Y, Chen H, Sun R, Dey R, Arnett DK, Aslibekyan S, Ballantyne CM, Bielak LF, Blangero J, Boerwinkle E, Bowden DW, Broome JG, Conomos MP, Correa A, Cupples LA, Curran JE, Freedman BI, Guo X, Hindy G, Irvin MR, Kardia SLR, Kathiresan S, Khan AT, Kooperberg CL, Laurie CC, Liu XS, Mahaney MC, Manichaikul AW, Martin LW, Mathias RA, McGarvey ST, Mitchell BD, Montasser ME, Moore JE, Morrison AC, O'Connell JR, Palmer ND, Pampana A, Peralta JM, Peyser PA, Psaty BM, Redline S, Rice KM, Rich SS, Smith JA, Tiwari HK, Tsai MY, Vasan RS, Wang FF, Weeks DE, Weng Z, Wilson JG, Yanek LR, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, TOPMed Lipids Working Group, Neale BM, Sunyaev SR, Abecasis GR, Rotter JI, Willer CJ, Peloso GM, Natarajan P, & Lin X. (2020). Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nature Genetics, 52(9), 969-983
STAARpipeline
STAARpipeline is an R package for phenotype-genotype association analyses of WGS/WES data, including single variant analysis and variant set analysis. The single variant analysis in STAARpipeline provides valid individual *P* values of variants given a MAF or MAC cut-off. The variant set analysis in STAARpipeline includes gene-centric analysis and non-gene-centric analysis of rare variants. The gene-centric coding analysis provides five genetic categories: putative loss of function (pLoF), missense, disruptive missense, pLoF and disruptive missense, and synonymous. The gene-centric noncoding analysis provides eight genetic categories: promoter or enhancer overlaid with CAGE or DHS sites, UTR, upstream, downstream, and noncoding RNA genes. The non-gene-centric analysis includes sliding window analysis with fixed sizes and dynamic window analysis with data-adaptive sizes. STAARpipeline also provides analytical follow-up of dissecting association signals independent of known variants via conditional analysis using STAARpipelineSummary. See user manual here.
References:
- Li Z*#, Li X*, Zhou H, Gaynor SM, Selvaraj MS, Arapoglou T, Quick C, Liu Y, Chen H, Sun R, Dey R, Arnett DK, Auer PL, Bielak LF, Bis JC, Blackwell TW, Blangero J, Boerwinkle E, Bowden DW, Brody JA, Cade BE, Conomos MP, Correa A, Cupples LA, Curran JE, de Vries PS, Duggirala R, Franceschini N, Freedman BI, G?ring HHH, Guo X, Kalyani RR, Kooperberg C, Kral BG, Lange LA, Lin BM, Manichaikul A, Manning AK, Martin LW, Mathias RA, Meigs JB, Mitchell BD, Montasser ME, Morrison AC, Naseri T, O¡¯Connell JR, Palmer ND, Peyser PA, Psaty BM, Raffield LM, Redline S, Reiner AP, Reupena MS, Rice KM, Rich SS, Smith JA, Taylor KD, Taub MA, Vasan RS, Weeks DE, Wilson JG, Yanek LR, Zhao W, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, TOPMed Lipids Working Group, Rotter JI, Willer CJ, Natarajan P, Peloso GM, & Lin X.# (2022). A framework for detecting noncoding rare-variant associations of large-scale whole-genome sequencing studies.Nature Methods, 19(12), 1599-1611
STAARpipelineSummary
STAARpipelineSummary is an R package for summarizing and visualizing association analysis results generated by STAARpipeline. See user manual here.
References:
- Li Z*#, Li X*, Zhou H, Gaynor SM, Selvaraj MS, Arapoglou T, Quick C, Liu Y, Chen H, Sun R, Dey R, Arnett DK, Auer PL, Bielak LF, Bis JC, Blackwell TW, Blangero J, Boerwinkle E, Bowden DW, Brody JA, Cade BE, Conomos MP, Correa A, Cupples LA, Curran JE, de Vries PS, Duggirala R, Franceschini N, Freedman BI, G?ring HHH, Guo X, Kalyani RR, Kooperberg C, Kral BG, Lange LA, Lin BM, Manichaikul A, Manning AK, Martin LW, Mathias RA, Meigs JB, Mitchell BD, Montasser ME, Morrison AC, Naseri T, O¡¯Connell JR, Palmer ND, Peyser PA, Psaty BM, Raffield LM, Redline S, Reiner AP, Reupena MS, Rice KM, Rich SS, Smith JA, Taylor KD, Taub MA, Vasan RS, Weeks DE, Wilson JG, Yanek LR, Zhao W, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, TOPMed Lipids Working Group, Rotter JI, Willer CJ, Natarajan P, Peloso GM, & Lin X.# (2022). A framework for detecting noncoding rare-variant associations of large-scale whole-genome sequencing studies.Nature Methods, 19(12), 1599-1611
MetaSTAAR (Meta-analysis of variant-Set Test for Association using Annotation infoRmation)
MetaSTAAR is an R package for performing Meta-analysis of variant-Set Test for Association using Annotation infoRmation (MetaSTAAR) procedure in whole-genome sequencing (WGS) studies. See user manual here.
References:
- Li X, Quick C, Zhou H, Gaynor SM, Liu Y, Chen H, Selvaraj MS, Sun R, Dey R, Arnett DK, Bielak LF, Bis JC, Blangero J, Boerwinkle E, Bowden DW, Brody JA, Cade BE, Correa A, Cupples LA, Curran JE, de Vries PS, Duggirala R, Freedman BI, G?ring HHH, Guo X, Haessler J, Kalyani RR, Kooperberg C, Kral BG, Lange LA, Manichaikul A, Manning AK, Martin LW, McGarvey ST, Mitchell BD, Montasser ME, Morrison AC, Naseri T, O¡¯Connell JR, Palmer ND, Peyser PA, Psaty BM, Raffield LM, Redline S, Reiner AP, Reupena MS, Rice KM, Rich SS, Sitlani CM, Smith JA, Taylor KD, Vasan RS, Willer CJ, Wilson JG, Yanek LR, Zhao W, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, TOPMed Lipids Working Group, Rotter JI, Natarajan P, Peloso GM, Li Z#, & Lin X#. (2023). Powerful, scalable and resource-efficient meta-analysis of rare variant associations in large whole genome sequencing studies.Nature Genetics, 55(1), 154-164
ACAT (Aggregated Cauchy Association Test)
ACAT is an R package for implement a generic method for combining p-values. For example, if ACAT is used to combine the variant-level (or SNV-level) p-values, it is a set-based test that is particularly powerful when only a small proportion of variants are casual. ACAT can also be used as an omnibus testing procedure to combine multiple set-level p-values, e.g., the p-values of SKAT or burden tests. The p-value of ACAT is approximated by a Cauchy distribution without the need to know the dependency of the p-values combined by ACAT, which makes the computation of ACAT super fast. This approximation is particularly accurate in the tail of the null distribution.
References:
- Liu, Y., Chen, S., Li, Z., Morrison, A. C., Boerwinkle, E., & Lin, X. (2019). ACAT: A Fast and Powerful p Value Combination Method for Rare-Variant Analysis in Sequencing Studies. The American Journal of Human Genetics,104(3), 410-421.
SMAT (Scaled Multiple-phenotype Association Test)
The current version of the R package is 0.98. Please download the source .tar.gz file or the .zip file for installation. Please download the manual PDF here. Some example files are also available for download.
References:
- Schifano, E.D., Li, L., Christiani, D.C., and Lin, X. (2012) Genome-wide Association Analysis for Multiple Continuous Secondary Phenotypes. (in revision)
- Roy, J., Lin, X., and Ryan, L. (2003). Scaled Marginal Models For Multiple Continuous Outcomes. Biostatistics, 4, 371-384.
TEtest
References:
- Huang YT, VanderWeele TJ and Lin X (2012) (2014). Joint analysis of SNP and gene expression data in genetic association studies of complex diseases. Annals of Applied Statistics 2014; 8:352-376.
- Roy, J., Lin, X., and Ryan, L. (2003). Scaled Marginal Models For Multiple Continuous Outcomes. Biostatistics, 4, 371-384.
iGWAS
References:
- Huang YT, Liang L, Cookson W OCM, Moffatt M and Lin X (2015). iGWAS: integrative genome-wide association studies using mediation analysis. Genetic Epidemiology 2015; 39:347-356.
Sparse PCA
References:
- Lee, S., Epstein, M.P., Duncan, R. and Lin, X. (2012) Sparse principal component analysis for identifying ancestry-informative markers in genome-wide association studies. Genetic Epidemiology , 36.4, 293-302.
Pathway Analysis
sLDA Pathway Test
Logistic Kernel Machine
Least Square Kernel Machine
References:
- Wu, M.,C., Zhang, L., Wang, Z., Christiani, D. C., Lin, Sparse linear discriminant analysis for simultaneous gene set/pathway significance test and gene selection. , Bioinformatics, , 25,1145-1151.
- Liu, D., Ghosh, D. and Lin, X. (2008) Estimation and Testing for the Effect of a Genetic Pathway on a Disease Outcome Using Logistic Kernel Machine Regression via Logistic Mixed Models. BMC Bioinformatics, 9, 292.
- Liu, D., Lin, X. and Ghosh, D. (2007) Semiparametric Regression of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines and Linear Mixed Models. Biometrics, 63, 1079-1088.
Nonparametric Regression
SAS Macro SPMM
SAS Macro Spline_Mixed
SAS Macro GAMM1
References:
- Zhang D., Lin X., Raz J., and Sowers M. (1998). Semiparametric stochastic mixed models for longitudinal data, Journal of the American Statistical Association, 93, 710-719.
- Lin X. and Zhang D. (1999). Inference in generalized additive mixed models using smoothing splines, Journal of the Royal Statistical Society, Series B, 61, 381-400.
- Zhang D., Lin X. and Sowers M. (2000). Periodic semiparametric regression for longitudinal hormone data from multiple menstrual cycles. Biometrics, , 56, 31-39.