SKAT (SNP Set/Sequence Association Test)

SKAT is a R package for performing

(1) Association tests between a set of common and rare SNPs and continuous and dichotomous (case-control) phenotypes using kernel machine methods for data from GWAS and genome-wide sequencing association studies

(2) Sample size and power calculatons for sequencing association studies.


MetaSKAT (Meta-analysis for multiple markers)

MetaSKAT is a R package for multiple marker meta-analysis across studies. It can carry out meta-analysis of SKAT, SKAT-O and burden tests with individual level genotype data or gene level summary statistics.


  • Lee, S., Teslovich, T.M., Boehnke, M. and Lin, X. (2013) General framework for meta-analysis of rare variants in sequencing association studies, American Journal of Human Genetics, in press.

CEPSKAT (Continuous Extreme Phenotype SKAT)

CEPSKAT extends the SKAT framework to the setting of continuous extreme phenotype samples. You can download the R package for CEPSKAT here. For Windows, download the compiled binary version instead. Consult the help files in the package for instruction and examples of usage.


coxKM (cox Kernel Machine)

coxKM (cox Kernel Machine) is an R package for conducting SNP-set association tests for right-censored survival outcomes based on kernel machine cox regression framework. coxKM is meant for common genetic variants only. coxKM tests for association between a SNP-set (made up of common variants) and a right-censored survival outcome. Software download , manual download .


  • Lin X, Cai T, Wu M, Zhou Q, Liu G, Christiani D and Lin X. 2011. Survival Kernel Machine SNP-set Analysis for Genome-wide AssociationStudies. Genetic Epidemiology 35:620-31. doi: 10.1002/gepi.20610
  • Cai T, Tonini G and Lin X. 2011. Kernel machine approach to testing the significance of multiple genetic markers for risk prediction. Biometrics, 67:975-86. doi:10.1111/j.1541-0420.2010.01544.x

gSKAT (family based association test)

gskat is a R package implements a family based association test via GEE Kernel Machine (KM) score test. It has functions to perform both burden test and SKAT test with family members as well as unrelated individuals. The package allows for both continuous and discrete traits in the association test.Software download

User groups: Feel free to join in the group to ask / discuss / comment about the package on the forum.


  • Wang X, Lee S, Zhu X, Redline S, and Lin X. (2013) GEE-Based SNP Set Association Test for Continuous and Discrete Traits in Family-Based Association Studies. Genet Epidemiol.?7:778-86.


Software download , Manual download .


GMMAT (Generalized linear Mixed Model Association Test)

GMMAT is an R package for performing genetic association tests in genome-wide association studies (GWAS) and sequencing association studies, for outcomes with distribution in the exponential family (e.g. binary outcomes) based on generalized linear mixed models (GLMMs). It can be used to analyze genetic data from individuals with population structure and relatedness. GMMAT fits a GLMM with covariate adjustment and random effects to account for population structure and familial or cryptic relatedness. For GWAS, GMMAT performs score tests for each genetic variant. For candidate gene studies, GMMAT can also perform Wald tests to get the effect size estimate for each genetic variant. For rare variant analysis from sequencing association studies, GMMAT performs the variant Set Mixed Model Association Tests (SMMAT), including the burden test, the sequence kernel association test (SKAT), SKAT-O and an efficient hybrid test of the burden test and SKAT, based on user-defined variant sets. See user manual here.


  • Breslow NE and Clayton DG. (1993) Approximate Inference in Generalized Linear Mixed Models. Journal of the American Statistical Association 88: 9-25.
  • Chen H, Wang C, Conomos MP, Stilp AM, Li Z, Sofer T, Szpiro AA, Chen W, Brehm JM, Celedon JC, Redline S, Papanicolaou GJ, Thornton TA, Laurie CC, Rice K and Lin X. (2016) Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies Using Logistic Mixed Models. The American Journal of Human Genetics 98(4): 653-666.
  • Han Chen, Jennifer E. Huffman, Jennifer A. Brody, Chaolong Wang, Seunggeun Lee, Zilin Li, Stephanie M. Gogarten, Tamar Sofer, Lawrence F. Bielak, Joshua C. Bis, John Blangero, Russell P. Bowler, Brian E. Cade, Michael H. Cho, Adolfo Correa, Joanne E. Curran, Paul S. de Vries, David C. Glahn, Xiuqing Guo, Andrew D. Johnson, Sharon Kardia, Charles Kooperberg, Joshua P. Lewis, Xiaoming Liu, Rasika A. Mathias, Braxton D. Mitchell, Jeffrey R. O'Connell, Patricia A. Peyser, Wendy S. Post, Alex P. Reiner, Stephen S. Rich, Jerome I. Rotter, Edwin K. Silverman, Jennifer A. Smith, Ramachandran S. Vasan, James G. Wilson, Lisa R. Yanek, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, TOPMed Hematology and Hemostasis Working Group, Susan Redline, Nicholas L. Smith, Eric Boerwinkle, Ingrid B. Borecki, L. Adrienne Cupples, Cathy C. Laurie, Alanna C. Morrison, Kenneth M. Rice, Xihong Lin. (2018) Efficient variant set mixed model association tests for continuous and binary traits in large-scale whole genome sequencing studies. Submitted.

MPAT (Multiple Phenotype Association Test)

MPAT is an R package for performing multiple phenotype association tests based on univarite GWAS summary statistics. It provides a toolkit of testing procedures to aggregate association evidence across multiple phenotypes for a given genetic variant. All the p-values can be efficiently computed. See user manual here.


  • Liu Z and Lin X. (2016) Multiple Phenotype Association Test using Summary statistics in Genome-wide Association Studies. Submitted.
  • Liu Z and Lin X. (2016) A Geometric Perspective on the Powers of Principal Component Association Tests in Multiple Phenotype Studies. To be submitted.

SCANG (SCAN the Genome)

SCANG is an R package for performing a flexible and computationally efficient scan statistic procedure (SCANG) that uses the p-value of a variant set-based test as a scan statistic of each moving window, to detect rare variant association regions for both continuous and dichotomous traits. The goal of SCANG is to detect whether any rare-variant association region exists across the genome, and if they do exist, to identify the locations and sizes of these association regions. Specifically, SCANG first fits the null linear or logistic model that includes covariates, e.g., age, sex and ancestry PCs, but no genetic variants. Second, SCANG applies set-based tests to all possible candidate moving windows of different sizes within a pre-specified window range of practical interest. Three tests are included in the SCANG framework: the burden test (SCANG-B), SKAT (SCANG-S) and an efficient omnibus test to aggregate information of the burden test and SKAT and different choices of weights using the ACAT method (SCANG-O). Third, SCANG generates an empirical threshold calculated by Monte Carlo simulation, to control the Genome-wise/Family-wise Type I Error Rate (GWER/FWER) at a given level, e.g., 0.05. The windows with the p-values smaller than this threshold are detected as genome-wise significant association regions. Both individual-window p-values and the genome-wise/family-wise p-values of these genome-wise significant windows are given. See user manual here.


  • Zilin Li, Xihao Li, Yaowu Liu, Jincheng Shen, Han Chen, Alanna C. Morrison, Eric Boerwinkle and Xihong Lin. (2019) Dynamic Scan Procedure for Detecting Rare-Variant Association Regions in Whole Genome Sequencing Studies. The American Journal of Human Genetics 104(5): 802-814.

STAAR (variant-Set Test for Association using Annotation infoRmation)

STAAR is an R package for performing variant-Set Test for Association using Annotation infoRmation (STAAR) procedure in whole genome sequencing studies. STAAR is a general framework that incorporates both qualitative functional categories and quantitative complementary functional annotation scores using an omnibus multi-dimensional weighting scheme. See user manual here.


  • Xihao Li, Zilin Li, Hufeng Zhou, Sheila M. Gaynor, Yaowu Liu, Han Chen, Ryan Sun, Rounak Dey, Donna K. Arnett, Stella Aslibekyan, Christie M. Ballantyne, Lawrence F. Bielak, John Blangero, Eric Boerwinkle, Donald W. Bowden, Jai G. Broome, Matthew P. Conomos, Adolfo Correa, Joanne E. Curran, L. Adrienne Cupples, Barry I. Freedman, Xiuqing Guo, Sharon L.R. Kardia, Sekar Kathiresan, Alyna T. Khan, Charles L. Kooperberg, Marguerite R. Irvin, Cathy C. Laurie, Ani W. Manichaikul, Michael C. Mahaney, Rasika A. Mathias, Alanna C. Morrison, Lisa W. Martin, Stephen T. McGarvey, Braxton D. Mitchell, May E. Montasser, Jill Moore, Jeffrey R. O'Connell, Nicholette D. Palmer, Akhil Pampana, Juan M. Peralta, Patricia A. Peyser, Bruce M. Psaty, Susan Redline, Kenneth M. Rice, Stephen S. Rich, Jennifer A. Smith, Hemant K. Tiwari, Michael Tsai, Ramachandran S. Vasan, Fei Fei Wang, Daniel E. Weeks, Zhiping Weng, James G. Wilson, Lisa R. Yanek, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, TOPMed Lipids Working Group, Benjamin M. Neale, Shamil R. Sunyaev, Goncalo R. Abecasis, Jerome I. Rotter, Cristen J. Willer, Gina M. Peloso, Pradeep Natarajan and Xihong Lin (2020) Dynamic incorporation of multiple in-silico functional annotations empowers rare variant association analysis of large whole genome sequencing studies at scale. Nature Genetics In press

ACAT (Aggregated Cauchy Association Test)

ACAT is an R package for implement a generic method for combining p-values. For example, if ACAT is used to combine the variant-level (or SNV-level) p-values, it is a set-based test that is particularly powerful when only a small proportion of variants are casual. ACAT can also be used as an omnibus testing procedure to combine multiple set-level p-values, e.g., the p-values of SKAT or burden tests. The p-value of ACAT is approximated by a Cauchy distribution without the need to know the dependency of the p-values combined by ACAT, which makes the computation of ACAT super fast. This approximation is particularly accurate in the tail of the null distribution.


  • Liu, Y., Chen, S., Li, Z., Morrison, A. C., Boerwinkle, E., & Lin, X. (2019). ACAT: A Fast and Powerful p Value Combination Method for Rare-Variant Analysis in Sequencing Studies. The American Journal of Human Genetics,104(3), 410-421.
  • Liu, Y., & Xie, J. (2018). Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures. Journal of the American Statistical Association. To appear.

SMAT (Scaled Multiple-phenotype Association Test)

SMAT is an R package for performing the Scaled Multiple-phenotype Association Test in cohort or case-control designs to assess common effect of a single nucleotide polymorphism (SNP) on multiple (positively correlated) continuous outcomes measuring the same underlying trait.

The current version of the R package is 0.98. Please download the source .tar.gz file or the .zip file for installation. Please download the manual PDF here. Some example files are also available for download.


  • Schifano, E.D., Li, L., Christiani, D.C., and Lin, X. (2012) Genome-wide Association Analysis for Multiple Continuous Secondary Phenotypes. (in revision)
  • Roy, J., Lin, X., and Ryan, L. (2003). Scaled Marginal Models For Multiple Continuous Outcomes. Biostatistics, 4, 371-384.


TEtest is an R package for conducting integrated analyses of a set of SNPs as well as a gene expression. The program is able to test the overall effect regardless it is from SNPs or gene expression. The testing procedure accommodates various candidate models: SNP-only model, main effect model and main effect plus interaction model.

Software download here.


  • Huang YT, VanderWeele TJ and Lin X (2012) (2014). Joint analysis of SNP and gene expression data in genetic association studies of complex diseases. Annals of Applied Statistics 2014; 8:352-376.
  • Roy, J., Lin, X., and Ryan, L. (2003). Scaled Marginal Models For Multiple Continuous Outcomes. Biostatistics, 4, 371-384.


iGWAS is an R package for conducting mediation analyses for an eQTL SNP set and a gene expression. The testing procedure examines the effect of eQTL SNPs on a dichotomous outcome mediated through gene expression (indirect effect) and the effect independent of gene expression (direct effect). The method accommodates models with and without SNPs-by-gene expression interaction, and includes an omnibus test to select the optimal model using perturbation. The procedure also incorporates family-design where subjects are not independent.

Software download here.


  • Huang YT, Liang L, Cookson W OCM, Moffatt M and Lin X (2015). iGWAS: integrative genome-wide association studies using mediation analysis. Genetic Epidemiology 2015; 39:347-356.

Sparse PCA

R functions for sparse PCA and some examples.


  • Lee, S., Epstein, M.P., Duncan, R. and Lin, X. (2012) Sparse principal component analysis for identifying ancestry-informative markers in genome-wide association studies. Genetic Epidemiology , 36.4, 293-302.

Pathway Analysis

sLDA Pathway Test

An R function for testing for differential expression of a gene set/pathway based on the sparse linear discriminant analysis approach.

Logistic Kernel Machine

A SAS Macro for estimating and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. A SAS Macro for doing semiparametric regression of multi-dimensional genetic pathway data, using least squares kernel machines and linear mixed models.


  • Wu, M.,C., Zhang, L., Wang, Z., Christiani, D. C., Lin, Sparse linear discriminant analysis for simultaneous gene set/pathway significance test and gene selection. , Bioinformatics, , 25,1145-1151.
  • Liu, D., Ghosh, D. and Lin, X. (2008) Estimation and Testing for the Effect of a Genetic Pathway on a Disease Outcome Using Logistic Kernel Machine Regression via Logistic Mixed Models. BMC Bioinformatics, 9, 292.
  • Liu, D., Lin, X. and Ghosh, D. (2007) Semiparametric Regression of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines and Linear Mixed Models. Biometrics, 63, 1079-1088.

Nonparametric Regression


A SAS Macro to fit smoothing splines mixed models, with documentation.

SAS Macro Spline_Mixed

A SAS Macro for calculating a cubic smoothing spline using PROC MIXED.


A SAS Macro to fit generalized additive mixed models using smoothing splines.


  • Zhang D., Lin X., Raz J., and Sowers M. (1998). Semiparametric stochastic mixed models for longitudinal data, Journal of the American Statistical Association, 93, 710-719.
  • Lin X. and Zhang D. (1999). Inference in generalized additive mixed models using smoothing splines, Journal of the Royal Statistical Society, Series B, 61, 381-400.
  • Zhang D., Lin X. and Sowers M. (2000). Periodic semiparametric regression for longitudinal hormone data from multiple menstrual cycles. Biometrics, , 56, 31-39.
Copyright © Xihong Lin, 2010-2016