Quantitative Issues in Cancer Research Working Seminar

There are more than one million new cancer cases every year in the United States. An additional 5-8 million people are living with cancer. Research on cancer has greatly influenced the development of statistical methods in the past two decades and is likely to continue to do so in the future. This working seminar will be a forum for the discussion of current methodologic developments as well as cancer research having a strong quantitative basis. The working seminars will include expository reviews of special topics as well as the presentation of new research. All students and faculty are invited to attend and participate.

Amy Zhou
Doctoral Student, Department of Biostatistics, Harvard University

"A Class of Case-Cohort Designs for Semi-Competing Risks"

ABSTRACT: Semi-competing risks refers to the setting where interest lies in some non-terminal event, the occurrence of which is subject to some terminal event (usually death). While existing analysis methods generally assume complete data on all relevant covariates, it is often the case, particularly with electronic health records databases and/or disease registries, that some information is not readily-available. To mitigate this, outcome-dependent sampling is a common strategy, especially in resource-limited settings, although researchers currently have only limited options in the semi-competing risks setting. We present a novel class of case-cohort designs for semi-competing risks within which researchers have flexibility to tailor allocation of resources in a variety of ways that best suit the disease context and study at-hand. For estimation and inference, we propose to use inverse-probability weighting for a parametric hazard regression-based frailty illness-death model. We present asymptotic results, along with a practical estimator of the asymptotic variance. Simulation results are presented that verify performance of the proposed analysis methods in finite settings and illustrate potential efficiency gains associated with the design. The work is motivated by and illustrated with data from the Center for International Blood & Marrow Transplant Research.

Luke Benz
Doctoral Student, Department of Biostatistics, Harvard University

"Understanding Missing Data when Emulating Target Trials in EHR-Based Observational Studies"

ABSTRACT: In recent years, the target trial emulation framework has developed as a framework for helping researchers mitigate or avoid potential biases in observational studies. In simplest terms, target trial emulation requires researchers to specify the protocol for an ideal clinical trial they would run if possible, and subsequently establish an analogous version of the protocol for the observational study that adheres as closely as possible to that of the target trial.

A critical component of this target trial emulation framework is specifying the eligibility criteria for inclusion in the study. Electronic health record databases serve as a useful data source for observational analyses. However, as EHR are collected for billing purposes rather than any particular clinical question, useful information for statistical analyses may be unavailable. In particular, when using EHR databases to emulate target trials, it is frequently the case that subjects' eligibility status can not be ascertained due to missing data in the covariates that comprise the inclusion criteria for the study. Nearly every observational analysis under the target trial emulation framework excludes all subjects with missing eligibility data, yet this could plausibly introduce selection bias, particularly when a sequence of emulated trials are pooled to increase power.

In this work, I will outline ongoing work on building infrastructure for several simulation studies to better understand settings where excluding subjects with missing eligibility data is problematic, and potential solutions. These simulation studies are motivated by the study of long term effects of Bariatric surgery, and simulation settings are informed by prior analyses conducted on EHR-based studies on the DURABLE cohort at Kaiser Permanente.

Raphael Kim
Doctoral Student, Department of Biostatistics, Harvard University

"Cost constrained optimal treatment regimes under bipartite network interference"

ABSTRACT: Adverse health effects of coal-fired power plant emissions are often studied under bipartite network interference (BNI) settings, in which the treated units are different from the units that outcomes are observed on and treatment units can affect multiple outcome units. There is growing literature on causal effect estimation under BNI, but to our knowledge, none have considered the problem of optimal treatment regimes under BNI. We introduce a Q Learning and A Learning approach for determining cost-constrained treatment under arbitrary BNI, and derive the asymptotic properties of our proposed estimators. We demonstrate the efficacy of our methods in a simulation study.

Phillip Nicol
Doctoral Student, Department of Biostatistics, Harvard University

"Model-based dimension reduction for spatial transcriptomics data"

ABSTRACT: Recently developed technologies can measure gene expression at single cell resolution while simultaneously preserving the spatial location of samples. Standard dimension reduction techniques such as principal component analysis (PCA) can be applied to find a small set of genes that contribute biologically relevant variation. However, standard approaches do not model the count nature of the data which can lead to spurious results. Moreover, the resulting PCA factors may not be spatially coherent in the sense that nearby cells could have very different factor scores. In this talk I will discuss preliminary work on adding spatial penalties to a Poisson-based model for dimension reduction of single-cell gene expression data (scGBM). We will demonstrate the ability of our method to produce spatially coherent factors on both real and simulated data.

Mónica Robles Fontán
Doctoral Student, Department of Biostatistics, Harvard University

"Time-varying effectiveness of the COVID-19 Bivalent Vaccine"

ABSTRACT: SARS-CoV-2 has now become a constant in our daily lives. To mitigate the severe outcomes of infection, the scientific community updates vaccines targeting both ancestral and newer, more prevalent strains. On October 12, 2022, the FDA approved the administration of a bivalent COVID-19 vaccine targeting Omicron strain infections. We intend to evaluate the bivalent vaccine's effectiveness by analyzing cases, hospitalizations, and deaths accumulated in Puerto Rico. In particular, we will compare improvements in effectiveness with respect to different groups: the unvaccinated, those who received the primary series, and those who received the primary series followed by a booster shot. Preliminary results suggest that bivalent vaccine effectiveness significantly declines within 6 months after administration, consistent with the decline in vaccine effectiveness associated with the primary series.