Speakers

Jing Qin, NIH

Biography Jing Qin (Chinese: 秦进) is a mathematical statistician at biostatistics research branch, national institute of allergy and infectious diseases. He obtained PhD of Statistics from the University of Waterloo in 1992. His research interest includes biased sampling and over-identi ed problems, empirical likelihood, causal inference, missing at random or not at random problems, genetic mixture models and infectious diseases.

 

Title: Biased sampling, over-identified parameter problems and applications

Abstract:  Biased sampling problems appear in many areas of research, including, Medicine, Epidemiology and Public Health, Social Sciences and Economics. When a proper randomization cannot be achieved, the observed sample will not be representative of the population of interest. This biased sampling problem appears frequently since in the real world, truly random sampling is not easily achievable or practically feasible. As pointed out by Professor James Heckman (1979), Sample selection bias may arise in practice for two reasons. First, there may be self selection by the individuals or data units being investigated. Second, sample selection decisions by analysts or data processors operate in much the same fashion as self selection". It is worth mentioning that biased sampling problems may occur even if the sampling is unbiased. In fact if we model only the density ratios but leave the baseline density arbitrary in multiple sample problems, then we end up with a biased sampling problem since those populations other than the baseline one can be treated as a biased version of the baseline population, where the selection bias functions are the density ratios.

When a model is de ned through more estimating functions than the free parameters, it becomes an over-identified parameter problem. This problem occurs naturally if there exists auxiliary information. For example, in survey sampling, summarized information is available from published reports. Meta analysis is an exciting area to combine similar studies to achieve a more precise analysis.

In this talk, I will use many examples to demonstrate inference techniques developed in biased sampling and over-identi ed parameter problems play very important roles in cancer and genetic epidemiology study, in COVID-19 incubation period study, in causal inference, in missing at random or not at random data, in mixture models and etc. Also I will show that the latest development on the divide-and-conquer paradigm or parallel and distributed inference in statistics and computer science is a scalable application of over-identified parameter problems. The concept of covariate shift or label shift in machine learning is an essential extension of biased sampling problems in classifications and in conformal inference.