Научная статья на тему 'Bias for gene-environment interactions in stratified cohort. '

Bias for gene-environment interactions in stratified cohort. Текст научной статьи по специальности «Медицинские технологии»

CC BY
207
45
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
BIAS / GENOME-WIDE ASSOCIATION / G X E INTERACTIONS / POPULATION STRATIFICATION

Аннотация научной статьи по медицинским технологиям, автор научной работы — Viktorova E. V., Sultanaev Ya T.

The goal was to investigate population stratification bias for Gene-Environment (GxE) interaction in Case-Control studies. To compare methods of GxE interaction effect estimates in terms of the robustness to the population stratification bias. We study the bias for GxE interaction, due f to population stratification. Simple formula was derived to measure stratification bias of interaction term in Cace-Control design. Simulation study for the range of realistic scenarios was performed, measuring the bias by introduced formula: CIRcc (Confounding Interaction Ratio for CC estimator). Five methods for GxE interaction parameter estimates were analyzed. For this purpose, we compared coefficients of logistic regression models, mis-specified model, omitting information about population stratification to corresponding term in true-model, introducing subgroups. We run simulations admixing similar or more divergent subpopulations. CC estimator of GxE interaction appear to be significantly more robust to the presence of population stratification as compare to Case-Only (CO) estimator. Lewinger's Hierarchical Bayes (LHB) estimator was compared against four others. In all scenarios CC, Mukherjee's Empirical Bayes (EB) and LHB outperform CO and Murcray's two-step approach (2ST).

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Bias for gene-environment interactions in stratified cohort. »

раздел МАТЕМАТИКА

UDC 316.323.2

BIAS FOR GENE-ENVIRONMENT INTERACTIONS IN STRATIFIED COHORT.

© E. V. Viktorova1’2*, Ya. T. Sultanaev1

IBashkir State University 32 Zaki Validi Street, 450074 Ufa, Republic of Bashkortostan, Russia 2Georg-August-University of Gottingen 32 Humboldtalle Street, 37073 Gottingen, Germany.

E-mail: elena.viktorova@med.uni-goettingen.de

The goal was to investigate population stratification bias for Gene-Environment (GxE) interaction in Case-Control studies. To compare methods of GxE interaction effect estimates in terms of the robustness to the population stratification bias. We study the bias for GxE interaction, due f to population stratification. Simple formula was derived to measure stratification bias of interaction term in Cace-Control design. Simulation study for the range of realistic scenarios was performed, measuring the bias by introduced formula: CIRcc (Confounding Interaction Ratio for CC estimator). Five methods for GxE interaction parameter estimates were analyzed. For this purpose, we compared coefficients of logistic regression models, mis-specified model, omitting information about population stratification to corresponding term in true-model, introducing subgroups. We run simulations admixing similar or more divergent subpopulations. CC estimator of GxE interaction appear to be significantly more robust to the presence of population stratification as compare to Case-Only (CO) estimator. Lewinger’s Hierarchical Bayes (LHB) estimator was compared against four others. In all scenarios CC, Mukherjee’s Empirical Bayes (EB) and LHB outperform CO and Murcray’s two-step approach (2ST).

Keywords: Bias, Genome-wide association, G x E Interactions, Population stratification.

1 Introduction

Most of the common multifactorial diseases are characterize by complex interplay between genetic and environmental factors, therefore useful predictions could be made only taking to the account existing interaction effects. Recent significant reduction of the genotyping costs made sequencing of large samples consisting of thousands of individuals possible. Many statistical methods were proposed to estimate GxE interaction effect in large Case-Control or cohort studies, however not all of the proposed methods are robust to the presence of some hidden substructure in the study sample, such as population stratification and therefore usually lead to the biased effect estimates.

Extent of population stratification bias depends on specific characteristics of the study sample, particularly on the number of admixed ethnicities, differences in genotype and exposure frequencies, differences in disease risks. We decided to investigate the magnitude of bias due to population stratification for GxE interaction in Case-Control studies and compare methods of GxE interaction effect estimates in terms of the robustness to the population stratification bias.

Firstly we investigated the degree of bias for GxE interaction that arises due to presence of population stratification. Simple formula was derived to measure population stratification bias of interaction term in Case-Control design.

Simulation study for a range of realistic situations was performed to measure population stratification bias by introduced formula for CIRcc (Confounding Interaction Ratio for Case-Control estimator of GxE interaction). We compared our results with previous studies of population stratification bias in Case-Only design [1].

Secondly we compared five methods for GxE interaction parameter estimation in terms of their robustness to the presence of population stratification in the study sample. For this purpose we compared coefficients of two logistic regression models, mis-specified model, that doesn’t account for the presence of population stratification to corresponding term in the model with introduced subgroups.

We run simulations for the set of eight different scenarios in each case admixing similar or more divergent subpopulations.

In current study for the first time Lewinger’s Hierarchical Bayes type estimator (LHB) [2] for GxE interaction was compared to classical Case-Control estimator (CC), Case-Only estimator (CO) [3], Murcray’s two-step approach estimator (2st) [4] and Mukherjee’s Empirical Bayes type shrinkage estimator (MEB) [5].

2 Part I

Theoretical bias in Case-Control studies (CIRcc)

During the recent years population stratification bias of GxE interaction have been extensively studied [6, 7, 8, 1]. In 2008 Wang and Lee introduced simple formula to measure magnitude of the population stratification bias in Case-Only design and compared results to observed bias in Case-Control studies of Gene-Gene (GxG) interactions. However corresponding formula to measure the degree of bias due to population stratification of GxE interaction in Case-Control studies was not presented till now. To fulfill this gap we derived formula to measure the bias of GxE interaction parameter in stratified cohort of cases and controls. Mathematical form of the introduced formula for Case-Control design is in close agreement with the one for Case-Only design.

* автор, ответственный за переписку

Statistical Methods

A general measure of confounding is Confounding Risk Ratio (CRR) [9]. In the same manner we can define Confounding Interaction Ratio in Case-Control Studies (CIRcc).

To derive formula for CIRcc we followed method and used notations described in [1].

Assume that study population consists of / (j — 1 .., /) subpopulations. We let E(E) denote presence (absence) of exposure under the interest and G ( G ) presence or absence of susceptible genotype. Next we define Pj to be prevalence of environment (£'), to be the frequency of genotype of interest, bj to be baseline disease risk

Pj

(baseline disease risk is a risk for the non-cariers of the risk allele copy). Then ft- = —— are exposure prevalence

J l-Pj

odds and

q;

are genotype frequency odds. Let Tflj denote number of subjects in the / subgroup (total l fjp denote relative rate of disease for those subjects with ( G,

as

person-time observation in jth stratum). Let compare to those (G , E ) subjects.

Similarly RRq is defined to denote relative risk of disease individuals with (G, E) compare to the same reference group ( Gf E) and RRE is relative risk of disease for individuals with (GfE\ From here on we assumed

KrtGEfKttGfKKE to be constant and same for each of / strata. In other words the stratification in the population only segregates people [1].

According to the definition, GxE interaction on multiplicative scale is given by

RR

GE

-, one says

RRgRRe

int Ф 1. In traditional Case-Control design RRint can be estimated in the

that there is an interaction effect when following way. For the Case-Control studies relative risks are not readily available therefore we should calculate odds ratios (OR) as a good approximators of risks when sample size is large. Then G x E interaction on the same

orGE

multiplicative can be presented as follows

orgore

Assume that study was able to collect all diseased individuals cases, as well as all non-diseased individuals

controls from considered sub-population. Let OR 1G denote risk (disease rate) for the person who carriers susceptible

genotype and R£\ qq for one who doesn’t, then

[G

OR

1C

OR

, where

11G

fl-l

and

LQ G

. In the same manner let

OG

IE

who has been exposed to the particular environment and

denote risk of the disease for the person

l0 Б

for one who wasn’t,

thereforeO/?^ =

OR

3-Я

where

and

l0E

7=i

be defined as

™j=irTvO-;

і с

lge

-. Finally joint risk in presence of genotype and environmental exposure could

_ 5ff=i mjqjPjbj

OR

OR

1 GE

where

■1 GE

and

о GE

J=l

Rqge —

-. Thus CIRcc is defined as the ratio of the bias relative rate to the true relative

OR^

rate CIRcc = - Detailed derivation of the

of I

0Rint

derived in this paper is in close agreement with

cc can be found in the Appendix A. Mathematical form f? and CRR measures [1,10].

а г,- - = т. &, where

«“Зг Sl=^' “ " 1м Ъ-^А- '«оІ-У.іт-

then

<Pe = Zj=iwiei hi' <Pg = Zj=iwj9j' <PG = Zj=iwjdA-

nUrr lJj=±wie>8^i y°xy° r$Gcv£cv£+i- .

- - — ■ ■ - ■ . . . ■!■■ _,, or if we define separate weights for cases

S/=i’■'/«/*/ ® r°cCV°CvS+lYB ' '

and controls in the following manner ------——-----—--------: - "weight" for controls,

^jCL-PjX1-^)6! r^CV^CV^ + l ru

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

W;i ------- ---------------------------------------------------------------------------------------------:- "weight" for cases then LlttCC =-, where ^fr and

rRCaCVE0CVCa+1

are the coefficients of variation of the exposure prevalence odds and the genotype frequency odds, respectively;

and TEG is correlation coefficient between the exposure prevalence odds and genotype frequency odds.

Both formulas lead to the same results. It can be easily seen from the formula above that there would be no population stratification bias if between subgroups the exposure prevalence odds and the genotype frequency odds are uncorrelated, there is no variation in the exposure prevalence odds or there is no variation in the genotype frequency

odds. For the CIRcc overestimation (CIRcc 1) of the parameter occur when genotype and exposure are

negatively correlated and underestimation (CIRcc < 1) occurs when prevalence odds of exposure and frequency of genotype have positive correlation.

Simulation Settings

To investigate CIRcc we performed simulation study under range of realistic scenarios. Generally we followed procedures described by Wacholder in 2000 [6], and Wang and Lee in 2008 [1]. However we investigated bias for the samples including 2, 3, 5 or 8 subpopulations. For each scenario we assumed that there are / = 2,3,5 OT 8 strata each of equal size. We allowed for different genotype and exposure frequencies and different baseline disease risk in each stratum. Genotype and exposure frequencies were set to one of the tree

intervals 0.01 0.3,0.1 0.4,0.3 0.6. The baseline risk of the disease was chosen from

intervals 1.1 1.5 CtTLd 1.1 3.0.

In each of the corresponding intervals values of the genotype, exposure frequencies and baseline risk of the disease for each of 8 strata were set to be equally distant on the log scale. It means that we picked 8 values for the corresponding parameters in each of the interval so that chosen values are equidistantly scaled on the logarithmic scale,

such a choice of the parameters is unique for each settings. For / = 2 subpopulations we considered all possible combinations of corresponding pairs of genotype, exposure frequencies and disease risks out of 8 possible values. For

■ = - subpopulations_we considered all possible combinations of triples from eight values of genotype, exposure frequencies and background disease risks. To investigate both possible situations when genotype and exposure are positively and negatively correlated we fixed disease risks and randomly permuted values for genotype and exposure

frequencies in triples. For ) = 5 subpopulations we repeated procedures described for ) = 3 for combinations of

five values of parameters. For ) = 8 subpopulations we fixed background disease risk and randomly permute eight values for genotype frequencies and eight values for exposure prevalence. We repeated same procedure for 100000 times. We obtained distribution of CIRcc for each of 18 scenarios and presented

■ 1 ■. I ■ ■.. J 3 T ■ ■. ^ 0 T ■ ■ Ci. ..Cl / 3 l percentiles of its distribution.

Table 1 summarizes observed results. Detailed results for j = 3,5 admixed strata are presented in

Appendix B.

As it was expected, bias of parameter estimate for Case-Control study is considerably smaller for all scenarios as compared to the Case-Only estimator (Table 2) when proportion of admixed subpopulations of equal size.

Bias of the interaction term generally is bigger than bias in main effects studied by Wacholder etc. (2000). In contrast to Case-Only design, magnitude in variation of background disease risks affects the magnitude of the bias; bias is larger for bigger variation in disease risks (Table 1).

Table 1

Confounding Interaction Risk Ratios i^^cc) for Case - Control Design calculated for different values of risk ratio,

exposure and allele frequencies

Parameters* ClR,c possible size 2 grou for all ubsets of rom 8 ps** .. for 100 000 simulations of random permutation of 8 values***

Risk Ratio Exposure frequencies Genotype frequencies Most extreme negative, min§ Most extreme positive, max Most extreme negative, min 25th percentile 50th percentile 75th percentile Most extreme positive, max§§

1.1—1.5 0.01-0.3 0.01-0.3 0.84 1.09 0.89 0.98 1.00 1.03 1.12

1.1—1.5 0.01-0.3 0.10-0.4 0.89 1.10 0.94 0.99 1.00 1.01 1.07

1.1—1.5 0.01-0.3 0.30-0.6 0.92 1.08 0.95 0.99 1.00 1.01 1.04

1.1—1.5 0.10-0.4 0.01-0.3 0.89 1.02 0.94 0.99 1.00 1.01 1.07

1.1—1.5 0.10-0.4 0.10-0.4 0.93 1.02 0.97 0.99 1.00 1.01 1.04

1.1—1.5 0.10-0.4 0.30-0.6 0.96 1.03 0.98 0.99 1.00 1.01 1.03

1.1—1.5 0.30-0.6 0.01-0.3 0.92 1.00 0.95 0.99 1.00 1.01 1.05

1.1—1.5 0.30-0.6 0.10-0.4 0.96 1.00 0.97 0.99 1.00 1.01 1.02

1.1—1.5 0.30-0.6 0.30-0.6 0.99 1.01 0.98 1.00 1.00 1.00 1.02

1.1-3.0 0.01-0.3 0.01-0.3 0.61 1.43 0.68 0.93 1.00 1.09 1.46

1.1-3.0 0.01-0.3 0.10-0.4 0.70 1.41 0.81 0.96 1.00 1.04 1.23

1.1-3.0 0.01-0.3 0.3-0.6 0.76 1.31 0.87 0.97 1.00 1.03 1.15

1.1-3.0 0.10-0.4 0.01-0.3 0.70 1.11 0.80 0.96 1.00 1.05 1.23

1.1-3.0 0.10-0.4 0.10-0.4 0.79 1.13 0.89 0.98 1.00 1.02 1.11

1.1-3.0 0.10-0.4 0.30-0.6 0.86 1.15 0.92 0.98 1.00 1.02 1.08

1.1-3.0 0.30-0.6 0.01-0.3 0.76 1.03 0.87 0.97 1.00 1.03 1.17

1.1-3.0 0.30-0.6 0.10-0.4 0.86 1.04 0.92 0.98 1.00 1.02 1.08

1.1-3.0 0.30-0.6 0.30-0.6 0.92 1.07 0.95 0.99 1.00 1.01 1.06

* risk ratio, genotype frequency and exposure frequency ranges are spaced to be equidistant on the logarithmic scale

** study cohort consists of two discrete, admixed populations

*** study cohort consists of eight discrete, admixed populations

§ The strongest negative association between GxE interaction and disease

§§ The strongest positive association between GxE interaction and disease

Table 2

Confounding Interaction Risk Ratios C^^co) for Case - Only Design calculated for different values of risk ratio, exposure and allele

frequencies

Parameters _. ■ ■ for all possible subsets of size 2 from 8 groups .. for 100 000 simulations of random permutation of 8 values

Risk Ratio Exposure frequencies Genotype frequencies Most extreme negative, min Most extreme positive, max Most extreme negative, min 25th percentile 50th percentile 75th percentile Most extreme positive, max

1.1-1.5 0.01-0.3 0.01-0.3 0.09 2.67 0.30 0.67 0.93 1.30 2.61

1.1-1.5 0.01-0.3 0.10-0.4 0.33 2.18 0.56 0.83 0.98 1.19 1.75

1.1-1.5 0.01-0.3 0.30-0.6 0.50 1.93 0.67 0.88 0.99 1.13 1.52

1.1-1.5 0.10-0.4 0.01-0.3 0.30 2.18 0.56 0.83 0.98 1.19 1.74

1.1-1.5 0.10-0.4 0.10-0.4 0.49 1.81 0.73 0.91 1.00 1.09 1.38

1.1-1.5 0.10-0.4 0.30-0.6 0.61 1.61 0.80 0.94 1.00 1.06 1.25

1.1-1.5 0.30-0.6 0.01-0.3 0.46 1.93 0.68 0.88 0.99 1.13 1.52

1.1-1.5 0.30-0.6 0.10-0.4 0.60 1.61 0.80 0.94 1.00 1.06 1.26

1.1-1.5 0.30-0.6 0.30-0.6 0.69 1.44 0.85 0.96 1.00 1.04 1.18

1.1-3.0 0.01-0.3 0.01-0.3 0.09 2.53 0.27 0.66 0.92 1.32 2.84

1.1-3.0 0.01-0.3 0.10-0.4 0.34 2.10 0.51 0.83 0.99 1.19 1.90

1.1-3.0 0.01-0.3 0.3-0.6 0.51 1.88 0.64 0.88 0.99 1.13 1.58

1.1-3.0 0.10-0.4 0.01-0.3 0.28 2.10 0.51 0.82 0.98 1.18 1.90

1.1-3.0 0.10-0.4 0.10-0.4 0.49 1.77 0.71 0.91 0.99 1.09 1.40

1.1-3.0 0.10-0.4 0.30-0.6 0.62 1.60 0.77 0.94 1.00 1.07 1.29

1.1-3.0 0.30-0.6 0.01-0.3 0.41 1.88 0.64 0.88 0.99 1.13 1.57

1.1-3.0 0.30-0.6 0.10-0.4 0.60 1.60 0.79 0.94 1.00 1.07 1.30

1.1-3.0 0.30-0.6 0.30-0.6 0.69 1.44 0.84 0.96 1.00 1.05 1.20

* risk ratio, genotype frequency and exposure frequency ranges are spaced to be equidistant on the logarithmic scale

** study cohort consists of two discrete, admixed populations

*** study cohort consists of eight discrete, admixed populations

§ The strongest negative association between GxE interaction and disease

§§ The strongest positive association between GxE interaction and disease

Results

We studied bias due to population stratification in Case-Control design. Formula to measure the magnitude of the bias was derived for the Case-Control estimator of GxE interaction. It was shown that Case-Control estimator of GxE interaction parameter is more robust to presence of population stratification as compared to Case-Only estimator for all considered here study conditions. On average bias of interaction term for Case-Control experiments is around 2-3% and can be considered negligible, however in some extreme settings bias can go up to 30-40% or even higher. The most

extreme bias was observed when sample consists of two different subgroups and its gets close to zero (CIRcc sw 1) when the number of strata in population increases (Table 1 compare j = 2 and j = 8). Therefore researches should always be aware of the presence of hidden substructure in the study sample as it may lead to the unreliable estimates of effects under consideration.

3 Part II

Methods comparison

We found out that the biggest bias occurs when there are only two subgroups admixed together, therefore we decided to investigate the magnitude of bias for J = .2 strata in cohort for five different types of GxE interaction parameter estimators. We included into our study classical Case-Control estimator, Case-Only estimator (CO), Murcray’s two-step approach estimator (2st), Mukherjee’s Empirical Bayes type shrinkage estimator (MEB) and Lewinger’s Hierarchical Bayes type estimator (LHB) of GxE environment interaction.

The goal was to compare methods in terms of their robustness to population stratification and if possible to identify the most robust estimator. For the first time MEB and LHB estimators were studied in their performance under the condition of present population stratification in the sample.

Statistical Methods

We followed the idea of Wang, etc. (2006) [8] who studied bias of GxG and GxE interaction in Case-Control studies with employment of logistic regression models. In addition we considered not only Case-Control design, but also Case-Only, 2st, MEB and LHB estimators.

For the analysis let G = 1 (G = 0) be binary variable that denotes presence (absence) of genotype of interest and E = 1 (E = 0) be binary variable that represent exposure, where E = 1 means presence and E = 0 absents of environment under the interest. Let Y denote binary phenotype where Y = 1 means individual belongs to cases and correspondingly 7 = 0 means individual belongs to controls. Assume that study sample consists of two 0 = 1,2) strata represented by Ct'nd S2, where S; is indicator variable such that S; = 1 if person belong to

subgroup j and zero otherwise. Therefore association between disease and G, E can be modeled in the following form with the use of logistic regression modeling approach in Case-Control study

logit(P(Y|$,e)) = a± + a2S2 + ft E + ftG + pccG X E.

where regression coefficient /?]_ is a measure of genetic main effect, j32 is a measure of the environmental main

effect, f^cc is an interaction effect between G QJld E. Without loss of generality let CC^_ specifies the log odds of the

disease (i.e. lOQlt function of the baseline disease risk) in the lowest-risk ethnicity S^ and 0 c -. flE^, where Ci-j

specifies the log odds ratio of the disease risk comparing ethnicity S2 versus S^. Therefore to evaluate bias we can omit the term from the model that is responsible to reflect the ethnic status of the individual and define mis-specified model for Case-Control study as

\q f= a E + ft^G + X E.

Bias of the parameter estimate is the difference between corresponding parameters in mis-specified model and true model. So population stratification bias of GxE interaction is equivalent to biCLScc = — Pec- 'vVc did not

consider any issues concerning variance of estimates or precession, assuming for the large samples

Z'.J. . = J. = C'C CO .:E3 Ln 3. In a similar manner for the Case-Only studies, true model is

given by logit(P(E\Y = 1, = Cl1 + Ct.2S2 +

and mis-specified model is given by logit(P(E\Y = 1,#)) = a*+fic0G therefore 0 \ J - - = J - - _ J - -. For the subgroup of controls needed in the further steps for other methods of parameter estimates true model is given by

logit(P(ElY = 0,g)) = a± + a2S2 + fiCLG

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

and mis-specified model is given by = 0, g)) = et*+fi*LG.

For the first step of Murcray’s two step procedure we have the following estimates of the parameters in true model

logit[P(E = II#)) = a± + a2S2 + (32stG

and in mis-specified model

And following estimate of bias bidS2sT = $2 sr — $2ST-Mukherjee’s Empirical Bayes type shrinkage estimator is defined as (Ref)

ficL

Pmeb —

Therefore

&CL + ®CC

Й*

A A J J T- A

VmEB =Pb>- fa™ ft CL

P CL aCC

Where <J^C - variance of the estimate from mis-specified Case-Control model.

And bias is defined as biasMEB = ^EB ~ flMEB.

For the Lewinger’s Hierarchical Bayes estimator we have bio.SLHE = Pihb —$LHB' where = PCO EpOSt X SlQTl(J3and = $cq

Where E*iJst estimates of posterior probabilities from mis-specified hierarchical model.

We compared described above five methods of GxE interaction estimation in terms of the magnitude of the bias produced under condition of population stratification.

Simulation Settings

We generated program code (written in R) to be able to create stratified sample that includes admixture of two subpopulations. We run our simulations under 8 scenarios representing extreme level of admixture, moderate level of admixture and matched case-control sample. We generated data sets consisting of 1000 cases and equal number of controls sampled from admixed population with different proportions from each subpopulation.

In all simulations we created genotype data at 1000 random SNPs. We simulated four different types of SNPs. Of those four types, one SNP with GxE interaction effect, association to the trait, 100 SNPs that have G-E association but not related to the disease of interest, 100 SNPs that we called differentiated SNPs following notation in [11]. We called the rest of SNPs dummy SNPs because they were set as disease unrelated. Hardy-Weinberg equilibrium was assumed for each subpopulation. In each strata allele frequencies at biallelic marker loci for dummy SNPs and SNPs with G-E

association were generated independently from bctil distribution following Balding-Nichols model as in Devlin and

Roeder (1999) [12] with two parameters p( l Fst)t^st and Cl p) (1 Pst)/^st witli Fst = 0.01

and p the ancestry population allele frequency from UTii f 07771 [0.1,0.9 ] distribution.

Fst is Wright’s fixation index a measure of genetic divergence among subgroups [13], Fst = 0.0 1 is typical value for the European populations. For the first category of SNPs, interacting SNP allele frequency in low risk subpopulation was fixed at value 0.1 and we vary this value in the risk subpopulation from 0.4 to 0.8 respectively. For the second category of SNPs - differentiated SNPs with no association to the disease, we assumed large variation in

allele frequency by setting values to be equal to 0.04. For the third category of SNPs - SNPs with G-E

association we assumed moderate level of association and therefore log(OK) for those SNPs were drawn from

CJ J 1 .'distribution. For the simulation of casual SNP we assumed multiplicative trait model and fixed relative risk at the value of 2 for the casual allele. Exposure frequencies and baseline disease risk were fixed for all scenarios. In low risk ethnicity we set prevalence of the environment to be equal to 0.1 and background disease risk to be 2%. In high risk ethnicity corresponding values were set at 0.3, 0.5 and 10% or 5% respectively.

For our study we considered cohort with two underlying discrete subpopulations that could be presented in cases and controls in different proportions. We chose the case-control sampling fractions at three levels representing extreme, moderate and matched differences between cases and controls in “low” and “high” risk ethnicities respectively. For the extreme scenario we used sampling ratio equal to 0.2 representing proportion of cases from low risk ethnicity. It means that we sampled 20% cases from low risk ethnicity and 80% cases from “high-risk” ethnicity and oppositely for the

controls. For moderate scenario sampling ratio was set to 0.4 and for matched design to 0.5. Note that it is expected that there will be no confounding in the matched Case-Control design.

Table 3 summarizes all the settings for each of 8 scenarios that we have been simulating.

We replicated each scenario 1000 times and obtained distribution of bias and mean squared error '■■S£ = b: ; - ; along with their average values for five methods.

Table 3

Scenario* Parameters

ra p01 pOl 1 pel I pel pg1 pgl

1 0.5 0.02 0.1 0.1 0.3 0.1 0.4

2 0.5 0.02 0.1 0.1 0.3 0.1 0.8

3 0.4 0.02 0.1 0.1 0.3 0.1 0.4

4 0.4 0.02 0.1 0.1 0.3 0.1 0.8

5 0.4 0.02 0.05 0.1 0.5 0.1 0.4

6 0.2 0.02 0.1 0.1 0.3 0.1 0.8

7 0.2 0.02 0.1 0.1 0.3 0.1 0.4

8 0.2 0.02 0.05 0.1 0.5 0.1 0.8

* Scenarios employed in analysis

ra sampling ratio (proportion of cases sampling from “low-risk” ethnicity), (1-ra) - proportion of controls and it is opposite for “at-risk” ethnicity

p01 baseline disease risk in “low-risk” ethnicity

p02 baseline disease risk in “at-risk” ethnicity

pe1 prevalence of environmental exposure in “low-risk” ethnicity

pe2 prevalence of environmental exposure in “at-risk” ethnicity

pg1 susceptible genotype frequency in “low-risk” ethnicity

pg2 susceptible genotype frequency in “at-risk” ethnicity

Results

Table 4 and Table 5 summarize the results of our study. Classical Case-Control estimator of GxE interaction, recently introduced LHB estimator and EB shrinkage estimator tend to outperform others in terms of smaller bias produced by them in all of the scenarios.

For the moderate admixture or matched Case-Control design population stratification bias of Case-Control estimator is negligible, however that is not the case for Case-Only estimator or two step approaches. One should note that for the matched Case-Control design including two admixed subpopulations the Case-Control estimator tend to give better results than any other, and even to LHB estimator, which shows bigger bias. Explanation can be found by consideration of the main idea behind both methods. Case-Control method estimates parameter comparing odds in cases and controls and therefore tends to overcome lack of homogeneity in both groups. It is not taking to the account other SNPs and analyzing each SNP separately. However LHB method taking to the consideration all other SNPs and therefore may suffer from difficulties in right estimate when there is some hidden substructure in other SNPs, like G-E association or presence of differentiating SNPs which frequencies vary significantly across subpopulations.

It is known that bias of interaction term always bigger that bias of estimating main effects [8]. However for the most conditions bias of the interaction term small, less than 0.8 for the absolute values of the bias or even less for all situations tested. Note that generally bias and MSE of LHB and EB approach are the smallest compare to other methods of estimation. Therefore it is suggested to use LHB method to estimate GxE interaction effects in samples susceptible to have hidden substructure such as population stratification.

Table 4

Bias as observed difference of the estimates in two logistic regression models

Scenario* I Bias CC I Bias CO I Bias 2st I Bias MEB I Bias LHB

1 0.124 0.418 0.464 0.100 0.183

2 0.351 0.234 0.487 0.260 0.249

3 0.361 0.636 0.937 0.181 0.192

4 0.101 0.905 0.860 0.247 0.267

5 0.009 0.489 0.453 0.083 0.176

6 0.015 0.969 0.818 0.300 0.332

7 0.179 0.659 0.732 0.157 0.178

8 0.518 0.383 0.746 0.418 0.301

MSE (MSE=^* ^^2 VOLT) Table 5

| Scenario* | MSE CC | MSE CO | MSE 2st | MSE MEB | MSE LHB |

1 0.071 0.201 0.230 0.073 0.087

2 0.208 0.076 0.252 0.148 0.134

3 0.210 0.441 0.903 0.117 0.106

4 0.066 0.861 0.764 0.148 0.159

5 0.053 0.269 0.221 0.067 0.089

6 0.055 0.986 0.695 0.184 0.208

7 0.078 0.460 0.549 0.082 0.082

8 0.337 0.166 0.571 0.246 0.159

* Scenarios employed in analysis (Table3 for specification)

CC - Case-Control estimator

CO - Case-Only estimator

2st - Murcray’ s two step approach estimator

MEB - Mukherjee’s Empirical Bayes type shrinkage estimator

LHB - Lewinger’ s Hierarchical Bayes type estimator

4 Discussion

Fairly large numbers of researches have been done to study bias in main effects [6, 14, 7, 10, 15] as well as bias of GxE interaction [8,1] due to population stratification. We were interested in deriving the formula to measure the magnitude of the population stratification bias in Case-Control studies. Mathematical formula for the derived measure CIRcc is in close agreement with CIR introduced by Wang and Lee in 2008 for the Case-Only design. However the degree of bias is significantly different for Case-Control compare to the Case-Only estimator. Our simulation study results agreed with findings published in [8]. The magnitude of the population stratification bias almost always less for the Case-Control estimator of GxE interaction compare to the Case-Only estimator.

We also studied the robustness of the five common methods of GxE interaction effect estimates to find out whether there is a method that outperforms others in different type of experimental conditions. Based on the results of the second part of our theoretical study of the population stratification bias we conclude that classical Case-Control estimator, recently introduced Lewinger’s Hierarchical Bayes estimator of GxE interaction or Empirical Bays type shrinkage estimator should be preferred when there are reasons to think that study cohort may include hidden substructure. It was shown in [8] and in our current simulation study that, bias of GxE interaction effect due to stratification usually at a small extend. However it still can reach extreme values in real situations. Therefore researches should always take to the account possible negative effect of population stratification and try to adjust for the substructure by matching cases and controls as proposed in [15] or for example by performing principal component analysis (PCA) [16] or other available methods.

5 Conclusions

We introduced formula to measure population stratification bias in Case-Control studies of GxE interaction. Results of our simulation study of population stratification bias agreed with the results published by Wacholdet etc [6], for main effects. We compared our results with Wang etc [8], and their study of bias in Case-Control studies.

We showed that generally in Case-Control studies, GxE interaction parameter estimate bias for no more than 2-3% except for some extreme situations that sometimes cannot be easily avoided. Researches should be aware that bias can rise up to 30-40% in Case-Control studies and to over than 50% in Case-Only studies.

It was shown that Case-Control estimator of GxE interaction significantly more robust to presence of population stratification compare to Case-Only estimator.

Wang etc. in 2008 mentioned that in Case-Control studies individuals with disease and disease-free subjects have same structure and therefore bias in GxE interaction between these groups cancels each other out. We were able to find the same effect in our study and conclude that matched Case-Control design helps avoid bias in estimate of GxE interaction.We investigated magnitude of bias in both matched and unmatched Case-Control designs. For unmatched Case-Control design cases and controls were sampled in different proportions.

Lewinger’s Hierarchical Bayes (LHB) estimator of GxE interaction was compared with four other estimators (CC, CC, 2ST and MEB). In all scenarios CC, EB and LHB estimator outperforms CO and 2ST in terms of degree of bias. LHB method produces smaller, compare to Case-Control estimator, bias and mean squared error (MSE) in unmatched designs and shows comparable to EB method bias and MSE in matched designs. However in matched Case-Control design classical CC estimator of GxE interaction outperforms others methods, because shows no bias in estimates. Therefore we conclude that LHB estimator should be preferred to estimate GxE interaction effect in unmatched samples suspected to have unobserved population stratification.

Appendix A

Derivation of

CIR

LC

Assume Y is binary phenotype variable, such that Y = 1 means individual has a disease and V = 0 means individual doesn’t suffer from the disease. Let E j E indicate the presence/absence of environmental exposure, and G / G, the presence/absence of genotype. The data can be represented in 2x4 contingency table

G=1 G=0

Status E=1 E=0 E=1 E=0

Y=1 Щи nuo nioi nioo

Y=0 non n01Q nwi ^000

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

— L.1 is abbreviation for ■J • 11 1 1 j and controls

where E — 1/E —0 is abbreviation for exposed/unexposed subjects, G — carriers/non-carriers of susceptible genotype.

We can consider the observed cell counts for cases 111 = (Vll00,7ll01,

multinomial distributions 111 ^ MN(Ncases,p 1) and tiO k MN(Ncontrolsfp0), where

probabilities of the underlying case/control population. For Case-Control studies relative risks are not readily available therefore we should calculate odds ratios (OR) as good approximation of risks.

Therefore GxE interaction effect can be estimated by qX£ given in the following form,

PCxE ___ U[ ___ QREG /ip __________ Pqqq~Pho n ________ Pq0qP:L01 [) __ PoooPlll

Poio'Pioq PaoiPioo

From here on unexposed, non-carriers individuals are used as a reference group. We can rewrite the ratio in the next manner,

°Reg PoooPmPoioPiooPooiPioo P111P010P100P001

lGE

PonPioo

ORg-ORc PoiiPiooPnooPimPooaPioi PnoPioi ----------, =>

Let

PonPiioPaooP 101

*1 =

PqiiFqqq

P010P001 oreg

я.

oreorg r0

From here on subscript C in the right upper corner means confounding.

e non — carri ers of succept ible g enotype => of the exposed individuals,

’ the non — exposed individuals =>

Let R1G T R.og — risk of

Let R-i E r\

}C

[G

flC

‘QG

Roe

I с

■E

IE

OE

Let K environment,

Rqge ~ risk

=>

)C

lGE

the non — carriers r> rige

carriers of succeptible genot}*pe exposed to the succeptible genotype, non — exposed

GE

40 GE

Ijmjqjpjbj Ijrn/,1-

ri ' OGE г1

Therefore confounding interaction

can be defined in the next form:

R

1GE

ic

int

) с

/

OGE

j с , LC'

ic

lE

1 G

Z

Я

OG

1E/,

я

0E

1 OG Е/ urjgj'Ej Wjbj

RiGE,

Therefore -д^-

1rqE 2/w/e/S/w/bj Sj-w/e/E/iir/bj

СУДсс = -0

,=±wJej3]bi

■y=i^jgj2y=i"fjefj у/ sJ=lw/eibJ

,/ ... _ -i

“ П = £j=iw70ej <РВ = 2/=iwae;.

—® V/ —1 ri / ( -i

<PG — Lj = 1 wjo3j *PG — 2-ij=\'wjl9j' *P S - denote means, subscript 1 specifies controls.

If SD’s are standard deviations define as

cases

and 0

- =i 1. ~ s then

If we define separate “weights” functions for cases and controls in the following manner

Wj0 =

JTliCl-

; - "weight" for controls, W,

miC1_

cases then

71

fi-t

■ - “weight” for

CIRcc = rgGlC^1+1,

rEG0 CVEaCVG 0 + 1

where CVE and CVG are the coefficients of variation of the of the exposure prevalence odds and the genotype

frequency odds, respectively in cases with subscript 1 and controls with subscript 0; and TEG is correlation coefficient between the exposure prevalence odds and genotype frequency odds.

Appendix B

Suplementary Tables

Table 6

Confounding Interaction Risk Ratios CCIRcc) for Case - Control Design calculated for different values of risk ratio, exposure and

allele frequencies

Parameters* L : rlfor 100 000 simulations of random permutation for all possible combinations of 3 values out of 8**

Risk Ratio Exposure frequencies Genotype frequencies Most extreme negative, min § 25th percentile 50th percentile 75th percentile Most extreme positive, max §§

1.1-1.5 0.01-0.3 0.01-0.3 0.86 0.98 0.99 1.01 1.17

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

1.1-1.5 0.01-0.3 0.10-0.4 0.91 0.99 1.00 1.01 1.10

1.1-1.5 0.01-0.3 0.30-0.6 0.93 0.99 1.00 1.00 1.07

1.1-1.5 0.10-0.4 0.01-0.3 0.92 0.99 1.00 1.01 1.10

1.1-1.5 0.10-0.4 0.10-0.4 0.95 0.99 1.00 1.00 1.06

1.1-1.5 0.10-0.4 0.30-0.6 0.96 1.00 1.00 1.00 1.04

1.1-1.5 0.30-0.6 0.01-0.3 0.94 0.99 1.00 1.01 1.07

1.1-1.5 0.30-0.6 0.10-0.4 0.96 1.00 1.00 1.00 1.04

1.1-1.5 0.30-0.6 0.30-0.6 0.97 1.00 1.00 1.00 1.03

1.1-3.0 0.01-0.3 0.01-0.3 0.62 0.92 0.98 1.05 1.60

1.1-3.0 0.01-0.3 0.10-0.4 0.75 0.95 0.98 1.02 1.32

1.1-3.0 0.01-0.3 0.3-0.6 0.80 0.97 0.99 1.01 1.20

1.1-3.0 0.10-0.4 0.01-0.3 0.77 0.97 0.99 1.03 1.37

1.1-3.0 0.10-0.4 0.10-0.4 0.85 0.98 1.00 1.01 1.20

1.1-3.0 0.10-0.4 0.30-0.6 0.88 0.98 1.00 1.01 1.14

1.1-3.0 0.30-0.6 0.01-0.3 0.82 0.98 1.00 1.02 1.22

1.1-3.0 0.30-0.6 0.10-0.4 0.87 0.99 1.00 1.01 1.15

1.1-3.0 0.30-0.6 0.30-0.6 0.91 0.99 1.00 1.01 1.11

* risk ratio, genotype frequency and exposure frequency ranges are spaced to be equidistant on the logarithmic scale ** study cohort consists of tree discrete, admixed populations § The strongest negative association between GxE interaction and disease §§ The strongest positive association between GxE interaction and disease

Table 7

Confounding Interaction Risk Ratios CCIR-cc) for Case - Control Design calculated for different values of risk ratio, exposure and

allele frequencies

Parameters* L : rlfor 100 000 simulations of random permutation for all possible combinations of 5 values out of 8**

Risk Ratio Exposure frequencies Genotype frequencies Most extreme negative, min§ 25th percentile 50th percentile 75th percentile Most extreme positive, max§§

1.1-1.5 0.01-0.3 0.01-0.3 0.85 0.97 0.99 1.02 1.17

1.1-1.5 0.01-0.3 0.10-0.4 0.91 0.98 0.99 1.01 1.08

1.1-1.5 0.01-0.3 0.30-0.6 0.94 0.99 1.00 1.01 1.05

1.1-1.5 0.10-0.4 0.01-0.3 0.93 0.99 1.00 1.01 1.09

1.1-1.5 0.10-0.4 0.10-0.4 0.95 0.99 1.00 1.01 1.05

1.1-1.5 0.10-0.4 0.30-0.6 0.97 0.99 1.00 1.00 1.04

1.1-1.5 0.30-0.6 0.01-0.3 0.94 0.99 1.00 1.01 1.06

1.1-1.5 0.30-0.6 0.10-0.4 0.96 0.99 1.00 1.01 1.04

1.1-1.5 0.30-0.6 0.30-0.6 0.97 1.00 1.00 1.00 1.03

1.1-3.0 0.01-0.3 0.01-0.3 0.63 0.91 0.97 1.06 1.68

1.1-3.0 0.01-0.3 0.10-0.4 0.75 0.94 0.98 1.03 1.28

1.1-3.0 0.01-0.3 0.3-0.6 0.81 0.96 0.98 1.02 1.23

1.1-3.0 0.10-0.4 0.01-0.3 0.77 0.96 1.00 1.04 1.30

1.1-3.0 0.10-0.4 0.10-0.4 0.85 0.97 1.00 1.02 1.17

1.1-3.0 0.10-0.4 0.30-0.6 0.89 0.98 1.00 1.02 1.13

1.1-3.0 0.30-0.6 0.01-0.3 0.83 0.97 1.00 1.03 1.24

1.1-3.0 0.30-0.6 0.10-0.4 0.90 0.98 1.00 1.02 1.10

1.1-3.0 0.30-0.6 0.30-0.6 0.91 0.99 1.00 1.01 1.08

* risk ratio, genotype frequency and exposure frequency ranges are spaced to be equidistant on the logarithmic scale ** study cohort consists of five discrete, admixed populations § The strongest negative association between GxE interaction and disease §§ The strongest positive association between GxE interaction and disease

Table 8

Confounding Interaction Risk Ratios for Case - Only Design calculated for different values of risk ratio, exposure and

allele frequencies

Parameters* ■- * r* ^for 100 000 simulations of random permutation for all possible combinations of 3 values out of 8**

Risk Ratio Exposure frequencies Genotype frequencies Most extreme negative, min§ 25 th percentile 50th percentile 75 th percentile Most extreme positive, max§§

1.1-1.5 0.01-0.3 0.01-0.3 0.12 0.62 0.91 1.41 4.36

1.1-1.5 0.01-0.3 0.10-0.4 0.32 0.79 0.97 1.21 2.78

1.1-1.5 0.01-0.3 0.30-0.6 0.46 0.86 0.98 1.16 2.20

1.1-1.5 0.10-0.4 0.01-0.3 0.32 0.79 0.97 1.21 2.80

1.1-1.5 0.10-0.4 0.10-0.4 0.55 0.89 0.99 1.12 1.89

1.1-1.5 0.10-0.4 0.30-0.6 0.65 0.92 0.99 1.08 1.58

1.1-1.5 0.30-0.6 0.01-0.3 0.46 0.86 0.99 1.16 2.21

1.1-1.5 0.30-0.6 0.10-0.4 0.64 0.92 1.00 1.08 1.59

1.1-1.5 0.30-0.6 0.30-0.6 0.74 0.94 1.00 1.06 1.39

1.1-3.0 0.01-0.3 0.01-0.3 0.12 0.62 0.90 1.37 6.31

1.1-3.0 0.01-0.3 0.10-0.4 0.32 0.79 0.96 1.20 3.05

1.1-3.0 0.01-0.3 0.3-0.6 0.44 0.86 0.98 1.14 2.22

1.1-3.0 0.10-0.4 0.01-0.3 0.30 0.80 0.97 1.20 3.16

1.1-3.0 0.10-0.4 0.10-0.4 0.55 0.89 0.99 1.11 1.88

1.1-3.0 0.10-0.4 0.30-0.6 0.64 0.92 0.99 1.08 1.62

1.1-3.0 0.30-0.6 0.01-0.3 0.49 0.86 0.99 1.15 2.29

1.1-3.0 0.30-0.6 0.10-0.4 0.64 0.92 0.99 1.08 1.60

1.1-3.0 0.30-0.6 0.30-0.6 0.73 0.94 1.00 1.06 1.40

* risk ratio, genotype frequency and exposure frequency ranges are spaced to be equidistant on the logarithmic scale ** study cohort consists of tree discrete, admixed populations § The strongest negative association between GxE interaction and disease §§ The strongest positive association between GxE interaction and disease

Table 9

Confounding Interaction Risk Ratios v CO) for Case - Only Design calculated for different values of risk ratio, exposure and

allele frequencies

Parameters* 'Ci r* .' for 100 000 simulations of random permutation for all possible combinations of 5 values out of 8**

Risk Ratio Exposure frequencies Genotype frequencies Most extreme negative, min§ 25 th percentile 50th percentile 75 th percentile Most extreme positive, max§§

1.1-1.5 0.01-0.3 0.01-0.3 0.18 0.61 0.90 1.36 4.60

1.1-1.5 0.01-0.3 0.10-0.4 0.42 0.79 0.97 1.21 2.36

1.1-1.5 0.01-0.3 0.30-0.6 0.55 0.86 0.98 1.15 1.86

1.1-1.5 0.10-0.4 0.01-0.3 0.42 0.80 0.97 1.21 2.41

1.1-1.5 0.10-0.4 0.10-0.4 0.63 0.89 0.99 1.11 1.61

1.1-1.5 0.10-0.4 0.30-0.6 0.72 0.92 1.00 1.08 1.41

1.1-1.5 0.30-0.6 0.01-0.3 0.56 0.86 0.99 1.16 1.88

1.1-1.5 0.30-0.6 0.10-0.4 0.72 0.93 1.00 1.08 1.41

1.1-1.5 0.30-0.6 0.30-0.6 0.80 0.95 1.00 1.05 1.27

1.1-3.0 0.01-0.3 0.01-0.3 0.16 0.61 0.89 1.34 5.13

1.1-3.0 0.01-0.3 0.10-0.4 0.41 0.79 0.96 1.19 2.50

1.1-3.0 0.01-0.3 0.3-0.6 0.54 0.85 0.98 1.14 1.96

1.1-3.0 0.10-0.4 0.01-0.3 0.40 0.80 0.97 1.20 2.35

1.1-3.0 0.10-0.4 0.10-0.4 0.62 0.89 0.99 1.11 1.64

1.1-3.0 0.10-0.4 0.30-0.6 0.71 0.92 0.99 1.08 1.45

1.1-3.0 0.30-0.6 0.01-0.3 0.55 0.86 0.99 1.15 1.94

1.1-3.0 0.30-0.6 0.10-0.4 0.73 0.93 1.00 1.08 1.45

1.1-3.0 0.30-0.6 0.30-0.6 0.78 0.95 1.00 1.05 1.29

* risk ratio, genotype frequency and exposure frequency ranges are spaced to be equidistant on the logarithmic scale ** study cohort consists of five discrete, admixed populations § The strongest negative association between GxE interaction and disease §§ The strongest positive association between GxE interaction and disease

REFERENCES

1. Liang-Yi Wang and Wen-Chung Lee. Population stratification bias in the case-only study for gene-environment interactions. American Journal of Epidemiology, 168(2):197-201, 2008.

2. Juan Pablo Lewinger, David V. Conti, James W. Baurley, Timothy J. Triche, and Duncan C.

3. Thomas. Hierarchical bayes prioritization of marker associations from a genome-wide association scan for further investigation. Genetic Epidemiology, 31(8):871-882, 2007.

4. Walter W. Piegorsch, Clarice R. Weinberg, and Jack A. Taylor. Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case-control studies. Statistics in Medicine, 13(2):153-162, 1994.

5. Cassandra E. Murcray, Juan Pablo Lewinger, and W. James Gauderman. Gene-environment interaction in genome-wide association studies. American Journal of Epidemiology, 169(2):219-226, 2009.

6. Bhramar Mukherjee and Nilanjan Chatterjee. Exploiting gene-environment independence for analysis of case-control studies: An empirical bayes-type shrinkage estimator to trade-off between bias and eficiency. Biometrics, 64(3):685-694, 2008.

7. Sholom Wacholder, Nathaniel Rothman, and Neil Caporaso. Population stratification in epidemiologic studies of common genetic variants and cancer: Quantification of bias. Journal of the National Cancer Institute, 92(14):1151-1158, 2000.

8. Yiting Wang, Russell Localio, and Timothy R. Rebbeck. Evaluating bias due to population

9. stratification in case-control association studies of admixed populations. Genetic Epidemiology, 27(1):14-20, 2004.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

10. Yiting Wang, Russell Localio, and Timothy R. Rebbeck. Evaluating bias due to population stratification in epidemiologic studies of gene-gene or gene-environment interactions. Cancer Epidemiology Biomarkers & Prevention, 15(1):124-132, 2006.

11. OLLI S. MIETTINEN. Components of the crude risk ratio. American Journal of Epidemiology, 96(2): 168-172, 1972.

12. Wen-Chung Lee and Liang-Yi Wang. Simple formulas for gauging the potential impacts of population stratification bias. American Journal of Epidemiology, 167(1):86-89, 2008.

13. Qizhai Li and Kai Yu. Improved correction for population stratification in genome-wide association studies by identifying hidden population structures. Genetic Epidemiology, 32(3):215-226, 2008.

14. B Devlin and K Roeder. Genomic control for association studies. Biometrics, 55:997-1004, 1999.

15. Kent E. Holsinger and Bruce S. Weir. Genetics in geographically structured populations: defining, estimating and interpreting fst. Nat Rev Genet, 10(9):639-650, September 2009.

16. Sholom Wacholder, Nathaniel Rothman, and Neil Caporaso. Counterpoint: Bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer. Cancer Epidemiology Biomarkers & Prevention, 11 (6):513-520, 2002.

17. Wen-Chung Lee and Liang-Yi Wang. Reducing population strati_cation bias: stratum matching is better than exposure. Journal of Clinical Epidemiology, 62(1):62-66, 2009.

18. Robert M Plenge Michael E Weinblatt Nancy A Shadick & David Reich Alkes L. Price, Nick J Patterson. Principal components analysis corrects for stratification in genome-wide association studies. Nature genetics, 38:904-909, 2006.

Поступила в редакцию 28.02.2012 г.

i Надоели баннеры? Вы всегда можете отключить рекламу.