Power of inclusion: Enhancing polygenic prediction with admixed individuals

Y. Tanigawa and M. Kellis. Am J Hum Genet. (2023).

Admixed individuals offer unique opportunities to address limited transferability in polygenic scores (PGS), given the substantial trans-ancestry genetic correlation in many complex traits. However, they are rarely considered in PGS training, given the challenges in representing ancestry-matched linkage-disequilibrium reference panels for admixed individuals. Here we present inclusive PGS (iPGS), which captures ancestry-shared genetic effects by finding the exact solution for penalized regression on individual-level data, thus naturally applicable to admixed individuals. We validate our approach in a simulation study across 33 configurations with varying heritability, polygenicity, and ancestry composition in the training set. Applying iPGS to n=237,055 ancestry-diverse individuals in UK Biobank, iPGS shows the greatest improvements in Africans by 48.9% on average across 60 quantitative traits and up to 50-fold for some traits (neutrophil count, R2=0.058) over the baseline model trained on the same number of European individuals. When allowing iPGS to use n=284,661 individuals, we observe an average improvement of 60.8% for African, 11.6% for South Asian, 7.3% for non-British white, 4.8% for white British, and 17.8% for the other individuals. We further develop iPGS+refit to jointly model the ancestry-shared and -dependent genetic effects when heterogeneous genetic associations are present. For neutrophil count, for example, iPGS+refit shows the highest predictive performance in the African (R2=0.115), which exceeds the best predictive performance for white British (R2=0.090 in the iPGS model), even though only 1.49% of individuals used in the iPGS training are of African ancestry. Our results indicate the power of diverse individuals to develop more equitable PGS models.

Inclusive PGS (iPGS) training with diverse ancestry enhances the transferability of polygenic scores in the UK Biobank


(A) Principal-component projection of the unrelated individuals in the UK Biobank and population assignment. (B) Relative average improvements of PGS model performance against the baseline model trained only with White British individuals. Error bars represent 95% confidence intervals of average improvements.

Browseable phenotypes

Here, we display available inclusive PGS models in UK Biobank. You can use the sorting and filtering functions. For example, you may enter ">30000" in the '# variants' column to select iPGS models with more than 30,000 genetic variants.

Trait category Trait # variants Heritability
Trait category Trait # variants Heritability
AnthropometryWaist circumference314060.178
AnthropometryHip circumference327560.189
AnthropometrySitting height445430.311
AnthropometryBody fat %343740.197
AnthropometryWhole body fat mass335550.200
AnthropometryWhole body fat-free mass427860.269
AnthropometryWhole body water mass431660.268
AnthropometryBasal metabolic rate467520.254
AnthropometryImpd. of whole body422650.237
AnthropometryImpd. of leg R382180.222
AnthropometryImpd. of leg L360630.222
AnthropometryImpd. of arm R354550.207
AnthropometryImpd. of arm L360520.211
AnthropometryLeg fat % R327580.197
AnthropometryLeg fat mass R307800.185
AnthropometryLeg fat-free mass R384930.237
AnthropometryLeg fat % L332930.197
AnthropometryLeg fat mass L307940.185
AnthropometryLeg fat-free mass L387640.235
AnthropometryArm fat % R324480.187
AnthropometryArm fat mass R297900.178
AnthropometryArm fat-free mass R386500.235
AnthropometryArm fat % L324500.191
AnthropometryArm fat mass L293920.176
AnthropometryArm fat-free mass L392760.234
AnthropometryTrunk fat %327700.187
AnthropometryTrunk fat mass343570.203
AnthropometryTrunk fat-free mass424680.275
Blood assaysLeukocyte count178900.117
Blood assaysErythrocyte count272930.190
Blood assaysHemoglobin conc.210780.146
Blood assaysHematocrit %201060.140
Blood assaysMean corpuscular vol.218180.178
Blood assaysMean corpuscular hemoglobin171270.149
Blood assaysMean corpuscular hemoglobin conc.44680.044
Blood assaysErythrocyte dist. width125570.121
Blood assaysPlatelet count329440.225
Blood assaysPlatelet crit270340.187
Blood assaysMean platelet vol.310320.236
Blood assaysPlatelet dist. width218990.171
Blood assaysLymphocyte count72910.052
Blood assaysMonocyte count134150.098
Blood assaysNeutrophil count186120.131
Blood assaysEosinophil count168590.139
Blood assaysBasophil count41840.037
Blood assaysLymphocyte %208040.142
Blood assaysMonocyte %107170.095
Blood assaysNeutrophil %169310.130
Blood assaysEosinophil %172270.143
Blood assaysBasophil %34720.034
Blood assaysReticulocyte %78840.062
Blood assaysReticulocyte count95580.072
Blood assaysMean reticulocyte vol.188320.156
Blood assaysMean sphered cell vol.215580.166
Blood assaysImmature reticulocyte frac.163070.124
Blood assaysHigh light scatter reticulocyte %116410.068
Blood assaysHigh light scatter reticulocyte count195650.154

Predictive performance

You can also browse the predictive performance on the held-out test set in UK Biobank.

Systematic predictive performance evaluation of inclusive PGS (iPGS) models and PRS-CSx across 60 anthropometric and hematological traits in the UK Biobank


(A) The predictive performance (R2) in White British (WB), South Asian (SA), and African (Afr) groups in the UK Biobank are shown for four select models: (i) WB-only, (ii) inclusive, (iii) inclusive-FixN, and (vii) PRS-CSx. (B) The number of approximately LD-independent (R2 < 0.2 in the African population in the UK Biobank) variants with heterogeneous GWAS associations. (C–G). The predictive performance of up to eight PGS models in White British (WB) and African (Afr) populations in the UK Biobank are shown for five select traits. The refit models are trained only for the neutrophil and leukocyte counts, where genetic variants with heterogeneous GWAS effects were observed. The predictive performance for other models and ancestry groups is shown in Figures S4 and S5. BMI: body mass index. Vol.: volume. Dist.: distribution. Impd.: impedance. Frac.: fraction. Conc.: concentration. %: percentage. R: right. L: left. Error bars represent 95% confidence intervals.

Data download