Frequently Asked Questions

inclusive Polygenic Scores (iPGS)

What are inclusive Polygenic Scores (iPGS)?

Inclusive Polygenic Scores (iPGS) is a PGS training strategy to train PGS models with improved transferability across ancestry groups. We fit a penalized regression model directly on the individual-level data across ancestry-diverse individuals using the BASIL algorithm implemented in the R snpnet package. In both simulation and application to UK Biobank dataset, we show that our iPGS approach improves transferability compared to a baseline model trained only on individuals of European ancestry.

What was the procedure used in the assignment of population groups?

We focused on N~406,000 unrelated individuals in our study. A detailed description of the sample-level quality control procedure is described in our publication. We used a combination of genotype PCs and self-reported ethnicity to define the following population groups: white British, non-British white, South Asian, African, and the other remaining individuals. Specifically, we applied a Bayesian outlier detection algorithm, aberrant, to the first six genotype principal components (PCs) to define European, South Asian, and African individuals. We further subdivided the Europen set into white British and non-British white. We took the remaining heterogeneous individuals as the other group. Please read our publication for more information.

How does iPGS compare to other multi-ancestry PGS models (e.g. PRS-CSx)?

Our iPGS approach is complementary to other multi-ancestry PGS models. There is an increasing number of multi-ancestry PGS models that take GWAS summary statistics and ancestry-matched LD reference panels from multiple ancestry groups (reviewed, for example, in Kachuri et al. 2023 PMID: 37620596). Those methods are advantageous when (meta-analyzed) GWAS summary statistics from a large number of individuals are readily available. However, admixed individuals are rarely considered in PGS training, given the challenges in representing ancestry-matched linkage-disequilibrium reference panels. Our iPGS approach directly operates on the individual-level data, thus directly applicable to admixed individuals. In our manuscript, we performed a systematic comparison of our iPGS approach and PRS-CSx method, a commonly used summary-statistics-based multi-ancestry PGS model, across 60 UK Biobank traits. When trained on a similar number of individuals, we show that our iPGS model has competitive or improved predictive performance over PRS-CSx. Please read our publication for more information.

Dataset

Where can I download the data?

You may download the coefficients of iPGS models.

60 iPGS models described in Tanigawa and Kellis. Am J Hum Genet (2023).
- We analyzed 60 anthropometric and hematological traits in UK Biobank. You may select the trait of interest from the list of browsable traits and download the coefficients of the iPGS models using the "download" button on each page.
- The coefficients of the PGS models analyzed in the study are available as Supplementary Data Files at figshare (doi: 10.6084/m9.figshare.22905368), suitable for bulk downloading of the iPGS models across multiple traits.
- The iPGS models will also be available in the PGS catalog (PGP000502). Score IDs are listed in the S1 Table in Y. Tanigawa and M. Kellis. Am J Hum Genet (2023).

What is the file format of the iPGS coefficients file?

We provide the coefficients (BETA, "effect_weight") of the inclusive PGS models as a bgzip-compressed table file. The file has the following columns:

rsID: dbSNP accession ID;
chr_name and chr_position: the location of the variants on the GRCh37/hg19 reference genome;
chr_ID: the unique variant identifier in either format of chr:pos:ref:alt or HLA alleles in the standard notation;
effect_allele: the coefficients (BETA, "effect_weight") are computed for the allele specified in the column;
other_allele: the non-effect allele at the loci;
effect_weight: the coefficients (BETA, "effect_weight") of the effect allele;
Consequence: the predicted consequence of genetic variants from Ensembl Variant Effect Predictor (VEP);
Gene_symbol: gene symbol annotated by VEP; and
Gene_ID: Ensembl's stable gene identifier annotated by VEP.

We prepared the iPGS coefficients files so that it is compatible with the PGS Catalog's formatted files.

How can I use the iPGS coefficients file for my research?

You may compute polygenic scores for each individual using individual-level genetic data and an iPGS coefficients file. You may use plink2's --score command. Yosuke previously wrote a short blog post on the example usage of the plink2 command to compute polygenic scores. Alternatively, you may use the pgsc_calc tool from the PGS catalog. Please cite our manuscript when you use our polygenic score models in your research.

About iPGS browser