About iPGS browser

The inclusive polygenic score browser is a resource of pre-trained polygenic score models and their predictive performance evaluation. We currently host over 50 polygenic score model weights trained on individual-level data from UK Biobank. The results are provided on an "AS-IS" basis, without warranty of any kind. Use of the website and its content is at the user's sole risk.

We request that any use of data from the browser, including the inclusive polygenic score weights, cite the following manuscript.


Frequently Asked Questions

inclusive Polygenic Scores (iPGS)

What are inclusive Polygenic Scores (iPGS)?

Inclusive Polygenic Scores (iPGS) is a PGS training strategy to train PGS models with improved transferability across ancestry groups. We fit a penalized regression model directly on the individual-level data across ancestry-diverse individuals using the BASIL algorithm implemented in the R snpnet package. In both simulation and application to UK Biobank dataset, we show that our iPGS approach improves transferability compared to a baseline model trained only on individuals of European ancestry.

What was the procedure used in the assignment of population groups?

We focused on N~406,000 unrelated individuals in our study. A detailed description of the sample-level quality control procedure is described in our publication. We used a combination of genotype PCs and self-reported ethnicity to define the following population groups: white British, non-British white, South Asian, African, and the other remaining individuals. Specifically, we applied a Bayesian outlier detection algorithm, aberrant, to the first six genotype principal components (PCs) to define European, South Asian, and African individuals. We further subdivided the Europen set into white British and non-British white. We took the remaining heterogeneous individuals as the other group. Please read our publication for more information.

How does iPGS compare to other multi-ancestry PGS models (e.g. PRS-CSx)?

Our iPGS approach is complementary to other multi-ancestry PGS models. There is an increasing number of multi-ancestry PGS models that take GWAS summary statistics and ancestry-matched LD reference panels from multiple ancestry groups (reviewed, for example, in Kachuri et al. 2023 PMID: 37620596). Those methods are advantageous when (meta-analyzed) GWAS summary statistics from a large number of individuals are readily available. However, admixed individuals are rarely considered in PGS training, given the challenges in representing ancestry-matched linkage-disequilibrium reference panels. Our iPGS approach directly operates on the individual-level data, thus directly applicable to admixed individuals. In our manuscript, we performed a systematic comparison of our iPGS approach and PRS-CSx method, a commonly used summary-statistics-based multi-ancestry PGS model, across 60 UK Biobank traits. When trained on a similar number of individuals, we show that our iPGS model has competitive or improved predictive performance over PRS-CSx. Please read our publication for more information.

Dataset

Where can I download the data?

You may download the coefficients of iPGS models.

What is the file format of the iPGS coefficients file?

We provide the coefficients (BETA, "effect_weight") of the inclusive PGS models as a bgzip-compressed table file. The file has the following columns:

  • rsID: dbSNP accession ID;
  • chr_name and chr_position: the location of the variants on the GRCh37/hg19 reference genome;
  • chr_ID: the unique variant identifier in either format of chr:pos:ref:alt or HLA alleles in the standard notation;
  • effect_allele: the coefficients (BETA, "effect_weight") are computed for the allele specified in the column;
  • other_allele: the non-effect allele at the loci;
  • effect_weight: the coefficients (BETA, "effect_weight") of the effect allele;
  • Consequence: the predicted consequence of genetic variants from Ensembl Variant Effect Predictor (VEP);
  • Gene_symbol: gene symbol annotated by VEP; and
  • Gene_ID: Ensembl's stable gene identifier annotated by VEP.
We prepared the iPGS coefficients files so that it is compatible with the PGS Catalog's formatted files.

How can I use the iPGS coefficients file for my research?

You may compute polygenic scores for each individual using individual-level genetic data and an iPGS coefficients file. You may use plink2's --score command. Yosuke previously wrote a short blog post on the example usage of the plink2 command to compute polygenic scores. Alternatively, you may use the pgsc_calc tool from the PGS catalog. Please cite our manuscript when you use our polygenic score models in your research.


Privacy policy

This website collects some personal data from its users. Specifically, we use Google Analytics, a web analytics service provided by Google LLC ("Google"), to help us understand resource usage. Google Analytics uses cookies to track your interactions with our website. The information generated by the cookies about your use of our website (including your IP address) will be transmitted to and stored by Google.