popgen-evomics

Population (pan)genomics

Welcome to the practical exercises section of the population genomics section of the evomics2025 course. This page is designed to guide you through the basics and provide instructions on how to effectively run the tutorial. You’ll find step-by-step guidance to familiarize yourself with the concepts and tools used in the exercises.

The HPRC data set

For this tutorial we will use data produced by the Human Pangenome Reference Consortium

The Human Pangenome Reference Consortium (HPRC) is a project funded by the National Human Genome Research Institute to sequence and assemble genomes from individuals from diverse populations in order to better represent genomic landscape of diverse human populations.

The data required for our exercises can be downloaded from the Human Pangenome Reference Consortium (HPRC) Year 1 Data Freeze, accessible here. Additionally, a broader range of data and software developed by the HPRC is available here, offering extensive resources for your research and learning needs.

hprcpop

The HPRC dataset includes samples from diverse geographical origins. The selection of 47 samples was guided by various criteria: trio status in the Coriell biobank influencing the inclusion of Europeans, low cell line passage count affecting European samples, a focus on genetic diversity leading to a larger representation of Africans, and considerations of genetic drift impacting the selection of Asian and American samples.

For this tutorial, we will specifically focus on two continental populations, grouping individuals from Africa (AFR) and the Americas (AMR), to illustrate key concepts.

Tutorial sections

Section Content link
0 Getting started link
1 Allele frequency, site frequency spectrum link
2 Fst and pangenome graph inspection link
3 Principal component analysis link

Software we will use for this tutorial

plink https://www.cog-genomics.org/plink/2.0/

vcftools https://vcftools.github.io/documentation.html

bcftools https://samtools.github.io/bcftools/bcftools.html

odgi https://pangenome.github.io/odgi.github.io/

bandage https://rrwick.github.io/Bandage/

R During the exercise, calculations are typically performed, and to visualize the results, you may use your preferred plotting tools or the R code available in the rscripts folder. These R scripts can be executed directly from the command line. Alternatively, if you prefer a more interactive approach, you can upload your data into RStudio. There, you have the option to either write your own code or modify and reuse the code provided in the scripts

Authors

Enza Colonna, Miloš Duchoslav, Erik Garrison, Silvia Buonaiuto, chatGTP, Jose Costa, Madee Chase