Maggie Steiner
@maggiesteiner.bsky.social
190 followers 340 following 24 posts
PhD Candidate @ UChicago Human Genetics
Posts Media Videos Starter Packs
Pinned
maggiesteiner.bsky.social
Excited to share a new preprint with @jnovembre.bsky.social ! We use a combination of population genetic theory, simulation, and data analysis to ask: how does study design in genetic studies (including biobanks) impact the discovery of rare, deleterious variants?
Reposted by Maggie Steiner
jnovembre.bsky.social
Excited for our publication on how the geographic scale of a sample affects the discovery of rare, deleterious variants to be out this week. With a mix of theory, simulation, and data analysis, we show when samples are narrow vs broad, the number of variants discovered and their frequencies change
PNAS
Proceedings of the National Academy of Sciences (PNAS), a peer reviewed journal of the National Academy of Sciences (NAS) - an authoritative source of high-impact, original research that broadly spans...
www.pnas.org
Reposted by Maggie Steiner
jeffspence.github.io
What do GWAS and rare variant burden tests discover, and why?

Do these studies find the most IMPORTANT genes? If not, how DO they rank genes?

Here we present a surprising result: these studies actually test for SPECIFICITY! A 🧵on what this means... (🧪🧬)

www.biorxiv.org/content/10.1...
Specificity, length, and luck: How genes are prioritized by rare and common variant association studies
Standard genome-wide association studies (GWAS) and rare variant burden tests are essential tools for identifying trait-relevant genes. Although these methods are conceptually similar, we show by anal...
www.biorxiv.org
maggiesteiner.bsky.social
Hi! Could I please be added? Thanks for setting this up!
maggiesteiner.bsky.social
I just figured out how to use feeds! So, sharing this with #popgen 🧪
maggiesteiner.bsky.social
Excited to share a new preprint with @jnovembre.bsky.social ! We use a combination of population genetic theory, simulation, and data analysis to ask: how does study design in genetic studies (including biobanks) impact the discovery of rare, deleterious variants?
biorxiv-genetic.bsky.social
Study design and the sampling of deleterious rare variants in biobank-scale datasets https://www.biorxiv.org/content/10.1101/2024.12.02.626424v1
maggiesteiner.bsky.social
Thanks to co-lead Dan Rice & co-authors @aabiddanda.bsky.social, Marida Ianni-Ravn, and Chris Porras!
maggiesteiner.bsky.social
Overall - while our theoretical model is no doubt a simplification of the complex dispersal/evolutionary processes seen in natural populations, especially humans - we hope that this work will help improve our interpretation of existing genetic studies and provide guidance for the design of new ones.
maggiesteiner.bsky.social
Our results have implications for several applications of genetic data. Power to detect trait/disease associations (e.g., GWAS) is tied to allele frequency. The SFS is also used for inference of the distribution of fitness effects, which our results suggest may be biased by effects of study design.
maggiesteiner.bsky.social
However, when it comes to avg. allele frequency across all sites (incl. monomorphic ones) these effects can cancel - in our theoretical model we see unchanging avg. allele frequency with sampling design. In human data we see this for fine scale samples (within the UK) but not for broader samples.
maggiesteiner.bsky.social
We find evidence of these effects in re-sampling experiments using the UK Biobank. For example, our broadest re-sample with n=10,000 discovers ~98% more variant LoF sites than our most narrow sample, but allele frequency at those variant sites is on average ~41% lower.
maggiesteiner.bsky.social
Broad samples will sample a greater number of rare, deleterious variants than narrow samples (we call this “discovery”), but each will be sampled at lower average frequency (we call this “dilution”). These effects lead to substantial changes in some summary statistics, especially for large samples.
maggiesteiner.bsky.social
We develop a model for the evolution of carriers of rare deleterious variants, and use it to approximate the site frequency spectrum (SFS, the distribution of allele frequencies) in samples at various scales of geographic breadth. We find several key patterns as samples go from “narrow” to “broad”.
maggiesteiner.bsky.social
We focus on rare, deleterious variants, which are expected to cluster in geographic space. Rare variants are also generally of interest since they tend to have large effects on traits (including disease traits), and can help improve understanding of biological mechanisms.
maggiesteiner.bsky.social
In particular, we are interested in geographic breadth, or how broad a region across which individuals are sampled. This is important to current discourse in human genetics surrounding the Euro-centric bias of genetic datasets, and the launch of new biobanks to improve representation globally.
maggiesteiner.bsky.social
Excited to share a new preprint with @jnovembre.bsky.social ! We use a combination of population genetic theory, simulation, and data analysis to ask: how does study design in genetic studies (including biobanks) impact the discovery of rare, deleterious variants?
maggiesteiner.bsky.social
Thanks to co-lead Dan Rice + co-authors @aabiddanda.bsky.social, Marida Ianni-Ravn, and Chris Porras!
maggiesteiner.bsky.social
Overall - while our theoretical model is no doubt a simplification of the complex dispersal/evolutionary processes seen in natural populations, especially humans - we hope that this work will help improve our interpretation of existing genetic studies and provide guidance for the design of new ones.
maggiesteiner.bsky.social
Our results have implications for several applications of genetic data. Power to detect trait/disease associations (e.g., GWAS) is tied to allele frequency. The SFS is also used for inference of the distribution of fitness effects, which our results suggest may be biased by effects of study design.
maggiesteiner.bsky.social
However, when it comes to avg. allele frequency across all sites (incl. monomorphic ones) these effects can cancel - in our theoretical model we see unchanging avg. allele frequency with sampling design. In human data we see this for fine scale samples (within the UK) but not for broader samples.
maggiesteiner.bsky.social
We find evidence of these effects in re-sampling experiments using the UK Biobank. For example, our broadest re-sample with n=10,000 discovers ~98% more variant LoF sites than our most narrow sample, but allele frequency at those variant sites is on average ~41% lower.
maggiesteiner.bsky.social
Broad samples will sample a greater number of rare, deleterious variants than narrow samples (we call this discovery), but each will be sampled at lower average frequency (we call this dilution). These effects lead to substantial changes in some summary statistics, especially for large samples.
maggiesteiner.bsky.social
We develop a model for the evolution of carriers of rare deleterious variants, and use it to approximate the site frequency spectrum (SFS, the distribution of allele frequencies) in samples at various scales of geographic breadth. We find several key patterns as samples go from “narrow” to “broad”.
maggiesteiner.bsky.social
We focus on rare, deleterious variants, which are expected to cluster in geographic space. Rare variants are also generally of interest since they tend to have large effects on traits (including disease traits), and can help improve understanding of biological mechanisms.