Elizabeth Atkinson
egatkinson.bsky.social
Elizabeth Atkinson
@egatkinson.bsky.social
Population and statistical genomicist working to make genomics fully representative. Views are my own. (she/her)
Thanks to all of our SMaHT colleagues and especially to @sedlazeck.bsky.social who led the hackathon which spawned the prototype of this pipeline!
December 9, 2025 at 6:06 PM
MosaicSim offers a realistic, scalable approach for assessing detection limits, with immediate applications to large sequencing efforts including those within the SMaHT Network, which was the springboard for this work.
December 9, 2025 at 6:06 PM
A key (surprising) result was that ultra-high coverage (300×–450×) yields diminishing returns for mosaic variant detection. In many settings, 150× coverage performs comparably or better, highlighting opportunities for cost-effective study design.
December 9, 2025 at 6:06 PM
Using MosaicSim, we benchmarked DRAGEN and found strong VAF- and depth-dependent performance limits. Sensitivity decreases sharply at low VAF, especially in complex genomic regions.
December 9, 2025 at 6:06 PM
Detecting mosaic variants is challenging due to low VAFs and real sequencing noise. MosaicSim layers user-defined variants directly onto empirical WGS data, preserving true read-level properties while providing a controlled ground-truth set for benchmarking.
December 9, 2025 at 6:06 PM
So since we only include >0.1% MAF variants in this article we can't address ultrarare, but check out Supp Fig 3; when comparing ancestry-specific AFs many variants deviate from the 1:1 line. We plotted this on the log₁₀(AF) scale to help magnify the low-frequency range.
October 10, 2025 at 3:32 PM
To limit the noise from ultra-rare alleles we only looked at variants ≥0.1% MAF. Totally appreciate that's still quite low frequency, but even with that filter, we still saw the noted ancestry-specific frequency differences.
October 10, 2025 at 3:01 PM
Great point; we thought about that too! Pragati stratified by whether variants were monomorphic or not to capture at least that aspect, but you’re right that the impact depends on where a variant sits on the SFS. Rare ones can show big fold-changes but small absolute shifts.
October 10, 2025 at 2:58 PM
Thanks for the interest! The tutorial code is available to download as supplemental information of the paper, and has been deposited as a community workspace in the All of Us Researcher Workbench.
July 23, 2025 at 3:05 PM
In summary, we present a replicable training model that empowers early-career researchers - including and especially those new to computational genomics - to responsibly leverage large-scale biobank data into their research programs and teaching.
July 22, 2025 at 4:36 PM
From years 1–3, training outcomes reported by scholars to stem directly from this training included:
📊 17 conference presentations
🔬 Multiple funded research grants
🎓 Numerous genomics modules added in undergrad courses
🤝 Sustained collaborations across institutions
July 22, 2025 at 4:36 PM
During the summit, scholars used real short-read WGS data to:
• Prepare phenotypes & covariates
• Run GWAS via Hail
• Visualize results with PCA, Manhattan & QQ plots
• Manage compute costs
All in ~4 hours with no prior coding required.
July 22, 2025 at 4:36 PM
Our training was part of the All of Us Biomedical Researcher Scholars Program through @bcmgenetics.bsky.social focused on mentoring early-stage faculty in genomic data science. The curriculum launches with an intensive Faculty Summit, where scholars get hands-on experience working with genomic data.
July 22, 2025 at 4:36 PM
Access to big genomic data is growing, but parallel access to skills needed to use it hasn’t kept up.
We created an accessible, cloud-based genomic analysis training bootcamp using real All of Us data, Jupyter notebooks, and the Hail framework to lower the barrier for early-career researchers.
July 22, 2025 at 4:36 PM
Tractor-Mix builds on Tractor’s strengths to detect ancestry-enriched signals while adding power and robust false-positive control for relatedness via a GRM. By modeling both admixture and relatedness, it overcomes key GWAS barriers and enables more accurate, representative genomic discovery.
June 9, 2025 at 6:31 PM
Tractor-Mix uses ancestry-specific genotypes as predictors, outputting ancestry-specific effect sizes and P values. We benchmark our new tool in simulations and apply it to multiple admixed cohorts (including UKBiobank and Mexico City Prospective Study), uncovering signals missed by standard GWAS.
June 9, 2025 at 6:31 PM
In this work, we introduce Tractor-Mix, a new GWAS method that extends Tractor to handle related admixed samples. It combines a mixed model framework (like GMMAT) with local ancestry-aware genotypes (like Tractor) in a 2 d.o.f. test.
June 9, 2025 at 6:31 PM
As biobanks and global cohorts grow, so does the inclusion of admixed individuals with close or cryptic relatedness. This introduces the statistical challenge of two interwoven sources of stratification: admixture and relatedness, which are rarely handled together.
June 9, 2025 at 6:31 PM