SMT
@pp0196.bsky.social
420 followers 670 following 340 posts
Sequences and consequences. Credit Pic : Cellular landscape cross-section through a eukaryotic cell, by Evan Ingersoll
Posts Media Videos Starter Packs
Reposted by SMT
egatkinson.bsky.social
Thrilled to share our new @natcomms.nature.com paper on local ancestry informed allele frequencies in gnomAD, which are live now on the browser! Check out my stellar PhD student @pragskore.bsky.social’s Bluetorial on how this brings finer detail to variant interpretation 🧬🖥️
pragskore.bsky.social
📃 We’re excited to share our latest work, now published in Nature Communications — a major update to the Genome Aggregation Database (gnomAD) that improves allele frequency resolution for two gnomAD-defined genetic ancestry groups using local ancestry inference (LAI).
Improved allele frequencies in gnomAD through local ancestry inference - Nature Communications
This study incorporates local ancestry into the Genome Aggregation Database (gnomAD) to improve allele frequency estimates for admixed populations, enhancing variant interpretation and enabling more accurate and equitable genomic research and clinical care.
www.nature.com
pp0196.bsky.social
The claim seems widely overstated because in the end it is about selecting the partitioning in order to push as much computation into storage layer as possible. Selecting the partitioning is mostly about access pattern just like db indexes
Reposted by SMT
bioconductor.bsky.social
🌍 Applications now open for our first Bioconductor course in West Africa!

📅 17–21 Nov 2025 | 📍 Abomey-Calavi, Benin
Free, in-person training on R, RStudio & RNA-seq workflows.

Apply by 15 Oct 👉 forms.gle/d32F6xJJbsFa...

More info 🔗 training.bioconductor.org/workshops/20...

#Bioconductor #RStats
Front view of the GBioS building at the University of Abomey-Calavi, Benin, surrounded by palm trees with a sign reading “Genetics, Biotechnology and Seed Science Unit.” Banner text above reads “Apply Now for the Bioconductor Benin 2025 Course!
pp0196.bsky.social
Wondered what the packageName-win.def in #extendr based #Rstats were about as a non windows user, while making a R binding to some go routines in a package, now i know !
Savvy issue : github.com/yutannihilat...
goserveR def : github.com/sounkou-bioi... (if some windows users could test the package)
github.com
pp0196.bsky.social
Planning to incorporate this into some plumber2 or Amborix api
At least it works for mtcars :D
pp0196.bsky.social
plumber2 supports redirects github.com/search?q=rep... and one can set cookies, so probably write the data on persistent storage and provide (if needed) signed urls when the persistent data is requested
tested out the duckb-wasm with vanilla js here github.com/sounkou-bioi... without the plumber bit
Build software better, together
GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.
github.com
pp0196.bsky.social
#plumber2 might give #ambiorix a run for their money especially with the serializer registration and its flexibility (e.g serializing some computation in R into parquet and push the remaining computations on the client with duckdb-wasm) plumber2.posit.co/articles/ren...
#RStats #WebDev
Rendering Output
plumber2.posit.co
pp0196.bsky.social
Give us swagger generators @john-coene.com @mwavu.com, so I can make my full stack R/Ambiorix mvp while getting non-R frontend devs busy with the all docs of the ambiorix endpoints I am consuming
pp0196.bsky.social
- don't want/cannot pay motherduck for now

What's best :
Just add tables, store in parquet duckdb files and try to do most stuff on the client side, use Postgress ?
pp0196.bsky.social
#Rstats #duckdb hivemind
Best strategy for this situation :
- very large number of indivual files with same structure (millions of rows)
- concurent individual file based (client side) queries based on attributes stored in different tables/files (search on attributes tables then joins basically)
Reposted by SMT
zeileis.org
📢 PSA: The R-Forge server is under attack from hackers and hence web access (e.g., for package installation) is currently down. #rstats

The team at WU Wien is working on it. I'll report here when it is back up again.
Reposted by SMT
gddiwan.bsky.social
Pleased to announce that our #preprint on the evolutionary history of gene functions is now online at #bioRxiv! We overlayed functional annotations on the evolutionary history of ~4.5M genes from 508 species across the tree of life and found some very cool stuff!

tinyurl.com/FuncEvol

🧵 1/10
Number of genes, domains and pathways gained at every node of a cladogram of 508 species
pp0196.bsky.social
#Rstats #Statgen #CNV
neumann-lab.bsky.social
**** Cumulative Copy Number Variation Analyses based on DNA methylation data ****

We present our new tool CCNV - now out in BMC Bioinformatics!

link.springer.com/article/10.1...
CCNV: a user-friendly R package enabling large-scale cumulative copy number variation analyses of DNA methylation data - BMC Bioinformatics
Background Copy number variation (CNV) analyses—often inferred from DNA-methylation data—depict alterations of DNA quantities across chromosomes and have improved tumour diagnostics and classification. For the analyses of larger case series, CNV-features of multiple samples have to be combined to reliably interpret tumour-type characteristics. Established workflows mainly focus on the analyses of singular samples and do not support scalability to high sample numbers. Additionally, only plots showing the frequency of the aberrations have been considered. Results We present the Cumulative CNV (CCNV) R package, which combines established segmentation methods and a newly implemented algorithm for thorough and fast CNV analysis at unprecedented accessibility. Our work is the first to supplement well-interpretable CNV frequency plots with their respective intensity plots, as well as showcasing the first application of penalised least-squares regression to DNA methylation data. CCNV exceeded existing tools concerning computing time and displayed high accuracy for all available array types on simulated and real-world data, verified by our newly developed benchmarking method. Conclusions CCNV is a user-friendly R package, which enables fast and accurate generation and analyses of cumulative copy number variation plots.
link.springer.com
Reposted by SMT
danielkaschta.bsky.social
✨ Delighted to share our open-access paper in Genome Medicine is finally fully published:
genomemedicine.biomedcentral.com/articles/10....
Evaluating genome sequencing strategies: trio, singleton, and standard testing in rare disease diagnosis - Genome Medicine
Background Short-read genome sequencing (GS) is among the most comprehensive genetic testing methods available, capable of detecting single-nucleotide variants, copy-number variants, mitochondrial variants, repeat expansions, and structural variants in a single assay. Despite its technical advantages, the full clinical utility of GS in real-world diagnostic settings remains to be fully established. Methods This study systematically compared singleton GS (sGS), trio GS (tGS), and exome sequencing-based standard-of-care (SoC) genetic testing in 416 patients with rare diseases in a blinded, prospective study. Three independent teams with divergent baseline expertise evaluated the diagnostic yield of GS as a unifying first-tier test and directly compared its variant detection capabilities, learning curve, and clinical feasibility. The SoC team had extensive prior experience in exome-based diagnostics, while the sGS and tGS teams were newly trained in GS interpretation. Diagnostic yield was assessed through both prospective and retrospective analyses. Results In our prospective analysis, tGS achieved the highest diagnostic yield for likely pathogenic/pathogenic variants at 36.1% in the newly trained team, surpassing the experienced SoC team at 35.1% and the newly trained sGS team at 28.8%. To evaluate which variants could technically be identified and account for differences in team experience, we conducted a retrospective analysis, achieving diagnostic yields of 36.7% for SoC, 39.1% for sGS, and 40.0% for tGS. The superior yield of GS was attributed to its ability to detect deep intronic, non-coding, and small copy-number variants missed by SoC. Notably, tGS identified three de novo variants classified as likely pathogenic based on recent GeneMatcher collaborations and newly published gene-disease association studies. Conclusions Our findings demonstrate that GS, particularly tGS, outperforms SoC in diagnosing rare diseases, with sGS providing a more cost-effective alternative. These results suggest that GS should be considered a first-tier genetic test, offering an efficient, single-step approach to reduce the diagnostic odyssey for patients with rare diseases. The trio approach proved especially valuable for less experienced teams, as inheritance data facilitated variant interpretation and maintained high diagnostic yield, while experienced teams achieved comparable results with singleton analysis alone.
genomemedicine.biomedcentral.com
Reposted by SMT
narjournal.bsky.social
🧬 New in NARGAB: authors compared human exome sequencing kits, showing how design differences impact capture efficiency and downstream analysis. Useful insights for choosing the right kit!

📖 Read: doi.org/10.1093/narg...

#ExomeSequencing #Genomics #Bioinformatics #NARGAB
pp0196.bsky.social
by adding some boilerplate to avoid nested columns, writing is no go (or maybe it is possible with `raw` blobs?)
pp0196.bsky.social
really don t like that in order to read/write arbitrary nested parquet types in #RStats (as of now and to the best of my knowledge) you need to have the huge ( in the sense of compile time) duckdb or arrow dependencies. With the very good {nanoparquet}, you can avoid reader errors...
Reposted by SMT
coatless.bsky.social
R that travels light on #Linux: Portable R AppImages.

Now working everywhere: your Ubuntu, friend's Fedora, cousin's Arch setup (btw)

No sudo, no tears, just base R science ✨

(package support coming soon!)

#RStats #AppImage #DataScience
Terminal session showing the download and execution of an R AppImage from GitHub releases. The commands show wget downloading the ARM64 AppImage built on Ubuntu 24.04, making it executable with chmod +x, and then launching it to run R statistical analysis including a linear regression model with income vs age data, demonstrating the workflow from GitHub release to running statistical computations. VS Code showing the successful completion of an R AppImage build process. The terminal displays the build summary indicating a 76MB minimal R 4.5.1 AppImage for ARM64 architecture with package installation disabled (immutable). The output shows usage examples and confirms the build completed successfully, with R actually running at the bottom showing the standard R startup message and executing system environment commands, demonstrating the fully functional portable R installation. VS Code terminal showing R 4.5.1 running from an AppImage on ARM64 architecture. The file explorer shows the typical AppImage directory structure with usr/bin, usr/lib, and usr/share folders, demonstrating the self-contained portable nature of the R installation.
Reposted by SMT
jsantoyo.bsky.social
Augmenting cost-effectiveness in clinical diagnosis using extended whole-exome sequencing: SNVs, SVs, and beyond. #WES #SNVs #SVs #Genomics #Bioinformatics #JournalOfHumanGenetics 🧬 🖥️
www.nature.com/articles/s10...