David Steinberg
@david4096.bsky.social
210 followers 1.4K following 48 posts
I make stuff to help scientists focus on their research david4096.github.io
Posts Media Videos Starter Packs
david4096.bsky.social
Catch you at the next one I hope ;)
Reposted by David Steinberg
ga4gh.org
Announcing GIF Project: Cloud-based BRCA Exchange variant analysis environment using GA4GH standards in Camber. The project aims to adapt and extend community-driven standards to support interoperable workflows, variant annotation, and metadata description. Learn more: www.ga4gh.org/what-we-do/g...
Cloud-Based BRCA Exchange Variant Analysis Environment Using GA4GH Standards in Camber
By integrating BRCA Exchange variant data with GA4GH standards, this GA4GH Implementation Forum (GIF) project creates open, platform-agnostic workflows and tools that can be used by anyone for scalabl...
www.ga4gh.org
david4096.bsky.social
Collected together @ga4gh.org Bluesky accounts here, lmk if you want to be added! go.bsky.app/8BDDMqM
david4096.bsky.social
Calling all @ga4gh.org Connect 2025 attendees online and in-person, let's connect here on bluesky! #ga4ghconnect2025 #ga4gh #bioinformatics #genomics
david4096.bsky.social
At the @mlcommons.org Croissant community meeting with, you guessed it
david4096.bsky.social
Another important direction is making immersive visual experiences that make data models accessible in a visual and humane way. I hope to experience this in person at a museum github.com/dbcls/dive
GitHub - dbcls/dive: Data Integration Visual Exploration (DIVE)
Data Integration Visual Exploration (DIVE). Contribute to dbcls/dive development by creating an account on GitHub.
github.com
david4096.bsky.social
Toshiyaki Katayama, original author of the wildly popular KEGG database rounding out the keynotes @swat4hcls.bsky.social by showing us the past, present, and future of linked data in the life sciences — lots of excitement for the possibilities of #graphgenome!!
david4096.bsky.social
Starter pack for #swat4hcls2025 conference go.bsky.app/PiZd2qR 🗣️ @swat4hcls.bsky.social
david4096.bsky.social
Embedding knowledge graphs in order to compare ontologies using learned features from Shervin Mehryar’s keynote
david4096.bsky.social
From Prof Anna Fensel’s keynote a roundup of some of the connections between AI and semantic
david4096.bsky.social
One of the common themes of the conversations at #swat4hcls so far is that knowledge graphs are proving to be critical for reliability and interpretability of AI and LLMs in specific
david4096.bsky.social
Excited to attend #SWAT4HCLS in Barcelona next week, representing @cambercloud.bsky.social ! 🎉

At the hackathon, we’ll explore #CroissantML for seamless dataset & model access via @hf.co and @kaggle.com 🤓
david4096.bsky.social
Check out our first preprint from #biohacakathon Fukushima 2024 and expect more on this work 🤓 files.osf.io/v1/resources...
files.osf.io
david4096.bsky.social
We found some low hanging fruit for improvement and tested out bringing a bio dataset into Croissant. We think that continually increasing the use of ontologies and controlled vocabularies will be crucial for data harmonization and the new era of multimodal models!
david4096.bsky.social
It works by providing a controlled vocabulary for high level dataset metadata as well as specific metadata for columnar data, which might seem like a small thing but is a huge step forward for bringing tools to data
david4096.bsky.social
@hf.co , @kaggle.com , OpenML, DataVerse and others are all implementing some or part of the CroissantML spec that interoperates with tooling like Tensorflow so you can load datasets directly into your AI training code
david4096.bsky.social
Biology datasets tend to be messy, require domain knowledge to parse, and not immediately usable for training AI models. That’s part of why I became interested in @mlcommons.org CroissantML as a way to bring ML tools to biology data — we’re presenting a poster on this effort at #swat4hcls next week!