Anthony Gitter
@anthonygitter.bsky.social
69 followers
30 following
21 posts
Computational biologist; Associate Prof. at University of Wisconsin-Madison; Jeanne M. Rowe Chair at Morgridge Institute
Posts
Media
Videos
Starter Packs
Reposted by Anthony Gitter
Anthony Gitter
@anthonygitter.bsky.social
· Aug 22
Chemical Language Model Linker: Blending Text and Molecules with Modular Adapters
The development of large language models and multimodal models has enabled the appealing idea of generating novel molecules from text descriptions. Generative modeling would shift the paradigm from relying on large-scale chemical screening to find molecules with desired properties to directly generating those molecules. However, multimodal models combining text and molecules are often trained from scratch, without leveraging existing high-quality pretrained models. Training from scratch consumes more computational resources and prohibits model scaling. In contrast, we propose a lightweight adapter-based strategy named Chemical Language Model Linker (ChemLML). ChemLML blends the two single domain models and obtains conditional molecular generation from text descriptions while still operating in the specialized embedding spaces of the molecular domain. ChemLML can tailor diverse pretrained text models for molecule generation by training relatively few adapter parameters. We find that the choice of molecular representation used within ChemLML, SMILES versus SELFIES, has a strong influence on conditional molecular generation performance. SMILES is often preferable despite not guaranteeing valid molecules. We raise issues in using the entire PubChem data set of molecules and their associated descriptions for evaluating molecule generation and provide a filtered version of the data set as a generation test set. To demonstrate how ChemLML could be used in practice, we generate candidate protein inhibitors and use docking to assess their quality and also generate candidate membrane permeable molecules.
doi.org
Reposted by Anthony Gitter
Anthony Gitter
@anthonygitter.bsky.social
· Jul 18
Assay2Mol: large language model-based drug design using BioAssay context
Scientific databases aggregate vast amounts of quantitative data alongside descriptive text. In biochemistry, molecule screening assays evaluate the functional responses of candidate molecules against...
arxiv.org
Anthony Gitter
@anthonygitter.bsky.social
· Jul 18
Anthony Gitter
@anthonygitter.bsky.social
· Jul 18
Anthony Gitter
@anthonygitter.bsky.social
· Jul 18
Data mining of PubChem bioassay records reveals diverse OXPHOS inhibitory chemotypes as potential therapeutic agents against ovarian cancer - Journal of Cheminformatics
Focused screening on target-prioritized compound sets can be an efficient alternative to high throughput screening (HTS). For most biomolecular targets, compound prioritization models depend on prior ...
doi.org
Reposted by Anthony Gitter
Anthony Gitter
@anthonygitter.bsky.social
· May 30
Anthony Gitter
@anthonygitter.bsky.social
· May 30