Lightnews — Scholar-powered news

Erin Young

@erinyoung.bsky.social

3.4K followers 2.3K following 320 posts

Public Health #Bioinformatician. Wants to sequence ALL THE THINGS. Personal account with alternative spellings and grammar structures. She/her

Posts Media Videos Starter Packs

Erin Young @erinyoung.bsky.social · 9d

It'd really shift my way of thinking if it didn't

Erin Young @erinyoung.bsky.social · 9d

And today's (useless) figure is that the mean depth observed in a bam file is linearly associated with the number of reads in the corresponding fastq files.

A scatter plot comparing the number of reads and the overall mean depth observed in a bam file, which appears to have a linear relationship.

Erin Young @erinyoung.bsky.social · 15d

I think you misunderstand. There is a lot of coverage for these samples. So much so that it is hard to see bubbles or other aberrations.

samtools coverage histogram of a SARS-CoV-2 sample with too many reads. The large grey box in the center should have more variation, but in general indicates that each and every base in the reference has a lot of read coverage.

Erin Young @erinyoung.bsky.social · 15d

How is this method simpler than what I attempted?

Erin Young @erinyoung.bsky.social · 15d

All 33 million+ reads were mapped with bbmap (all non-mapped reads were excluded prior to trimming the primers)

Erin Young @erinyoung.bsky.social · 15d

So, in summary, if high coverage samples aren't getting assigned lineages, I recommend subsampling them or adjusting the `samtools mpileup` command.

Erin Young @erinyoung.bsky.social · 15d

I really solved my dilemma (after too many hours trouble shooting it) by adding the `-d 0` flag to `samtools mpileup`, which uses a lot of memory but produced adequate consensus fasta files.

/

Erin Young @erinyoung.bsky.social · 15d

Instead it turns out that `bbmap` was allowing for very large insert sizes, some of which spanned the majority of the genome. These, for whatever reason, were given priority for `samtools mpileup` (which gets piped into `ivar consensus`) /

Erin Young @erinyoung.bsky.social · 15d

The most irritating part is that each of these 18 would generate a consensus that could be used for determining lineage if I subsampled them. I was worried about contamination, but these had all had human reads removed. /

Erin Young @erinyoung.bsky.social · 15d

I had 18 samples (33 million+ reads) of SARS-CoV-2 amplicon-based sequencing that would not create a decent consensus fasta after aligning with `bbmap` and trimming/consensus generation with `ivar`, but would if I tweaked the pipeline a bit.
/

Erin Young @erinyoung.bsky.social · Sep 8

Just finished updating my prokaryotic representative genome mash reference coinciding with refseq version 232

🧬🖥️🦠

Mash Sketch of RefSeq Bacterial Reference Genomes

The mash reference that can be downloaded from the mash documentaion is for RefSeq version 70. I do not inherently have a problem with RefSeq version 70, but RefSeq is well past version 200 now. RefS...

Erin Young @erinyoung.bsky.social · Aug 28

Just remember to deactivate your environment when finished.

Popular bioinformatic tools have conda recipes, so conda helps make installing bioinformatic tools less painful.

Erin Young @erinyoung.bsky.social · Aug 28

6. Install stuff

I recommend installing tools in their own environment.

Something like

`conda create -n seqkit seqkit`

will install seqkit in a new conda environment named seqkit.

This environment can be turned on or activated with

`conda activate seqkit` /

Erin Young @erinyoung.bsky.social · Aug 28

5. Add some extra channels to conda.

I recommend conda-forge and bioconda

Something like this
`conda config --add channels conda-forge`

or this
`conda config --add channels bioconda`

Channels are where different conda packages are stored. Adding channels helps conda find packages to install. /

Erin Young @erinyoung.bsky.social · Aug 28

There are some people who then recommend installing mamba (or similar) with something like `conda install mamba`, which I think is fine. These tools perform similar tasks to conda, but are often faster. I let "more advanced" users explore these options and tend to keep everything "conda". /

Erin Young @erinyoung.bsky.social · Aug 28

4. Activate conda

Something like source `~/miniconda3/bin/activate`, or just log out and log back in

Now you have conda! /

Erin Young @erinyoung.bsky.social · Aug 28

3. Use bash (or whatever shell is being used) to run the downloaded script

`bash Miniconda3-latest-Linux-x86_64.sh`

Set all prompts with relevant information, and, yes, you want a line in your ~/.bashrc file

/

Erin Young @erinyoung.bsky.social · Aug 28

1. go the Anaconda/Miniconda/Conda website to get the url for the bash script is uses to install itself. It something like repo.anaconda.com/miniconda/Mi...

2. Download this file
wget repo.anaconda.com/miniconda/Mi...
or
curl repo.anaconda.com/miniconda/Mi...

(I'm not picky)

/

Installing Miniconda - Anaconda

www.anaconda.com

Erin Young @erinyoung.bsky.social · Aug 28

Conda is an open-source command-line system for package and environment management. It's free, and generally does not need admin privileges.

Since I helped someone install conda today, I'll share the process here. /

Reposted by Erin Young

Kate Baker @ksbakes.bsky.social · Aug 27

Awesome to see the transparent, systematic, and evidence-based approach behind the WHO Bacterial Priority Pathogens List 2024 out in the Lancet Infectious Diseases www.thelancet.com/journals/lan...

The WHO Bacterial Priority Pathogens List 2024: a prioritisation study to guide research, development, and public health strategies against antimicrobial resistance

The 2024 WHO BPPL is a key tool for prioritising research and development investments and informing global public health policies to combat AMR. Gram-negative bacteria and rifampicin-resistant M tuber...

www.thelancet.com

Erin Young @erinyoung.bsky.social · Aug 19

We have a new APHL fellow working with us!

@robbysainsbury.bsky.social is going to do some great things!

Erin Young @erinyoung.bsky.social · Aug 12

I feel like a lot of people surpass me, but I'll let you be impressed.

34 are picture books,
4 are graphic novels,
1 is a kids chapter book,
and 1 book is for me.

Erin Young @erinyoung.bsky.social · Aug 12

Mine does this too! The library reminds me that my kids would bankrupt me if I didn't have a library nearby.

Text in image:

Total items: 9

You just saved $110.74 by using your library. You have saved $2,778.31 this past year and $7,108.25 since you began using the library!

Account summary
Items out: 40
Hold requests: 1
Items held: 0
Charges: $0.00

Thank You!

Erin Young @erinyoung.bsky.social · Aug 8

* Morbillivirus hominis (sorry for the typo!)

Erin Young @erinyoung.bsky.social · Aug 8

Sequencing measles (and other pathogens) helps track its spread, spot new introductions, and monitor changes in the virus. The more virus that we sequence, the clearer the picture.