Erin Young
@erinyoung.bsky.social
3.4K followers 2.3K following 320 posts
Public Health #Bioinformatician. Wants to sequence ALL THE THINGS. Personal account with alternative spellings and grammar structures. She/her
Posts Media Videos Starter Packs
erinyoung.bsky.social
It'd really shift my way of thinking if it didn't
erinyoung.bsky.social
And today's (useless) figure is that the mean depth observed in a bam file is linearly associated with the number of reads in the corresponding fastq files.
A scatter plot comparing the number of reads and the overall mean depth observed in a bam file, which appears to have a linear relationship.
erinyoung.bsky.social
I think you misunderstand. There is a lot of coverage for these samples. So much so that it is hard to see bubbles or other aberrations.
samtools coverage histogram of a SARS-CoV-2 sample with too many reads. The large grey box in the center should have more variation, but in general indicates that each and every base in the reference has a lot of read coverage.
erinyoung.bsky.social
How is this method simpler than what I attempted?
erinyoung.bsky.social
All 33 million+ reads were mapped with bbmap (all non-mapped reads were excluded prior to trimming the primers)
erinyoung.bsky.social
So, in summary, if high coverage samples aren't getting assigned lineages, I recommend subsampling them or adjusting the `samtools mpileup` command.
erinyoung.bsky.social
I really solved my dilemma (after too many hours trouble shooting it) by adding the `-d 0` flag to `samtools mpileup`, which uses a lot of memory but produced adequate consensus fasta files.

/
erinyoung.bsky.social
Instead it turns out that `bbmap` was allowing for very large insert sizes, some of which spanned the majority of the genome. These, for whatever reason, were given priority for `samtools mpileup` (which gets piped into `ivar consensus`) /
erinyoung.bsky.social
The most irritating part is that each of these 18 would generate a consensus that could be used for determining lineage if I subsampled them. I was worried about contamination, but these had all had human reads removed. /
erinyoung.bsky.social
I had 18 samples (33 million+ reads) of SARS-CoV-2 amplicon-based sequencing that would not create a decent consensus fasta after aligning with `bbmap` and trimming/consensus generation with `ivar`, but would if I tweaked the pipeline a bit.
/
erinyoung.bsky.social
Just remember to deactivate your environment when finished.

Popular bioinformatic tools have conda recipes, so conda helps make installing bioinformatic tools less painful.
erinyoung.bsky.social
6. Install stuff

I recommend installing tools in their own environment.

Something like

`conda create -n seqkit seqkit`

will install seqkit in a new conda environment named seqkit.

This environment can be turned on or activated with

`conda activate seqkit` /
erinyoung.bsky.social
5. Add some extra channels to conda.

I recommend conda-forge and bioconda

Something like this
`conda config --add channels conda-forge`

or this
`conda config --add channels bioconda`

Channels are where different conda packages are stored. Adding channels helps conda find packages to install. /
erinyoung.bsky.social
There are some people who then recommend installing mamba (or similar) with something like `conda install mamba`, which I think is fine. These tools perform similar tasks to conda, but are often faster. I let "more advanced" users explore these options and tend to keep everything "conda". /
erinyoung.bsky.social
4. Activate conda

Something like source `~/miniconda3/bin/activate`, or just log out and log back in

Now you have conda! /
erinyoung.bsky.social
3. Use bash (or whatever shell is being used) to run the downloaded script

`bash Miniconda3-latest-Linux-x86_64.sh`

Set all prompts with relevant information, and, yes, you want a line in your ~/.bashrc file

/
erinyoung.bsky.social
1. go the Anaconda/Miniconda/Conda website to get the url for the bash script is uses to install itself. It something like repo.anaconda.com/miniconda/Mi...

2. Download this file
wget repo.anaconda.com/miniconda/Mi...
or
curl repo.anaconda.com/miniconda/Mi...

(I'm not picky)

/
Installing Miniconda - Anaconda
www.anaconda.com
erinyoung.bsky.social
Conda is an open-source command-line system for package and environment management. It's free, and generally does not need admin privileges.

Since I helped someone install conda today, I'll share the process here. /
erinyoung.bsky.social
We have a new APHL fellow working with us!

@robbysainsbury.bsky.social is going to do some great things!
erinyoung.bsky.social
I feel like a lot of people surpass me, but I'll let you be impressed.

34 are picture books,
4 are graphic novels,
1 is a kids chapter book,
and 1 book is for me.
erinyoung.bsky.social
Mine does this too! The library reminds me that my kids would bankrupt me if I didn't have a library nearby.
Text in image:

Total items: 9

You just saved $110.74 by using your library. You have saved $2,778.31 this past year and $7,108.25 since you began using the library!

Account summary
        Items out: 40
        Hold requests: 1
        Items held: 0
        Charges: $0.00

Thank You!
erinyoung.bsky.social
* Morbillivirus hominis (sorry for the typo!)
erinyoung.bsky.social
Sequencing measles (and other pathogens) helps track its spread, spot new introductions, and monitor changes in the virus. The more virus that we sequence, the clearer the picture.