Ming Tommy Tang
@tommytang.bsky.social
4.1K followers 1.4K following 7.1K posts
Director of bioinformatics at AstraZeneca. subscribe to my youtube channel @chatomics. On my way to helping 1 million people learn bioinformatics. Educator, Biotech, single cell. Also talks about leadership. tommytang.bio.link
Posts Media Videos Starter Packs
tommytang.bsky.social
I was featured in The Data Wire by Pure Storage.
Data challenge comes before the algorithm/AI challenge.
tommytang.bsky.social
11/
Key Takeaways
Intronic reads are common

Don't ignore them—understand them

Clean RNA prep helps

IR ≠ error; it can be biology

Always validate with orthogonal methods
tommytang.bsky.social
10/
Not Always Noise
Intronic reads ≠ garbage.
They may tell you about:
Transcriptional burst

Splicing kinetics

Retained introns with function

Context matters.
tommytang.bsky.social
9/
When To Worry
Metric Normal (PolyA) Red Flag
Exonic Reads >60% <50%
Intergenic Reads <10% >20%
Intronic Reads ~5–15% >30%
tommytang.bsky.social
7/
Aligner & Annotation Issues
Even splice-aware tools like STAR and HISAT2 struggle with:
Novel isoforms

Pseudogenes

Incomplete annotations .
tommytang.bsky.social
6/
Ribo-depletion Libraries
They pull in everything—including lncRNAs and pre-mRNAs.
That’s great for total transcriptome.
Not great if you expect clean mRNA.
tommytang.bsky.social
5/
Library Prep Matters
PolyA vs Total RNA makes a big difference.
Method Exonic Intronic
PolyA 69% 7%
Total RNA 56% 21%
tommytang.bsky.social
4/
Biological Intron Retention (IR)
This isn’t noise—it’s regulation.
IR plays a role in cell stress, cancer, and mRNA decay.
e.g., HNRNPD, TP53 .
tommytang.bsky.social
3/
Genomic DNA Contamination
Bad RNA prep? DNA might still be there.
It mimics intronic reads.
You’ll see odd insert sizes or discordant read pairs .
tommytang.bsky.social
2/
Pre-mRNA Contamination
Poly(A) libraries still catch unspliced RNA.
Why? Because splicing isn’t instant.
Correlations: intron vs exon counts often r = 0.7–0.9 .
tommytang.bsky.social
1/
Your RNA-seq targets mature mRNA, right?
So why do you see intronic reads?
Here’s why—some technical, some biological.
tommytang.bsky.social
Why are there intronic reads in your bulk RNA-seq data?
You're not alone—it's common, and the reasons are more layered than you think.
Let’s break it down. 🧵
tommytang.bsky.social
14/
So next time you see only 5 DE genes…
Or only 100 peaks from your TF of interest…
Don’t give up.
Zoom out.
Ask better questions.
And adjust.
Biology won’t fit in your default settings.
Neither should your analysis.
tommytang.bsky.social
13/
Bioinformatics isn’t about being strict.
It’s about being smart.
Knowing when to trust the defaults, and when to question them.
tommytang.bsky.social
12/
Always go back to the genome browser.
Visualize your bigwig.
Zoom into the peaks.
Some truths are in the noise.
tommytang.bsky.social
11/
Use default thresholds—but don’t worship them.
Ask yourself:
Does this make sense biologically?

What am I missing with this filter?

Can visualization help?
tommytang.bsky.social
10/
Key takeaway:
Bioinformatics is not a vending machine.
You don’t just plug in data and get out truth.
It’s a craft.
tommytang.bsky.social
9/
Don’t forget QC.
High duplication rate?
Low FRiP score?
High mitochondrial reads?
All affect confidence—but don't dictate a hard stop.
Always combine data quality with domain knowledge.
tommytang.bsky.social
8/
This isn’t cherry-picking.
It’s being biology-aware.
A dataset can be messy and still reveal truth—if you know where and how to look.
tommytang.bsky.social
7/
Maybe the sample was under-chipped.
Maybe sequencing depth was low.
Or maybe your TF just binds loosely
If the biology says this TF should bind thousands of sites—
You adjust the q-value threshold.
You follow the signal.