Soledad Galli, PhD
banner
solegalli.bsky.social
Soledad Galli, PhD
@solegalli.bsky.social
Data scientist, best selling instructor, book author, Python 🐍 open-source developer (check out Feature-engine).

Find out more at Train in Data: https://www.trainindata.com/
Discover the latest thoughts on working with imbalanced data with our free booklet.

We discuss 3 recent articles that have changed the conversation on resampling and SMOTE👇

www.trainindata.com/p/7-takes-on...
October 27, 2025 at 12:30 PM
All our courses come with a 30-Day money back guarantee...

If you are unhappy for whatever reason, we give you the money back.

That confident we are that you'll ❤️ our courses.

#trainindata
October 24, 2025 at 11:28 PM
Next Monday on Data Bites : Six Cloud Platforms to Run Jupyter Notebooks for Free 🚀

Want to know more?

Click the link below to subscribe and stay tuned!👇
https://f.mtr.cool/bltkmoeitj

#machinelearning #datascience #jupyter #mlmodels #ML #mltools #notebooks #cloudplatforms
August 29, 2025 at 10:02 AM
👉MICE is a powerful method for datasets with missing data across multiple variables. 

Let this slide guide you through how it works. 

#machinelearning #MICE #mlmodels #datascience #dataengineering #imputation #featureengineering
August 27, 2025 at 4:02 PM
How to construct ensembles from a thousand models?

In this article, Caruana, a prominent figure in machine learning and ensemble methods, tells us more about how they create ensembles from libraries of 1000s of machine learning models. 
📄 https://f.mtr.cool/fpaqqnqxms
August 26, 2025 at 4:02 PM
Clustering & Dimensionality Reduction: your toolkit for finding patterns, simplifying data, and solving real-world problems.

🔍 You’ll:
✅ Group data (K-means, DBSCAN & more)
✅ Reduce complexity (PCA, UMAP)
✅ Work on real cases like RNA profiling

📍 https://f.mtr.cool/hdjiwbbsbl
August 25, 2025 at 4:02 PM
Next Monday on Data Bites : Working with imbalanced data? Follow these 3 steps.

Want to know more?

Click the link below to subscribe and stay tuned!👇
https://f.mtr.cool/svpfklfpda

#machinelearning #datascience #CV #mlmodels #ML #MLCareer #MLresume
August 22, 2025 at 10:02 AM
Can we use statistical tests to select features? 🤔

Turns out, we can! 🎉

In the slides below, we’ll explore the most commonly used statistical tests for feature selection, along with their advantages and limitations. 👇

#machinelearning #datascience #featureselection
August 19, 2025 at 4:02 PM
🚨 It’s here! Our new course on Clustering & Dimensionality Reduction just dropped 🎉

Learn how to group data (K-Means, DBSCAN, Louvain) + simplify it with PCA & UMAP, no prior experience needed!

Hands-on & practical 👇
👉  https://f.mtr.cool/zshxexbrds

#MachineLearning #DataScience
August 18, 2025 at 4:02 PM
Next Monday on Data Bites : How to Write a Winning Data Science CV

Want to know more?

Click the link below to subscribe and stay tuned!👇
https://f.mtr.cool/nozrfuruar

#machinelearning #datascience #CV #mlmodels #ML #MLCareer #MLresume
August 15, 2025 at 10:02 AM
Deep learning has transformed our daily lives, but designing neural networks remains a challenge. 

Automated hyperparameter optimization (HPO) streamlines the process. This paper reviews key techniques & tools for improving model accuracy & efficiency.
📃https://f.mtr.cool/wowjcrmwjg
August 14, 2025 at 4:02 PM
August 13, 2025 at 4:02 PM
🚨 SMOTE has long been hailed as the go-to solution for imbalanced datasets, but it only works in specific scenarios. 

In this article, we explore when SMOTE is truly effective & why it’s remained popular. 

Check it out!
https://f.mtr.cool/medbbpfril
August 12, 2025 at 4:01 PM
🚨 Just launched: our new course on Clustering & Dimensionality Reduction is live at Train in Data!

Learn to group data, reduce complexity with PCA & UMAP, and tackle real-world projects (no experience needed!)

🎓 Join us: https://f.mtr.cool/wlhxbboqkl
August 11, 2025 at 4:02 PM
Next Monday on Data Bites : Everybody says “SMOTE does not work”.

Want to know more?

Click the link below to subscribe and stay tuned!👇
https://f.mtr.cool/pinchbaedf

#machinelearning #datascience #smote #mlmodels #ML
August 8, 2025 at 10:01 AM
🐍Python libraries that implement agnostic global explainability methods 👇 

#python #machinelearning #MLModel #datascience #dataengineering
August 6, 2025 at 4:02 PM
Most commonly used encoding techniques ⬇️

1. OneHotEncoder
2. OrdinalEncoder
3. TargetEncoder

When one-hot encoding gets too complex and ordinal encoding leads to inaccuracies, TargetEncoding often becomes the best choice. Learn more at the link below.

#targetencoder #ML
August 5, 2025 at 4:02 PM
🚨 New Course - Clustering & Dimensionality Reduction at Train in Data

Learn to apply unsupervised ML in practice 👇
✅ K-Means, DBSCAN, HDBSCAN, Graph-based
✅ PCA & UMAP
✅ Real-world projects incl. RNA case study

Find out more : https://f.mtr.cool/cojxgkyhgq
August 4, 2025 at 4:02 PM
Next Monday on Data Bites : Probe Feature Selection

Want to know more?

Click the link below to subscribe and stay tuned!👇
https://f.mtr.cool/xefqrzzgeh

#machinelearning #datascience #imbalanceddata #undersampling #mlmodels #ML
August 1, 2025 at 10:02 AM
🤔 Have you used missing category imputation in your projects? Check out this reel 👇

💡 Want to dive deeper into feature engineering and data imputation? Check out our course 
https://www.trainindata.com/p/feature-engineering-for-machine-learning

#machinelearning #featurenegineering #dataimputation
July 29, 2025 at 4:03 PM
In #ML, the accuracy of a classifier’s predictions is crucial. If your model's probabilities are off, probability calibration can correct that.✔️

Learn why calibration matters & how to do it in Python with scikit-learn 👇 https://www.blog.trainindata.com/probability-calibration-in-machine-learning/
July 28, 2025 at 4:02 PM
Next Monday on Data Bites : Tired of spending hours on data preprocessing?

Want to know more?

Click the link below to subscribe and stay tuned!👇
https://f.mtr.cool/lyojjydmkp
July 25, 2025 at 10:01 AM
Machine Learning is transforming insurance, but black-box models hurt trust and compliance. 🧐

Interpretability helps us:
✅ Spot biases
✅ Explain decisions
✅ Improve models

Understanding decisions = fairer, more transparent insurance. 💡

#MachineLearning #Insurance #AI
July 24, 2025 at 4:03 PM
📊 AUC-ROC analysis is a reliable metric for binary classification, helping to assess class differentiation, even with imbalanced datasets.

Check out this blog that breaks down its key concepts and shows how to evaluate #ML model performance.👇
https://f.mtr.cool/ravvrkjudz
July 22, 2025 at 4:02 PM
🚀 Exciting news! Our new course on Clustering & Dimensionality Reduction is live at Train in Data! 🎉

Learn to group data & simplify datasets with hands-on projects—no experience needed. Let’s grow your ML skills together!

👉 https://f.mtr.cool/myghjmekoa
July 21, 2025 at 4:02 PM