Soledad Galli, PhD
banner
solegalli.bsky.social
Soledad Galli, PhD
@solegalli.bsky.social
Data scientist, best selling instructor, book author, Python 🐍 open-source developer (check out Feature-engine).

Find out more at Train in Data: https://www.trainindata.com/
Moving averages has been long used as a forecasting benchmark model.

Did you know that you can also use moving averages as input features?

If not, check out this blog to find out more, together with Python implementations:

www.blog.trainindata.com/master-movin...
Moving Average Forecasting: What You Need to Know - Train in Data's Blog
Learn moving average forecasting with clear examples, practical applications, and accuracy tips for better time series predictions.
www.blog.trainindata.com
November 3, 2025 at 12:30 PM
Discover the latest thoughts on working with imbalanced data with our free booklet.

We discuss 3 recent articles that have changed the conversation on resampling and SMOTE👇

www.trainindata.com/p/7-takes-on...
October 27, 2025 at 12:30 PM
All our courses come with a 30-Day money back guarantee...

If you are unhappy for whatever reason, we give you the money back.

That confident we are that you'll ❤️ our courses.

#trainindata
October 24, 2025 at 11:28 PM
Next Monday on Data Bites : Six Cloud Platforms to Run Jupyter Notebooks for Free 🚀

Want to know more?

Click the link below to subscribe and stay tuned!👇
https://f.mtr.cool/bltkmoeitj

#machinelearning #datascience #jupyter #mlmodels #ML #mltools #notebooks #cloudplatforms
August 29, 2025 at 10:02 AM
Imbalanced datasets can mess with your ML models. 😬
ADASYN (Adaptive Synthetic Sampling) to the rescue! 🚀

Learn how it works + when to use it in our latest blog 👇
https://f.mtr.cool/rqstrumpnx

#MachineLearning #DataScience #ImbalancedData #ADASYN
ADASYN: Adaptive Synthetic Sampling for Imbalanced Datasets - Train in Data's Blog
ADASYN can be used to handle data imbalance by creating synthetic samples of the minority class and improve model performance. Really?
f.mtr.cool
August 28, 2025 at 4:02 PM
👉MICE is a powerful method for datasets with missing data across multiple variables. 

Let this slide guide you through how it works. 

#machinelearning #MICE #mlmodels #datascience #dataengineering #imputation #featureengineering
August 27, 2025 at 4:02 PM
How to construct ensembles from a thousand models?

In this article, Caruana, a prominent figure in machine learning and ensemble methods, tells us more about how they create ensembles from libraries of 1000s of machine learning models. 
📄 https://f.mtr.cool/fpaqqnqxms
August 26, 2025 at 4:02 PM
Clustering & Dimensionality Reduction: your toolkit for finding patterns, simplifying data, and solving real-world problems.

🔍 You’ll:
✅ Group data (K-means, DBSCAN & more)
✅ Reduce complexity (PCA, UMAP)
✅ Work on real cases like RNA profiling

📍 https://f.mtr.cool/hdjiwbbsbl
August 25, 2025 at 4:02 PM
Next Monday on Data Bites : Working with imbalanced data? Follow these 3 steps.

Want to know more?

Click the link below to subscribe and stay tuned!👇
https://f.mtr.cool/svpfklfpda

#machinelearning #datascience #CV #mlmodels #ML #MLCareer #MLresume
August 22, 2025 at 10:02 AM
Model performance matters! 🎯 

In this article, we break down essential evaluation metrics for classification models, starting with the Confusion Matrix. Perfect for anyone looking to build reliable #machinelearning systems!

Have a good read👇
Confusion Matrix, Precision, and Recall - Train in Data's Blog
Find out what the confusion matrix is and how it relates to other classification metrics like precision, recall and f1-score.
f.mtr.cool
August 21, 2025 at 4:02 PM
ELI5 now supports scikit-learn 1.6.0! 🎉It wasn’t working with the latest version of scikit-learn, but that’s a thing of the past.

As of now, ELI5 has released a new version with full support for scikit-learn >1.6.0 and Python >3.10.

Check it out 👇
GitHub - eli5-org/eli5: A library for debugging/inspecting machine learning classifiers and explaining their predictions
A library for debugging/inspecting machine learning classifiers and explaining their predictions - eli5-org/eli5
f.mtr.cool
August 20, 2025 at 4:02 PM
Can we use statistical tests to select features? 🤔

Turns out, we can! 🎉

In the slides below, we’ll explore the most commonly used statistical tests for feature selection, along with their advantages and limitations. 👇

#machinelearning #datascience #featureselection
August 19, 2025 at 4:02 PM
🚨 It’s here! Our new course on Clustering & Dimensionality Reduction just dropped 🎉

Learn how to group data (K-Means, DBSCAN, Louvain) + simplify it with PCA & UMAP, no prior experience needed!

Hands-on & practical 👇
👉  https://f.mtr.cool/zshxexbrds

#MachineLearning #DataScience
August 18, 2025 at 4:02 PM
Next Monday on Data Bites : How to Write a Winning Data Science CV

Want to know more?

Click the link below to subscribe and stay tuned!👇
https://f.mtr.cool/nozrfuruar

#machinelearning #datascience #CV #mlmodels #ML #MLCareer #MLresume
August 15, 2025 at 10:02 AM
Deep learning has transformed our daily lives, but designing neural networks remains a challenge. 

Automated hyperparameter optimization (HPO) streamlines the process. This paper reviews key techniques & tools for improving model accuracy & efficiency.
📃https://f.mtr.cool/wowjcrmwjg
August 14, 2025 at 4:02 PM
August 13, 2025 at 4:02 PM
🚨 SMOTE has long been hailed as the go-to solution for imbalanced datasets, but it only works in specific scenarios. 

In this article, we explore when SMOTE is truly effective & why it’s remained popular. 

Check it out!
https://f.mtr.cool/medbbpfril
August 12, 2025 at 4:01 PM
🚨 Just launched: our new course on Clustering & Dimensionality Reduction is live at Train in Data!

Learn to group data, reduce complexity with PCA & UMAP, and tackle real-world projects (no experience needed!)

🎓 Join us: https://f.mtr.cool/wlhxbboqkl
August 11, 2025 at 4:02 PM
Next Monday on Data Bites : Everybody says “SMOTE does not work”.

Want to know more?

Click the link below to subscribe and stay tuned!👇
https://f.mtr.cool/pinchbaedf

#machinelearning #datascience #smote #mlmodels #ML
August 8, 2025 at 10:01 AM
In this video, I review hyperparameter optimization techniques like Grid Search, Random Search, & Bayesian methods.

Learn their pros, cons, and best applications for both low and high-dimensional spaces! 

What techniques do you use? 
📽️
Enjoy the videos and music that you love, upload original content and share it all with friends, family and the world on YouTube.
f.mtr.cool
August 7, 2025 at 4:02 PM
🐍Python libraries that implement agnostic global explainability methods 👇 

#python #machinelearning #MLModel #datascience #dataengineering
August 6, 2025 at 4:02 PM
Most commonly used encoding techniques ⬇️

1. OneHotEncoder
2. OrdinalEncoder
3. TargetEncoder

When one-hot encoding gets too complex and ordinal encoding leads to inaccuracies, TargetEncoding often becomes the best choice. Learn more at the link below.

#targetencoder #ML
August 5, 2025 at 4:02 PM
🚨 New Course - Clustering & Dimensionality Reduction at Train in Data

Learn to apply unsupervised ML in practice 👇
✅ K-Means, DBSCAN, HDBSCAN, Graph-based
✅ PCA & UMAP
✅ Real-world projects incl. RNA case study

Find out more : https://f.mtr.cool/cojxgkyhgq
August 4, 2025 at 4:02 PM
Next Monday on Data Bites : Probe Feature Selection

Want to know more?

Click the link below to subscribe and stay tuned!👇
https://f.mtr.cool/xefqrzzgeh

#machinelearning #datascience #imbalanceddata #undersampling #mlmodels #ML
August 1, 2025 at 10:02 AM
The most crucial component of any machine learning project is data!
 
 ▶️ 90% of the time is spent on data preprocessing 
 ▶️ 10% of the time is spent on model building, tuning and evaluation.

#machinelearning #ML #MLmodels #preprocessing #modelbuilding #datascience
July 31, 2025 at 4:02 PM