Lightnews — Scholar-powered news

denis-ai.bsky.social

@denis-ai.bsky.social

All I want is to deploy in the cloud☁️
Once containerized there are many options😃
See four liners at github.com/DataTalksClu...

github.com

November 3, 2025 at 9:10 PM

denis-ai.bsky.social

@denis-ai.bsky.social

What about getting UV in docker container? 🤔
Just copy binary like this:
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/ 😎

November 3, 2025 at 9:00 PM

denis-ai.bsky.social

@denis-ai.bsky.social

UV package manager
is really-really fast. Get it with pip install uv👌

November 3, 2025 at 8:58 PM

denis-ai.bsky.social

@denis-ai.bsky.social

Summary of MLZC Week 5
- loading and saving models
- serve model with FastAPI
- manage environments with uv
- package prediction service in Docker container
- deploy to cloud

November 3, 2025 at 8:56 PM

denis-ai.bsky.social

@denis-ai.bsky.social

Summary of MLZC Week 4
ROC curve: Evaluates the performance at all thresholds
K-fold Cross Validation: train and validate a model on a number of parts (folds) and loop that through all folds combos.

October 23, 2025 at 8:36 PM

denis-ai.bsky.social

@denis-ai.bsky.social

Summary of MLZC week 4
Metric: single number to describe model performance
Accuracy: fraction of correct answers
Precision&Recall are less misleading when classes are imbalanced

October 23, 2025 at 8:31 PM

denis-ai.bsky.social

@denis-ai.bsky.social

Dummy binary classifier
Can be very accurate when classes are imbalanced 👌

October 23, 2025 at 8:27 PM

denis-ai.bsky.social

@denis-ai.bsky.social

Accuracy is not enough!
Get precision and recall too.
Bonus: calculate F1 score just in case!

October 23, 2025 at 8:25 PM

denis-ai.bsky.social

@denis-ai.bsky.social

Confusion table helps to avoid confusion ...
when predicting a cstmr churn consider:
No churn (Negative)
Cstmr didn't churn - True Negative (TN)
Cstmr churned - False Negative (FN)
Churn (Positive)
Cstmr churned - True Positive (TP)
Cstmr didn't churn - False Positive (FP)

October 23, 2025 at 8:23 PM

denis-ai.bsky.social

@denis-ai.bsky.social

sklearn.metrics module
Score functions, performance metrics, pairwise metrics and distance computations.
See scikit-learn.org/stable/modul...

3.4. Metrics and scoring: quantifying the quality of predictions

Which scoring function should I use?: Before we take a closer look into the details of the many scores and evaluation metrics, we want to give some guidance, inspired by statistical decision theory...

scikit-learn.org

October 23, 2025 at 8:17 PM

denis-ai.bsky.social

@denis-ai.bsky.social

ROC AUC is useful metric
Evaluate model with Area Under the ROC curve (AUC). AUC equals 0.5 for a random baseline and 1.0 for an ideal model.

October 23, 2025 at 8:11 PM

denis-ai.bsky.social

@denis-ai.bsky.social

When data preprocessing looks right

preprocessor = ColumnTransformer([
('num', StandardScaler(), numerical_features),
('cat', OneHotEncoder(drop='first'), categorical_features)
])

October 13, 2025 at 10:58 PM

denis-ai.bsky.social

@denis-ai.bsky.social

Get weights of trained model with:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(solver='lbfgs')
model.fit(X_train, y_train)
...
model.intercept_[0]
model.coef_[0]

October 13, 2025 at 10:55 PM

denis-ai.bsky.social

@denis-ai.bsky.social

Logistic regression is supervised ML algorithm
Unlike linear regression that predicts continuous values it predicts the probs that an input belongs to a specific class. Used for binary classification, with output, e.g., Yes/No, 0/1. Sigmoid function converts inputs into a prob value between 0 and 1.

October 13, 2025 at 10:49 PM

denis-ai.bsky.social

@denis-ai.bsky.social

Make Features Important Again!
Risk ratios, mutual information and pearson's correlations are your little helpers🧑‍🎓

October 13, 2025 at 10:38 PM

denis-ai.bsky.social

@denis-ai.bsky.social

One-hot encoding is easy
with DictVectorizer from sklearn.feature_extraction 👌

October 13, 2025 at 10:34 PM

denis-ai.bsky.social

@denis-ai.bsky.social

Before model training do that ☝️
- Categorical Variables: convert with one-hot encoding
- Numerical Variables: scaled or normalize
- Missing Values: fix with chosen strategy
- Data Split: 60% for training, 20% for validation and rest for testing

October 13, 2025 at 10:31 PM

denis-ai.bsky.social

@denis-ai.bsky.social

Learnings from Week 3 of ML-Zoomcamp
- Not all features are equal - feature analysis with risk ratios, mutual information and correlation matrices
- One-hot encoding for categorical features
- Classify with logistic regression
- Probabilities for categories and interpretation of weights

October 13, 2025 at 10:27 PM

denis-ai.bsky.social

@denis-ai.bsky.social

Data preparation basics - II

Shuffle data using pandas only 👍 The built-in sample() function is ready to help!
pandas.pydata.org/docs/referen...

df_shuffled = df.sample(frac=1, random_state=seed)
df_shuffled.reset_index(drop=True, inplace=True)

pandas.DataFrame.sample — pandas 2.3.3 documentation

pandas.pydata.org

October 8, 2025 at 8:47 AM

denis-ai.bsky.social

@denis-ai.bsky.social

Data preparation basics
Shuffle your data before splitting it into train, validation, test subsets:
idx = np.arange( len(df) )
np.random.seed(42) <-- seed value
np.random.shuffle(idx)

df_train = df.iloc[idx[:train_size]]
...

October 7, 2025 at 9:57 PM

denis-ai.bsky.social

@denis-ai.bsky.social

Learnings from Week 2 of ML-Zoomcamp
- Start with EDA
- Look at target variable distribution
- Make train, val, test datasets
- Matrix manipulations for linear regression
- Code linear regression in few lines of code
- RMSE
- Feature engineering helps
- Regularization is needed too

October 7, 2025 at 9:47 PM

denis-ai.bsky.social

@denis-ai.bsky.social

Reduce the skewness in data with np.log1p() 😎
Easy!☝️

October 7, 2025 at 9:39 PM

denis-ai.bsky.social

@denis-ai.bsky.social

Little hacks💡to EDA your data - III👌
Have a grasp of values and their uniqueness for every column with:
for col in df.columns:
print(col)
print(df[col].unique()[:10])
print(df[col].nunique())

October 7, 2025 at 9:30 PM

denis-ai.bsky.social

@denis-ai.bsky.social

Little hacks💡to EDA your data - II👌
Get number of missing values for every column with df.isnull().sum()

October 7, 2025 at 9:25 PM

denis-ai.bsky.social

@denis-ai.bsky.social

Little hacks💡to EDA your data - I👌
After loading dataframe, have a quick look at it with df.T

October 7, 2025 at 9:19 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news