annaeus.bsky.social
annaeus.bsky.social
@annaeus.bsky.social
I don’t have an opinion about emojis in bios, but when I saw your “red flag” comment, I assumed you were making some kind of pun. And I’ll make another: I don’t see any red flags in that person’s bio.
December 30, 2025 at 8:34 PM
Cuius Exemplum monstrat: “testāmentō quīdam iussit pōnī statuam auream hastam tenentem. Quaeritur, statua hastam tenens aurea esse dēbeat an hasta esse aurea in statuā alterīus māteriae?” (Ibid.)
October 27, 2025 at 11:59 PM
Nunc videō eum linguā Latīnā priscā scriptum esse.
October 23, 2025 at 5:38 AM
I worked in Belfast for a while. One of the admins sent out a weekly trivia quiz. Shortly after I joined, he gave it an American theme. Coworkers asked me, “What do Americans call courgettes?” I said, “I don’t know. What’s a courgette?”
July 14, 2025 at 4:46 PM
I’ve actually heard some people argue that Nixon might have survived Watergate if he had managed the economy more responsibly.
April 7, 2025 at 11:50 PM
As I said before, the validation views for the latest version of the model immediately looked more sensible than previous versions. It feels good to think I solved a general, fundamental data science issue while simultaneously being able to show clear business benefits to my boss.
February 16, 2025 at 1:18 AM
And that means I could add the importance score for the two models to get a single measure of importance for the original variable. Since SHAP values are also additive, I could do the same thing for all my validation views.
February 16, 2025 at 1:18 AM
Another thing I did was to created a single importance score for these hybrid variables. For the variable selection phase, I used “total gain” in XGBoost instead of the default “average gain”. Totals can be meaningfully added together, unlike averages.
February 16, 2025 at 1:18 AM
This isn’t entirely new, but it’s not built into most open-source data science packages. I wrote some code this week to successfully implement this as part of an XGBoost model, and I could immediately see the benefits when I generated validation views for my model.
February 16, 2025 at 1:18 AM
The model can use the numerical variable to understand how customers with calculated values behave and use the categorical variable to make any adjustments necessary to accurately represent the behavior of the other types of customers.
February 16, 2025 at 1:18 AM
The categorical variable will then record which category the customer falls in: “Regular value”, “True missing value”, “No loans”, “No loans in 12 months”, “No minimum payments”, etc.
February 16, 2025 at 1:18 AM
The numeric variable has the actual values for customers when it can be calculated and either a missing value or an imputed value for those customers for whom it cannot be calculated.
February 16, 2025 at 1:18 AM
This week what I did was to turn this single variable into two variables: a numeric variable and a categorical variable.
February 16, 2025 at 1:18 AM
This variable that initially sounded like a numeric variable, has categories of customers for which it cannot be calculated. It’s partly numeric and partly categorical. How do we handle this?
February 16, 2025 at 1:18 AM
For some customers, that is a straightforward calculation: 100% or 66.7% or 125%. But for other customers it makes no sense. Some might not have any loans, some might not have any in the past 12 months, some might not have a minimum payment amount.
February 16, 2025 at 1:18 AM
Take an example variable: the ratio of a customer’s actual payments to their minimum required payments on all their loans in the past 12 months.
February 16, 2025 at 1:18 AM
All of this so far is stuff that you are likely to see in intro statistics and data science textbooks. One thing that rarely gets mentioned in textbooks is that some variables can be partially numeric and partially categorical. That’s what I was working on this week.
February 16, 2025 at 1:18 AM