Natalia
@nataliaelv.hf.co
1.5K followers 750 following 23 posts
Building Argilla @ Hugging Face 🤗. Linguist at heart. En ocasiones escribo en castellano.
Posts Media Videos Starter Packs
Pinned
nataliaelv.hf.co
Hello everyone! 👋 Since this is growing quite a bit, I thought I'd introduce myself:

I'm Natalia, a computational linguist working at @huggingface.bsky.social as part of the team building Argilla.
nataliaelv.hf.co
New chapter in the Hugging Face NLP course! 🤗 🚀

We've added a new chapter about the very basics of Argilla to the Hugging Face NLP course. Learn how to set up an Argilla instance, load & annotate datasets, and export them to the Hub. 

Any feedback for improvements welcome!
Screenshot of the Introduction to Argilla in Chapter 10 of the Hugging Face NLP course
Reposted by Natalia
jfcalvo.hf.co
🚀 Argilla v2.6.0 is here! 🎉

Let me show you how EASY it is to export your annotated datasets from Argilla to the Hugging Face Hub. 🤩

Take a look to this quick demo 👇

💁‍♂️ More info about the release at github.com/argilla-io/a...

#AI #MachineLearning #OpenSource #DataScience #HuggingFace #Argilla
nataliaelv.hf.co
I'm taking a well-deserved break to celebrate Christmas 🎄 ☃️ but the FineWeb2 annotation sprint continues!

You can still contribute some annotations or start leading a language!
nataliaelv.hf.co
If you are still wondering how the FineWeb2 annotations are done, how to follow the guidelines or how Argilla works, this is your video!

I go through a few samples of the FineWeb2 dataset and classify them based on their educational content. Check it out!
FineWeb2 collaborative sprint: how to annotate
In this video you'll learn how you can go about annotating some records in the FineWeb2 collaborative annotation sprint launched by Hugging Face and Argilla....
buff.ly
nataliaelv.hf.co
The FineWeb2 collaborative annotation sprint is also a way of keeping many languages alive. I talk about it in this LinkedIn post: https://buff.ly/49DghmN
nataliaelv.hf.co
Sure! We do have multiple leads for some languages! You don't need to be a lead to collaborate, though. You can also contribute with annotations once we launch the annotation space 🚀 If you'd still like to lead, send me a private message and I'll sign you up 🤗
nataliaelv.hf.co
Thanks @rasgaard.bsky.social ! Looking forward to this!
nataliaelv.hf.co
Thanks! 🤗 The best thing you can do is stay tuned and contribute some annotations in the Spanish split once we launch! 🚀
nataliaelv.hf.co
Next week we're launching a collaborative annotation effort to build a big multilingual dataset, so you can have high-quality data in your language.

We are really close to getting leads for 100 languages! Can you help us cover the remaining 200?
Screenshot of a dashboard showing the number of languages with a lead and languages without a lead
Reposted by Natalia
jfcalvo.hf.co
🙌 I just wanted to share a few thoughts about the latest Argilla release, 2.5.0, as it's a pretty big one!

Argilla now has full support for webhooks, which means you can do some pretty cool stuff, like model training on the fly as annotations are created. 🤯

#MachineLearning #NLP #DataLabeling
nataliaelv.hf.co
Just wanted to say that I'm sorry about my previous post. I was supporting a colleague who was sharing that his work was trending without being aware that it was harmful. I deleted the previous post a bit hastily to stop incoming insults. I'm sorry and will be more careful next time.
nataliaelv.hf.co
This is what you get in Bluesky when your feeds are Linguistics and otters 🦦😍
jonathanhsy.bsky.social
Linguistic trivia: OTTER is related to HYDRA #etymology #linguistics #protoindoeuropean #otters #serpents #mindblown
nataliaelv.hf.co
At @huggingface.bsky.social 🤗 we're preparing a collaborative annotation effort to build an open-source multilingual dataset.

If you'd like to get high-quality open data for your language, check if yours is listed in this form and sign up!
forms.gle/DHJdtvoSNxAA...
Language Lead sign-up
At Hugging Face 🤗, we're launching a big community initiative to improve LLM training for many languages. We're looking for Language Leads to help us cultivate specific languages during this initiativ...
forms.gle
nataliaelv.hf.co
We've updated the list and it should be there now! (Until we find a lead for the language of course!)
nataliaelv.hf.co
The list is updated and Japanese is in there!
Reposted by Natalia
kimshum.bsky.social
Periodic reminder: a lot of what makes AI "work" is exploited people doing the tasks, just hidden behind fancy websites.

It's good that a normie outlet like 60 Minutes is reporting on this.
Reposted by Natalia
davidberenstein.bsky.social
I created a collection with good models for dataset curation

- NSFW classifiers
- PII classifiers
- blazing fast embeddings by model2vec
- quality classifier
- educational value classifier
- domain classifier

Collection: huggingface.co/collections/...
Models for dataset curation - a Dataset-Tools Collection
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
nataliaelv.hf.co
I like posting about super-high-quality data curation for AI, languages (modern and ancient!) and linguistics.

If you'd like to follow my work on other platforms, you can find more links here: buff.ly/3OiuHPH
About me
Hi! 👋 I’m Natalia, a Computational Linguist from Madrid (Spain) working at Hugging Face 🤗. I’m passionate about languages and curating high-quality data for AI.
buff.ly
nataliaelv.hf.co
Hello everyone! 👋 Since this is growing quite a bit, I thought I'd introduce myself:

I'm Natalia, a computational linguist working at @huggingface.bsky.social as part of the team building Argilla.
nataliaelv.hf.co
That's so cool!