Lightnews — Scholar-powered news

Hellina Hailu Nigatu

@hellinanigatu.bsky.social

2.2K followers 250 following 130 posts

CS PhD candidate @UCBerkeley. Interested in multilingual and low-resourced language NLP + HCI. @SIGHPC CDS Fellow. Interned @MBZUAI. Current intern at DAIR Website: https://hhnigatu.github.io

hhnigatu.github.io

Posts Media Videos Starter Packs

Pinned

Hellina Hailu Nigatu @hellinanigatu.bsky.social · Jul 22

I started a personal project a while back...Working on African languages, I meet people from all over the continent. I found it very interesting how similar we can be on some things and how drastically different we are on others. So I decided to read one book from each African country...🧵

7 8 37

Hellina Hailu Nigatu @hellinanigatu.bsky.social · 2d

Congrats!!

Reposted by Hellina Hailu Nigatu

Deb Raji @rajiinio.bsky.social · 13d

There is so much about navigating the Internet in a low resourced language that makes one unnecessarily vulnerable to malicious actors. It's not just a quality of experience difference, but literally the soft belly through which misinformation spreaders attack.

Hellina Hailu Nigatu @hellinanigatu.bsky.social · 13d

Very excited for our upcoming #AIES paper Into the Void: Understanding Online Health Information in Low-Web Data Languages.

Link: arxiv.org/pdf/2509.20245

1/n

arxiv.org

6 10

Hellina Hailu Nigatu @hellinanigatu.bsky.social · 13d

Thank you friend ❤

Hellina Hailu Nigatu @hellinanigatu.bsky.social · 13d

This work was done with my wonderful collaborators Nuredin Ali, Fiker Tewelde, @schancellor.bsky.social and @iamdaricia.bsky.social

5/n

1 3

Hellina Hailu Nigatu @hellinanigatu.bsky.social · 13d

Based on our findings, we introduce the concept of Data Horizons: a critical boundary where algorithmic structures begin to degrade the relevance and reliability of search results.

4/n

1 2

Hellina Hailu Nigatu @hellinanigatu.bsky.social · 13d

We investigate online health information on #YouTube and #TikTok in two low-web data languages, Amharic and Tigrinya. We find that linguistic, technological, and socio-cultural constraints on information access and production lead to degraded information quality for low-web data languages.

3/n

1 4

Hellina Hailu Nigatu @hellinanigatu.bsky.social · 13d

While social media platforms are increasingly being used as sources of information for critical sectors like healthcare, the quality and quantity of information available is not always guaranteed, especially for languages with limited data available online.
2/n

1 2

Hellina Hailu Nigatu @hellinanigatu.bsky.social · 13d

Very excited for our upcoming #AIES paper Into the Void: Understanding Online Health Information in Low-Web Data Languages.

Link: arxiv.org/pdf/2509.20245

1/n

arxiv.org

1 1 9

Hellina Hailu Nigatu @hellinanigatu.bsky.social · 14d

I will DM you!

Hellina Hailu Nigatu @hellinanigatu.bsky.social · 15d

እንኳን አብሮ አደረሰን!
So far so good navigating the documentation! Will reach out if i need help or have questions 😊 thank you!

1 1

Hellina Hailu Nigatu @hellinanigatu.bsky.social · 15d

@meg48.bsky.social's Ethiopian new years gift to me is a new version of HornMorpho exactly as i am working on a project that requires morphological analyzer for Amharic, Tigrinya, and Afan Oromo 💃💃

Hellina Hailu Nigatu @hellinanigatu.bsky.social · Sep 4

That explains a lot 😂😂

Hellina Hailu Nigatu @hellinanigatu.bsky.social · Sep 4

What are you up to Nina 👀

Reposted by Hellina Hailu Nigatu

Sarah E. Chasins @schasins.bsky.social · Aug 28

If you or your students are interested in visualization tools, may I suggest signing up for my student @parkie-doo.sh's study! We're learning *a lot* about how to build direct manipulation programming tools these days! Please pass the sign up link along to your labs!
docs.google.com/forms/d/e/1F...

1 4

Reposted by Hellina Hailu Nigatu

Milagros Miceli @milamiceli.bsky.social · Aug 28

I am thrilled to be recognized by TIME as one of the 100 most influential people worldwide in the field of artificial intelligence for my work with @dataworkersinquiry.bsky.social.

>> #TIME100AI time.com/time100ai

I want to take this opportunity to share a few reflections on this work 👇🧵

Portrait of Milagros Miceli in a frame that reads TIME100/AI 2025.

5 17 42

Hellina Hailu Nigatu @hellinanigatu.bsky.social · Aug 26

Oh no! I ran out of wall space for my tally!!!😌

Hellina Hailu Nigatu @hellinanigatu.bsky.social · Aug 26

I am gonna start a tally for every time i have to contend with publication policies at top tier conferences that implicitly stall Global South scholarship.

1 2

Hellina Hailu Nigatu @hellinanigatu.bsky.social · Aug 13

Came accross a common Ethiopian name on one of the poems in this book as a dedication 😊

Hellina Hailu Nigatu @hellinanigatu.bsky.social · Aug 12

this is not to say all MT is bad or MT has no place in contribution...more on that as an output of my work @dairinstitute.bsky.social 😎

1 2

Hellina Hailu Nigatu @hellinanigatu.bsky.social · Aug 12

Lol here is an example:

A google translated Tigrinya article: ti.wikipedia.org/wiki/%E1%88%...

English version: en.wikipedia.org/wiki/Wedding...

I took the part that says "Ethiopia" from the English article and ran it through Google Translate...almost identical output save a few words.

1 2 1

Hellina Hailu Nigatu @hellinanigatu.bsky.social · Aug 12

Book #11
Missing in action and presumed Dead by Rashidah Ismaili from Benin

Got this from Thrift Books and by luck got a version with the author signature ☺️

Its a beautiful collection of poems and my fav one is Nomad attached in the picture below

1 2

Hellina Hailu Nigatu @hellinanigatu.bsky.social · Aug 11

Omg our advisor @schasins.bsky.social got us beanbags for our lab space a while back and we loveee them

Hellina Hailu Nigatu @hellinanigatu.bsky.social · Aug 11

This is a good step IMO...but i think we conflate "Wikipedia" with "English Wikipedia" and "AI Generated" with "LLM generated"

We should also be having conversations on Machine Translated text in non-English Wikipedia...those are also "AI Generated"😐

Data & Society @datasociety.bsky.social · Aug 8

Wikipedia's policy for handling AI-generated articles could be "an important example for how to deal with the growing AI slop problem from a platform that has so far managed to withstand various forms of enshittification that have plagued the rest of the internet." www.404media.co/wikipedia-ed...

Wikipedia Editors Adopt ‘Speedy Deletion’ Policy for AI Slop Articles

“The ability to quickly generate a lot of bogus content is problematic if we don't have a way to delete it just as quickly.”

www.404media.co

4 8

Hellina Hailu Nigatu @hellinanigatu.bsky.social · Jul 30

Was a pleasure to work with you Chinasa❤ here is to many more collaborations 🥂

1 1

Reposted by Hellina Hailu Nigatu

Dr. Chinasa T. Okolo @chinasa.bsky.social · Jul 30

My latest work, “Examining the Cultural Encoding of Gender Bias in LLMs for Low-Resourced African Languages,” co-authored with Abigail Oppong and Hellina Nigatu, is now published at the Workshop on Gender Bias in Natural Language Processing at #ACL2025!

aclanthology.org/2025.gebnlp-...

Screenshot of paper on the ACL website with the title (Examining the Cultural Encoding of Gender Bias in LLMs for Low-Resourced African Languages) and abstract that reads: "Abstract
Large Language Models (LLMs) are deployed in several aspects of everyday life. While the technology could have several benefits, like many socio-technical systems, it also encodes several biases. Trained on large, crawled datasets from the web, these models perpetuate stereotypes and regurgitate representational bias that is rampant in their training data. Languages encode gender in varying ways; some languages are grammatically gendered, while others do not. Bias in the languages themselves may also vary based on cultural, social, and religious contexts. In this paper, we investigate gender bias in LLMs by selecting two languages, Twi and Amharic. Twi is a non-gendered African language spoken in Ghana, while Amharic is a gendered language spoken in Ethiopia. Using these two languages on the two ends of the continent and their opposing grammatical gender system, we evaluate LLMs in three tasks: Machine Translation, Image Generation, and Sentence Completion. Our results give insights into the gender bias encoded in LLMs using two low-resourced languages and broaden the conversation on how culture and social structures play a role in disparate system performances."

1 3 7