Lightnews — Scholar-powered news

Dan Saattrup Smart

@saattrupdan.com

IT-Universitetet i København @itu.dk · May 14

Så langt øjet rækker, vil manglen på it- og STEM-uddannede bare vokse, viser ny analyse fra @ida.dk.

Det kan blive dyrt for samfundet, advarer @forperson.ida.dk.

På ITU må vi hvert år afvise mange ansøgere i døren pga. politisk bestemte rammer... 🤷

#uddpol #dkpol
www.berlingske.dk/virksomheder...

Ny analyse: Danmark vil mangle 20.400 ingeniører og it-kandidater i 2040

Læs mere her.

www.berlingske.dk

May 14, 2025 at 7:51 PM

Reposted by Dan Saattrup Smart

NoDaLiDa

@nodalida.bsky.social

NoDaLiDa 2027 will be held at the Center of Language Technology at the University of Copenhagen!!

#nodalida #nlp

March 4, 2025 at 3:23 PM

Reposted by Dan Saattrup Smart

Dirk Hovy

@dirkhovy.bsky.social

Wanna keep up with our @milanlp.bsky.social lab? Here is a starter pack of current and former members:
bsky.app/starter-pack...

March 5, 2025 at 10:47 AM

Reposted by Dan Saattrup Smart

NoDaLiDa

@nodalida.bsky.social

NoDaLiDa x Baltic-HLT 2025 is a wrap!

Thank you all for joining for a fruitful conference! Safe trip home and see you in Copenhagen or Vilnius in 2027!!

#nlp #nodalida #baltichlt

March 5, 2025 at 3:11 PM

Reposted by Dan Saattrup Smart

Daniel van Strien

@danielvanstrien.bsky.social

WebFAQ: Massive Multilingual Q&A Dataset

- 96M QA pairs extracted from schema.org/FAQPage annotations
- 75 languages with standardized structured markup
- Leverages existing web publisher content intent
- No synthetic data generation needed

huggingface.co/datasets/PaD...

PaDaS-Lab/webfaq · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

March 6, 2025 at 9:18 AM

Reposted by Dan Saattrup Smart

NoDaLiDa

@nodalida.bsky.social

🚀 Thank you all for waiting! The full program of NoDaLiDa x Baltic-HLT is online:

www.nodalida-bhlt2025.eu/program

#nodalida #baltichlt #nlp #nlproc

NoDaLiDa/Baltic-HLT 2025 - Program

All times are local (GMT+2/UTC+2). See detailed program below.

www.nodalida-bhlt2025.eu

February 18, 2025 at 3:27 PM

Reposted by Dan Saattrup Smart

Margaret Mitchell

@mmitchell.bsky.social

⚫⚪ It's coming...SHADES. ⚪⚫
The first ever resource of multilingual, multicultural, and multigeographical stereotypes, built to support nuanced LLM evaluation and bias mitigation. We have been working on this around the world for almost **4 years** and I am thrilled to share it with you all soon.

Screenshot of 'SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models.'
SHADES is in multiple grey colors (shades).

February 10, 2025 at 8:28 AM

Dan Saattrup Smart

@saattrupdan.com

Some new evaluation results from the European evaluation benchmark ScandEval! This time of the new o3-mini model by OpenAI - how well does it compare to the existing gpt-4o model on English tasks?

(1/4)

#nlp #evaluation #reasoning #llm #o3

February 10, 2025 at 4:33 PM

Dan Saattrup Smart

@saattrupdan.com

Recently, we got a lot of new ScandEval evaluations of large LLMs, including the 405B Llama-3.1 model. So how well does it perform?

A 🧵 (1/n)

#llm #evaluation

January 20, 2025 at 2:01 PM

Reposted by Dan Saattrup Smart

Daniel van Strien

@danielvanstrien.bsky.social

Introducing Scandi-fine-web-cleaner, a decoder model trained to remove low-quality web from FineWeb 2 for Danish and Swedish

- Uses FineWeb-c community annotations
- 90%+ precision + minimal compute required
- Enables efficient filtering of 43M+ documents

huggingface.co/davanstrien/...

The image shows an illustration titled "Hygge Web Data" featuring three cartoon animals - a fox, an owl, and what appears to be a bear or similar animal - sitting at a table or surface reviewing various documents and papers. The style is cute and whimsical, with the animals drawn in a simple, friendly manner. Each animal is looking at different papers with sketched symbols, text, and designs on them. The illustration has a gentle, cozy feel to it, fitting with the "hygge" (Danish concept of coziness and comfort) mentioned in the title.

January 13, 2025 at 3:48 PM

Reposted by Dan Saattrup Smart

IT-Universitetet i København

@itu.dk

Brugerdrevet faktatjek kan betyde, at minoriteters interesser bliver overset, advarer ITU-lektor @lrossi.bsky.social.

Påstande om fx grønlandske forhold risikerer at undslippe faktatjek, simpelthen fordi der er få grønlandske brugere i forhold til andre grupper.
www.berlingske.dk/kultur/faceb...

Facebook i kovending: Forvent flere vilde opslag – og forvent at blive dummere, advarer ekspert

Læs mere her.

www.berlingske.dk

January 9, 2025 at 1:12 PM

Dan Saattrup Smart

@saattrupdan.com

#dkai

Iris van Rooij 💭 @irisvanrooij.bsky.social · Dec 24

📣 Vacancy for Assistant Professor of Cognitive Science at Department of Linguistics, Cognitive Science and Semiotics, Aarhus University, Denmark. (Deadline January 6)

international.au.dk/about/profil...

Assistant Professor of Cognitive Science at the School of Communication and Culture - Vacancy at Aarhus University

Vacancy at School of Communication and Culture - Linguistics, Cognitive Science and Semiotics, Dept. of, Aarhus University

international.au.dk

December 28, 2024 at 1:14 PM

Reposted by Dan Saattrup Smart

European Commission

@ec.europa.eu

It’s time for THE charger.

Today, the USB-C becomes officially the common standard for charging new mobile electronic devices in the EU.

It means better-charging technology, reduced e-waste, and less fuss to find the chargers you need!

#DigitalEU

A minimalist illustration showing a packaged charger box labeled "one Union one Charger." The box features an image of a blue charger with the European Union flag symbol and a USB-C cable. The scene is set within a holiday theme, with decorative Christmas trees, ornaments, and gift boxes surrounding the charger box. In the top right corner, there is a small EU flag symbol.

December 28, 2024 at 7:09 AM

Reposted by Dan Saattrup Smart

Ketan Joshi

@ketanjoshi.co

"Each task consumed approximately 1,785 kWh of energy—about the same amount of electricity an average U.S. household uses in two months"

This is one per-task estimate from Salesforce's head of sustainability -->>

www.linkedin.com/posts/bgamaz...

OpenAl03 (high compute tuned) 1 task = 684 kg CO₂e R Emissions = 5 full tanks of gas

December 28, 2024 at 8:45 AM

Reposted by Dan Saattrup Smart

Dusty Pomerleau

@dpom.bsky.social

I'm so impressed with the markview #Neovim plugin. Look at the preview you get out of the box:

github.com/OXY2DEV/mark...

A markdown preview within Neovim, showing syntax-highlighted code blocks, including gutter icons for each filetype, and custom rendering of headers, with unique colors for each level and a replacement of the hash syntax (###) with custom icons.

December 18, 2024 at 10:49 PM

Reposted by Dan Saattrup Smart

Sung Kim

@sungkim.bsky.social

TII UAE's Falcon 3

1B, 3B, 7B, 10B (Base + Instruct) & 7B Mamba, trained on 14 trillion tokens!

- 1B-Base surpasses SmolLM2-1.7B and matches gemma-2-2b
- 3B-Base outperforms larger models like Llama-3.1-8B and Minitron-4B-Base
- 7B-Base is on par with Qwen2.5-7B in the under-9B category

December 17, 2024 at 3:07 PM

Reposted by Dan Saattrup Smart

Rasmus Aagaard

@rasgaard.com

40,7% med hjælp fra 15 annotators! 🇩🇰😎🔥

Vi er kommet langt men ikke helt i mål endnu :) Det drejer sig virkelig ikke om mange annoteringer efterhånden.

Drømmer lidt om at vi kan få en lille slutspurt i løbet af ugen! Hjælp til her: data-is-better-together-fineweb-c.hf.space/dataset/5a58...

December 16, 2024 at 8:43 AM

Dan Saattrup Smart

@saattrupdan.com

Loving this Neovim plugin ❄️

Source: github.com/marcussimons...

December 13, 2024 at 5:32 PM

Reposted by Dan Saattrup Smart

Rasmus Aagaard

@rasgaard.com

Dansk er gået fra 0.1% -> 12.3% i dag! Det svarer til at 123 tekster er annoteret af 3 personer.

Enhver annotering hjælper os med det første mål på 1000 tekster :)

Hjælp med til at annotere datasættet her: data-is-better-together-fineweb-c.hf.space/dataset/5a58... #dkai

December 12, 2024 at 11:10 AM

Reposted by Dan Saattrup Smart

Rasmus Aagaard

@rasgaard.com

Vil du hjælpe med at forbedre kvaliteten af danske sprogmodeller?

Vær med til at hjælpe i annoteringssprintet! Det kræver ingen erfaring - bare gå ind på linket og begynd med annotering:)

huggingface.co/spaces/data-... #dkai #dktech

Længere opslag på LinkedIn: www.linkedin.com/posts/rasgaa...

December 10, 2024 at 12:11 PM

Reposted by Dan Saattrup Smart

Svenska Konton

@svenskakonton.bsky.social

Danmark Starter Pack för dig i Malmö Öresundsregionen eller bara intresserad av Danmark och danskar.

Nyheter, tidningar, media, politik, organisationer...

#danmark #danskar #köpenhamn #öresund #malmö #skåne #nyheter #tidningar #media #politik #starterpack

go.bsky.app/U2VkkfU

December 3, 2024 at 7:11 AM

Reposted by Dan Saattrup Smart

Guilherme Penedo

@guilherme.hf.co

Announcing 🥂 FineWeb2: A sparkling update with 1000s of 🗣️languages.

We applied the same data-driven approach that led to SOTA English performance in🍷 FineWeb to thousands of languages.

🥂 FineWeb2 has 8TB of compressed text data and outperforms other datasets.

December 8, 2024 at 9:19 AM

Dan Saattrup Smart

@saattrupdan.com

*** New ScandEval evaluation ***

EuroLLM is a new series of European models, trained from scratch! They released both base and instruct models.

The base models can be used commercially, but the instruction models can't be, due to use of OpenAI outputs.

But how do they perform?

#nlp #evaluation

December 6, 2024 at 1:11 PM

Reposted by Dan Saattrup Smart

Sara Hooker

@sarahooker.bsky.social

Is MMLU Western-centric? 🤔

As part of a massive cross-institutional collaboration:
🗽Find MMLU is heavily overfit to western culture
🔍 Professional annotation of cultural sensitivity data
🌍 Release improved Global-MMLU 42 languages

📜 Paper: arxiv.org/pdf/2412.03304
📂 Data: hf.co/datasets/Coh...

December 5, 2024 at 4:31 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news