Det kan blive dyrt for samfundet, advarer @forperson.ida.dk.
På ITU må vi hvert år afvise mange ansøgere i døren pga. politisk bestemte rammer... 🤷
#uddpol #dkpol
www.berlingske.dk/virksomheder...
bsky.app/starter-pack...
bsky.app/starter-pack...
Thank you all for joining for a fruitful conference! Safe trip home and see you in Copenhagen or Vilnius in 2027!!
#nlp #nodalida #baltichlt
Thank you all for joining for a fruitful conference! Safe trip home and see you in Copenhagen or Vilnius in 2027!!
#nlp #nodalida #baltichlt
- 96M QA pairs extracted from schema.org/FAQPage annotations
- 75 languages with standardized structured markup
- Leverages existing web publisher content intent
- No synthetic data generation needed
huggingface.co/datasets/PaD...
- 96M QA pairs extracted from schema.org/FAQPage annotations
- 75 languages with standardized structured markup
- Leverages existing web publisher content intent
- No synthetic data generation needed
huggingface.co/datasets/PaD...
www.nodalida-bhlt2025.eu/program
#nodalida #baltichlt #nlp #nlproc
www.nodalida-bhlt2025.eu/program
#nodalida #baltichlt #nlp #nlproc
The first ever resource of multilingual, multicultural, and multigeographical stereotypes, built to support nuanced LLM evaluation and bias mitigation. We have been working on this around the world for almost **4 years** and I am thrilled to share it with you all soon.
The first ever resource of multilingual, multicultural, and multigeographical stereotypes, built to support nuanced LLM evaluation and bias mitigation. We have been working on this around the world for almost **4 years** and I am thrilled to share it with you all soon.
(1/4)
#nlp #evaluation #reasoning #llm #o3
(1/4)
#nlp #evaluation #reasoning #llm #o3
A 🧵 (1/n)
#llm #evaluation
A 🧵 (1/n)
#llm #evaluation
- Uses FineWeb-c community annotations
- 90%+ precision + minimal compute required
- Enables efficient filtering of 43M+ documents
huggingface.co/davanstrien/...
- Uses FineWeb-c community annotations
- 90%+ precision + minimal compute required
- Enables efficient filtering of 43M+ documents
huggingface.co/davanstrien/...
Påstande om fx grønlandske forhold risikerer at undslippe faktatjek, simpelthen fordi der er få grønlandske brugere i forhold til andre grupper.
www.berlingske.dk/kultur/faceb...
Påstande om fx grønlandske forhold risikerer at undslippe faktatjek, simpelthen fordi der er få grønlandske brugere i forhold til andre grupper.
www.berlingske.dk/kultur/faceb...
international.au.dk/about/profil...
Today, the USB-C becomes officially the common standard for charging new mobile electronic devices in the EU.
It means better-charging technology, reduced e-waste, and less fuss to find the chargers you need!
#DigitalEU
Today, the USB-C becomes officially the common standard for charging new mobile electronic devices in the EU.
It means better-charging technology, reduced e-waste, and less fuss to find the chargers you need!
#DigitalEU
This is one per-task estimate from Salesforce's head of sustainability -->>
www.linkedin.com/posts/bgamaz...
This is one per-task estimate from Salesforce's head of sustainability -->>
www.linkedin.com/posts/bgamaz...
github.com/OXY2DEV/mark...
github.com/OXY2DEV/mark...
1B, 3B, 7B, 10B (Base + Instruct) & 7B Mamba, trained on 14 trillion tokens!
- 1B-Base surpasses SmolLM2-1.7B and matches gemma-2-2b
- 3B-Base outperforms larger models like Llama-3.1-8B and Minitron-4B-Base
- 7B-Base is on par with Qwen2.5-7B in the under-9B category
1B, 3B, 7B, 10B (Base + Instruct) & 7B Mamba, trained on 14 trillion tokens!
- 1B-Base surpasses SmolLM2-1.7B and matches gemma-2-2b
- 3B-Base outperforms larger models like Llama-3.1-8B and Minitron-4B-Base
- 7B-Base is on par with Qwen2.5-7B in the under-9B category
Vi er kommet langt men ikke helt i mål endnu :) Det drejer sig virkelig ikke om mange annoteringer efterhånden.
Drømmer lidt om at vi kan få en lille slutspurt i løbet af ugen! Hjælp til her: data-is-better-together-fineweb-c.hf.space/dataset/5a58...
Vi er kommet langt men ikke helt i mål endnu :) Det drejer sig virkelig ikke om mange annoteringer efterhånden.
Drømmer lidt om at vi kan få en lille slutspurt i løbet af ugen! Hjælp til her: data-is-better-together-fineweb-c.hf.space/dataset/5a58...
Enhver annotering hjælper os med det første mål på 1000 tekster :)
Hjælp med til at annotere datasættet her: data-is-better-together-fineweb-c.hf.space/dataset/5a58... #dkai
Enhver annotering hjælper os med det første mål på 1000 tekster :)
Hjælp med til at annotere datasættet her: data-is-better-together-fineweb-c.hf.space/dataset/5a58... #dkai
Vær med til at hjælpe i annoteringssprintet! Det kræver ingen erfaring - bare gå ind på linket og begynd med annotering:)
huggingface.co/spaces/data-... #dkai #dktech
Længere opslag på LinkedIn: www.linkedin.com/posts/rasgaa...
Vær med til at hjælpe i annoteringssprintet! Det kræver ingen erfaring - bare gå ind på linket og begynd med annotering:)
huggingface.co/spaces/data-... #dkai #dktech
Længere opslag på LinkedIn: www.linkedin.com/posts/rasgaa...
Nyheter, tidningar, media, politik, organisationer...
#danmark #danskar #köpenhamn #öresund #malmö #skåne #nyheter #tidningar #media #politik #starterpack
go.bsky.app/U2VkkfU
Nyheter, tidningar, media, politik, organisationer...
#danmark #danskar #köpenhamn #öresund #malmö #skåne #nyheter #tidningar #media #politik #starterpack
go.bsky.app/U2VkkfU
We applied the same data-driven approach that led to SOTA English performance in🍷 FineWeb to thousands of languages.
🥂 FineWeb2 has 8TB of compressed text data and outperforms other datasets.
We applied the same data-driven approach that led to SOTA English performance in🍷 FineWeb to thousands of languages.
🥂 FineWeb2 has 8TB of compressed text data and outperforms other datasets.
EuroLLM is a new series of European models, trained from scratch! They released both base and instruct models.
The base models can be used commercially, but the instruction models can't be, due to use of OpenAI outputs.
But how do they perform?
#nlp #evaluation
EuroLLM is a new series of European models, trained from scratch! They released both base and instruct models.
The base models can be used commercially, but the instruction models can't be, due to use of OpenAI outputs.
But how do they perform?
#nlp #evaluation
As part of a massive cross-institutional collaboration:
🗽Find MMLU is heavily overfit to western culture
🔍 Professional annotation of cultural sensitivity data
🌍 Release improved Global-MMLU 42 languages
📜 Paper: arxiv.org/pdf/2412.03304
📂 Data: hf.co/datasets/Coh...
As part of a massive cross-institutional collaboration:
🗽Find MMLU is heavily overfit to western culture
🔍 Professional annotation of cultural sensitivity data
🌍 Release improved Global-MMLU 42 languages
📜 Paper: arxiv.org/pdf/2412.03304
📂 Data: hf.co/datasets/Coh...