However LabelStudio has limitations with tutorials/attention checks or when it comes to assigning annotation tasks to lay people (they have to register to annotate I believe?)
However LabelStudio has limitations with tutorials/attention checks or when it comes to assigning annotation tasks to lay people (they have to register to annotate I believe?)
Pearmut was created out of the frustration to set up humeval using existing tools with good defaults.
In the paper we have 5 researchers trying to set up humeval using 5 different platforms and reporting on time and ease of use and customizability.
Pearmut was created out of the frustration to set up humeval using existing tools with good defaults.
In the paper we have 5 researchers trying to set up humeval using 5 different platforms and reporting on time and ease of use and customizability.
Get in touch if you'd like to help with human evaluation for your paper/work! 🖐️
Get in touch if you'd like to help with human evaluation for your paper/work! 🖐️
Pearmut is open-source and extensible with many exciting features coming. 🍏
github.com/zouharvi/pea...
Pearmut is open-source and extensible with many exciting features coming. 🍏
github.com/zouharvi/pea...
pip install pearmut
# Download example campaign
wget raw.githubusercontent.com/zouharvi/pea...
# Load and start
pearmut add esa.json
pearmut run
pip install pearmut
# Download example campaign
wget raw.githubusercontent.com/zouharvi/pea...
# Load and start
pearmut add esa.json
pearmut run
- direct assessment (with custom sliders),
- ESA, MQM,
- contrastive evaluation, video/audio/image, attention checks, tutorials, statistically sound model comparison, etc.
- direct assessment (with custom sliders),
- ESA, MQM,
- contrastive evaluation, video/audio/image, attention checks, tutorials, statistically sound model comparison, etc.
Introducing Pearmut, Human Evaluation of Translation Made Trivial🍐 arxiv.org/pdf/2601.02933
Introducing Pearmut, Human Evaluation of Translation Made Trivial🍐 arxiv.org/pdf/2601.02933
iwslt.org/2026/metrics
Effort lead by @maikezufle.bsky.social, @marinecarpuat.bsky.social, @hjhan.bsky.social, @matteo-negri.bsky.social, and others. 🙂
iwslt.org/2026/metrics
Effort lead by @maikezufle.bsky.social, @marinecarpuat.bsky.social, @hjhan.bsky.social, @matteo-negri.bsky.social, and others. 🙂
At IWSLT 2026, we’re launching the first-ever ✨Speech Translation Metrics Shared Task ✨!
At IWSLT 2026, we’re launching the first-ever ✨Speech Translation Metrics Shared Task ✨!
multilingual-multicultural-evaluation.github.io
multilingual-multicultural-evaluation.github.io
For those curious to know more about my thesis:
- Web-optimized version: gsarti.com/phd-thesis/
- PDF: research.rug.nl/en/publicati...
- Steal my Quarto template: github.com/gsarti/phd-t...
For those curious to know more about my thesis:
- Web-optimized version: gsarti.com/phd-thesis/
- PDF: research.rug.nl/en/publicati...
- Steal my Quarto template: github.com/gsarti/phd-t...
Share it at the MME workshop 🕵️ co-located at EACL.
Direct submission deadline in 10 days (December 19th)!
multilingual-multicultural-evaluation.github.io
Share it at the MME workshop 🕵️ co-located at EACL.
Direct submission deadline in 10 days (December 19th)!
multilingual-multicultural-evaluation.github.io
- Working on Typst template. DM'd.
github.com/acl-org/acl-...
- Working on Typst template. DM'd.
github.com/acl-org/acl-...
ctan.math.illinois.edu/macros/latex...
ctan.math.illinois.edu/macros/latex...
Yes you got 67 BLEU points but is the resulting hair slaying? 💇
See the result on one datapoint (my head) at EMNLP.
Yes you got 67 BLEU points but is the resulting hair slaying? 💇
See the result on one datapoint (my head) at EMNLP.
- Estimating Machine Translation Difficulty arxiv.org/abs/2508.10175
- COMET-poly: Machine Translation Metric Grounded in Other Candidates arxiv.org/abs/2508.18549
- Estimating Machine Translation Difficulty arxiv.org/abs/2508.10175
- COMET-poly: Machine Translation Metric Grounded in Other Candidates arxiv.org/abs/2508.18549
- Efficient evaluation (Nov 5, 16:30, poster session 3)
- MT difficulty (Nov 7, 12:30, findings 3)
- COMET-poly (Nov 8, 11:00, WMT)
(DM to meet 🌿 )
- Efficient evaluation (Nov 5, 16:30, poster session 3)
- MT difficulty (Nov 7, 12:30, findings 3)
- COMET-poly (Nov 8, 11:00, WMT)
(DM to meet 🌿 )
You should apply to these fellowships, even if it's for the exercise of periodically refining your research statement.
You should apply to these fellowships, even if it's for the exercise of periodically refining your research statement.
I am not secretive about having applied to 4 similar fellowships during my PhD before and not succeeding. Still, refining my research statement (part of the application) helped me tremendously in finding out the...
inf.ethz.ch/news-and-eve...
I am not secretive about having applied to 4 similar fellowships during my PhD before and not succeeding. Still, refining my research statement (part of the application) helped me tremendously in finding out the...
inf.ethz.ch/news-and-eve...