@bramvanroy.bsky.social
500 followers 190 following 47 posts
Posts Media Videos Starter Packs
bramvanroy.bsky.social
Not sure if intentional or not, but it says "backTreasoning" in the dataset viewer screenshot slide.
bramvanroy.bsky.social
Open training code and evaluation code could be added to get to fully open source.
bramvanroy.bsky.social
Something that I've bumped my head against too often in `transformers` is how kwargs are sometimes dealt with. Just popping what's needed from kwargs but as a user it's then unclear if an argument is unexpectedly unused (and no good code editor support). Nice to see changes here!
bramvanroy.bsky.social
Oh, and don't forget to register! ▶️ clin35.ccl.kuleuven.be/registration Hope to see you at CLIN in Leuven!
clin35-2025.bsky.social
🔊 #CLIN35 is three weeks away! The programme has been online for a while yet mystery remained around the keynote speaker but the wait is over! 🥳 Marie-Catherine de Marneffe will present "Consensus is a myth: Human label variation in Natural Language Inference". Info: clin35.ccl.kuleuven.be/programme
CLIN35 - Programme
The full programme with parallel sessions can be found here. The break-down for the poster sessions is here.
clin35.ccl.kuleuven.be
bramvanroy.bsky.social
All great suggestions, thanks!
bramvanroy.bsky.social
I'd like to grow my network more on this site as it is small compared to on that other site. My interests are quite broad so drop some names to follow 👇 around these topics: #machinetranslation #MT, Universal Dependencies, #LLMs in all of its aspects (technical, ethical, data), #Dutch NLP, #OSS!
Reposted
clin35-2025.bsky.social
🥳 Happy to open up the registrations for the CLIN conference! You can find more information here: clin35.ccl.kuleuven.be/registration The website has also been updated with more information for the presenters, with a programme, and with information about the venue. See you soon at #CLIN35!
Reposted
clin35-2025.bsky.social
🥳 We've sent out the notification of acceptance to all abstract submitters! Registration to CLIN will be opened up soon. We're really looking forward to seeing so many of you in Leuven on September 12th!!
bramvanroy.bsky.social
Did you upload the paper? I would not trust these services when they say "they have read something" unless you explicitly provide the material to them.
Reposted
ctt2025.bsky.social
As a wrap-up for this year's CTT edition, we invite you to take a peek at our workshop's draft proceedings, in which you'll find the full papers of our wonderful speakers. The final version will be published ont he ACL Anthology, but in the meantime:
ctt2025.ccl.kuleuven.be/proceedings
CTT 2025 - Proceedings
The draft proceedings are available at this link: https://drive.google.com/file/d/1yCOHAHTpqsyFCL2c5jj7Khy6yyKJ7BbC/view?usp=drive_link Important note: these are just the drafts, provided for your vie...
ctt2025.ccl.kuleuven.be
bramvanroy.bsky.social
Really cool! Going to forward this to some people. Maybe you can add a line on how the "📜 Traditional OCR Output" is generated? Or is that what's in the optional XML?
bramvanroy.bsky.social
I could not be there this year and mostly worked behind the scenes, but I am very happy to see the workshop growing! Hope everyone had a nice time!!
ctt2025.bsky.social
The #CTT2025 workshop ended yesterday after an insightful day into new creative domains, new applications and tools, new projects and new avenues for research and action. Many thanks, once again, to our presenters, and to everyone who participated! Now, on to the Summit itself!
Reposted
ctt2025.bsky.social
#CTT2025 is now live at MT Summit 2025!
Reposted
stellaathena.bsky.social
EleutherAI is trying to build bridges with libraries and other cultural heritage institutions, but I would also love to talk with digital humanities students or researchers about how we can collaborate in mutually beneficial ways.
bramvanroy.bsky.social
Yay, so happy to host CLIN in Leuven this year! It'll take place on September 12th. Abstract submission deadline on June 13th!
clin35-2025.bsky.social
📅 Don't forget! The deadline for submitting your abstract to the #CLIN conference in Leuven is coming: 13th of June! Submitting is easy: name, title of your work, 500-word abstract, done! #nlp #nlproc #compling #llm #ai #dutch clin35.ccl.kuleuven.be
CLIN35
Computational Linguistics in The Netherlands (CLIN) is a yearly conference on computational linguistics. Each year the conference is organized by a different institution in the Dutch-speaking region. ...
clin35.ccl.kuleuven.be
bramvanroy.bsky.social
More work needs to be done! Only 7 out of 100+ crawls have been processed so far. That's encouraging because it means there is a lot more Creative Commons data to be collected! If you have the compute available and wish to collaborate in an open and transparent manner, please get in touch!
bramvanroy.bsky.social
In total, this V1 release contains almost 150 thousand documents and 150 billion tokens. This data was not filtered on quality nor deduplicated so that you can decide for yourself how much data to keep or discard but a dataset field describes whether a doc is included in the FineWeb(-2) datasets.
bramvanroy.bsky.social
In this first version of C5, 8 languages are included (afr, deu, eng, fra, fry, ita, nld, spa). The language set was limited for 2 reasons: computational limitations and a collaboration with GPT-NL, who requested CC data for these languages to train a Dutch-focused, copyright-conscious #LLM.
bramvanroy.bsky.social
To build C5 HTML pages are scrutinized and links to CC licenses are collected. Additional data fields are included such as "was the license found in the HTML `head`?" or "if multiple licenses were found, do they contradict each other?", which makes further filtering a breeze.
Reposted
at4ssl.bsky.social
💡 We extend the submission deadline for AT4SSL! If you were working on a submission but didn't manage to finish it we can now offer a few days more so you can still submit it. The new deadline is ⏰ Thursday 20/3 AOE ⏰. All information at sites.google.com/tilburgunive.... @mtsummit2025.bsky.social
Home
SCOPE The rapid technological and methodological advances in deep learning, and in AI in general, that we see in the last decade, have not only improved MT, recognition of image, video and audio signa...
sites.google.com
bramvanroy.bsky.social
Oooh, the LLM world definitely has A LOT to learn when it comes to evaluation best practices. Looking forward to seeing WMT diligence and years of experience and improvement being applied in this field too.
bramvanroy.bsky.social
Isn't implementing task routes an architectural change over vanilla transformer? Basically task-specific heads, potentially with some MoE in there for routing?
bramvanroy.bsky.social
Agreed. I also think that in the future the tasks (auto-regression vs diffusion vs pure token-level prediction) will somehow merge into an architecture that can do whatever seems more optimal. At least I imagine something has to happen architecturally to counter scaling-ceiling in an efficient way.