Institute of Formal and Applied Linguistics
banner
ufal.mff.cuni.cz
Institute of Formal and Applied Linguistics
@ufal.mff.cuni.cz
Computational linguistics • Natural language processing • Formal linguistics • Machine translation | at Faculty of Mathematics and Physics, Charles University
🎉 KORPUS TŘICETILETÝ: Nová publikace oddělení Českého Národního korpusu FF UK u příležitosti 30. výročí ČNK 🎉

🔗 www.nln.cz/knihy/korpus...

ÚFAL se na knize podílel nejen editorsky, ale i třemi klíčovými příspěvky.

#NLP hashtag#Lingvistika hashtag#UFAL hashtag#PDT30 hashtag#CorpusLinguistics
February 3, 2026 at 4:49 PM
Ondrej Bojar on prompt dilution and the role of red team in deep learning models, processes essential for understanding why AI models sometimes bypass safety guardrails, and how to mprove their robustness.

Watch here (in Czech):
🔗 www.ceskatelevize.cz/porady/10969...
January 22, 2026 at 4:45 PM
🍻 We asked around the office what "UFAL" means to them. A mix of academic rigor, hard work, family atmosphere, a great cup of coffee, or a unique team.

Whether we are discussing NLP over beer or collaborating on a multi-generational project, the spirit of UFAL is all about community. Happy 2026! 👇
January 7, 2026 at 1:02 PM
Kristýna Onderková reports a successful poster presentation on Table-to-Text Generation Evaluation (abstract: openreview.net/forum?id=CbD...) AT EurIPS 2025 WORKSHOP "AI for Tabular Data". Her co-authors O.Plátek, Z.Kasner and O.Dusek share the success, but did not taste the special EurIPA beer!
December 9, 2025 at 1:41 PM
Na workshopu
Infoveillance - prevence proti infodemiím budou představeny výsledky nejnovějšího sociologického šetření o používání sociálních médií a kvalitě informací v české populaci. Dozvíte se i novinky o využití technologií pro detekci anomálií
v prostředí síťových digitálních médií.
December 3, 2025 at 1:18 PM
🌍 Towards Adding Arabic to CorefUD
Dima Taji and Dan Zeman
aclanthology.org/2025.crac-1.6
Expanding the CorefUD universal coreference dataset to Arabic - taking important steps toward truly multilingual coreference resolution resources and better Arabic NLP.
November 11, 2025 at 2:37 PM
EMNLP 2025 is over... and Milan Straka is bringing home an award! 🏆
CorPipe triumphed in the prestigious CRAC25 Shared Task, focusing on multilingual coreference resolution.

Did Milan just CRACk it? We certainly think so! 😉

🔗 Find out more at arxiv.org/abs/2509.17858

#EMNLP2025 #CorPipe #CRAC25
November 11, 2025 at 1:49 PM
The EU's 🇪🇺 HPLT project, coordinated by @ufal.mff.cuni.cz is at #EMNLP2025! It has supported it as a silver sponsor, disseminating HPLT results from our booth and through several papers. We'll continue to shape the future of multilingual datasets and models here and in @openeurollm.bsky.social!
November 7, 2025 at 9:03 PM
📚 SRS-Stories: Vocabulary-constrained multilingual story generation for language learning
Wiktor Kamzela, Mateusz Lango & @toonietuesday.bsky.social
aclanthology.org/2025.emnlp-i...
LLM stories teach vocab while reviewing learned words via Spaced Repetition-more grammatical than standard generation
November 7, 2025 at 8:54 PM
Excited to share our work at #EMNLP2025! Our team is presenting 12 papers across the main conference and workshops, covering multilingual NLG, LLM agents, coreference resolution, and machine translation.
A thread with highlights 🧵👇
November 7, 2025 at 8:54 PM
Zveme na dnešní přednášku Jazykovědného sdružení, kterou od 17:30 přednese prof. PhDr. Eva Hajičová, DrSc.

🔗 Můžete přijít osobně nebo sledovat na zoomu: lnkd.in/eQeST-uG

Téma přednášky: Aktuální členění v době paralelních korpusů

📸 Foto: Vladimír Šigut, UK
October 23, 2025 at 9:02 AM
🚀 PROJECT LAUNCH: Infoveillance is Live! Our AI tool monitors digital media to detect misinformation and enhance public trust/literacy. Fighting infodemics & polarization.

[https://ufal.mff.cuni.cz/grants/infoveillance]
#Infoveillance #AI #Misinformation #PublicTrust #UFAL
October 2, 2025 at 11:46 AM
Šest kolegů vedlo pro DGT (Evropské ředitelství pro překlady) třídenní letní školu v Lucemburku. Učili 40+ pracovníků DGT nejnovější metody strojové podpory překladu a zajištění kvality. Cíl? Zefektivnit překlad legislativy EU do všech členských jazyků!
#DGT #UFAL #StrojovyPreklad #AI #EUTools
September 29, 2025 at 9:31 AM
@Dan Zeman has been invited as a keynote speaker at the ICLC 11 conference! iclc11.ff.cuni.cz/keynote-spea...

#UFAL #ICLC11 #UniversalDependencies #CharlesUniversity #Prague
September 24, 2025 at 7:48 AM
Nahlédněte na kick‐off meeting projektu ✨HumanAId: AI zaměřená na člověka pro udržitelnou a adaptabilní společnost✨.

Projekt se silnou účastí: vede ho FFUK ve spolupráci s MFF UK, FSV UK, PF UP v Olomouci, FÚ AV ČR, prg.ai a Kampusem Hybernská.

#prgAI #HumanAId #OPJAK
1/2
September 23, 2025 at 2:58 PM
And another successfully defended thesis: 👉Dr.👈 Kira Droganova defended her thesis: Dependency Parsing beyond Simple Trees, which focused on enriching syntactic parsing with deeper semantic layers to better capture meaning across languages. Congratulations 🥳
September 23, 2025 at 11:07 AM
🎉 Congratulations to 👉Dr.👈 Tomáš Musil on successfully defending his PhD thesis! 🍻 His talk explored #LLMs, theories of meaning, and their role in LLM #interpretability, highlighting unsupervised discovery of binary semantic features via ICA and the word intruder test.
September 22, 2025 at 10:09 AM
Workshop "Regulace, AI a advokacie – zákulisí legislativy a advokátních inovací" představil OpenEuroLLM jako naději pro evropskou digitální suverenitu a nutnost pro konkurenceschopnost Evropy. Jan Hajič zdůraznil, že Česko se snaží o snižování byrokracie v oblasti AI.

#AI #AIregulation #FutureOfLaw
September 19, 2025 at 2:45 PM
Researchers' Night with @informatfyz.cuni.cz!
You can come to a live podcast recording and try out a real-time automatic interpreting system ELITR. The event is on September 26th.

🔗 czechia.representation.ec.europa.eu/evropsky-den...

#ELITR #AI #Interpreting #MachineTranslation #LanguageTech
September 18, 2025 at 10:19 AM
Gold Data and Multiple Understanding of Discourse Relations
by Š. Zikánová, A. Nedoluzhko, J. Mírovský & E. Hajičová
TL;DR: Investigate how annotators interpret discourse relations differently, revealing important insights about subjectivity in linguistic annotation and its impact on NLP systems.
September 1, 2025 at 2:29 PM
Morphological Segmentation with Neural Networks: Performance Effects of Architecture, Data Size, and Cross-Lingual Transfer in Seven Languages
by M. Olbrich & Z. Zabokrtsky
TL;DR: Analyzed neural architectures, data size, and cross-lingual transfer for morphological segmentation for 7 languages.
September 1, 2025 at 2:29 PM
Flexing in 73 Languages: A Single Small Model for Multilingual Inflection
by Tomáš Sourada & Jana Straková
TL;DR: Compact neural model successfully handles morphological inflection across 73 diverse languages, proving that small can be mighty in multilingual NLP.
September 1, 2025 at 2:29 PM
Refining Czech GEC: Insights from a Multi-Experiment Approach
by P. Pechman, @straka-milan.bsky.social , @janastrakova.bsky.social , J. Náplava
TL;DR: Better Czech grammatical error correction systems + insights for better automated writing assistance in Czech arxiv.org/abs/2506.22402
September 1, 2025 at 2:29 PM
Investigating the Effect of Parallel Data in the Cross-Lingual Transfer for Vision-Language Encoders
by @andrei-a-manea.bsky.social & @jlibovicky.bsky.social
TL;DR: Explore how parallel datasets improve cross-lingual transfer in vision-language models. arxiv.org/abs/2504.21681
September 1, 2025 at 2:29 PM
ParCzech 3.0: A Large Czech Speech Corpus with Rich Metadata
by M. Kopp, V. Stankov, J. O. Krůza, . Straňák & . Bojar
TL;DR: Czech parliamentary speeches from 2013-2021 with rich metadata incl. speaker identities, political affiliations, and automatic linguistic annotations in TEI format.
September 1, 2025 at 2:29 PM