Lightnews — Scholar-powered news

Enzo Doyen

@edoyen.com

It should be said that LLMs also generally have on-par performance with traditional NMT engines (see arxiv.org/html/2401.05... or aclanthology.org/2024.wmt-1.1...); but apart from that, I guess the whole "novelty" thing makes it a preferred choice for people wanting to implement machine l10n.

July 15, 2025 at 7:07 PM

Enzo Doyen

@edoyen.com

Compared to traditional NMT engines, LLMs do have this advantage of easily allowing to provide requirements for the translation (in terms of style, keywords; see aclanthology.org/2023.wmt-1.8... or arxiv.org/abs/2301.13294); even though I highly doubt it's widely used for machine l10n.

July 15, 2025 at 7:05 PM

Enzo Doyen

@edoyen.com

hmm that's nice, but does ACL allow to change style files like that?

May 29, 2025 at 12:51 PM

Enzo Doyen

@edoyen.com

I never said that you were against benchmarking; rather that, in my opinion, such datasets can be used as a starting point to theoretically define the "default behaviors" of LLMs insofar as they reflect what we generally expect from them on a diverse range of tasks.

March 13, 2025 at 3:34 PM

Enzo Doyen

@edoyen.com

To my knowledge, there is no research on the topic, but I intuitively believe that generic prompts are much more prevalent than one may first think. While many do, I don't think *most* people actually use already created prompt templates or necessarily have the time to describe their task at length.

March 12, 2025 at 9:10 PM

Enzo Doyen

@edoyen.com

I think that makes sense to draw on these benchmarks for research on LLM behaviors given they're the standard in evaluating LLMs.

So the "golden" default behavior for each task could theoretically be found in standard LLM benchmarking datasets (and same for "generic prompts").

March 12, 2025 at 9:10 PM

Enzo Doyen

@edoyen.com

Actually, I think we should talk about default behaviors (plural) where each default behavior is task-dependent. Main tasks can be determined from commonly used LLM benchmarks (that is, commonsense reasoning w/ ARC; language understanding/question-answer w/ OpenBookQA…).

March 12, 2025 at 9:10 PM

Enzo Doyen

@edoyen.com

vastai is the cheapest and the most reliable that I know

February 12, 2025 at 11:34 AM

Enzo Doyen

@edoyen.com

aaah! Well that's definitely an interesting question. Very curious to know the answer too lol. Theoretically I guess it's possible but the performance may not be very good

February 1, 2025 at 12:23 PM

Enzo Doyen

@edoyen.com

It can: github.com/ading2210/do...

GitHub - ading2210/doompdf: A port of Doom (1993) that runs inside a PDF file

A port of Doom (1993) that runs inside a PDF file. Contribute to ading2210/doompdf development by creating an account on GitHub.

github.com

February 1, 2025 at 12:09 PM

Enzo Doyen

@edoyen.com

Is this even feasible or desirable? (I think it is.) And where to draw the line between inherently inappropriate content and disputed (but sound) content when doing pre-training filtering?

January 28, 2025 at 11:36 PM

Enzo Doyen

@edoyen.com

This is obviously not specific to China — DeepSeek shows an example of it, but it could apply to any other country — and not even to diplomatic topics in general. The larger questions (and perhaps debate) are: How to best promote the development of globally fair and accurate models?

January 28, 2025 at 11:36 PM

Enzo Doyen

@edoyen.com

"Open-source" generally implies more than just giving access to the code, though. Can an LLM really be called "open" if it purposely refuses to answer historical questions that may go against a certain political power's narrative? Or if it promotes the One China principle with propaganda?

January 28, 2025 at 11:36 PM

Enzo Doyen

@edoyen.com

DeepSeek is incredible evidence that the number of local, open-source LLMs will keep growing and that these models can achieve similar performance similar to proprietary models.

January 28, 2025 at 11:36 PM

Enzo Doyen

@edoyen.com

Is this even feasible or desirable? (I think it is.) And where to draw the line between inherently inappropriate content and disputed (but sound) content when doing pre-training filtering?

January 28, 2025 at 11:33 PM

Enzo Doyen

@edoyen.com

This is obviously not specific to China — DeepSeek shows an example of it, but it could apply to any other country — and not even to diplomatic topics in general. The larger questions (and perhaps debate) are: How to best promote the development of globally fair and accurate models?

January 28, 2025 at 11:33 PM

Enzo Doyen

@edoyen.com

"Open-source" generally implies more than just giving access to the code, though. Can an LLM really be called "open" if it purposely refuses to answer historical questions that may go against a certain political power's narrative? Or if it promotes the One China principle with propaganda?

January 28, 2025 at 11:33 PM

Enzo Doyen

@edoyen.com

DeepSeek is incredible evidence that the number of local, open-source LLMs will keep growing and that these models can achieve similar performance similar to proprietary models.

January 28, 2025 at 11:33 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news