#BackTranslation
데이터 증강 완벽 가이드! 과적합 방지와 모델 성능 향상의 핵심 기법. 이미지 증강: Mixup vs CutMix vs AugMix 비교, AutoAugment vs RandAugment 성능/비용. NLP 증강: Back Translation, EDA 4가지 연산, LLM 활용 최신 기법. GAN 기반 증강으로 민감도 10% 향상!

#AugMix #AutoAugment #BackTranslation #CutMix #Cutout #DataAugmentation #EDA
doyouknow.kr/631/data-aug...
데이터 증강 완벽 가이드: AI 학습 데이터가 부족할 때의 마법! Mixup, CutMix, AutoAugment 총정리
데이터 증강 완벽 가이드! 과적합 방지와 모델 성능 향상의 핵심 기법. 이미지 증강: Mixup vs CutMix vs AugMix 비교, AutoAugment vs RandAugment 성능/비용. NLP 증강: Back Translation, EDA 4가지 연산, LLM 활용 최신 기법. GAN 기반 증강으로 민감도 10% 향상!
doyouknow.kr
December 5, 2025 at 12:26 PM
Felipe Fujita, Hideyuki Takada
Exploring Parameter-Efficient Fine-Tuning and Backtranslation for the WMT 25 General Translation Task
https://arxiv.org/abs/2511.12109
November 18, 2025 at 11:36 AM
Felipe Fujita, Hideyuki Takada: Exploring Parameter-Efficient Fine-Tuning and Backtranslation for the WMT 25 General Translation Task https://arxiv.org/abs/2511.12109 https://arxiv.org/pdf/2511.12109 https://arxiv.org/html/2511.12109
November 18, 2025 at 6:30 AM
Basically because in practice synthetic data is made through methods like backtranslation that are lossless or it's selected through various quality filters and classifiers to improve faster than the random sampling degrades it.
minihf.com/posts/2024-0...
The RetroInstruct Guide To Synthetic Text Data
minihf.com
November 7, 2025 at 6:19 PM
{\L}ukasz Radli\'nski, Mateusz Gu\'sciora, Jan Koco\'n
Backtranslation and paraphrasing in the LLM era? Comparing data augmentation methods for emotion classification
https://arxiv.org/abs/2507.14590
July 22, 2025 at 10:05 AM
{\L}ukasz Radli\'nski, Mateusz Gu\'sciora, Jan Koco\'n: Backtranslation and paraphrasing in the LLM era? Comparing data augmentation methods for emotion classification https://arxiv.org/abs/2507.14590 https://arxiv.org/pdf/2507.14590 https://arxiv.org/html/2507.14590
July 22, 2025 at 6:30 AM
Arwa Arif
The Saturation Point of Backtranslation in High Quality Low Resource English Gujarati Machine Translation
https://arxiv.org/abs/2506.21566
June 30, 2025 at 8:53 AM
Arwa Arif: The Saturation Point of Backtranslation in High Quality Low Resource English Gujarati Machine Translation https://arxiv.org/abs/2506.21566 https://arxiv.org/pdf/2506.21566 https://arxiv.org/html/2506.21566
June 30, 2025 at 6:30 AM
feedback that directly give users an assessment of translation quality using 1) error highlights and 2) LLM explanations, and implicit feedback that helps users compare MT inputs and outputs through 3) backtranslation and 4) question-answer (QA) [3/5 of https://arxiv.org/abs/2505.24683v1]
June 2, 2025 at 6:19 AM
Backtranslation (semi-supervised) - to translate Nepali into English. Using the former, we are able to achieve a devtest Set SacreBLEU score of 14.2, which improves the baseline fully supervised score reported by (Guzman et al., 2019) by 6.6 points. [3/4 of https://arxiv.org/abs/2505.14553v1]
May 21, 2025 at 6:16 AM
Ah good. It’s topical but: backtranslation techniques could actually help here.
May 10, 2025 at 8:46 PM
languages. Utilizing the Bharat Parallel Corpus Collection (BPCC) as the primary dataset, the model incorporates iterative backtranslation to generate synthetic parallel data, effectively augmenting the training dataset and enhancing the model's [2/5 of https://arxiv.org/abs/2504.05914v1]
April 9, 2025 at 6:00 AM
Par ailleurs, la backtranslation n'est jamais un bon mécanisme pour juger de la qualité d'une traduction. Source : mon baccalauréat et mes années d'expérience en traduction.
April 3, 2025 at 8:07 PM
For day to day shorter things EN->DE, I mostly went the other direction - Google Translate first, languagetools to fix the occasional error, then DeepL for backtranslation as that consistently has captured meaning better in texts where I know the subject well.
March 12, 2025 at 1:53 AM
It is true that backtranslation could mask errors in an original translation. That said, your specific example seems like it rather indicates that Google Translate here succeeds in capturing intent when going from German to English, so in and of itself would support using it in that direction.
March 12, 2025 at 1:53 AM
like gender-targeted hate speech is understudied because of class imbalance issues. This paper addresses this gap by comparing three data augmentation techniques for Indonesian gender-based hate speech detection. We evaluate backtranslation, [2/6 of https://arxiv.org/abs/2503.04279v1]
March 7, 2025 at 6:33 AM
RetroInstruct and weave-agent looking very good in this context.

1. blocktype: evaluation
2. MCTS(?)
3. "Break this problem into parts."
4. RetroInstruct being named after backtranslation, though the cognitive operation isn't actually taught in the dataset yet(?)
2/13 We identify 4 key cognitive behaviors that enable successful learning: Verification (checking work), Backtracking (trying new approaches), Subgoal Setting (breaking problems down) & Backward Chaining (working backwards from a goal). Qwen naturally exhibits these, while Llama mostly lacks them.
March 4, 2025 at 6:41 PM
backtranslation, that our approaches surpass the performance of fine-tuning with extensive multilingual datasets such as MMA on ProofNet with only 1/150th of the tokens. Taken together, our methods show a promising new [7/8 of https://arxiv.org/abs/2502.15795v1]
February 25, 2025 at 5:53 AM
distilled (offline) backtranslation with few-shot amplification, and (3) line-by-line proof analysis integrated with proof state information. Each variant is designed to optimize data quality over quantity, focusing on the [4/8 of https://arxiv.org/abs/2502.15795v1]
February 25, 2025 at 5:53 AM
capabilities of language models, particularly addressing the challenge posed by the scarcity of labeled data. Specifically, we evaluate three primary variations of this strategy: (1) on-the-fly (online) backtranslation, (2) [3/8 of https://arxiv.org/abs/2502.15795v1]
February 25, 2025 at 5:53 AM
Achei graça da backtranslation, e isso já me prepara para quando alguém reclamar que o termo é ruim porque não significa "fiend" hahaha.
December 10, 2024 at 3:37 PM
Turns out that you can apply a backtranslation-like technique to improve reasoning in LLMs:
x.com/cyjustinchen...
December 2, 2024 at 8:13 PM
I’ll temporarily break my usual Klingon-only rule to provide an overly literal backtranslation for the benefit of anyone outside of my usual Klingon-speaking audience who might see this. It’s only the first verse which I did just for fun, if there’s any serious interest I can do the other verses:
November 25, 2024 at 3:59 PM
This is a mystery. Consistent pastoral theme but they did Heinäpellonpuisto (Hayfield park) into Hejnåkersparken, which… means nothing. Hej, nåker!? Would be Höåkersparken or Höängsparken. It is possible, that the explanation is a local backtranslation of the Finnish name, but weird nonetheless.
November 8, 2024 at 9:25 AM
Backtranslation of human RNA biosignatures of tuberculosis disease risk into the preclinical pipeline is condition dependent https://www.biorxiv.org/content/10.1101/2024.06.21.600067v1
Backtranslation of human RNA biosignatures of tuberculosis disease risk into the preclinical pipeline is condition dependent https://www.biorxiv.org/content/10.1101/2024.06.21.600067v1
It is not clear whether human progression to active tuberculosis disease (TB) risk signatures are vi
www.biorxiv.org
June 22, 2024 at 5:16 AM