
There’s something uniquely beautiful about old books. The smell of weathered paper, the texture of the pages, and the stories that have survived generations. But if you’ve ever tried opening a piece of Classical Korean literature—like the Joseon Dynasty novel HongGildongJeon (홍길동전)—you’ll quickly realize that time leaves its own mark on language.
Between the lack of word spacing and obsolete letters like the dot vowel Arae-a (ㆍ) or the soft Yeorin-hieut (ㆆ), reading it feels less like browsing a novel and more like solving a beautiful, ancient puzzle. Even for native speakers, the linguistic gap is massive.
So, that's why I decided to creat this tutorial, a digital bridge between the past and the present. Using Gemma 4 E2B (IT), I set out to create a humble translator that turns Classical Korean into smooth, modern Korean.
To keep things manageable, I ran this on a single NVIDIA T4 GPU (16GB) using Google Colab.
First, we pull in our favorite open-source tools: Hugging Face’s transformers, trl for the training loop, and peft so we can use LoRA (Low-Rank Adaptation) to fine-tune our model without needing a massive server cluster.
For our data, I used a public domain version of HongGildongJeon, paired with a beautiful modern translation by 직지프로 (licensed under Creative Commons).
To make Gemma feel at home, I structured the data into a conversation, guiding the model with a clear system prompt:
[
{"role": "system", "content": "Translate Classical Korean into Modern Korean."},
{"role": "user", "content": "됴션국셰둉ᄃᆡ왕즉위십오연의홍희문밧긔ᄒᆞᆫᄌᆡ상이잇스되"},
{"role": "assistant", "content": "조선국 세종대왕 즉위 십오년에 홍회문 밖에 한 재상이 있으되,"}
]
(Translation note: This line introduces us to a prime minister living just outside the Honghoemun Gate during the 15th year of King Sejong's reign!)
Before giving Gemma any specific training, I ran a quick baseline test. Base models are smart, but archaic grammar is a highly specific domain. Without tuning, Gemma tried its best but ended up giving long, overly literal explanations:
(Translation note: This line actually means - Upon hearing this, Mr. Baek was deeply impressed and said, "He does not hide his true nature; he is a true man!" and comforted him again and again.)
The base model was clearly lost in time. It needed a map.
To train the model efficiently, I used a Parameter-Efficient Fine-Tuning (PEFT) setup with LoRA.
from peft import LoraConfig
peft_config = LoraConfig(
lora_alpha=16,
lora_dropout=0.05,
r=16,
bias="none",
target_modules="all-linear",
task_type="CAUSAL_LM",
)
The Secret Sauce: collate_fn
When fine-tuning a chat model to behave like a specific tool, you don't want it to waste energy learning how to re-write your prompt. By using a custom data collator, I masked the system and user inputs (setting their labels to -100), forcing Gemma's loss calculation to focus strictly on generating the correct modern assistant response.
After setting our hyper-parameters to gently cruise through 5 epochs with a learning rate of 2e-5, I hit train.
After a bit of patience and letting the trainer do its magic, the results were incredibly rewarding. The character-by-character similarity score jumped all the way up to a brilliant 79.93%!
Look at how it handles the text now:
Technology often pushes us relentlessly into the future, but my favorite tech projects are the ones that allow us to look backward with greater clarity. By spending a little time fine-tuning a lightweight model like Gemma 4, we can build tools that preserve cultural history, making ancient wisdom and classic stories accessible to anyone with a laptop.
Next time you find a piece of history that feels just a bit too out of reach, remember that a small dataset and a fine-tuning session might be all you need to bring it into the light.
Here's the structured workflow when you do a fine-tuning for your own domain:
👉 Check out this tutorial in Gemma Cookbook
👉 Star the repository to support us