polytales 

polytales (name subject to change) is a browser extension that converts your internet content into reading practice in a target language you wish to learn at your current reading level and then tests your comprehension.

Here is what the early v0.2 of polytales (August 2024) looks like:

Screenshot of polytales v0.2 in the untranslated state

Here I'm trying to read an interesting TechCrunch article but I want to make it an opportunity to practice reading A2 French, so that's what I select in the side panel. If I then click 'Translate and Generate Quiz', this is what I see after a couple seconds of loading:

Screenshot of polytales v0.2 in the translated state

The article body is translated to something close to the A2 French level, and I got a little multiple choice quiz on the side in French to test my comprehension.

Screenshot of polytales v0.2 showing the final score of the quiz

It looks like I didn't do so well this time, which makes sense, my french is horrendous.

I hope this concept shows some potential for a more practical use of LLMs for language learning. Before LLMs, I used apps like Du Chinese which had human-prepared Chinese news articles written for different HSK levels. While I thought this was interesting, the content was very limited and usually not what I was interested in. On the other hand, the LLM hype train has produced a lot of language tutor chatbots who can be customized to converse with users about topics they enjoy, but I found that these conversations were hollow and only had outdated / made-up facts. I didn't find any motivation to talk to someone's lightweight ChatGPT wrapper.

What I did realize however, was that LLMs were good within highly contextual tasks, semantically connecting different words and generating textbook-like grammatically perfect text. This is what I think makes tasks like RAG, semantic retrieval, summarization and translation particularly useful applications of LLMs.

My thesis for this project was based on a couple of key ideas: 1) Humans prefer to read what other humans write, 2) LLMs are good at simplifying and translating, and 3) We learn languages massively through comprehensible input. So if we can produce more comprehensible input sourced from human writings and translated through LLMs, perhaps we could massively speed up reading development for language learners.

There are some flaws though, LLMs don't always know what the CEFR language levels look like and for the sake of generating its translation, it may resort to using direct translations which will have some difficult vocabulary. It is fundamentally limited to the source material.

No link yet, I'll probably release a later version.

<- Take me back home