Why My Coding Assistant Started Replying in Korean When I Typed Chinese

Contents

Hypothesis Controlled Language Drift Real-world Model Behavior Conclusion

. Primarily, I work with my coding assistant in Chinese. However, my writing is often mixed: many engineering terms are more familiar to me in English (especially terms we use in python, git, etc), and some are even difficult to translate naturally into Chinese.

Yesterday, I asked my coding assistant in Chinese:“run.py有早停吗？我在恒源云上跑，发现没有触发”, meaning, “Does run.py implement early stopping? I was running the project on a shared GPU service, and I didn’t see early stopping triggered.” As usual, I naturally typed the technical token run.py in its original English form. The model inspected the code and responded with the following:

Image by author: Screenshot of coding assistant replying in Korean

All technical tokens remained in English (run.py, config.py, train_unified), while the explanatory structure shifted into Korean. This is not a unique case. It has happened from time to time: as long as I mixed Chinese and English engineering terms, Korean always appeared.

Image by author: Another screenshot of coding assistant replying in Korean

This made me ask: Is this a language issue, or something deeper in the embedding space?

Hypothesis

Embedding spaces are not primarily structured by the nature of languages. Having been trained alongside language models, they tend to be organized by task registers such as academic writing, conversational text, and, in the case of coding assistants, engineering/code. Chinese, although spoken by the largest population in the world, is not a natural medium for the engineering register and has limited representation in technical corpora.

In such a context, text may stop behaving like “Chinese” in the embedding space as soon as engineering tokens such as review / branch / commit / PR / diff appear. Instead, it may drift into an engineering attractor field.

We will conduct some experiments to provide empirical evidence for this hypothesis.

Controlled Language Drift

We construct the following controlled sequence of sentences where English words take over Chinese ones gradually:

Stage 0: 请帮我检查这个分支
Stage 1: 请帮我 review 这个分支
Stage 2: 请帮我 review 这个 branch
Stage 3: Please review this branch pull request commit
Stage 4: Please review this branch pull request commit code diff

We now compute similarity using cosine similarity between sentence embeddings. We define Korean and English “clusters” as the average embedding of a small set of representative engineering-related sentences in each language. We use Δ (EN − KO) to denote the difference between English and Korean similarity scores, i.e., Δ = similarity(English) − similarity(Korean).

Stage	Korean similarity	English similarity	Δ (EN − KO)
0	0.4783	0.5141	0.0358
1	0.5235	0.5728	0.0492
2	0.5474	0.6140	0.0665
3	0.5616	0.7314	0.1698
4	0.5427	0.7398	0.1972

We observed an interesting phenomenon: Korean similarity increases first and is later overtaken by English similarity. Moreover, the growth in English similarity is non-linear, suggesting a phase-transition–like behavior rather than gradual drift.

When projecting the embeddings into two dimensions using PCA, we observe a smooth trajectory in the early stages, followed by a sharp directional jump between Stage 2 and Stage 3, and subsequent stabilization. This pattern indicates that embeddings do not move linearly through space; instead, they appear to transition between attractor basins.

Image by author: Controlled Drift Trajectory in PAC space

Real-world Model Behavior

Consider again the sentence we mentioned at the beginning. I asked:

A. “run.py有早停吗？我在恒源云上跑，发现没有触发”, meaning “Does run.py implement early stopping? I was running the project on a shared GPU service, and I didn’t see early stopping triggered.”

B. “원인을 찾았습니다. 결론: run.py에는 실제로 조기 종료가 없습니다. config.py에 USE_EARLY_STOPPING = True” (in Korean).

Translated back into Chinese, we have:

C. “我找到了原因。结论：run.py实际上没有早停。config.py里有 USE_EARLY_STOPPING = True。”

We compute the similarities of A, B, and C using cosine similarity between sentence embeddings. For comparison, we define three reference clusters: the Chinese cluster as the average embedding of general Chinese natural-language sentences, and the corresponding English and Korean clusters.

Text	Korean sim	English sim	Chinese sim
A. (Chinese prompt)	0.2003	0.2688	0.3134
B. (Korean response)	0.2745	0.2983	0.1641
C. (Translated Chinese)	0.1634	0.3106	0.2798

As you can see, translating the Korean response back into Chinese does not send the embedding back to the Chinese region. Instead, it moves even closer to the English clusters.

This suggests: Translation could restore language form, but probably not embedding location.

Conclusion

Both experiments give the same conclusion: the embedding space is not organized by language boundaries. Instead, it is more likely structured by task natures, where engineering English dominates.
When a sentence enters this region, language form may change, but the embedding structure remain in the engineering basin, leading to weird behaviors such as replying in Korean even if you are not at all a Korean speaker.