“And that´s what makes us human.” Phraseology in AI compared to human language. Does English shape the phraseology of AI-produced Czech?

Date

Wednesday 2026-04-29 14:10

Speaker

Denisa Šebestová

Abstract

I am looking into phraseological sequences in AI-produced language from a cross-linguistic perspective, while also comparing them to human-produced texts. Differences between AI- and human-produced language on the phraseological level are subtle, yet they may contribute substantially to the perceived "otherness" of AI language. Mastering phraseological sequences is known to pose a challenge to foreign language learners; LLMs may thus face similar difficulties, particularly in Czech, an inflectional language with little representation in LLM training data. The basic premise is that LLMs process prompts in English internally before generating output in the target language (Zhao et al. 2024; Zhong et al. 2024; Schut, Gal, and Farquhar 2025). The study seeks to clarify whether and how this affects Czech output: Do English lexical bundles transpire into Czech AI texts? If so, what discourse functions do they fulfil, and how are they distributed across registers? To answer these questions, I compare frequent n-grams between two human language corpora: Koditex (Czech) and BE21 (English); and two AI corpora: AI Koditex and AI Brown.