Czech National Corpus brings you

Interpretable benchmark of stylistic variation in LLM-generated texts

The benchmark shows how various LLM-produced texts are shifted in style compared to human-written texts. Details are in this paper by Jiří Milička, Anna Marklová, and Václav Cvrček. Based on Biber's classic MDA stylistic vectors (Biber 1988) and their adaptation for Czech by Cvrček et al. (2018). The LLM-generated corpora (AI Brown and AI Koditex) are searchable via the Kontext interface (Universal Dependencies tagging). Click a heatmap cell to open a scatterplot (every tot is one text chunk). .

Select a cell in the heatmap to see the scatterplot here.