One thing is clear about the role of AI in higher education: it is unavoidable. However, many other aspects remain uncertain. This paper aims to provide illustrative examples, offer several suggestions, and—most importantly—foster a discussion about how and in what contexts AI should be both taught and used in the context of humanities and social sciences.
The official Recommendations regarding the use of generative artificial intelligence for university educators at Charles University advise educators to “Monitor developments in AI tools and spend some of your time exploring their capabilities. Check out what they can do, how they can benefit your work, and how reliable they are or aren’t... Actively use these tools where appropriate. Encourage students to use AI tools while respecting their varying levels of knowledge and skills.”
These recommendations are, perhaps necessarily, somewhat vague—particularly regarding questions such as: To what extent should teachers and students study the theory behind large language models to truly understand their capabilities and limitations? How should AI be studied and taught? In which areas is the use of AI most beneficial, and where might it pose the greatest challenges?
If you’re already tired of hearing about AI, brace yourself: it is not going away. That is why it is crucial to study AI language, from the early days of AI Dungeon (circa 2019), which gave a broad public its first taste of large language models, to the present (and beyond). In this talk, I present research on AI language using corpus- and psycholinguistic methods. Firstly, a live demonstration (or its screenshots alternative) of new publicly accessible AI-corpora, AI-Brown and AI-Koditex, will take place. Then, experiments on on AI-generated texts (including poetry), analyses of stylistic variability, and a study of AI-generated images will be presented. The goal of this talk is to offer a concise overview of recent work on large language models conducted at the Czech National Corpus. Let’s study AI before it studies us.
Comparing quantitative morphological features of languages: a study on annotated multi-parallel texts
Research on morphological diversity in typology and contrastive linguistics has traditionally focused on discrete, predominantly inflectional features. However, corpus-based approaches can provide complementary insights into the quantitative and dynamic aspects of morphological systems. While multiple languages have both morphological resources and large parallel corpora, sizeable corpora with detailed morphological annotation - including morphological segmentation and morpheme classification - remain very scarce. As part of a broader effort to address this gap, we present our current work on the detailed automatic annotation of part of the multiparallel corpus Europarl, comprising over 10 million tokens in each of six languages: Czech, English, French, German, Hungarian, and Slovak. The presentation reports preliminary results on quantitative morphological features extracted from these data and their potential to inform further cross-linguistic research. In particular, we discuss observed cross-linguistic regularities in morpheme frequency distributions, relationships among morpheme classes, and their possible connection to word formation strategies.
Language in Aphasia with Naive Discriminative Learning
In this talk, I will give a brief overview of my research on language in aphasia. I will start with the relationship between aphasiology and aphasic data and linguistics. This will be followed by three case studies. In the first case study, I will show how entrenchment and chunking modulate fluency, using prepositional phrases as an example. This study shows how a usage-based approach to language can complement approaches that focus more on the role of structural complexity in explaining linguistic behaviour in aphasia. The second study shows how a linguistically informed analysis can provide a more systematic and principled description of aphasic data. Specifically, I will present a description of verb and arguments structure production in aphasia, using the perspective of Construction Grammar and Frame Semantics. The last part of the talk will be dedicated to a new project in which I will focus on Czech inflectional morphology in speakers with aphasia and the possibilities of applying computational models of learning on this data.
Semantic networks for children with typical acquisition and specific language impairment
Tomáš Savčenko (OAJD)
I am preparing a study on semantic networks based on word vectors trained on the Clinical English Gillam corpus (Gillam & Pearson 2004) containing narratives of children with typical language development and specific language impairment (SLI). The aim is to analyse the structure of those semantic networks at different stages of acquisition with the hypothesis that a 'small-world structure', characterized by prominent hub words with many connections and local clusters of closely related words, will be found in typically developing children while a network with less dominant hubs and more evenly linked nodes will be found for children with SLI. Small-world network allows, in theory, effective search strategies in local clusters as well as across distant domains via the hub nodes (Watts & Strogatz 1998; Steyvers & Tenenbaum 2005) which is why I assume that its disruption should occur in SLI. Special focus will lie on whether this network measure would be able to distinguish typical and SLI children with similar mean length of utterance in which case this network measure would outperform a traditional psycholinguistic measure used to diagnose SLI (Rice et al. 2010).