MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

Making Meaning: Predictive Processing of Narratives in Humans and Machines

Viktor Kewenig
UCL
Talk


My research intersects AI, Cognitive Neuroscience and Philosophy, with a focus on language. Naturally, I am a big fan of Wittgenstein. After studying logic, philosophy of science and mind at Cambridge, I transitioned to cognitive neuroscience of language comprehension during my MSc at UCL. Currently, as part of the Eco-Brain Leverhulme DTP my PhD research employs multimodal computational and ecologically valid research methods to investigate alignment principles between AI and human cognition. I also collaborate with Microsoft Research Cambridge to find out more about how generative AI impacts human cognition in knowledge work and education.
AG 1, AG 2, AG 3, INET, AG 4, AG 5, D6, SWS, RG1, MMCI  
AG Audience
English

Date, Time and Location

Thursday, 30 January 2025
10:00
60 Minutes
E1 5
029
Saarbrücken

Abstract

In this talk, I examine how humans and artificial systems process narratives by integrating information from multiple modalities. First, I describe a behavioural study in which participants predicted upcoming words in short film clips while their eye movements were recorded. We compared unimodal (text-only) and multimodal (visual+text) computational models that differ in architecture, finding that models with cross-modal attention more closely matched human word predictions and gaze patterns—especially when the film clips provided meaningful visual context. Second, I discuss a neuroimaging study where participants listened to extended stories during fMRI. Using unimodal and multimodal variants of a large language model, we predicted brain activity (encoding) and decoded semantic content from recorded signals. The multimodal model substantially outperformed the unimodal version, predicting widespread brain activation and improving semantic decoding. Notably, only the multimodal model significantly benefited from including brain data, suggesting that multimodal embeddings are more biologically plausible. Taken together, these findings indicate that narrative comprehension relies on distributed, multimodal processes, and that incorporating non-linguistic cues (e.g., vision, audition) into large language models yields richer, more human-like representations of meaning.

Contact

Annika Meiser
+49 681 9303 9105
--email hidden

Virtual Meeting Details

Zoom
966 8141 4048
passcode not visible
logged in users only

Annika Meiser, 01/17/2025 15:37 -- Created document.