Picture Viewing and Picture Description: Two Windows on the Mind

University dissertation from Cognitive Science

Abstract: In this thesis, I connect two disciplines, linguistics and vision research, and combine two methods, spoken language protocols and eye movement protocols, in order to cast light on the underlying cognitive processes. In a series of studies, I investigate the processual aspects of picture viewing and picture description. On the one hand, eye movements reflect human thought processes. It is easy to determine which elements attract the observer's eye and consequently, his thought. Eye movement records offer us one tool for accessing the mind. On the other hand, spoken segments are the linguistic expressions of a conscious focus of attention. By using a specific discourse-analytic approach, spoken description provides another complex and subtle window on the mind. Visual perception and spoken language description are conceived of with the help of a spotlight metaphor. By combining the contents of the verbal and visual spotlight, we get a reinforcement effect. When using 'two windows on the mind', we obtain more than twice as much information about cognition, since vision and spoken language interact with each other. For a sequential comparison of verbal and visual data, so called multimodal score sheets were created. With the help of this new analytic format, configurations of verbal and visual clusters were extracted from the synchronized data. In the main study, three groups of questions are central: i. Can we identify comparable units in visual perception and in discourse production? ii. Does the order of units in the verbal description reflect the general order in which information was acquired visually? iii. Is the content of the units in picture viewing and picture description similar? My results show that a verbal focus does not always closely correspond to a visual fixation. A perfect temporal and se¬mantic match between visual and verbal foci is very rare. In the light of these findings, the hypothesis about temporal and semantic correlation of verbal and visual data on the focus level has been invalidated. Instead, I suggest that the superfocus (roughly corresponding to longer prosodic sentences) is a suitable unit of comparison since it represents an entity that delimits separable clusters of both visual and verbal data. The results of the eye-tracking study in a monological setting are also compared to a spontaneous description in an interactive setting where the spoken language description is accompanied by spontaneous drawing. I suggest that both the verbal and the non-verbal means of communication, spoken description, drawing, pointing gestures and gaze direction, contribute to a situatively anchored joint focusing process.The method developed in the thesis can be applied to several areas. The scanning strategies together with the on-line comments can solve several puzzles in cognition and scene perception. The method illuminates mental processes and attitudes and can thus be used as a sensitive evaluative tool for assessments of a current design. The empirical results from my studies, especially the way speech and eye movements are synchronised and integrated, can contribute to the development of a new generation of multimodal interactive systems.

  This dissertation MIGHT be available in PDF-format. Check this page to see if it is available for download.