Search for dissertations about: "language corpora"

Showing result 1 - 5 of 59 swedish dissertations containing the words language corpora.

  1. 1. Why the pond is not outside the frog? Grounding in contextual representations by neural language models

    Author : Mehdi Ghanimifard; Göteborgs universitet; []
    Keywords : NATURVETENSKAP; NATURAL SCIENCES; Computational linguistics; Language grounding; Spatial language; Distributional semantics; Computer vision; Language modelling; Vision and language; Neural language model; Grounded language model;

    Abstract : In this thesis, to build a multi-modal system for language generation and understanding, we study grounded neural language models. Literature in psychology informs us that spatial cognition involves different aspects of knowledge that include visual perception and human interaction with the world. READ MORE

  2. 2. Recycling Translations : Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language Processing

    Author : Jörg Tiedemann; Anna Sågvall Hein; Joakim Nivre; Uppsala universitet; []
    Keywords : NATURVETENSKAP; NATURAL SCIENCES; Computational linguistics; word alignment; parallel corpora; translation corpora; computational lexicography; machine translation; Datorlingvistik; Computational linguistics; Datorlingvistik;

    Abstract : The focus of this thesis is on re-using translations in natural language processing. It involves the collection of documents and their translations in an appropriate format, the automatic extraction of translation data, and the application of the extracted data to different tasks in natural language processing. READ MORE

  3. 3. Splitting rocks: Learning word sense representations from corpora and lexica

    Author : Luis Nieto Piña; Göteborgs universitet; []
    Keywords : NATURVETENSKAP; NATURAL SCIENCES; language technology; natural language processing; distributional models; semantic representations; distributed representations; word senses; embeddings; word sense disambiguation; linguistic resources; corpus; lexicon; machine learning; neural networks;

    Abstract : The representation of written language semantics is a central problem of language technology and a crucial component of many natural language processing applications, from part-of-speech tagging to text summarization. These representations of linguistic units, such as words or sentences, allow computer applications that work with language to process and manipulate the meaning of text. READ MORE

  4. 4. Morphosyntactic Corpora and Tools for Persian

    Author : Mojgan Seraji; Joakim Nivre; Carina Jahani; Jan Hajic; Uppsala universitet; []
    Keywords : NATURVETENSKAP; NATURAL SCIENCES; Persian; language technology; corpus; treebank; preprocessing; segmentation; part-of-speech tagging; dependency parsing; Computational Linguistics; Datorlingvistik;

    Abstract : This thesis presents open source resources in the form of annotated corpora and modules for automatic morphosyntactic processing and analysis of Persian texts. More specifically, the resources consist of an improved part-of-speech tagged corpus and a dependency treebank, as well as tools for text normalization, sentence segmentation, tokenization, part-of-speech tagging, and dependency parsing for Persian. READ MORE

  5. 5. Studies in Corpora and Idioms : Getting the cat out of the bag

    Author : David Minugh; Nils-Lennart Johannesson; Maria Kuteeva; Karin Aijmer; Stockholms universitet; []
    Keywords : HUMANIORA; HUMANITIES; Coll corpus; corpora; corpus creation; idioms; idiom variation; idiom-breaking; online newspapers; student newspapers; college newspapers; English language; Engelska språket; English; engelska;

    Abstract : “Idiomatic” expressions, usually called “idioms”, such as a dime a dozen, a busman’s holiday, or to have bats in your belfry are a curious part of any language: they usually have a fixed lexical (why a busman?) and structural composition (only dime and dozen in direct conjunction mean ‘common, ordinary’), can be semantically obscure (why bats?), yet are widely recognized in the speech community, in spite of being so rare that only large corpora can provide us with access to sufficient empirical data on their use.In this compilation thesis, four published studies focusing on idioms in corpora are presented. READ MORE