Saturday, 23rd August

Biomedical/Clinical NLP

Tutors: Ozlem Uzuner and Meliha Yetisgen

This tutorial will present an overview of the biomedical and clinical NLP data, tools, and methods with the intent of providing the researchers with a jump-start into these domains. The focus will be on the demand for NLP in biomedical and clinical domains, the potential for impact, and the required NLP tasks. The data and methods that support the NLP tasks will be introduced and a vision for future work will be shared.

Tutorial Slide Deck

Using Neural Networks for Modelling and Representing Natural Languages

Tutors: Thomas Mikolov

Artificial neural networks are currently very popular in tasks such as language modelling, speech recognition, image classification and machine translation. The goal of this tutorial is to introduce the main concepts such as backpropagation and stochastic gradient descent, with focus on applications to language related problems. This includes language modelling with feedforward and recurrent neural nets, distributed representations of words and phrases, and applications to speech recognition and machine translation.

Tutorial Slide Deck

Sunday, 24th August

Multilingual Word Sense Disambiguation and Entity Linking

Tutors: Roberto Navigli and Andrea Moro

Nowadays the textual information available online is provided in an increasingly wide range of languages. This language explosion clearly forces researchers to focus on the challenging problem of being able to analyze and understand text written in any language. At the core of this problem lies the lexical ambiguity of language, an issue which is addressed by two key tasks in computational lexical semantics: multilingual Word Sense Disambiguation (WSD) and Entity Linking (EL). In this tutorial, we present the two tasks of multilingual WSD and EL, by surveying the challenges involved and the most effective solutions, both in isolation by illustrating the state of the art in the two fields, and when the tasks are addressed in a unified, multilingual way.

Tutorial Slide Deck

Automated Grammatical Error Correction for Language Learners

Tutors: Joel Tetreault and Claudia Leacock

A fast growing area in Natural Language Processing is the use of automated tools for identifying and correcting grammatical errors made by language learners. This growth, in part, has been fuelled by the needs of a large number of people in the world who are learning and using a second or foreign language. For example, it is estimated that there are currently over one billion people who are non-native speakers of English. These numbers drive the demand for accurate tools that can help learners to write and speak proficiently in another language. Such demand also makes this an exciting time for those in the NLP community who are developing automated methods for grammatical error correction (GEC). Our motivation for proposing a tutorial at COLING is to make others more aware of this field and its particular set of challenges. Although applications of GEC are often geared toward the classroom, its methods are more generally applicable to a wide variety of NLP problems, especially where systems must contend with noisy data, such as MT evaluation and correction, analysis of microblogs and other user-generated content, and disfluency detection in speech. For these reasons, we believe that the tutorial will potentially benefit a broad range of conference attendees.

Tutorial Slide Deck

Selection Bias, Label Bias, and Bias in Ground Truth

Tutors: Anders S√łgaard, Barbara Plank and Dirk Hovy

We argue that most important problems in NLP relate to three kinds of bias: (a) selection bias, exemplified by domain and cross-language adaptation problems, as well as robust learning problems, (b) label bias, e.g., involved in learning from crowdsourced annotations, or simply inconsistent annotations, and (c) bias in ground truth or theoretical bias, which is less frequently addressed, but motivates recent work on cross-framework evaluation, as well as much work on downstream evaluation of syntactic analysis. This tutorial will present a wide range of novel algorithms, as well as empirical results, with Twitter POS and named entity tagging as our running example.

Tutorial Slide Deck

Dependency Parsing: Past, Present, and Future

Tutors: Wenliang Chen, Zhenghua Li and Min Zhang

To date, research on dependency parsing mainly focuses on data-driven supervised approaches and results show that the supervised models can achieve reasonable performance on in-domain texts for a variety of languages when manually labelled data is provided. However, relatively less effort is devoted to parsing out-domain texts and resource-poor languages, and few successful techniques are bought up for such scenario. This tutorial will cover all these research topics of dependency parsing and is composed of four major parts. Especially, we will survey the present progress of semi-supervised dependency parsing, web data parsing, and multilingual text parsing, and show some directions for future work.

Tutorial Slide Deck