Speakers and Tutorials
Since late '80s statistical methods have gained attention in linguistics as for their ability to model and automate several linguistic inferences, with an increasing impact on computational linguistics. Recently, the neural network (NNs) new wave has been the focus of interesting and promising research on several linguistic phenomena ranging from grammatical inference to semantic interpretation, from sentence classification to machine translation. Complex neural architectures seems able to account for syntagmatic and lexical semantic phenomena with surprisingly good generalization capabilities.
In the tutorial, after a not-so-short survey on motivations, paradigms and mathematical properties that make the neural learning methods so much effective, the most recent architectures (e.g. attention-based recurrent neural networks) that are largely adopted across modern CL research will be presented. By using some tasks as use cases, the tutorial will try to shed some light on potentials and limitations of the neural paradigms, underlying the representational, that is exquisitely linguistic, design principles that make them so appealing.
According to background knowledge and e-mail requests from the audience, the tutorial will also concentrate on best practices related to the use of NNs in current programming frameworks, useful to apply neural learning to available linguistic datasets, in order to make attendants able to autonomously move their first steps.
Abstract: (to be defined)
A Gentle Introduction to Universal Dependencies
Marco Passarotti (Catholic University of the Sacred Heart, Milan & University of Pavia)
when: 3 May, 14.30-16.30 place: Aula Bottigella, Palazzo San Tommaso
Abstract: Universal Dependencies (UD; http://universaldependencies.org/) is one of the most notable projects currently ongoing in computational linguistics. The project, run by contributors from the research community, aims at creating a collection of dependency treebanks for different languages built according to a cross-linguistically consistent annotation style meant to complement (but not to replace) the single language/treebank-specific schemes. Started in 2014 with the first set of guidelines, UD has published a new release of the collection of the treebanks roughly every six months. Version 2 (v2), which introduces a new set of guidelines, was released in March 2017. The current version is 2.1 (November 2017). It includes 102 treebanks and 60 languages. The talk will introduce the basic aspects of the annotation style of UD v2, particularly focussing on a number of specific syntactic constructions. The format of data will be detailed and a bunch of tools for handling UD data will be presented. Finally, a network-based method for comparing the UD treebanks will be sketched.
Abstract: The application of research practices and methodologies from Natural Language Processing (NLP) to Humanities studies is having a great impact on the way humanities research is being conducted. However, although many applications have been developed to automatically analyse document collections from the historical or the literary domain, they often fail to provide a real support to scholars. This depends on a number of reasons: automated systems, typically tailored to news, may perform poorly in different domains, or the algorithms may be deemed too complex by humanities researchers. Furthermore, a system may suffer from poor usability in spite of a good accuracy, or the annotation scheme behind a task may not suit scholars' needs. This talk will give an overview of such issues and present some possible solutions proposed by the Digital Humanities research community.