Graphs, Computation, and Language

Graphs, Computation, and Language (GCL) is a series of tutorials by Dmitry Ustalov dedicated to human-computer interaction systems, graph theory, and computational linguistics. The tutorials discuss their real-world applications and elaborately go through the corresponding algorithms step-by-step.

Currently the GCL series includes five lectures:

GCL has been accepted at the 33rd European Summer School in Logic, Language and Information (ESSLLI 2022)!

The slides are licensed under a CC BY-NC-SA 4.0 license and are available via the DOI link below.

Graphs, Computation, and Language (DOI: 10.5281/zenodo.4698904)

Nearest Neighbor Graph

Nearest Neighbor Graph
(Source: Wikimedia)

Crowdsourcing for Language Resources and Evaluation


Crowdsourcing is an efficient approach for knowledge acquisition and data annotation that enables gathering and evaluating large-scale linguistic datasets. In this lecture we will focus on practical use of human-assisted computation for language resource construction and evaluation. We will analyze three established approaches for crowdsourcing in NLP. First, we will consider the case study of Wikipedia and Wiktionary that facilitate the community effort using automatic quality control via content assessment and edit patrolling. Second, we will dive deep in microtask-based crowdsourcing using reCAPTCHA and Mechanical Turk as the examples. We will discuss task design and decomposition issues and then carefully describe standard approaches for inter-annotator agreement evaluation (Krippendorff’s α) and answer aggregation (Majority Vote and Dawid-Skene). Third, we will study the case of various games with a purpose, including ESP Game, Infection Game for BabelNet, and OpenCorpora gamification. Finally, we will provide recommendations for ensuring the high quality of the crowdsourced annotation and show useful datasets for further studies.

Tutorial at AINL 2019 in Tartu, Estonia

Tutorial at AINL 2019 in Tartu, Estonia

Graph Clustering for Natural Language Processing


Graph clustering allows to extract useful knowledge by exploiting the implicit structure of the data. In this lecture we will introduce the problem of non-overlapping and overlapping graph clustering. We will demonstrate and elaborately describe several efficient clustering algorithms for both these problems widely used in NLP, including Chinese Whispers and Markov Clustering for non-overlapping clustering (aka partitioning), and MaxMax and Watset for overlapping (aka fuzzy) clustering. We will show their strengths and weaknesses as well as their implementations and successful applications in word sense and frame induction. Then, we will focus on evaluation of unsupervised NLP methods using pairwise precision, recall, F-score, (inverse) purity and its modifications. Finally, we will discuss the randomization-based statistical tests of these measures, algorithm choice, and useful language resources for further studies.

Tutorial at AINL 2018 in Saint Petersburg, Russia

Tutorial at AINL 2018 in Saint Petersburg, Russia