Automata and Dictionaries

Reference

Maurel Denis, Guenthner Franz. (2006). Automata and Dictionaries. Texts in computing, vol. 6, King’s College Publications, London, 240 pages.

ISBN 1-904-987-32-X

Automata and Dictionaries

To order the book at Amazon

Abstract

Automata and Dictionaries is aimed at students and specialists in natural language processing and related disciplines where efficient text analysis plays a role. Large linguistic resources, in particular lexica, are now recognized as a fundamental pre-requisite for all natural language processing tasks. Specialists of this domain cannot afford to be ignorant of the state-of-the-art lexicon-management algorithms. This monograph, which is also intended be used a an advanced text book in computational linguistics, fills a gap in natural language processing monographs and be complementary to other publications in this area.

This book is a source of examples, exercises and problems for software engineering in general. The algorithms that are presented are excellent examples of non-trivial problems of graph construction, graph handling and graph traversal. Even though published in scientific journals, they have not been accessible in an easily accessible form to teachers and students. These algorithms will also be of interest for the training of software engineers.

Chapter 1 of Automata and Dictionaries provides the application-oriented motivation for solving the problems studied in the rest of the book. It introduces and exemplifies several key notions of lexicon-based natural language processing in a way accessible to any computer science student.

Chapter 2 surveys the main solutions of the problem, but only on the example of a very small toy lexicon. Chapter 3 defines the underlying mathematical notions, immediately illustrating theory with the aid of practical examples, which makes this part quite readable.

Chapters 4 and 5 are dedicated to the two central notions of lexicon construction: the algorithms of determinization and minimisation. The standard form of both algorithms is presented, but also their variants and some special cases that occur frequently in practice. The operation of the algorithms is described step by step in examples, introducing the beginner into the world of epsilon-transitions, state heights and reverse automata.

Chapter 6 goes a step further into complexity. It is based on algorithms published by scholars from 1998 to now. They are presented here with the same clarity as the preceding, more classical, algorithms. This remarkable achievement owes much to the rigorous structuring of this chapter. These algorithms have variants for transducers, which are presented in Chapter 7 with the same pedagogical skill.

The last chapter studies time and space complexity of the algorithms and explains several tricks useful to speed up their operation.