Table of Contents

FR97/2011/G1 - Formal Models in Natural Language Processing

Materials for Lectures (FRVŠ: Horáček, Burgetová, Zámečníková (2011))

Lecture Topic Presentation Printable
Formal Description of Natural Languages
01 Aims of Linguistic Theory PDF PDF
02 Generative Power of Natural Languages PDF PDF
NLP Formalisms
03 Augmented Transition Networks PDF PDF
04 Generalized Phrase Structure Grammar PDF PDF
05 Head-driven Phrase Structure Grammar PDF PDF
06 Lexical Functional Grammar PDF PDF
07 Lexicalized Tree Adjoining Grammar PDF PDF
08 Dependency Grammars PDF PDF
Statistical NLP
09 Probabilistic Context-Free Grammar PDF PDF
Alternative Approach
10 Scattered Context Grammar PDF PDF
Parsing
11 LR Parsing PDF PDF
All Lectures in One Archive ZIP ZIP

Introduction

The study of natural languages is one of the key factors in the foundation of Formal Language Theory (FLT). The earliest works in the area of FLT were often motivated by the effort to understand and formally describe the syntax of natural languages. However, since then, the areas of FLT and Natural Language Processing (NLP) have become quite distinct.

The materials available here are based on the connections between NLP and FLT. For the NLP-oriented reader, they should provide an overview of the formal background of the tools used in NLP. Several well-known models are presented. For those with primary interest in formal languages, these materials contain information about an important application area of FLT. All discussed topics are illustrated by practical examples.

Topics

Formal Description of Natural Languages

References

  1. James Allen: Natural Language Understanding. The Benjamin/Cummings Publishing Company Inc., 1995
  2. Robert N. Moll, Michael A. Arbib, Assaf J. Kfoury: An Introduction to Formal Language Theory. Springer-Verlag, 1988

NLP Formalisms

In this section, you can find several formalisms that are used in practice in NLP. Rather than trying to provide in-depth coverage of all aspects, the materials should present an overview of available alternatives. The core principles of the models are described, and illustrated by examples.

References

  1. James Allen: Natural Language Understanding. The Benjamin/Cummings Publishing Company Inc., 2005
  2. Doug Arnold: Lexical Functional Grammar. Dept of Language and Linguistics, University of Essex, 2011
  3. Andrew Carnie: Syntax: A Generative Introduction. Blackwell Publishing, Oxford, 2002
  4. Gerald Gazdar, Ewan H. Klein, Geoffery K. Pullum, Ivan A. Sag: Generalized Phrase Structure Grammar. Harvard University Press, 1985
  5. Robert N. Moll, Michael A. Arbib, Assaf J. Kfoury: An Introduction to Formal Language Theory. Springer-Verlag, 1988
  6. Carl Pollard, Ivan A. Sag: Head-Driven Phrase Structure Grammar. University of Chicago Press, 1994
  7. Yves Schabes, Aravind K. Joshi: Tree-Adjoining Grammars and Lexicalized Grammars. 1991
  8. Dependency grammar formalisms
    1. Ralph Debusmann, Denys Duchier, Geert-Jan Kruijff: Extensible Dependency Grammar: A New Methodology. In Proceedings of the Workshop on Recent Advances in Dependency Grammar, p. 78-85, 2004
    2. Richard Hudson: Word Grammar. Blackwell, 1984
    3. Igor Mel'čuk: Dependency Syntax: Theory and Practice. State University of New York Press, 1988
    4. Petr Sgall, Eva Hajičová, Jarmila Panevová, Jacob Mey: The meaning of the sentence in its semantic and pragmatic aspects. Springer, 1986
    5. Lucien Tesnière: Éléments de syntaxe structurale. Editions Klincksieck, 1959

Statistical NLP

With the availability of large text corpora (both annotated and unannotated) for various languages, statistical approaches became prominent in modern NLP. Instead of the traditional approach, where we try to describe all relevant aspects of a language manually (hand-crafted rules etc.), we can apply machine learning methods to extract some generalized information from a corpus.

Most of the models discussed in the previous section can be (and have been) adapted for use in statistical NLP.

References

  1. Christopher D. Manning, Hinrich Schütze: Foundations of Statistical Natural Language Processing. MIT Press, 1999

Alternative Approach

One of the major trends in FLT is the so called regulated rewriting. The basic idea is to increase the generative power of a given formal model (usually CFG) by adding some mathematically simple mechanism that will control (regulate) the sentence generation. A well-known example is the Matrix Grammar [1]. Another way to increase the generative capacity may be by modifying the form of the rules, as is done in the Scattered Context Grammar [2].

Such models could potentially be useful in NLP as well, because we would be able to capture even non-context-free features of natural languages easily.

References

  1. Jürgen Dassow, Gheorghe Păun: Regulated Rewriting in Formal Language Theory. Springer, 1989
  2. Sheila A. Greibach, John E. Hopcroft: Scattered Context Grammars. In Journal of Computer and System Sciences, 3:233-247, 1969
  3. Alexander Meduna and Jiří Techet: Scattered Context Grammars and their Applications. WITpress, 2010

Parsing

In practice, having a formal description the syntax of a language is usually not enough. We need to be able not only to generate well-formed sentences of a language, often we want to analyze a given sentence. For instance, we want to decide if a sentence belongs to a given language, or to obtain further information about its constituents and structure.

Syntactic analysis (parsing) is an important part of FLT. For practical applications (in NLP, but also in other areas), we need efficient parsing methods and algorithms. This is the main reason why many of the formal models used in NLP are in some way based on CFG or context-free rules - CFG is the most powerful grammar of the Chomsky Hierarchy for which we are still able to perform syntatic analysis efficiently enough.

There are several well-known parsing algorithms for CFG, such as CKY or Chart Parsing. Some algorithms are only applicable to a given subset of CFGs (for example LL Parsing), while other ones are general.

References

  1. Alexander Meduna: Elements of Compiler Design. Taylor & Francis, New York, 2008

Acknowledgement

This work was supported by the MŠMT FRVŠ grant FR97/2011/G1.