FR97/2011/G1 - Formal Models in Natural Language Processing

FR97/2011/G1 - Formal Models in Natural Language Processing

Materials for Lectures (FRVŠ: Horáček, Burgetová, Zámečníková (2011))

Lecture	Topic	Presentation	Printable
	Formal Description of Natural Languages
01	Aims of Linguistic Theory	PDF	PDF
02	Generative Power of Natural Languages	PDF	PDF
	NLP Formalisms
03	Augmented Transition Networks	PDF	PDF
04	Generalized Phrase Structure Grammar	PDF	PDF
05	Head-driven Phrase Structure Grammar	PDF	PDF
06	Lexical Functional Grammar	PDF	PDF
07	Lexicalized Tree Adjoining Grammar	PDF	PDF
08	Dependency Grammars	PDF	PDF
	Statistical NLP
09	Probabilistic Context-Free Grammar	PDF	PDF
	Alternative Approach
10	Scattered Context Grammar	PDF	PDF
	Parsing
11	LR Parsing	PDF	PDF
	All Lectures in One Archive	ZIP	ZIP

Introduction

The study of natural languages is one of the key factors in the foundation of Formal Language Theory (FLT). The earliest works in the area of FLT were often motivated by the effort to understand and formally describe the syntax of natural languages. However, since then, the areas of FLT and Natural Language Processing (NLP) have become quite distinct.

NLP studies many other aspects of natural langues besides their syntax (such as morphology or semantics). It is focused mainly on practical applications, and involves various tasks (for example spell and grammar checking or machine translation; in a broader sense we can even include such tasks as speech recognition and synthesis).
FLT focuses on purely theoretical studies of formal models and their properties. Practical applications are often only briefly sketched out.

The materials available here are based on the connections between NLP and FLT. For the NLP-oriented reader, they should provide an overview of the formal background of the tools used in NLP. Several well-known models are presented. For those with primary interest in formal languages, these materials contain information about an important application area of FLT. All discussed topics are illustrated by practical examples.

Topics

Formal Description of Natural Languages

Aims of Linguistic Theory PDF
- Origins of computational linguistics are mostly connected with artificial intelligence (AI) applications.
- The main concern is the syntax of natural languages.
- Notion of transformational rules - useful for generalizations in languages (for example, we can use them to transform an active sentence in English into passive).
Generative Power of Natural Languages PDF
- Where do natural languages fit in the Chomsky Hierarchy?
- Natural languages are not regular.
- Are natural languages context-free?
- Introduction of Transformational Grammars.

References

James Allen: Natural Language Understanding. The Benjamin/Cummings Publishing Company Inc., 1995
Robert N. Moll, Michael A. Arbib, Assaf J. Kfoury: An Introduction to Formal Language Theory. Springer-Verlag, 1988

NLP Formalisms

In this section, you can find several formalisms that are used in practice in NLP. Rather than trying to provide in-depth coverage of all aspects, the materials should present an overview of available alternatives. The core principles of the models are described, and illustrated by examples.

Augmented Transition Networks (ATN) PDF
- One of the earliest models for the description of natural languages.
- Recursive transition network with registers.
- ATNs are Turing-complete.
Generalized Phrase Structure Grammar (GPSG) PDF
- Created as an attempt to show that it is possible to describe natural languages in a context-free framework, without using transformations [4].
- Apart from context-free rules, GPSG includes features and metarules.
- GPSG itself is not used in many practical applications, but its basic principles inspired some much more successful models (such as HPSG).
Head-driven Phrase Structure Grammar (HPSG) PDF
- Non-derivational, unification-based grammar with declarative constraints.
- Basic HPSG type is the sign - formalized as a typed feature structure.
- Widely used in practice in NLP.
Lexical Functional Grammar (LFG) PDF
- Main focus is still on syntax, but LFG also covers its relation to morphology, semantics and pragmatics.
- Multiple dimensions of structure. There are two fundamental levels - c-structure and f-structure.
- Used as the theoretical basis of various machine translation tools.
Lexicalized Tree Adjoining Grammar (LTAG) PDF
- Elementary objects are trees, not strings (TAG is a “tree-generating system”).
- Two basic operations over trees: substitution and adjunction.
- Strong generative capacity - TAGs can generate some context-sensitive languages.
- Lexicalization means associating each elementary structure with a lexical item.
Dependency Grammars PDF
- The term actually encompasses a whole group of particular formal models, such as:
  - Theory of Structural Syntax [V]
  - Word Grammar (WG) [II]
  - Functional Generative Description (FGD) [IV]
  - Meaning-Text Theory (MTT) [III]
  - Extensible Dependency Grammar (XDG) [I]
- Dependency grammars are an alternative to phrase structure grammars - they capture relations between words directly, there are no phrasal nodes.
- Generally more simple, robust and portable than PSG, but less informative.
- Good for describing free word order languages (such as Czech).

References

James Allen: Natural Language Understanding. The Benjamin/Cummings Publishing Company Inc., 2005
Doug Arnold: Lexical Functional Grammar. Dept of Language and Linguistics, University of Essex, 2011
Andrew Carnie: Syntax: A Generative Introduction. Blackwell Publishing, Oxford, 2002
Gerald Gazdar, Ewan H. Klein, Geoffery K. Pullum, Ivan A. Sag: Generalized Phrase Structure Grammar. Harvard University Press, 1985
Yuji Matsumoto: Syntax and Parsing: Phrase Structure and Dependency Parsing Algorithms. SSLST, 2011
Robert N. Moll, Michael A. Arbib, Assaf J. Kfoury: An Introduction to Formal Language Theory. Springer-Verlag, 1988
Carl Pollard, Ivan A. Sag: Head-Driven Phrase Structure Grammar. University of Chicago Press, 1994
Yves Schabes, Aravind K. Joshi: Tree-Adjoining Grammars and Lexicalized Grammars. 1991
Dependency grammar formalisms
1. Ralph Debusmann, Denys Duchier, Geert-Jan Kruijff: Extensible Dependency Grammar: A New Methodology. In Proceedings of the Workshop on Recent Advances in Dependency Grammar, p. 78-85, 2004
2. Richard Hudson: Word Grammar. Blackwell, 1984
3. Igor Mel'čuk: Dependency Syntax: Theory and Practice. State University of New York Press, 1988
4. Petr Sgall, Eva Hajičová, Jarmila Panevová, Jacob Mey: The meaning of the sentence in its semantic and pragmatic aspects. Springer, 1986
5. Lucien Tesnière: Éléments de syntaxe structurale. Editions Klincksieck, 1959

Statistical NLP

With the availability of large text corpora (both annotated and unannotated) for various languages, statistical approaches became prominent in modern NLP. Instead of the traditional approach, where we try to describe all relevant aspects of a language manually (hand-crafted rules etc.), we can apply machine learning methods to extract some generalized information from a corpus.

Most of the models discussed in the previous section can be (and have been) adapted for use in statistical NLP.

Probabilistic Context-Free Grammar (PCFG) PDF
- Context-free grammar where every rule has a given probability.
- While PCFG may not be often used in practice, it is presented here as a straightforward extension of CFG, which is a well-known model from classic FLT.
- Also useful to demonstrate some principles from statistical NLP and machine learning (i.e. the Inside-Outside Algorithm).
- Practical applications for example in grammar checking.

References

Christopher D. Manning, Hinrich Schütze: Foundations of Statistical Natural Language Processing. MIT Press, 1999

Alternative Approach

One of the major trends in FLT is the so called regulated rewriting. The basic idea is to increase the generative power of a given formal model (usually CFG) by adding some mathematically simple mechanism that will control (regulate) the sentence generation. A well-known example is the Matrix Grammar [1]. Another way to increase the generative capacity may be by modifying the form of the rules, as is done in the Scattered Context Grammar [2].

Such models could potentially be useful in NLP as well, because we would be able to capture even non-context-free features of natural languages easily.

Scattered Context Grammar (SCG) PDF
- A modification of context-free grammar, where we rewrite an n-tuple of nonterminals simultaneously in each derivation step.
- Using SCG and Transformational SCG in the description of English syntax is proposed and demonstrated in [3].

References

Jürgen Dassow, Gheorghe Păun: Regulated Rewriting in Formal Language Theory. Springer, 1989
Sheila A. Greibach, John E. Hopcroft: Scattered Context Grammars. In Journal of Computer and System Sciences, 3:233-247, 1969
Alexander Meduna and Jiří Techet: Scattered Context Grammars and their Applications. WITpress, 2010

Parsing

In practice, having a formal description the syntax of a language is usually not enough. We need to be able not only to generate well-formed sentences of a language, often we want to analyze a given sentence. For instance, we want to decide if a sentence belongs to a given language, or to obtain further information about its constituents and structure.

Syntactic analysis (parsing) is an important part of FLT. For practical applications (in NLP, but also in other areas), we need efficient parsing methods and algorithms. This is the main reason why many of the formal models used in NLP are in some way based on CFG or context-free rules - CFG is the most powerful grammar of the Chomsky Hierarchy for which we are still able to perform syntatic analysis efficiently enough.

There are several well-known parsing algorithms for CFG, such as CKY or Chart Parsing. Some algorithms are only applicable to a given subset of CFGs (for example LL Parsing), while other ones are general.

LR Parsing PDF
- Bottom-up parsing.
- Can be used for any languages accepted by deterministic pushdown automata.
- (Generalized) LR Parsing is one of the practically used methods in NLP.

References

Yuji Matsumoto: Syntax and Parsing: Phrase Structure and Dependency Parsing Algorithms. SSLST, 2011
Alexander Meduna: Elements of Compiler Design. Taylor & Francis, New York, 2008

Acknowledgement

This work was supported by the MŠMT FRVŠ grant FR97/2011/G1.

Table of Contents

FR97/2011/G1 - Formal Models in Natural Language Processing

Materials for Lectures (FRVŠ: Horáček, Burgetová, Zámečníková (2011))

Introduction

Topics

Formal Description of Natural Languages

References

NLP Formalisms

References

Statistical NLP

References

Alternative Approach

References

Parsing

References

Acknowledgement