This text, co-authored with Isabelle Moulinier and first published by John Benjamins in 2002, describes applications of NLP technology to the world of Internet search and publishing. Since 2007, it is now available in a second revised edition.
Table of Contents
Preface to the Second Edition
1. Natural Language Processing
1.1. What is NLP?
1.2. NLP and Linguistics
1.3. Linguistic Tools
1.4. Plan of the Book
2. Document Retrieval
2.1. Information Retrieval
2.2. Indexing Technology
2.3. Query Processing
2.4. Evaluating Search Engines
2.5. Attempts to Enhance Search Engine Performance
2.6. The Future of Web Searching
3. Information Extraction
3.1. The Message Understanding Conferences
3.2. Regular Expressions
3.3. Finite Automata in FASTUS
3.4. Context Free Grammars
3.5. Limitations of Current Technology and Future Research
3.6. Summary of Information Extraction
4. Text Categorization
4.1. Overview of Categorization Tasks and Methods
4.2. Handcrafted Rule Based Methods
4.3. Inductive Learning for Text Classification
4.4. Nearest Neighbor Algorithms
4.5. Combining Classifiers
4.6. Evaluation of Text Categorization Systems
5. Towards Text Mining
5.1. What is Text Mining?
5.2. Reference & Coreference
5.3. Automatic Summarization
5.4. Testing of Automatic Summarization Programs
5.5. Prospects for Text Mining
Errata (1st Edn)
p.17, Fig 1.2. "NP: sleep furiously" should be "VP: sleep furiously" in both parse trees.
p.26. Fig 2.1. Column 3 of bottom posting should read "4 26", not "4 26 32".
p.100. Table 3.7. There is a missing row in the wfsst.
Errata (2nd Edn)
p.26. Fig 2.1. Column 3 of bottom posting should read "4 26", not "4 26 32". (Yep, we missed this one.)
Isabelle Moulinier is a Lead Research Scientist in the R&D lab at Thomson. She previously studied at Paris VI and Pierre et Marie Curie Universities in France, and worked for IBM in Paris. Her Ph.D. was in the area of text categorization.
Since joining Thomson in 1997, she has worked in the areas of non-English information retrieval, concept search, and vertical search engines for legal documents and business news, leading to the launch of a number of new portals around the world.
Reviews of the 1st Edn
"The book is a very good, concise reference book, filled with many theoretical principles and practical guidelines." (In Linguist List, vol 14, 226, 2003)
"I would recommend it to anyone who is interested in NLP and its applications to the challenges brought about by the arrival of the information age." (In Terminology, vol 10(1), 2004)
"Some special features of the book include solid coverage of evaluation techniques in every chapter, excellent endnotes, and references to exactly the right stuff." (In Language, 80(1), 2004)
Reviews of the 2nd Edn
"L'ensemble de l'ouvrage est d'une lecture agréable, l'anglais reste accessible, évitant tout jargon. La construction de l'ouvrage est particulaierement didactique, permettant au novice de se familiariser avec les différents domains d'applications et au professionnel ou au chercher de maîtriser et d'approfondir les technologies et les techniques mises en oeuvres. L'ouvrage est une réédition, enrichie des avancées technologiques et techniques servenues depuis la premiere edition, et a été rendu moins académique dans sa présentation pour permettre à un public professionnel de l'utiliser. Les références citées ont également été mis à jour." (In TAL, 48(3), 2007)
Copyright Peter Jackson, All rights reserved.