Home | Amazing | Today | Tags | Publishers | Years | Account | Search 
Loading
Taming Text: How to Find, Organize, and Manipulate It

Buy

Summary

Taming Text, winner of the 2013 Jolt Awards for Productivity, is a hands-on, example-driven guide to working with unstructured text in the context of real-world applications. This book explores how to automatically organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization. The book guides you through examples illustrating each of these topics, as well as the foundations upon which they are built.

About this Book

There is so much text in our lives, we are practically drowningin it. Fortunately, there are innovative tools and techniquesfor managing unstructured information that can throw thesmart developer a much-needed lifeline. You'll find them in thisbook.

Taming Text is a practical, example-driven guide to working withtext in real applications. This book introduces you to useful techniques like full-text search, proper name recognition,clustering, tagging, information extraction, and summarization.You'll explore real use cases as you systematically absorb thefoundations upon which they are built.Written in a clear and concise style, this book avoids jargon, explainingthe subject in terms you can understand without a backgroundin statistics or natural language processing. Examples arein Java, but the concepts can be applied in any language.

Written for Java developers, the book requires no prior knowledge of GWT.

Purchase of the print book comes with an offer of a free PDF, ePub, and Kindle eBook from Manning. Also available is all code from the book.

Winner of 2013 Jolt Awards: The Best Books—one of five notable books every serious programmer should read.

What's Inside

  • When to use text-taming techniques
  • Important open-source libraries like Solr and Mahout
  • How to build text-processing applications
About the Authors

Grant Ingersoll is an engineer, speaker, and trainer, a Lucenecommitter, and a cofounder of the Mahout machine-learning project. Thomas Morton is the primary developer of OpenNLP and Maximum Entropy. Drew Farris is a technology consultant, software developer, and contributor to Mahout,Lucene, and Solr.

"Takes the mystery out of verycomplex processes."—From the Foreword by Liz Liddy, Dean, iSchool, Syracuse University

Table of Contents

  1. Getting started taming text
  2. Foundations of taming text
  3. Searching
  4. Fuzzy string matching
  5. Identifying people, places, and things
  6. Clustering text
  7. Classification, categorization, and tagging
  8. Building an example question answering system
  9. Untamed text: exploring the next frontier
(HTML tags aren't allowed.)

Data Quality: The Accuracy Dimension (The Morgan Kaufmann Series in Data Management Systems)
Data Quality: The Accuracy Dimension (The Morgan Kaufmann Series in Data Management Systems)
An informative resource for any data management staff, IT management staff, and CIOs of companies with data assets.

Data Quality: The Accuracy Dimension is about assessing the quality of corporate data and improving its accuracy using the data profiling method. Corporate data is increasingly important as companies continue to find new
...
The Driver: My Dangerous Pursuit of Speed and Truth in the Outlaw Racing World
The Driver: My Dangerous Pursuit of Speed and Truth in the Outlaw Racing World

On his deathbed, Alex Roy's father dropped tantalizing hints about the notorious Cannonball Run of the 1970s, the utterly illegal high-speed nonstop race from New York to L.A. that was nothing at all like the one portrayed in the Burt Reynolds movie.

Inspired by his father's dying words, and against the advice of his loyal,...

Object Oriented Programming in Eiffel
Object Oriented Programming in Eiffel
Provides a clear introduction to the Eiffel programming language. Covers the language, logical assertions, and design of object-oriented systems, making it ideal for a new programmer or those unfamiliar with object-oriented programming. Paper. DLC: Object-oriented prog.

The book is an introductory text on Eiffel for the new
...

C++ 2013 for C# Developers
C++ 2013 for C# Developers

C++/CLI was originally envisioned as a high-level assembler for the .NET runtime, much like C is often considered a high-level assembler for native code generation. That original vision even included the ability to directly mix IL with C++ code, mostly eliminating the need for the IL assembler ilasm.

As the design of C++/CLI
...

Electricity and Magnetism: New Formulation by Introduction of Superconductivity (Undergraduate Lecture Notes in Physics)
Electricity and Magnetism: New Formulation by Introduction of Superconductivity (Undergraduate Lecture Notes in Physics)

The author introduces the concept that superconductivity can establish a perfect formalism of electricity and magnetism. The correspondence of electric materials that exhibit perfect electrostatic shielding (E=0) in the static condition and superconductors that show perfect diamagnetism (B=0) is given to help readers understand the...

Foundations of Software Testing: ISTQB Certification
Foundations of Software Testing: ISTQB Certification
“….the dream team for this topic. If I could have my choice for any 4 authors worldwide on this topic, I’d go with these.” Ross Collard, Attglobal.net

“…there is no-one better placed to ensure that the book is perfectly aligned with the ISTQB” Stuart Reid, founding chair of the ISTQB
...
©2017 LearnIT (support@pdfchm.net) - Privacy Policy