Home | Amazing | Today | Tags | Publishers | Years | Account | Search 
Loading
Taming Text: How to Find, Organize, and Manipulate It

Buy

Summary

Taming Text, winner of the 2013 Jolt Awards for Productivity, is a hands-on, example-driven guide to working with unstructured text in the context of real-world applications. This book explores how to automatically organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization. The book guides you through examples illustrating each of these topics, as well as the foundations upon which they are built.

About this Book

There is so much text in our lives, we are practically drowningin it. Fortunately, there are innovative tools and techniquesfor managing unstructured information that can throw thesmart developer a much-needed lifeline. You'll find them in thisbook.

Taming Text is a practical, example-driven guide to working withtext in real applications. This book introduces you to useful techniques like full-text search, proper name recognition,clustering, tagging, information extraction, and summarization.You'll explore real use cases as you systematically absorb thefoundations upon which they are built.Written in a clear and concise style, this book avoids jargon, explainingthe subject in terms you can understand without a backgroundin statistics or natural language processing. Examples arein Java, but the concepts can be applied in any language.

Written for Java developers, the book requires no prior knowledge of GWT.

Purchase of the print book comes with an offer of a free PDF, ePub, and Kindle eBook from Manning. Also available is all code from the book.

Winner of 2013 Jolt Awards: The Best Books—one of five notable books every serious programmer should read.

What's Inside

  • When to use text-taming techniques
  • Important open-source libraries like Solr and Mahout
  • How to build text-processing applications
About the Authors

Grant Ingersoll is an engineer, speaker, and trainer, a Lucenecommitter, and a cofounder of the Mahout machine-learning project. Thomas Morton is the primary developer of OpenNLP and Maximum Entropy. Drew Farris is a technology consultant, software developer, and contributor to Mahout,Lucene, and Solr.

"Takes the mystery out of verycomplex processes."—From the Foreword by Liz Liddy, Dean, iSchool, Syracuse University

Table of Contents

  1. Getting started taming text
  2. Foundations of taming text
  3. Searching
  4. Fuzzy string matching
  5. Identifying people, places, and things
  6. Clustering text
  7. Classification, categorization, and tagging
  8. Building an example question answering system
  9. Untamed text: exploring the next frontier
(HTML tags aren't allowed.)

Writing for Science and Engineering, Second Edition: Papers, Presentations and Reports (Elsevier Insights)
Writing for Science and Engineering, Second Edition: Papers, Presentations and Reports (Elsevier Insights)

Learning how to write clearly and concisely is an integral part of furthering your research career; however, doing so is not always easy. In this second edition, fully updated and revised, Dr. Silyn-Roberts explains in plain English the steps to writing abstracts, theses, journal papers, funding bids, literature reviews, and more. The book...

Finding Source Code on the Web for Remix and Reuse
Finding Source Code on the Web for Remix and Reuse

In recent years, searching for source code on the web has become increasingly common among professional software developers and is emerging as an area of academic research. This volume surveys past research and presents the state of the art in the area of "code retrieval on the web." This work is concerned with the algorithms, ...

Probability Theory: A Comprehensive Course (Universitext)
Probability Theory: A Comprehensive Course (Universitext)

This second edition of the popular textbook contains a comprehensive course in modern probability theory. Overall, probabilistic concepts play an increasingly important role in mathematics, physics, biology, financial engineering and computer science. They help us in understanding magnetism, amorphous media, genetic diversity and the perils...


Software Maintenance Success Recipes
Software Maintenance Success Recipes

Dispelling much of the folklore surrounding software maintenance, Software Maintenance Success Recipes identifies actionable formulas for success based on in-depth analysis of more than 200 real-world maintenance projects. It details the set of factors that are usually present when effective software maintenance teams do...

The Dip: A Little Book That Teaches You When to Quit (and When to Stick)
The Dip: A Little Book That Teaches You When to Quit (and When to Stick)

A New York Times, USA Today, and Wall Street Journal bestseller 

In this iconic bestseller, popular business blogger and bestselling author Seth Godin proves that winners are really just the best quitters. Godin shows that winners quit fast, quit often, and quit without guilt—until they commit...

Starting Statistics: A Short, Clear Guide
Starting Statistics: A Short, Clear Guide

Statistics: A Simple Guide for Students is an accessible, humorous and easy introduction to statistics for social science students.

In this refreshing book, experienced author and academic Neil Burdess shows that statistics is not...

©2017 LearnIT (support@pdfchm.net) - Privacy Policy