Pro Perl Parsing

Pro Perl Parsing, 9781590595046 (1590595041), Apress, 2005

Over the course of the past decade, we have all been witnesses to an explosion of information,
in terms of both the amounts of knowledge that exists within the world and the
availability of such information, with the proliferation of the World Wide Web being a
prime example. Although these advancements of knowledge have undoubtedly been
beneficial, they have also created new challenges in information retrieval, in information
processing, and in the extraction of relevant information. This is in part due to a diversity
of file formats as well as the proliferation of loosely structured formats, such as HTML.
The solution to such information retrieval and extraction problems has been to develop
specialized parsers to conduct these tasks. This book will address these tasks, starting
with the most basic principles of data parsing.

The book will begin with an introduction to parsing basics using Perl’s regular expression
engine. Once these regex basic are mastered, the book will introduce the concept of
generative grammars and the Chomsky hierarchy of grammars. Such grammars form the
base set of rules that parsers will use to try to successfully parse content of interest, such as
text or XML files. Once grammars are covered, the book proceeds to explain the two basic
types of parsers—those that use a top-down approach and those that use a bottom-up
approach to parsing. Coverage of these parser types is designed to facilitate the understanding
of more powerful parsing modules such as Yapp (bottom-up) and RecDescent
(top-down).

Once these powerful and flexible generalized parsing modules are covered, the book
begins to delve into more specialized parsing modules such as parsing modules designed
to work with HTML. Within Chapter 6, the book also provides an overview of the LWP modules,
which facilitate access to documents posted on the Web. The parsing examples within
this chapter will use the LWP modules to parse data that is directly accessed from the Web.
Next the book examines the parsing of XML data, which is a markup language that is
increasingly growing in popularity. The XML coverage also discusses SOAP and XML-RPC,
which are two of the most popular methods for accessing remote XML-formatted data. The
book then covers several smaller parsing modules, such as an RSS parser and a date/time
parser, as well as some useful parsing tasks, such as the parsing of configuration files. Lastly,
the book introduces data mining. Data mining provides a means for individuals to work
with extracted data (as well as other types of data) so that the data can be used to learn
more about a given area or to make predictions about future directions that area of interest
may take. This content aims to demonstrate that although parsing is often a critical data
extraction and retrieval task, it may just be a component of a larger data mining system.

Comments

Amazing Books

Expert Systems in Chemistry Research

CRC Press, 2007

Expert systems allow scientists to access, manage, and apply data and specialized knowledge from various disciplines to their own research. Expert Systems in Chemistry Research explains the general scientific basis and computational principles behind expert systems and demonstrates how they can improve the efficiency of scientific workflows and...

Beginning RSS and Atom Programming

Wrox Press, 2005

RSS and Atom are specifications that give users the power to subscribe to information they want to receive and give content developers tools to provide continuous subscriptions to willing recipients in a spam-free setting. RSS and Atom are the technical power behind the growing millions of blogs on the Web. Blogs change the Web from a set of static...

Imaging of Soft Tissue Tumors

Lippincott Williams & Wilkins, 2006

Based on a vast number of cases seen at the Armed Forces Institute of Pathology and the Mayo Clinic, this volume is a comprehensive reference on the radiologic evaluation of soft tissue tumors. The book covers the entire spectrum of soft tissue pathologies, with over 1,400 images showing common and atypical appearances. The authors...

Marketing and Selling Professional Services in Architecture and Construction

John Wiley & Sons, 2009

This practical book on selling and marketing will help architects, engineers, project managers, facilities managers, surveyors, and contractors ‘sell’ themselves to prospective clients.

As clients become more sophisticated at both local and international level, and as competition in the construction industry increases,...

Computable Models of the Law: Languages, Dialogues, Games, Ontologies (Lecture Notes in Computer Science / Lecture Notes in Artificial Intelligence)

Springer, 2008

Information technology has now pervaded the legal sector, and the very modern concepts of e-law and e-justice show that automation processes are ubiquitous. European policies on transparency and information society, in particular, require the use of technology and its steady improvement.

Some of the revised papers presented in this book...

Aspects of Semidefinite Programming: Interior Point Algorithms and Selected Applications (Applied Optimization)

Springer, 2002

This monograph has grown from my PhD thesis Interior point Methods for Semidefinite Programming [39] which was published in December 1997. Since that time, Semidefinite Programming (SDP) has remained a popular research topic and the associated body of literature has grown considerably. As SDP has proved such a useful tool in many applications, like...