The focus of this book is effective databases for text and document management inclusive of new and enhanced techniques, methods, theories and practices. The research contained in these chapters is of particular significance to researchers and practitioners alike because of the rapid pace at which the Internet and related technologies are changing our world. Already there is a vast amount of data stored in local databases and Web pages (HTML, DHTML, XML and other markup language documents). In order to take advantage of this wealth of knowledge, we need to develop effective ways of extracting, retrieving and managing the data. In addition, advances in both database and Web technologies require innovative ways of dealing with data in terms of syntactic and semantic representation, integrity, consistency, performance and security.
One of the objectives of this book is to disseminate research that is based on existing Web and database technologies for improved information extraction and retrieval capabilities. Another important objective is the compilation of international efforts in database systems, and text and document management in order to share the innovation and research advances being done at a global level.
The book is organized into four sections, each of which contains chapters that focus on similar research in the database and Web technology areas. In the section entitled, Information Extraction and Retrieval in Web-Based Systems, Web and database theories, methods and technologies are shown to be efficient at extracting and retrieving information from Web-based documents. In the first chapter, “System of Information Retrieval in XML Documents,” Saliha Smadhi introduces a process for retrieving relevant information from XML documents. Smadhi’s approach supports keyword-based searching, and ranks the retrieval of information based on the similarity with the user’s query. In “Information Extraction from Free-Text Business Documents,” Witold Abramowicz and Jakub Piskorski investigate the applicability of information extraction techniques to free-text documents typically retrieved from Web-based systems. They also demonstrate the indexing potential of lightweight linguistic text processing techniques in order to process large amounts of textual data.