Home | Amazing | Today | Tags | Publishers | Years | Account | Search 
Python Web Scraping Cookbook: Over 90 proven recipes to get you scraping with Python, micro services, Docker and AWS


Untangle your web scraping complexities and access web data with ease using Python scripts

Key Features

  • Hands-on recipes to advance your web scraping skills to expert level
  • Address complex and challenging web scraping tasks using Python
  • Understand the web page structure and collect meaningful data from the website with ease

Book Description

Python Web Scraping Cookbook is a solution-focused book that will teach you techniques to develop high-performance scrapers and deal with crawlers, sitemaps, forms automation, Ajax-based sites, and caches.You'll explore a number of real-world scenarios where every part of the development/product life cycle will be fully covered. You will not only develop the skills to design and develop reliable data flows, but also deploy your codebase to an AWS. If you are involved in software engineering, product development, or data mining (or are interested in building data-driven products), you will find this book useful as each recipe has a clear purpose and objective.

Right from extracting data from the websites to writing a sophisticated web crawler, the book's independent recipes will be a godsend on the job. This book covers Python libraries, requests, and BeautifulSoup. You will learn about crawling, web spidering, working with AJAX websites, and paginated items. You will also learn to tackle problems such as 403 errors, working with proxy, scraping images, and LXML.

By the end of this book, you will be able to scrape websites more efficiently and deploy and operate your scraper in the cloud.

What you will learn

  • Use a wide variety of tools to scrape any website and data-including BeautifulSoup, Scrapy, Selenium, and many more
  • Master expression languages such as XPath, CSS, and regular expressions to extract web data
  • Deal with scraping traps such as hidden form fields, throttling, pagination, and different status codes
  • Build robust scraping pipelines with SQS and RabbitMQ
  • Scrape assets such as images media and know what to do when Scraper fails to run
  • Explore ETL techniques of building a customized crawler, parser, and convert structured and unstructured data from websites
  • Deploy and run your scraper as a service in AWS Elastic Container Service

Who This Book Is For

Python Web Scraping Cookbook is ideal for Python programmers, web administrators, security professionals or someone who wants to perform web analytics would find this book relevant and useful. Familiarity with Python and basic understanding of web scraping would be useful to take full advantage of this book.

Table of Contents

  1. Getting Started with Scraping
  2. Data Acquisition and Extraction
  3. Processing Data
  4. Working with Images, Audio and Other Assets
  5. Scraping - Code of Conduct
  6. Scraping Challenges and Solutions
  7. Text Wrangling and Analysis
  8. Searching, Mining and Visualizing Data
  9. Working with an API and Providing a Data API
  10. Creating Scraper Microservices with Docker
  11. A Complete Real-World Example
(HTML tags aren't allowed.)

Pro DevOps with Google Cloud Platform: With Docker, Jenkins, and Kubernetes
Pro DevOps with Google Cloud Platform: With Docker, Jenkins, and Kubernetes
Use DevOps principles with Google Cloud Platform (GCP) to develop applications and services. This book builds chapter by chapter to a complete real-life scenario, explaining how to build, monitor, and maintain a complete application using DevOps in practice.

Starting with core DevOps concepts, continuous...
Crossroads: History of Science, History of Art: Essays by David Speiser, vol. II
Crossroads: History of Science, History of Art: Essays by David Speiser, vol. II
Perusing the titles of the essays in this book makes it clear that its author is a man with many interests and a great curiosity. David Speiser is a lover and connoisseur of art. His view of the world is coloured by his familiarity with mathematics, that is, with “organized imagination” (his definition, see p. 49). As a...
OS X Mountain Lion Server For Dummies
OS X Mountain Lion Server For Dummies

Create a Mac network in your home or office

There's no doubt about it: Macs, iPhones, and iPads have invaded the workplace. But, you don't need an IT department to administer a Mac network in your home or business. This friendly guide explains everything you need to know to set one up yourself using OS X Mountain...

Creating Web Pages All in One
Creating Web Pages All in One

Did you know that all the tools you need to create GREAT Web pages are free -  they are built right into Windows XP or are free online!

In Sams Teach Yourself to Create Web Pages All in One you will learn the basics on creating a variety of different types of...

VSAT Networks
VSAT Networks
Now fully revised and updated, VSAT Networks continues to cover all of the essential issues involved with the installation and operation of networks of small earth stations called ‘Very Small Aperture Terminal’. VSATs are typically one to two meters in antenna reflector diameter and communicate with one another, or with a...
Biomarkers of Environmentally Associated Disease: Technologies, Concepts, and Perspectives
Biomarkers of Environmentally Associated Disease: Technologies, Concepts, and Perspectives

The end of the 20th century brought with it a revolution in molecular biology that culminated in advances such as the completion of the human genome. This has brought optimism to the fields of toxicology and environmental health, and the anticipation that molecular biomarkers might soon come of age and have a major impact on human and...

©2019 LearnIT (support@pdfchm.net) - Privacy Policy