TutsPlus.com: Parsing HTML With PHP Using DiDOM
The TutsPlus.com site has posted a tutorial showing you how to use the DiDOM library to parse HTML in PHP. The DiDOM is a "simple and fast parser" packed with a lot of functionality for parsing, searching and modifying HTML.
Every now and then, developers need to scrape webpages to get some information from a website. For example, let’s say you are working on a personal project where you have to get geographical information about the capitals of different countries from Wikipedia. Entering this manually would take a lot of time. However, you can do it very quickly by scraping the Wikipedia page with the help of PHP. You will also be able to automatically parse the HTML to get specific information instead of going through the whole markup manually.
In this tutorial, we will learn about a fast, easy-to-use HTML parser called DiDOM. We will begin with the installation process and then learn how to extract information from different elements on a webpage using different kinds of selectors like tags, classes, etc.
The tutorial starts by helping you get the package installed (via Composer) and provides a simple example of using it to parse either a string of HTML, a local document or a remote site. It then walks you through using the search functionality built into the library, using either CSS selector type strings. They also include examples of traversing the DOM, updating element attributes, and adding/removing/replacing elements.