Pages

Showing posts with label parsing HTML. Show all posts
Showing posts with label parsing HTML. Show all posts

Wednesday, April 30, 2014

Parsing HTML Pages using HTML::Parser


Introduction to HTML::Parser

There are times when you will need to read an HTML file and extract a field from that file. Perl has a module called HTML::Parser that simplifies this task.

This module reads an HTML file and allows you to define actions when it reads a starting tag, the body and the end tag. To do this, you can define subroutines that are to be executed during these events. The HTML::Parser documentation lists all the events that can happen during processing. For our discussion, we will discuss only the starttext and end events.