Source code from http://www.jamesmolloy.co.uk/tutorial_html/index.html with improved build system and some simplifications. Behaviour is very close to the tutorial so ...
htmlparser2 is the fastest HTML parser, and takes some shortcuts to get there. If you need strict HTML spec compliance, have a look at parse5.