Hubbry Logo
search button
Sign in
Comparison of HTML parsers
Comparison of HTML parsers
Comunity Hub
History
arrow-down
starMore
arrow-down
bob

Bob

Have a question related to this hub?

bob

Alice

Got something to say related to this hub?
Share it here.

#general is a chat channel to discuss anything related to the hub.
Hubbry Logo
search button
Sign in
Comparison of HTML parsers
Community hub for the Wikipedia article
logoWikipedian hub
Welcome to the community hub built on top of the Comparison of HTML parsers Wikipedia article. Here, you can discuss, collect, and organize anything related to Comparison of HTML parsers. The purpose of t...
Add your contribution
Comparison of HTML parsers

HTML parsers are software for automated Hypertext Markup Language (HTML) parsing. They have two main purposes:

  • HTML traversal: offer an interface for programmers to easily access and modify the "HTML string code". Canonical example: DOM parsers.
  • HTML clean: to fix invalid HTML and to improve the layout and indent style of the resulting markup. Canonical example: HTML Tidy.
Parser License Implementation language(s) Latest date* HTML parsing[1] HTML5-compliant parsing Clean HTML** Update HTML***
HTML Tidy W3C license ANSI C 2021-07-17[2] Yes[3] Yes Yes[3] Yes
HtmlUnit Apache License 2.0 Java 2023-10-31[4] Yes ? No No
Beautiful Soup MIT License Python 2023-04-07[5] Yes Yes ? No
jsoup MIT License Java 2025-08-25[6] Yes Yes Yes Yes
Parser License Implementation language(s) Latest date* HTML Parsing HTML5-compliant Parsing Clean HTML** Update HTML***
* Latest release (of significant changes) date.
** sanitize (generating standard-compatible web-page, reduce spam, etc.) and clean (strip out surplus presentational tags, remove XSS code, etc.) HTML code.
*** Updates HTML4.X to XHTML or to HTML5, converting deprecated tags (ex. CENTER) to valid ones (ex. DIV with style="text-align:center;").

References

[edit]
  1. ^ "HTML Standard". html.spec.whatwg.org. Archived from the original on January 16, 2013.
  2. ^ "Release 5.8.0 · htacg/tidy-html5". GitHub.
  3. ^ a b "HTML Tidy". www.html-tidy.org.
  4. ^ "Release HtmlUnit 3.7.0 · HtmlUnit/htmlunit". GitHub.
  5. ^ "Index of /software/BeautifulSoup/bs4/download/4.12". www.crummy.com.
  6. ^ "jsoup release 1.21.2 (2025-Aug-25)". jsoup.org.