Empty tag in Beautiful Soup

June 6 2008

Beautiful Soup is a HTML parser written in Python, robust and well designed. I am using it for this website and it helps a lot. But it has a limitation: it only outputs XML empty element. If you feed '<br>' to Beautiful Soup; it will output '<br />'.

That's a problem when working with HTML 4.01 Strict documents where '<br>' is preferable. See Empty elements in SGML, HTML, XML, and XHTML for more details.

So I hacked BeautifulSoup.py to add an htmlDialect parameter to the 'rendering' methods:

    >>> tag = BeautifulSoup.BeautifulSoup("<br>")
    >>> tag.__str__(htmlDialect=True)
    '<br>'
    >>> tag.renderContents(htmlDialect=True)
    '<br>'
    >>> tag.prettify(htmlDialect=True)
    '<br>\n'
    


I sent the patch twice to the author of Beautiful Soup and I did not get any
reply. So I am posting it here; maybe it will be useful to others.


The patch to apply against
version 3.0.6.