Empty tag in Beautiful Soup
Beautiful Soup is a HTML
parser written in Python, robust and well designed. I am using it for this
website and it helps a lot. But it has a limitation: it only outputs XML
empty element
. If you feed '<br>' to Beautiful Soup;
it will output '<br />'.
That's a problem when working with HTML 4.01 Strict documents where
'<br>' is preferable. See Empty elements in
SGML, HTML, XML, and XHTML for more details.
So I hacked BeautifulSoup.py to add an htmlDialect parameter to the 'rendering' methods:
>>> tag = BeautifulSoup.BeautifulSoup("<br>")
>>> tag.__str__(htmlDialect=True)
'<br>'
>>> tag.renderContents(htmlDialect=True)
'<br>'
>>> tag.prettify(htmlDialect=True)
'<br>\n'
I sent the patch twice to the author of Beautiful Soup and I did not get any
reply. So I am posting it here; maybe it will be useful to others.
The patch to apply against
version 3.0.6.