Empty tag in Beautiful Soup
Beautiful Soup is a HTML
parser written in Python, robust and well designed. I am using it for this
website and it helps a lot. But it has a limitation: it only outputs XML
empty element. If you feed
'<br>' to Beautiful Soup;
it will output
That's a problem when working with HTML 4.01 Strict documents where
'<br>' is preferable. See Empty elements in
SGML, HTML, XML, and XHTML for more details.
So I hacked BeautifulSoup.py to add an htmlDialect parameter to the 'rendering' methods:
>>> tag = BeautifulSoup.BeautifulSoup("<br>") >>> tag.__str__(htmlDialect=True) '<br>' >>> tag.renderContents(htmlDialect=True) '<br>' >>> tag.prettify(htmlDialect=True) '<br>\n'
I sent the patch twice to the author of Beautiful Soup and I did not get any reply. So I am posting it here; maybe it will be useful to others.
The patch to apply against version 3.0.6.