Beautiful Soup is a HTML
parser written in Python, robust and well designed. I am using it for this
website and it helps a lot. But it has a limitation: it only outputs XML
empty element
. If you feed '<br>' to Beautiful Soup;
it will output '<br />'.
That's a problem when working with HTML 4.01 Strict documents where
'<br>' is preferable. See Empty elements in
SGML, HTML, XML, and XHTML for more details.
So I hacked BeautifulSoup.py to add an htmlDialect parameter to the 'rendering' methods:
'<br>'
>>> BeautifulSoup.BeautifulSoup("<br>").renderContents(htmlDialect=True)
'<br>'
>>> BeautifulSoup.BeautifulSoup("<br>").prettify(htmlDialect=True)
'<br>\n'
I sent the patch twice to the author of Beautiful Soup and I did not get any reply. So I am posting it here; maybe it will be useful to others.
The patch to apply against version 3.0.6.