Jay Taylor's notes

back to listing index

Html2text by Alir3z4

[web search]
Original source (alir3z4.github.io)
Tags: python html markdown text alir3z4.github.io
Clipped on: 2017-05-28


Convert HTML to Markdown-formatted text.

View on GitHub Download .zip Download .tar.gz


Image (Asset 1/2) alt= Version Egg? Wheel? Format License

html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).

Usage: html2text [(filename|url) [encoding]]

Option Description
--version Show program's version number and exit
-h, --help Show this help message and exit
--ignore-links Don't include any formatting for links
--escape-all Escape all special characters. Output is less readable, but avoids corner case formatting issues.
--reference-links Use reference links instead of links to create markdown
--mark-code Mark preformatted and code blocks with [code]...[/code]

For a complete list of options see the docs

Or you can use it from within Python:

>>> import html2text
>>> print(html2text.html2text("<p><strong>Zed's</strong> dead baby, <em>Zed's</em> dead.</p>"))
**Zed's** dead baby, _Zed's_ dead.

Or with some configuration options:

>>> import html2text
>>> h = html2text.HTML2Text()
>>> # Ignore converting links from HTML
>>> h.ignore_links = True
>>> print h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!")
Hello, world!

>>> print(h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!"))

Hello, world!

>>> # Don't Ignore links anymore, I like links
>>> h.ignore_links = False
>>> print(h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!"))
Hello, [world](http://earth.google.com/)!

Originally written by Aaron Swartz. This code is distributed under the GPLv3.

How to install

html2text is available on pypi https://pypi.python.org/pypi/html2text

$ pip install html2text

How to run unit tests

PYTHONPATH=$PYTHONPATH:. coverage run --source=html2text setup.py test -v

To see the coverage results:

coverage combine
coverage html

then open the ./htmlcov/index.html file in your browser.


Documentation lives here

Html2text is maintained by Alir3z4. This page was generated by GitHub Pages using the Cayman theme by Jason Long.