Jay Taylor's notes

back to listing index

Alir3z4/html2text

[web search]
Original source (github.com)
Tags: python html markdown text github.com
Clipped on: 2017-05-28

Skip to content
  • You have no unread notifications
  • Create new…
  • Image (Asset 1/5) alt= Version Wheel? Image (Asset 2/5) alt=

    html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).

    Usage:  html2text [(filename|url) [encoding]] 

    Option Description
     --version  Show program's version number and exit
     -h ,  --help  Show this help message and exit
     --ignore-links  Don't include any formatting for links
     --escape-all  Escape all special characters. Output is less readable, but avoids corner case formatting issues.
     --reference-links  Use reference links instead of links to create markdown
     --mark-code  Mark preformatted and code blocks with [code]...[/code]

    For a complete list of options see the docs

    Or you can use it from within  Python :

    >>> import html2text
    >>>
    >>> print(html2text.html2text("<p><strong>Zed's</strong> dead baby, <em>Zed's</em> dead.</p>"))
    **Zed's** dead baby, _Zed's_ dead.
    
    

    Or with some configuration options:

    >>> import html2text
    >>>
    >>> h = html2text.HTML2Text()
    >>> # Ignore converting links from HTML
    >>> h.ignore_links = True
    >>> print h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!")
    Hello, world!
    
    >>> print(h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!"))
    
    Hello, world!
    
    >>> # Don't Ignore links anymore, I like links
    >>> h.ignore_links = False
    >>> print(h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!"))
    Hello, [world](http://earth.google.com/)!
    
    

    Originally written by Aaron Swartz. This code is distributed under the GPLv3.

    How to install

     html2text  is available on pypi https://pypi.python.org/pypi/html2text

    $ pip install html2text
    

    How to run unit tests

    PYTHONPATH=$PYTHONPATH:. coverage run --source=html2text setup.py test -v
    

    To see the coverage results:

    coverage combine
    coverage html
    

    then open the  ./htmlcov/index.html  file in your browser.

    Documentation

    Documentation lives here