无法弄清楚如何从Python 3调用html5Tidy [英] Can't figure out how to invoke html5Tidy from Python 3

查看:155
本文介绍了无法弄清楚如何从Python 3调用html5Tidy的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于Python 3.5.

For Python 3.5.

有人可以向我指出一些在Python 3中使用 html5tidy 的文档吗?令我惊讶的是,多次搜索没有返回任何内容.

Can someone please point me to some documentation for using html5tidy with Python 3? I'm amazed that multiple searches don't return anything.

在Python 3中, html5tidy.py 中的文档指出:

In Python 3, the documentation in html5tidy.py states:

"""
HTML5Tidy
=========

Simple wrapper around html5lib & lxml.etree to "tidy" html in the wild to
well-formed xml/html

Usage
-----

    >>> from html5tidy import tidy
    >>> tidy('some text')
    '<html><head/><body>some text</body></html>'

Dependencies
------------

* [html5lib](http://code.google.com/p/html5lib/)
* [lxml](http://lxml.de/)

好吧,所以我拥有了所有的东西:

Okay, so I have all the pieces:

>>> import html5lib
>>> dir(html5lib)
['HTMLParser', '__all__', '__builtins__', '__cached__', [and so on]]
>>> 
>>> import lxml
>>> dir(lxml)
['__builtins__', '__cached__', '__doc__', '__file__', [and so on]]

但是我注意到 dir(tidy)仅返回双下划线结果:

BUT I note that dir(tidy) returns only double-underscore results:

>>> from html5tidy import tidy
>>> dir(tidy)
['__annotations__', '__call__', '__class__', [and so on...]'__subclasshook__']

因此,我打开一个包含HTML的文件,其格式为 unidiedHTML .

So I open a file containing HTML as untidiedHTML.

>>> print(untidiedHTML)
<!DOCTYPE html>
<html id="ng-app" lang="en" ng-app="TH" style="" xmlns:ng="http://angularjs.org">
 <head ng-controller="DZHeadController">
  <meta content="text/html; charset=utf-8" http-equiv="content-type"/>
  <title ng-bind="service.title">
   What the Heck Is OAuth? - DZone Security
  </title>
  <link href="WhatIsOAuth0200_files/tranquility.css" rel="stylesheet" type="text/css"/>
 </head>
 <body class="tranquility" >
 ... and so on...

然后按照HTML5整洁的文档尝试:

Then per the HTML5 tidy documentation I try:

from html5tidy import tidy
tidiedHTML = tidy(untidiedHTML)

会产生:

Traceback (most recent call last):
  File "[path to my Python source file].py", line 50, in <module>
    tidiedHTML = tidy(untidiedHTML)
  File "/usr/local/lib/python3.5/dist-packages/html5tidy.py", line 61, in tidy
    parts = [parser.parse(src, encoding=encoding, parseMeta=parseMeta, useChardet=useChardet)]
  File "/usr/local/lib/python3.5/dist-packages/html5lib/html5parser.py", line 289, in parse
    self._parse(stream, False, None, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/html5lib/html5parser.py", line 130, in _parse
    self.tokenizer = _tokenizer.HTMLTokenizer(stream, parser=self, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/html5lib/_tokenizer.py", line 36, in __init__
    self.stream = HTMLInputStream(stream, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/html5lib/_inputstream.py", line 149, in HTMLInputStream
    return HTMLUnicodeInputStream(source, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'parseMeta'

我不知道该怎么办.我搜索了说明如何从Python 3调用 html5tidy 的文档,但是我空了...

I have NO idea what to do. I've searched for documentation that explains how to invoke html5tidy from Python 3 but I've come up empty...

推荐答案

该库已损坏和/或不适用于python 3.5.我安装并遇到了与html5lib.HTMLParser相关的错误 https://github .com/aleray/html5tidy/blob/master/html5tidy.py#L57

That library is broken and/or doesn't work with python 3.5. I installed and ran into errors related to html5lib.HTMLParser https://github.com/aleray/html5tidy/blob/master/html5tidy.py#L57

只有一名贡献者,并且该软件包在6年内没有更新

Theres one contributor and the package has not been updated in 6 years

您的选择是

  • 分叉仓库,解决问题并提交拉取请求
  • 提取您需要的代码并自己动手
  • 找到另一个图书馆

这篇关于无法弄清楚如何从Python 3调用html5Tidy的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆