如何使用python来比较两个html文件 [英] how to using python to diff two html files

查看:724
本文介绍了如何使用python来比较两个html文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用python来比较两个html文件:

例子:

  html_1 =
< p>我喜欢它< / p>

html_2 =
< h2>它< / p>

diff文件会像这样:

  diff_html =
< del>< p>我爱它< / p>< dev>< ins>< h2>我爱它< / h2>< / ins>

是否有这样的python lib帮助我做到这一点?

解决方案

lxml 可以做类似于你想要的东西。从文档:

 >>> from lxml.html.diff import htmldiff 
>>> < / p>'''
>>> doc2 ='''< p>这里是< b>很多< / b> < i>文字< / i>。< / p>'''
>>> print htmldiff(doc1,doc2)
< p>这里是< ins>< b>很多< / b> < i>文字< / i>。< / ins> < del>一些文字。< / del> < / p为H.

我不知道这个特定任务的其他Python库,但您可能想看逐字逐句比较。他们可能近似你想要的。



一个例子是 diff.py ,然后 import diff

 >>> diff.htmlDiff(a,b)
>>> <德尔>< p为H. I< /德尔> <插件>< H2 I标记< / INS> love< del>它< / p>< / del> < />< / p>< / ins>'


i want use python to diff two html files:

example :

html_1 = """
<p>i love it</p>
"""
html_2 = """ 
<h2>i love it </p>
"""

the diff file will like this :

diff_html = """
<del><p>i love it</p></dev><ins><h2>i love it</h2></ins>
"""

is there such python lib help me do this ?

解决方案

lxml can do something similar to what you want. From the docs:

>>> from lxml.html.diff import htmldiff
>>> doc1 = '''<p>Here is some text.</p>'''
>>> doc2 = '''<p>Here is <b>a lot</b> of <i>text</i>.</p>'''
>>> print htmldiff(doc1, doc2)
<p>Here is <ins><b>a lot</b> of <i>text</i>.</ins> <del>some text.</del> </p>

I don't know of any other Python library for this specific task, but you may want to look into word-by-word diffs. They may approximate what you want.

One example is this one, implemented in both PHP and Python (save it as diff.py, then import diff)

>>> diff.htmlDiff(a,b)
>>> '<del><p>i</del> <ins><h2>i</ins> love <del>it</p></del> <ins>it </p></ins>'

这篇关于如何使用python来比较两个html文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆