Python从网址解析HTML [英] Python HTML parsing from url

查看：151 发布时间：2018/6/29 15:14:30 python html parsing

本文介绍了Python从网址解析HTML的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我听说可以从链接中获取数据。但我想知道最好的方法，我已经阅读过这方面的内容，但我仍然想知道如何以及如何以及如何做最好的模块。我想解析一下：

 < div class =blalbal>< h2> DATA5< / h2> 
< div class =blabla> 
< table class =tabledata> 
< tr>< th> Blablabla：< / th>< td> DATA3< br>（DATA4）< / td>< / tr> 
< tr>< th> Blablabla：< / th>< td> DATA2< / td>< / tr> 
< tr>< th> Blablabla：< / th>< td> DATA1< / td>< / tr> 
< / td>

作为字符串，如DATA1，DATA2，DATA3（DATA4），DATA5

所以，我想看看这是如何可能的（只是一个例子），什么是最好的&最快的方法。谢谢！

解决方案

来自Python HTMLParser文档：

 来自HTMLParser导入HTMLParser 
 
＃创建一个子类并覆盖处理程序方法
 class MyHTMLParser（HTMLParser）：
 def handle_starttag（self，tag，attrs）：
 print遇到一个开始标记：，标记
 def handle_endtag（self，tag）：
 print遇到一个结束标记：，标记
 def handle_data（self，data）：
 print遇到一些数据：，data 
 
＃实例化解析器并为其提供一些HTML 
解析器= MyHTMLParser（）
 parser.feed（'< html>< head>< title> Test< / title> ;< / head>'
'< body>< h1>解析我！< / h1>< / body>< / html>'）

在你的情况下，你可以使用 handle_data 函数来打印H TML内容。

I've heard it's possible to get data from a link. But I want to know the best method, I've read about that, but I still want to know how and what's the best module to do so. I want to parse this:

<div class="blalbal"><h2>DATA5</h2>
<div class="blabla">
<table class="tabledata">
<tr><th>Blablabla:</th><td>DATA3<br>(DATA4)</td></tr>
<tr><th>Blablabla:</th><td>DATA2</td></tr>
<tr><th>Blablabla:</th><td>DATA1</td></tr>
</td>

as a string, like DATA1, DATA2, DATA3 (DATA4), DATA5

So, I'd want to see how is this possible (just an example) and what's the best & fastest method. Thanks!

解决方案

From Python HTMLParser Documentation:

from HTMLParser import HTMLParser

# create a subclass and override the handler methods
class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        print "Encountered a start tag:", tag
    def handle_endtag(self, tag):
        print "Encountered an end tag :", tag
    def handle_data(self, data):
        print "Encountered some data  :", data

# instantiate the parser and fed it some HTML
parser = MyHTMLParser()
parser.feed('<html><head><title>Test</title></head>'
            '<body><h1>Parse me!</h1></body></html>')

In your case you can just use the handle_data function to print HTML contents.

这篇关于Python从网址解析HTML的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python从网址解析HTML [英] Python HTML parsing from url

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

Python从网址解析HTML [英] Python HTML parsing from url

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭