用Python从HTML中提取数据 [英] Extracting data from HTML with Python

查看：193 发布时间：2018/6/15 12:08:34 python html

本文介绍了用Python从HTML中提取数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的代码在Python中处理了以下文本：

 < td> 
 < br /> 
某些资料1< br /> 
一些数据2< br /> 
某些资料3< / td>

您能否告诉我如何从< td> ？
我的想法是使用以下格式将其存储在CSV文件中：某链接，某些数据1，某些数据2，某些数据3 。

我希望如果没有正则表达式，它可能会很难，但是我仍然很难对付正则表达式。

我用我的代码或多或少以下面的方式：

tabulka = subpage.find（table） for row in tabulka.findAll（'tr'）： col = row.findAll（'td'） print col [0]
，理想情况是每个td在某个数组中竞争。上面的Html是python的结果。 解决方案

获取 BeautifulSoup 并使用它。很好。

  $> easy_install pip 
 $> pip安装BeautifulSoup 
 $> python 
>>>从BeautifulSoup导入BeautifulSoup as BS 
>>> import urllib2 
>>> html = urllib2.urlopen（your_site_here）
>>>汤= BS（html）
>>> elem = soup.findAll（'a'，{'title'：'title here'}）
>>> elem [0] .text

I have following text processed by my code in Python:

<td>
<a href="http://www.linktosomewhere.net" title="title here">some link</a>
<br />
some data 1<br />
some data 2<br />
some data 3</td>

Could you advice me how to extract data from within <td>? My idea is to put it in a CSV file with the following format: some link, some data 1, some data 2, some data 3.

I expect that without regular expression it might be hard but truly I still struggle against regular expressions.

I used my code more or less in following manner:
tabulka = subpage.find("table") for row in tabulka.findAll('tr'): col = row.findAll('td') print col[0]
and ideally would be to get each td contend in some array. Html above is a result from python.
解决方案
Get BeautifulSoup and just use it. It's great.
$> easy_install pip $> pip install BeautifulSoup $> python >>> from BeautifulSoup import BeautifulSoup as BS >>> import urllib2 >>> html = urllib2.urlopen(your_site_here) >>> soup = BS(html) >>> elem = soup.findAll('a', {'title': 'title here'}) >>> elem[0].text

这篇关于用Python从HTML中提取数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

用Python从HTML中提取数据 [英] Extracting data from HTML with Python

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

用Python从HTML中提取数据 [英] Extracting data from HTML with Python

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭