(美丽的汤)如何从HTML标签提取数据 [英] (Beautiful Soup) How to extract data from HTML tags
本文介绍了(美丽的汤)如何从HTML标签提取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
到目前为止,我已经开始了.我无法从div获取正常文本.
So far I have started with this. I can't get the normal text from div.
from BeautifulSoup import BeautifulSoup
import urllib2
get = BeautifulSoup(urllib2.urlopen("https://example/com/").read()).findAll('div', {'class':'h4 entry-title'})
import sys
for i in get:
print i
请问我该如何从HTML中抓取数据?我只需要这些颜色名称和段落.
How can I scrap data from this HTML please ? I only need these color name and paragraph.
<div class="h4 entry-title">
<a href="https://example/com/01/">RED</a>
</div>
<p>
I am paragraph red
<p>
<div class="h4 entry-title">
<a href="https://example.com/02/">WHITE</a>
</div>
<p>
I am paragraph white
</p>
<div class="h4 entry-title">
<a href="https://example.com/03/">PINK</a>
</div>
<p>
I am paragraph pink
</p>
我的问题:
- 如何从此HTML中抓取数据?我只需要文字和段落.
控制台中需要的输出
RED I am paragraph red
WHITE I am paragraph white
PINK I am paragraph pink
- 如何将这些数据集自动导入到SQL文件中?
我想要的输出数据库表(名称,说明):
name: RED,WHITE,PINK
description: I am paragraph RED, I am paragraph WHITE, I am paragraph PINK
推荐答案
回答一个问题,像这样写:
Answering question one, write it like this:
for div in BeautifulSoup(urllib2.urlopen("https://example/com/").read()).findAll('div', {'class':'h4 entry-title'}):
for a in div.findAll('a'):
print a.text
for p in div.findAll('p'):
print p.text
这篇关于(美丽的汤)如何从HTML标签提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文