(美丽的汤)如何从HTML标签提取数据 [英] (Beautiful Soup) How to extract data from HTML tags

查看：62 发布时间：2021/4/15 19:07:37 python sqlite web-scraping beautifulsoup urllib2

本文介绍了(美丽的汤)如何从HTML标签提取数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

到目前为止，我已经开始了.我无法从div获取正常文本.

So far I have started with this. I can't get the normal text from div.

from BeautifulSoup import BeautifulSoup
import urllib2
get = BeautifulSoup(urllib2.urlopen("https://example/com/").read()).findAll('div', {'class':'h4 entry-title'})
import sys
for  i in get:
print i

请问我该如何从HTML中抓取数据?我只需要这些颜色名称和段落.

How can I scrap data from this HTML please ? I only need these color name and paragraph.

<div class="h4 entry-title">
<a href="https://example/com/01/">RED</a>
</div>
<p>
I am paragraph red
<p>

<div class="h4 entry-title">
<a href="https://example.com/02/">WHITE</a>
</div>
<p>
I am paragraph white
</p>


<div class="h4 entry-title">
<a href="https://example.com/03/">PINK</a>
</div>
<p>
I am paragraph pink
</p>

我的问题:

如何从此HTML中抓取数据?我只需要文字和段落.

控制台中需要的输出

RED I am paragraph red
WHITE I am paragraph white
PINK I am paragraph pink

如何将这些数据集自动导入到SQL文件中?

我想要的输出数据库表(名称，说明):


name: RED,WHITE,PINK
description: I am paragraph RED, I am paragraph WHITE, I am paragraph PINK

推荐答案

回答一个问题，像这样写:

Answering question one, write it like this:

for div in BeautifulSoup(urllib2.urlopen("https://example/com/").read()).findAll('div', {'class':'h4 entry-title'}):
    for a in div.findAll('a'):
        print a.text
    for p in div.findAll('p'):
        print p.text

这篇关于(美丽的汤)如何从HTML标签提取数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

(美丽的汤)如何从HTML标签提取数据 [英] (Beautiful Soup) How to extract data from HTML tags

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

(美丽的汤)如何从HTML标签提取数据 [英] (Beautiful Soup) How to extract data from HTML tags

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭