从网页上抓取数据后无法生成一些自定义输出 [英] Can't produce some customized output after scraping data from a webpage

查看:44
本文介绍了从网页上抓取数据后无法生成一些自定义输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将数据附加到字典中,同时从Webppage抓取数据.我目前所获得的输出不是我希望如何排列它们的输出.这是网页.

I'm trying to append data to a dictionary while scraping the same from a webppage. The output that I'm having at this moment is not how I wish to arrange them. This is the webpage.

我尝试过:

import requests
from bs4 import BeautifulSoup
from pprint import pprint

url = 'https://elllo.org/english/grammar/L1-01-AimeeTodd-Intros-BeVerb.htm'
data = []

r = requests.get(url)
soup = BeautifulSoup(r.text,"lxml")
for item in soup.select("#transcript p"):
    d = {}

    if "Aimee:" in item.text:
        d['Aimee'] = item.text.replace("Aimee:","").strip()

    elif "Todd:" in item.text:
        d['Todd'] = item.text.replace("Todd:","").strip()

    data.append(d)

pprint(data)

输出结果如下:

[{'Aimee': 'So Todd, where are you from?'},
 {'Todd': "I am from the U.S., I am from San Francisco. It's on the west "
          'coast.'},
 {'Aimee': 'And what do you do?'},
 {'Todd': "I'm an English teacher. Also, I create Elllo. I work on Elllo a "
          'lot.'}

预期输出:

[{'Aimee': 'So Todd, where are you from?','Todd': "I am from the U.S., I am from San Francisco. It's on the west "
          'coast.'},

 {'Aimee': 'And what do you do?','Todd': "I'm an English teacher. Also, I create Elllo. I work on Elllo a "
          'lot.'},

如何产生第二个输出?

How can I produce the second output?

推荐答案

r = requests.get(url)
soup = BeautifulSoup(r.text,"lxml")
d = {}
for item in soup.select("#transcript p"):

    if "Aimee:" in item.text:
        d['Aimee'] = item.text.replace("Aimee:","").strip()

    elif "Todd:" in item.text:
        d['Todd'] = item.text.replace("Todd:","").strip()
        data.append(d)
        d = {}

pprint(data)

这篇关于从网页上抓取数据后无法生成一些自定义输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆