我收到KeyError试图从网站上抓取数据 [英] I'm getting KeyError trying to scrape data from website

查看：98 发布时间：2020/9/20 6:40:01 python web-scraping beautifulsoup

本文介绍了我收到KeyError试图从网站上抓取数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我写了一个用于数据抓取的代码；它适用于某些页面，但对于某些页面却显示:

I wrote a code for data scraping; it works well for some pages, but for some it displays:

KeyError:'isbn'.

KeyError: 'isbn'.

您能指导我如何解决此问题吗?

Could you please guide me on how can I solve this issue?

这是我的代码:

import requests
import re
import json
from bs4 import BeautifulSoup
import csv
import sys
import codecs


def Soup(content):
    soup = BeautifulSoup(content, 'html.parser')
    return soup


def Main(url):
    r = requests.get(url)
    soup = Soup(r.content)
    scripts = soup.findAll("script", type="application/ld+json",
                           text=re.compile("data"))
    prices = [span.text for span in soup.select(
        "p.product-field.price span span") if span.text != "USD"]
    with open("AudioBook/Fiction & Literature/African American.csv", 'a', encoding="utf-8", newline="") as f:
        writer = csv.writer(f)
        writer.writerow(["Title", "Writer", "Price", "IMG", "URL", "ISBN"])
        for script, price in zip(scripts, prices):
            script = json.loads(script.text)
            title = script["data"]["name"]
            author = script["data"]["author"][0]["name"]    
            img = f'https:{script["data"]["thumbnailUrl"]}'
            isbn = script["data"]["isbn"]
            url = script["data"]["url"]
            writer.writerow([title, author, price, img, url, isbn])
for x in range(1,10):
    url = ("https://www.kobo.com/ww/en/audiobooks/contemporary-1?pageNumber=" + str(x))
    print("Scrapin page " + str(x) + ".....")
    Main(url)

推荐答案

由于有声读物在列表页面上没有ISBN，因此您可以使用默认值为这种情况做准备，例如:

Since audiobooks don't have an ISBN on the listings page, you could prepare for this case with a default value, e.g.:

isbn = script["data"].get("isbn", "")

在这种情况下，如果script["data"]中不存在"isbn"键，它将回退到空字符串的值.

In this case, if the "isbn" key doesn't exist in script["data"], it will fallback on the value of an empty string.

或者，您可以从特定于有声读物的页面(上面的script["data"]["url"])获得该书的ISBN，例如:

Alternatively, you could get the book ISBN from the audiobook-specific page (your script["data"]["url"] above), e.g.:

def Main(url):
    r = requests.get(url)
    soup = Soup(r.content)
    scripts = soup.findAll("script", type="application/ld+json",
                           text=re.compile("data"))
    prices = [span.text for span in soup.select(
        "p.product-field.price span span") if span.text != "USD"]
    with open("AudioBook/Fiction & Literature/African American.csv", 'a', encoding="utf-8", newline="") as f:
        writer = csv.writer(f)
        writer.writerow(["Title", "Writer", "Price", "IMG", "URL", "ISBN"])
        for script, price in zip(scripts, prices):
            script = json.loads(script.text)
            title = script["data"]["name"]
            author = script["data"]["author"][0]["name"]    
            img = f'https:{script["data"]["thumbnailUrl"]}'
            # NEW CODE
            url = script["data"]["url"] 
            if "isbn" in script["data"]:
                # ebook listings
                isbn = script["data"]["isbn"]
            else:
                # audiobook listings                            
                r = requests.get(url)                                               
                inner_soup = Soup(r.content)
                try:                                       
                    inner_script = json.loads(
                        inner_soup.find("script", type="application/ld+json",                           
                                        text=re.compile("workExample")).text)                           
                    isbn = inner_script["workExample"]["isbn"]
                except AttributeError:
                    isbn = ""
            # END NEW CODE
            writer.writerow([title, author, price, img, url, isbn])

这篇关于我收到KeyError试图从网站上抓取数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

我收到KeyError试图从网站上抓取数据 [英] I'm getting KeyError trying to scrape data from website

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

我收到KeyError试图从网站上抓取数据 [英] I&#39;m getting KeyError trying to scrape data from website

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

我收到KeyError试图从网站上抓取数据 [英] I'm getting KeyError trying to scrape data from website

登录关闭