我收到KeyError试图从网站上抓取数据 [英] I'm getting KeyError trying to scrape data from website
问题描述
我写了一个用于数据抓取的代码;它适用于某些页面,但对于某些页面却显示:
I wrote a code for data scraping; it works well for some pages, but for some it displays:
KeyError:'isbn'.
KeyError: 'isbn'.
您能指导我如何解决此问题吗?
Could you please guide me on how can I solve this issue?
这是我的代码:
import requests
import re
import json
from bs4 import BeautifulSoup
import csv
import sys
import codecs
def Soup(content):
soup = BeautifulSoup(content, 'html.parser')
return soup
def Main(url):
r = requests.get(url)
soup = Soup(r.content)
scripts = soup.findAll("script", type="application/ld+json",
text=re.compile("data"))
prices = [span.text for span in soup.select(
"p.product-field.price span span") if span.text != "USD"]
with open("AudioBook/Fiction & Literature/African American.csv", 'a', encoding="utf-8", newline="") as f:
writer = csv.writer(f)
writer.writerow(["Title", "Writer", "Price", "IMG", "URL", "ISBN"])
for script, price in zip(scripts, prices):
script = json.loads(script.text)
title = script["data"]["name"]
author = script["data"]["author"][0]["name"]
img = f'https:{script["data"]["thumbnailUrl"]}'
isbn = script["data"]["isbn"]
url = script["data"]["url"]
writer.writerow([title, author, price, img, url, isbn])
for x in range(1,10):
url = ("https://www.kobo.com/ww/en/audiobooks/contemporary-1?pageNumber=" + str(x))
print("Scrapin page " + str(x) + ".....")
Main(url)
推荐答案
由于有声读物在列表页面上没有ISBN,因此您可以使用默认值为这种情况做准备,例如:
Since audiobooks don't have an ISBN on the listings page, you could prepare for this case with a default value, e.g.:
isbn = script["data"].get("isbn", "")
在这种情况下,如果script["data"]
中不存在"isbn"
键,它将回退到空字符串的值.
In this case, if the "isbn"
key doesn't exist in script["data"]
, it will fallback on the value of an empty string.
或者,您可以从特定于有声读物的页面(上面的script["data"]["url"]
)获得该书的ISBN,例如:
Alternatively, you could get the book ISBN from the audiobook-specific page (your script["data"]["url"]
above), e.g.:
def Main(url):
r = requests.get(url)
soup = Soup(r.content)
scripts = soup.findAll("script", type="application/ld+json",
text=re.compile("data"))
prices = [span.text for span in soup.select(
"p.product-field.price span span") if span.text != "USD"]
with open("AudioBook/Fiction & Literature/African American.csv", 'a', encoding="utf-8", newline="") as f:
writer = csv.writer(f)
writer.writerow(["Title", "Writer", "Price", "IMG", "URL", "ISBN"])
for script, price in zip(scripts, prices):
script = json.loads(script.text)
title = script["data"]["name"]
author = script["data"]["author"][0]["name"]
img = f'https:{script["data"]["thumbnailUrl"]}'
# NEW CODE
url = script["data"]["url"]
if "isbn" in script["data"]:
# ebook listings
isbn = script["data"]["isbn"]
else:
# audiobook listings
r = requests.get(url)
inner_soup = Soup(r.content)
try:
inner_script = json.loads(
inner_soup.find("script", type="application/ld+json",
text=re.compile("workExample")).text)
isbn = inner_script["workExample"]["isbn"]
except AttributeError:
isbn = ""
# END NEW CODE
writer.writerow([title, author, price, img, url, isbn])
这篇关于我收到KeyError试图从网站上抓取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!