在搜索栏中输入查询并抓取结果 [英] Enter query in search bar and scrape results

查看:109
本文介绍了在搜索栏中输入查询并抓取结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据库,其中包含不同书籍的ISBN号.我使用Python和Beautifulsoup收集了它们.接下来,我想在书籍中添加类别.书籍类别有一个标准.名为 https://www.bol.com/nl/的网站拥有所有书籍和类别根据标准.

I have a database with ISBN numbers of different books. I gathered them using Python and Beautifulsoup. Next I would like to add categories to the books. There is a standard when it comes to book categories. A website called https://www.bol.com/nl/ has all the books and categories according to the standard.

起始网址:https://www.bol.com/nl/

ISBN:9780062457738

搜索后的网址:https://www.bol.com/nl/p/the-subtle-art-of-not-giving-a-f-ck/9200000053655943/

HTML类类别:<li class="breadcrumbs__item"

有人知道如何(1)在搜索栏中输入ISBN值,(2)然后提交搜索查询并使用该页面进行抓取?

Does anyone know how to (1) enter the ISBN value in the search bar, (2) then submit the search query and use the page for scraping?

步骤(3)抓取所有类别是我可以做的事情.但是我不知道如何执行前两个步骤.

Step (3) scraping all the categories is something I can do. But I don't know how to do the first 2 steps.

我到目前为止在步骤(2)中使用的代码

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

webpage = "https://www.bol.com/nl/" # edit me
searchterm = "9780062457738" # edit me

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get(webpage)

sbox = driver.find_element_by_class_name("appliedSearchContextId")
sbox.send_keys(searchterm)

submit = driver.find_element_by_class_name("wsp-search__btn  tst_headerSearchButton")
submit.click()

我到目前为止在步骤(3)中拥有的代码

import requests
from bs4 import BeautifulSoup

data = requests.get('https://www.bol.com/nl/p/the-subtle-art-of-not-giving-a-f-ck/9200000053655943/')

soup = BeautifulSoup(data.text, 'html.parser')

categoryBar = soup.find('ul',{'class':'breadcrumbs breadcrumbs--show-last-item-small'})

for category in categoryBar.find_all('span',{'class':'breadcrumbs__link-label'}):
    print(category.text)

推荐答案

您可以使用selenium定位输入框并在您的ISBN上循环,分别输入以下内容:

You can use selenium to locate the input box and loop over your ISBNs, entering each:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
d = webdriver.Chrome('/path/to/chromedriver')
books = ['9780062457738']
for book in books:
  d.get('https://www.bol.com/nl/')
  e = d.find_element_by_id('searchfor')
  e.send_keys(book)
  e.send_keys(Keys.ENTER)
  #scrape page here 

现在,对于books中的每本书ISBN,解决方案都将值输入搜索框并加载所需的页面.

Now, for each book ISBN in books, the solution will enter the value into the search box and load the desired page.

这篇关于在搜索栏中输入查询并抓取结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆