注意到在 Python 中使用 BeautifulSoup 限制抓取结果的警告 [英] Noticing a warning to limit scraped results with BeautifulSoup in Python

查看:27
本文介绍了注意到在 Python 中使用 BeautifulSoup 限制抓取结果的警告的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Python 中的 BeautifulSoup 从 eBay 中抓取最近售出的商品的销售数据,并且它与以下代码配合得很好,该代码可查找已售商品的所有价格和所有日期.

 价格 = []尝试:p = soup.find_all('span', class_='POSITIVE')除了:p = '南'对于 p 中的 x:x = str(x)x = x.replace(' ','"')x = x.split('"')如果 x 中的>售出":继续别的:price.append(x)

现在我遇到了一个问题.如下图所示,该 URL (

代码类似于:

results = soup.find...#您必须将变量设置为 int 以便替换所有额外的内容结果 = int(结果)对于范围内的 i(1,结果):价格[i] = str(价格[i])价格[i] = 价格[i].replace(' ','"')价格[i] = 价格[i].split()如果 '>Sold' in price[i]:继续别的:

I am trying to scrape sales data from eBay with BeautifulSoup in Python for recently sold items and it works very well with the following code which finds all prices and all dates from sold items.

 price = []
   
    try:
        p = soup.find_all('span', class_='POSITIVE')

    except:
        p = 'nan'
          
    for x in p:
        x = str(x)
        x = x.replace(' ','"')
        x = x.split('"')
        
        if '>Sold' in x:
            continue
        else:
            price.append(x)

Now I am running into a problem though. As seen in the picture below for this URL (https://www.ebay.com/sch/i.html?_from=R40&_trksid=p2334524.m570.l1313&_nkw=babe+ruth+1933+goudey+149+psa+%281.5%29&_sacat=0&LH_TitleDesc=0&_osacat=0&_odkw=babe+ruth+1933+goudey+149+psa+1.5&LH_Complete=1&rt=nc&LH_Sold=1), eBay sometimes suggests other search results if there are not enough for specific search queries. Check out the image

By that, my code not only finds the correct prices but also those of the suggested results below the warning. I was trying to find out where the warning message is located and delete every listing that is being found afterward, but I cannot figure it out. I also thought that I can search for the prices one by one but even then I cannot figure out how to notice when the warning appears.

Is there any other way you guys can think of to solve this?

I am aware that this is really specific

解决方案

You can scrape the number of results (Shown in picture) and make a loop with the range of the results.

The code will be something like:

results = soup.find...
#You have to make the variable a int so replace everything extra
results = int(results)

  
for i in range(1, results):
        price[i] = str(price[i])
        price[i] = price[i].replace(' ','"')
        price[i] = price[i].split()
        
        if '>Sold' in price[i]:
            continue
        else:
      

这篇关于注意到在 Python 中使用 BeautifulSoup 限制抓取结果的警告的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆