使用BeautifulSoup搜寻Google新闻会返回空结果 [英] Scraping google news with BeautifulSoup returns empty results

查看:93
本文介绍了使用BeautifulSoup搜寻Google新闻会返回空结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用以下代码抓取Google新闻:

I am trying to scrape google news using the following code:

from bs4 import BeautifulSoup
import requests
import time
from random import randint


def scrape_news_summaries(s):
    time.sleep(randint(0, 2))  # relax and don't let google be angry
    r = requests.get("http://www.google.co.uk/search?q="+s+"&tbm=nws")
    content = r.text
    news_summaries = []
    soup = BeautifulSoup(content, "html.parser")
    st_divs = soup.findAll("div", {"class": "st"})
    for st_div in st_divs:
        news_summaries.append(st_div.text)
    return news_summaries


l = scrape_news_summaries("T-Notes")
#l = scrape_news_summaries("""T-Notes""")
for n in l:
    print(n)

即使这部分代码以前工作过,我现在也想不出为什么它不再工作了.自从我只运行3到4次代码以来,我就被Google禁止了吗? (我也曾尝试使用Bing新闻,但不幸的是,结果还是空的...)

Even though this bit of code was working before, I now can't figure out why it's not working anymore. Is it possible that I've been banned by google since I only ran the code 3 or four times? (I tried using Bing News with unfortunate empty results too...)

谢谢.

推荐答案

我尝试运行代码,并且在我的计算机上运行正常.

I tried running the code and it works fine on my computer.

您可以尝试打印请求的状态码,然后查看它是否不是200.

You could try printing the status code for the request, and see if it's anything other than 200.

from bs4 import BeautifulSoup
import requests
import time
from random import randint


def scrape_news_summaries(s):
    time.sleep(randint(0, 2))  # relax and don't let google be angry
    r = requests.get("http://www.google.co.uk/search?q="+s+"&tbm=nws")
    print(r.status_code)  # Print the status code
    content = r.text
    news_summaries = []
    soup = BeautifulSoup(content, "html.parser")
    st_divs = soup.findAll("div", {"class": "st"})
    for st_div in st_divs:
        news_summaries.append(st_div.text)
    return news_summaries


l = scrape_news_summaries("T-Notes")
#l = scrape_news_summaries("""T-Notes""")
for n in l:
    print(n)

https://www.scrapehero.com/how -to-prevent-getting-blackting-scraping/以获得状态代码列表,这是您已被禁止的标志.

https://www.scrapehero.com/how-to-prevent-getting-blacklisted-while-scraping/ for a list of status code that's a sign you have been banned.

这篇关于使用BeautifulSoup搜寻Google新闻会返回空结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆