使用BeautifulSoup搜寻Google新闻会返回空结果 [英] Scraping google news with BeautifulSoup returns empty results
问题描述
我正在尝试使用以下代码抓取Google新闻:
I am trying to scrape google news using the following code:
from bs4 import BeautifulSoup
import requests
import time
from random import randint
def scrape_news_summaries(s):
time.sleep(randint(0, 2)) # relax and don't let google be angry
r = requests.get("http://www.google.co.uk/search?q="+s+"&tbm=nws")
content = r.text
news_summaries = []
soup = BeautifulSoup(content, "html.parser")
st_divs = soup.findAll("div", {"class": "st"})
for st_div in st_divs:
news_summaries.append(st_div.text)
return news_summaries
l = scrape_news_summaries("T-Notes")
#l = scrape_news_summaries("""T-Notes""")
for n in l:
print(n)
即使这部分代码以前工作过,我现在也想不出为什么它不再工作了.自从我只运行3到4次代码以来,我就被Google禁止了吗? (我也曾尝试使用Bing新闻,但不幸的是,结果还是空的...)
Even though this bit of code was working before, I now can't figure out why it's not working anymore. Is it possible that I've been banned by google since I only ran the code 3 or four times? (I tried using Bing News with unfortunate empty results too...)
谢谢.
推荐答案
我尝试运行代码,并且在我的计算机上运行正常.
I tried running the code and it works fine on my computer.
您可以尝试打印请求的状态码,然后查看它是否不是200.
You could try printing the status code for the request, and see if it's anything other than 200.
from bs4 import BeautifulSoup
import requests
import time
from random import randint
def scrape_news_summaries(s):
time.sleep(randint(0, 2)) # relax and don't let google be angry
r = requests.get("http://www.google.co.uk/search?q="+s+"&tbm=nws")
print(r.status_code) # Print the status code
content = r.text
news_summaries = []
soup = BeautifulSoup(content, "html.parser")
st_divs = soup.findAll("div", {"class": "st"})
for st_div in st_divs:
news_summaries.append(st_div.text)
return news_summaries
l = scrape_news_summaries("T-Notes")
#l = scrape_news_summaries("""T-Notes""")
for n in l:
print(n)
https://www.scrapehero.com/how -to-prevent-getting-blackting-scraping/以获得状态代码列表,这是您已被禁止的标志.
https://www.scrapehero.com/how-to-prevent-getting-blacklisted-while-scraping/ for a list of status code that's a sign you have been banned.
这篇关于使用BeautifulSoup搜寻Google新闻会返回空结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!