无法通过他们的班级抓取谷歌新闻标题 [英] Unable to scrape google news heading via their class

查看:17
本文介绍了无法通过他们的班级抓取谷歌新闻标题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试抓取谷歌新闻标题及其输入术语的链接.但是当我通过 find_all 方法搜索包含所有新闻标题的类时,它返回一个空列表.

I am trying to scrape google news headings along with their links for input term. But when I searched via find_all method for a class that contains all news headings, it returned an empty list.

我尝试使用带有 ID 的父 div,但结果没有什么不同.

I tried with parent divs with their id's but the result wasn't different.

import requests
from bs4 import BeautifulSoup

input_term = input("Enter a term to search:")
source = requests.get("https://www.google.com/search?q={0}&source=lnms&tbm=nws".format(input_term)).text
soup = BeautifulSoup(source, 'html.parser')

#here 'bkWMgd' is class that I found to be contained all search results.
heading_results = soup.find_all('div', class_ = 'bkWMgd')
print(heading_results)

我想抓取所有新闻标题及其各自的链接.我期望从上面的代码中得到所有搜索结果的列表.但它返回一个空列表.

I want to scrape all news headings and their respective links. I expected a list of all search result from the above code. But it returning an empty list.

推荐答案

由于 Javascript 的存在,beautifulsoup 看到的响应与浏览器中的响应大不相同.因此,您使用的选择器可能会有所不同.打印您从 beautifulsoup 收到的响应并分析 HTML & 总是一个好主意.然后适当地使用 class/id 来决定选择器.

The response that is seen by beautifulsoup and the one in your browser is quite different due to the presence of Javascript. Hence the selectors that you use might vary. It's always a good idea to print the response that you receive from beautifulsoup and analyze the HTML & then decide the selectors using class/id appropriately.

import requests
from bs4 import BeautifulSoup

input_term = input("Enter a term to search:")
source = requests.get(
    "https://www.google.com/search?q={0}&source=lnms&tbm=nws".format(input_term)).text
soup = BeautifulSoup(source, 'html.parser')

# here div#ires contains an ol which contains the results.
heading_results = soup.find("div", {"id": "ires"}).find("ol").find_all('h3', {'class': 'r'})
# Loop over each item to obtain the title and link (anchor tag text and link)
print(heading_results)

这篇关于无法通过他们的班级抓取谷歌新闻标题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆