如何使用Beautifulsoup访问前五个Google结果链接 [英] How to access top five Google result links using Beautifulsoup

查看：45 发布时间：2021/4/15 19:05:47 python url hyperlink beautifulsoup google-search

本文介绍了如何使用Beautifulsoup访问前五个Google结果链接的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想访问Google的结果链接的前五个(或任何指定数量).通过研究，我发现并修改了以下代码.

I want to access the top five(or any specified number) of links of results from Google. Through research, I found and modified the following code.

import requests
from bs4 import BeautifulSoup
import re    
search = raw_input("Search:")
page = requests.get("https://www.google.com/search?q=" + search)
soup = BeautifulSoup(page.content, "lxml")
links = soup.find("a")
print links.get('href')

这将返回页面上的第一个链接，每次似乎都是"Google图片"标签.

This returns the first link on the page, which seems to be the Google images tab every time.

这不是我想要的.对于初学者，我不希望任何Google网站的链接，而只是结果.另外，我想要前三个或五个或任何指定数量的结果.

This is not completely what I want. For starters, I don't want the links of any google sites, just the results. Also, I want the first three or five or any specified number of results.

如何使用python做到这一点?

How can I use python to do this?

提前谢谢！

推荐答案

您可以使用:

import requests
from bs4 import BeautifulSoup
import re
search = input("Search:")
results = 100 # valid options 10, 20, 30, 40, 50, and 100
page = requests.get(f"https://www.google.com/search?q={search}&num={results}")
soup = BeautifulSoup(page.content, "html5lib")
links = soup.findAll("a")
for link in links :
    link_href = link.get('href')
    if "url?q=" in link_href and not "webcache" in link_href:
        print (link.get('href').split("?q=")[1].split("&sa=U")[0])

Google搜索演示

对于 duckduckgo.com ，请使用:

import requests
from bs4 import BeautifulSoup
import re
search = input("Search:")
h = {"Host":"duckduckgo.com", "Origin": "https://duckduckgo.com", "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0"}
d = {"q":search}
page = requests.post(f"https://duckduckgo.com/html/", data=d, headers=h)
soup = BeautifulSoup(page.content, "html5lib")
links = soup.findAll("a", {"class": "result__a"})
for link in links :
    link_href = link.get('href')
    if not "https://duckduckgo.com" in link_href:
        print(link_href)

这篇关于如何使用Beautifulsoup访问前五个Google结果链接的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用Beautifulsoup访问前五个Google结果链接 [英] How to access top five Google result links using Beautifulsoup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何使用Beautifulsoup访问前五个Google结果链接 [英] How to access top five Google result links using Beautifulsoup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭