打印来自自定义站点的所有搜索结果 [英] Print all search results from a custom site

查看:32
本文介绍了打印来自自定义站点的所有搜索结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我怎样才能使,如果我发出命令,我的程序会打印来自自定义站点的所有搜索结果(在我的示例中:https://discordpy.readthedocs.io/en/latest/search.html?q=test)

How can I make, that if I give the command, my program prints all search results from a custom site (in my example: https://discordpy.readthedocs.io/en/latest/search.html?q=test)

我想要这样的东西:

site = f'https://discordpy.readthedocs.io/en/latest/search.html?q={search}'
for line in site.content:
    if str(line).startswith("<a"):
        print(str(line))

这样的事情有可能吗?

推荐答案

本站通过 Javascript 动态加载搜索结果.通过请求加载页面并使用 BeautifulSoup 解析它是行不通的.解决方案是使用 selenium 加载页面.此示例将在 colab 上开箱即用:

This site loads the search results dynamically through Javascript. Loading the page through requests and parsing it with BeautifulSoup won't work. The solution is to load the page with selenium. This example will work out of the box on colab:

!apt update
!apt install chromium-chromedriver
!pip install selenium

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup

options = Options()
web = 'https://discordpy.readthedocs.io/en/latest/search.html?q=test'
path = '/usr/bin/chromedriver' #set the path of your chromedriver file

options.add_argument('--no-sandbox')
options.add_argument('--window-size=1920x1080')
options.add_argument('--headless')
options.add_argument('--disable-gpu')

driver = webdriver.Chrome(options=options)
driver.get(web)
html = driver.page_source
soup = BeautifulSoup(html)

results = [i.get_text() for i in soup.find_all('li')]

结果:

['API Reference...– The extension name to load. It must be dot separated like\nregular Python imports if accessing a sub-module. e.g.\nfoo.test if you want to import foo/test.py.\n\nRaises\n\nExtensionNotFound – The extension could not be imported.\nExtensionAlrea...', "Commands...two are equivalent:\nfrom discord.ext import commands\n\nbot = commands.Bot(command_prefix='$')\n\n@bot.command()\nasync def test(ctx):\n    pass\n\n# or:\n\n@commands.command()\nasync def test(ctx):\n    pass\n\nbot.add_command(test)\n\n\nSince the Bot.com...", 'Migrating to v0.10.0....channels\nServer.members\n\nSome examples of previously valid behaviour that is now invalid\nif client.servers[0].name == "test":\n    # do something\n\n\nSince they are no longer lists, they no longer support indexing or any operation other than...', "Migrating to v1.0...ow use a File pseudo-namedtuple to upload a single file.\n# before\nawait client.send_file(channel, 'cool.png', filename='testing.png', content='Hello')\n\n# after\nawait channel.send('Hello', file=discord.File('cool.png', 'testing.png'))\n\n\nThis..."]

这篇关于打印来自自定义站点的所有搜索结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆