pycurl javascript [英] Pycurl javascript

查看:36
本文介绍了pycurl javascript的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个 python 3 脚本,它允许我在搜索引擎 (DuckDuckGo) 上进行搜索,获取 HTML 源代码并将其写入文本文件.

I created a python 3 script that allows me to search on a search engine (DuckDuckGo), get the HTML source code and write it in a textfile.

import pycurl
from io import BytesIO

buffer = BytesIO()
c = pycurl.Curl()
c.setopt(c.URL, 'https://duckduckgo.com/?q=test')
c.setopt(c.WRITEDATA, buffer)
c.setopt(c.FOLLOWLOCATION, True)
c.perform()
c.close()

body = buffer.getvalue()
with open("output.htm", "w") as text_file:
    text_file.write(str(body))
print(body.decode('iso-8859-1'))

那部分代码运行正常.但是,当我尝试打开包含搜索引擎 HTML 源代码的 output.htm 文件时,我什么也没得到(我的搜索得到了一个 input题目写在里面).我希望通过在终端上运行 curl https://duckduckgo.com/?q=test 获得相同的 HTML 源代码.

That part of the code is working properly. However, when I try to open the output.htm file containing the HTML source code of the search engine, I don't get anything (I get an input with my search topic written inside). I would like to have the same HTML source code that I would get by running curl https://duckduckgo.com/?q=test on my terminal.

推荐答案

Duckduckgo 的 html 页面使用 javascript 将其搜索结果加载到其 html 标记中,因此 curlPyCurl将无法获得您在浏览器中看到的相同 html 内容,因为 curl/pycurl 仅获取互联网资源但不提供任何 javascript 处理.

Duckduckgo's html pages uses javascript to load their search result into their html markups, so curl or PyCurl will not be able to get the same html content you'd see in a browser since curl/pycurl merely fetches internet resources but does not provide any javascript processing.

使用 https://duckduckgo.com/api 而不是在他们的服务器中抓取搜索结果/数据库.

Use https://duckduckgo.com/api instead of scraping to find search results in their servers/databases.

这篇关于pycurl javascript的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆