使用Python从安全网站中提取和解析HTML? [英] Extracting and parsing HTML from a secure website with Python?

查看：232 发布时间：2020/11/2 22:18:56 python ssl web extract

本文介绍了使用Python从安全网站中提取和解析HTML?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

让我们深入研究吧?

好吧，我需要编写一个脚本(我不在乎什么语言，更喜欢Python或Javascript之类的东西，但是无论如何我都会花一些时间来学习).该脚本将访问多个URL，从每个站点提取文本并将其存储到我的PC上的文件夹中. (我从那里开始用Python处理数据，我知道该怎么做.)

Ok, I need to write a script (I don't care what language, prefer something like Python or Javascript, but whatever works I will take time to learn). The script will access multiple URL's, extract text from each site and store it into a folder on my PC. (From there I am manipulating the data with Python, which I know how to do.)

目前，我正在使用python的NLTK模块.这是我的代码的简单版本:

Currently I am using python's NLTK module. Here is a simple version of my code:

url  = "<URL HERE>"
html = urlopen(url).read()
raw = nltk.clean_html(html)
print(raw)

此代码对于 http 和 https 均适用，但不适用于需要身份验证的实例.

This code works fine for both http and https, but not for instances where authentication is required.

是否有一个用于处理安全身份验证的Python模块?

Is there a Python module which deals with secure authentication?

在此先感谢您的帮助！对于那些认为这是一个不好的问题的mods，请给我一些方法来改善它.我需要别人的想法，而不是Google的想法.

Thanks in advance for help! And to the mods who will view this as a bad question, please just give me ways to make it better. I need ideas..from people, not Google.

使用Python从安全网站中提取和解析HTML? [英] Extracting and parsing HTML from a secure website with Python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用Python从安全网站中提取和解析HTML? [英] Extracting and parsing HTML from a secure website with Python?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭