Python Web 爬虫和“获取"html源代码 [英] Python Web Crawlers and "getting" html source code

查看：48 发布时间：2022/1/4 23:18:26 python get web-crawler

本文介绍了Python Web 爬虫和“获取"html源代码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以我哥哥想让我用 Python 编写一个网络爬虫(自学)，我知道 C++、Java 和一些 html.我正在使用 2.7 版并阅读 python 库，但我有一些问题1. httplib.HTTPConnection 和 request 概念对我来说是新的，我不明白它是下载 cookie 之类的 html 脚本还是实例.如果您同时执行这两项操作，您是否获得了网站页面的来源?以及我需要知道哪些词才能修改页面并返回修改后的页面.

So my brother wanted me to write a web crawler in Python (self-taught) and I know C++, Java, and a bit of html. I'm using version 2.7 and reading the python library, but I have a few problems 1. httplib.HTTPConnection and request concept to me is new and I don't understand if it downloads an html script like cookie or an instance. If you do both of those, do you get the source for a website page? And what are some words that I would need to know to modify the page and return the modified page.

仅作为背景，我需要下载一个页面并将任何 img 替换为我拥有的 img

Just for background, I need to download a page and replace any img with ones I have

如果你们能告诉我你对 2.7 和 3.1 的看法就好了

And it would be nice if you guys could tell me your opinion of 2.7 and 3.1

推荐答案

~~使用 Python 2.7，目前有更多 3rd 方库.~~(见下文).

我推荐你使用 stdlib 模块 urllib2，它可以让你轻松获取网络资源.示例:

I recommend you using the stdlib module urllib2, it will allow you to comfortably get web resources. Example:

import urllib2

response = urllib2.urlopen("http://google.de")
page_source = response.read()

要解析代码，请查看BeautifulSoup.

顺便说一句:你到底想做什么:

BTW: what exactly do you want to do:

仅作为背景，我需要下载一个页面并将任何 img 替换为我拥有的 img

Just for background, I need to download a page and replace any img with ones I have

现在是 2014 年，大多数重要的库都已移植，如果可以，您绝对应该使用 Python 3.python-requests 是一个非常好的高级库比 urllib2 更容易使用.

It's 2014 now, most of the important libraries have been ported, and you should definitely use Python 3 if you can. python-requests is a very nice high-level library which is easier to use than urllib2.

这篇关于Python Web 爬虫和“获取"html源代码的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python Web 爬虫和“获取"html源代码 [英] Python Web Crawlers and "getting" html source code

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python Web 爬虫和“获取"html源代码 [英] Python Web Crawlers and &quot;getting&quot; html source code

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

Python Web 爬虫和“获取"html源代码 [英] Python Web Crawlers and "getting" html source code

登录关闭