使用BeautifulSoup保存网页内容 [英] Saving content of a webpage using BeautifulSoup

查看:180
本文介绍了使用BeautifulSoup保存网页内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正尝试使用下面的代码使用BeautifulSoup抓取网页:

I'm trying to scrape a webpage using BeautifulSoup using the code below:

import urllib.request
from bs4 import BeautifulSoup

with urllib.request.urlopen("http://en.wikipedia.org//wiki//Markov_chain.htm") as url:
    s = url.read()

soup = BeautifulSoup(s)

with open("scraped.txt", "w", encoding="utf-8") as f:
    f.write(soup.get_text())
    f.close()

问题在于它保存了维基百科的主页,而不是该特定文章.为什么地址不起作用,该如何更改?

The problem is that it saves the Wikipedia's main page instead of that specific article. Why the address doesn't work and how should I change it?

推荐答案

该页面的正确网址为 http ://en.wikipedia.org/wiki/Markov_chain :

>>> import urllib.request
>>> from bs4 import BeautifulSoup
>>> url = "http://en.wikipedia.org/wiki/Markov_chain"
>>> soup = BeautifulSoup(urllib.request.urlopen(url))
>>> soup.title
<title>Markov chain - Wikipedia, the free encyclopedia</title>

这篇关于使用BeautifulSoup保存网页内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆