使用Python和BeautifulSoup(保存的网页源$ C ​​$ CS为本地文件) [英] Using Python and BeautifulSoup (Saved webpage source codes into a local file)

查看:188
本文介绍了使用Python和BeautifulSoup(保存的网页源$ C ​​$ CS为本地文件)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你能不能帮我解决这个问题,好吗?

Could you help me with this problem, please?

(环境:Python 2.7版+ BeautifulSoup 4.3.2)

(Environment: Python 2.7 + BeautifulSoup 4.3.2)

我试图用Python和BeautifulSoup拿起网页上的信息。因为网页是在该公司的网站需要登录和重定向,所以我复制源$ C ​​$目标页面的CS到一个文件并将其保存在Cexample.html的:\\执业方便

I am trying to using Python and BeautifulSoup to pick up information on a webpage. Because the webpage is in the company website requires login and redirection, so I copy the source codes of the target page into a file and save it as "example.html" in C:\ for the convenience of practicing.

这在原来的codeS的一部分:

This the a part of the original codes:

<tr class="ghj">
    <td><span class="city-sh"><sh src="./citys/1.jpg" alt="boy" title="boy" /></span><a href="./membercity.php?mode=view&amp;u=12563">port_new_cape</a></td>
    <td class="position"><a href="./search.php?id=12563&amp;sr=positions" title="Search positions">452</a></td>
    <td class="details"><div>South</div></td>
    <td>May 09, 1997</td>
    <td>Jan 23, 2009 12:05 pm&nbsp;</td>
</tr>

在codeS到目前为止,我摸索出的是:

The codes so far I worked out is:

from bs4 import BeautifulSoup
import re
import urllib2

url = "C:\example.html"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())

cities = soup.find_all('span', {'class' : 'city-sh'})

for city in cities:
print city

**这仅仅是检测的第一阶段,以便有所尚未完成

** this is just the first stage of testing so somewhat not completed.

然而,当我运行它,它提供了错误信息,似乎是不恰当的使用urllib2.urlopen打开本地文件。

However when I run it, it gives error message, seems it’s improper to use "urllib2.urlopen" to open a local file.

回溯(最近通话最后一个):
   文件C:\\ Python27 \\ Testing.py,8号线,在
     页= urllib2.urlopen(URL)
   文件C:\\ Python27 \\ lib目录\\ urllib2.py,第127行中的urlopen
     返回_opener.open(URL,数据,超时)
   文件C:\\ Python27 \\ lib目录\\ urllib2.py,404线,开放
     响应= self._open(REQ,数据)
   文件C:\\ Python27 \\ lib目录\\ urllib2.py,线路427,在_open
     unknown_open',REQ)
   文件C:\\ Python27 \\ lib目录\\ urllib2.py,382线,在_call_chain
     结果= FUNC(*参数)
   文件C:\\ Python27 \\ lib目录\\ urllib2.py,线路1247,在unknown_open
     提高URLError('未知的URL类型:%s'的%型)
 URLError:

Traceback (most recent call last): File "C:\Python27\Testing.py", line 8, in page = urllib2.urlopen(url) File "C:\Python27\lib\urllib2.py", line 127, in urlopen return _opener.open(url, data, timeout) File "C:\Python27\lib\urllib2.py", line 404, in open response = self._open(req, data) File "C:\Python27\lib\urllib2.py", line 427, in _open 'unknown_open', req) File "C:\Python27\lib\urllib2.py", line 382, in _call_chain result = func(*args) File "C:\Python27\lib\urllib2.py", line 1247, in unknown_open raise URLError('unknown url type: %s' % type) URLError:

所以,请你教我,用什么方式,我可以通过使用本地文件的做法?谢谢你。

So could you please teach me, in what way, I can practice by using a local file? Thank you.

推荐答案

,这个问题就解决了​​。本证到他家里去。 :)

with Chandan's help, the problem is solved. credit shall go to him. :)

在urllib2.url是无用的在这里。

the "urllib2.url" is useless here.

from bs4 import BeautifulSoup
import re
import urllib2

url = r"C:\example.html"
page = open(url)
soup = BeautifulSoup(page.read())

cities = soup.find_all('span', {'class' : 'city-sh'})

for city in cities:
    print city

这篇关于使用Python和BeautifulSoup(保存的网页源$ C ​​$ CS为本地文件)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆