Python urllib2 + Beautifulsoup [英] Python urllib2 + Beautifulsoup
问题描述
所以我正在努力在当前的python项目中实现美化,好吧,为了保持简洁明了,我将减少当前脚本的复杂性.
So I'm struggling to implement beautiful into my current python project, Okay so to keep this plain and simple I'll reduce the complexity of my current script.
没有BeautifulSoup的脚本-
Script without BeautifulSoup -
import urllib2
def check(self, name, proxy):
urllib2.install_opener(
urllib2.build_opener(
urllib2.ProxyHandler({'http': 'http://%s' % proxy}),
urllib2.HTTPHandler()
)
)
req = urllib2.Request('http://example.com' ,"param=1")
try:
resp = urllib2.urlopen(req)
except:
self.insert()
try:
if 'example text' in resp.read()
print 'success'
现在当然缩进是错误的,这只是我正在做的事情的简要说明,您可以简单地说,就是向"example.com"发送发帖请求&则example.com如果在重新读取成功后包含"example text".
now of course the indentation is wrong, this is just sketch up of what I have going on, as you can in simple terms I'm sending a post request to " example.com " & then if example.com contains " example text " in resp.read print success.
但是我真正想要检查的是
But what I actually want is to check
if ' example ' in resp.read()
然后输出 td中的文本使用
then output text inside td align from example.com request using
soup.find_all('td', {'align':'right'})[4]
现在,我实现Beautifulsoup的方式不起作用,例如-
Now the way I'm implementing beautifulsoup isn't working, example of this -
import urllib2
from bs4 import BeautifulSoup as soup
main_div = soup.find_all('td', {'align':'right'})[4]
def check(self, name, proxy):
urllib2.install_opener(
urllib2.build_opener(
urllib2.ProxyHandler({'http': 'http://%s' % proxy}),
urllib2.HTTPHandler()
)
)
req = urllib2.Request('http://example.com' ,"param=1")
try:
resp = urllib2.urlopen(req)
web_soup = soup(urllib2.urlopen(req), 'html.parser')
except:
self.insert()
try:
if 'example text' in resp.read()
print 'success' + main_div
现在您看到我添加了4个新行/调整项
Now you see I added 4 new lines/adjustments
from bs4 import BeautifulSoup as soup
web_soup = soup(urllib2.urlopen(url), 'html.parser')
main_div = soup.find_all('td', {'align':'right'})[4]
aswell as " + main_div " on print
但是似乎无法正常工作,我在调整一些错误时说了一些错误:在分配之前引用了本地变量"& 必须以beautifulsoup实例作为第一个参数来调用未绑定方法find_all"
However it just doesn't seem to be working, I've had a few errors whilst adjusting some of which have said " Local variable referenced before assignment " & " unbound method find_all must be called with beautifulsoup instance as first argument "
推荐答案
关于最后一个代码段:
from bs4 import BeautifulSoup as soup
web_soup = soup(urllib2.urlopen(url), 'html.parser')
main_div = soup.find_all('td', {'align':'right'})[4]
您应该在web_soup实例上调用find_all
.另外,请确保在使用url
变量之前先定义它:
You should call find_all
on the web_soup instance. Also be sure to define the url
variable before you use it:
from bs4 import BeautifulSoup as soup
url = "url to be opened"
web_soup = soup(urllib2.urlopen(url), 'html.parser')
main_div = web_soup.find_all('td', {'align':'right'})[4]
这篇关于Python urllib2 + Beautifulsoup的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!