Python urllib2 + Beautifulsoup [英] Python urllib2 + Beautifulsoup

查看：84 发布时间：2020/9/20 8:29:26 python beautifulsoup urllib2

本文介绍了Python urllib2 + Beautifulsoup的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以我正在努力在当前的python项目中实现美化，好吧，为了保持简洁明了，我将减少当前脚本的复杂性.

So I'm struggling to implement beautiful into my current python project, Okay so to keep this plain and simple I'll reduce the complexity of my current script.

没有BeautifulSoup的脚本-

Script without BeautifulSoup -

import urllib2

    def check(self, name, proxy):
        urllib2.install_opener(
            urllib2.build_opener(
                urllib2.ProxyHandler({'http': 'http://%s' % proxy}),
                urllib2.HTTPHandler()
                )
            )

        req = urllib2.Request('http://example.com' ,"param=1")
        try:
            resp = urllib2.urlopen(req) 
        except:
            self.insert()
        try:
            if 'example text' in resp.read()
               print 'success'

现在当然缩进是错误的，这只是我正在做的事情的简要说明，您可以简单地说，就是向"example.com"发送发帖请求&则example.com如果在重新读取成功后包含"example text".

now of course the indentation is wrong, this is just sketch up of what I have going on, as you can in simple terms I'm sending a post request to " example.com " & then if example.com contains " example text " in resp.read print success.

但是我真正想要检查的是

But what I actually want is to check

if ' example ' in resp.read()

然后输出 td中的文本使用

then output text inside td align from example.com request using

soup.find_all('td', {'align':'right'})[4]

现在，我实现Beautifulsoup的方式不起作用，例如-

Now the way I'm implementing beautifulsoup isn't working, example of this -

import urllib2
from bs4 import BeautifulSoup as soup

main_div = soup.find_all('td', {'align':'right'})[4]

    def check(self, name, proxy):
        urllib2.install_opener(
            urllib2.build_opener(
                urllib2.ProxyHandler({'http': 'http://%s' % proxy}),
                urllib2.HTTPHandler()
                )
            )

        req = urllib2.Request('http://example.com' ,"param=1")
        try:
            resp = urllib2.urlopen(req) 
            web_soup = soup(urllib2.urlopen(req), 'html.parser')
        except:
            self.insert()
        try:
            if 'example text' in resp.read()
               print 'success' + main_div

现在您看到我添加了4个新行/调整项

Now you see I added 4 new lines/adjustments

from bs4 import BeautifulSoup as soup

web_soup = soup(urllib2.urlopen(url), 'html.parser')

main_div = soup.find_all('td', {'align':'right'})[4]

aswell as " + main_div " on print

但是似乎无法正常工作，我在调整一些错误时说了一些错误:在分配之前引用了本地变量"& 必须以beautifulsoup实例作为第一个参数来调用未绑定方法find_all"

However it just doesn't seem to be working, I've had a few errors whilst adjusting some of which have said " Local variable referenced before assignment " & " unbound method find_all must be called with beautifulsoup instance as first argument "

Python urllib2 + Beautifulsoup [英] Python urllib2 + Beautifulsoup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python urllib2 + Beautifulsoup [英] Python urllib2 + Beautifulsoup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭