尝试/排除一系列BeautifulSoup命令的优雅方法? [英] Elegant way to try/except a series of BeautifulSoup commands?

查看:78
本文介绍了尝试/排除一系列BeautifulSoup命令的优雅方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在解析显示项目数据的网站上的网页.这些项目包含大约20个可能会或可能不会发生的字段,例如:价格,数量,最后购买的价格,最高价,最低价等.

I'm parsing webpages on a site displaying item data. These items have about 20 fields which may or may not occur -- say: price, quantity, last purchased, high, low, etc.

我目前正在使用一系列命令;大约20行soup.find('div',{'class':SOME_FIELD_OF_INTEREST})查找感兴趣的每个单独的字段. (有些在divspandd等中,因此很难执行soup.find_all('div')命令.)

I'm currently using a series of commands; about 20 lines of soup.find('div',{'class':SOME_FIELD_OF_INTEREST}) to look for each separate field of interest. (Some are in div, span, dd, and so on, so it's difficult to just do a soup.find_all('div') command.)

我的问题:是否有一种优雅的方法来处理tryexcept的所有内容,以便使上述代码的显示更为紧凑或简洁?现在,示例行如下所示:

My question: Is there an elegant way to try and except everything such that the viewing of said code can be more compact or concise? Right now a sample line would look like:

try:
    soup.find('div', {'id':'item-pic'}).img["src"]
except:
    ""

我希望将所有内容组合成一行.我不认为我可以在语法上运行try:<line of code> except: <code>,而且我不确定如何在不实际运行命令的情况下编写去try_command(soup.find('div',{'id':'item-pic'}).img["src"])的函数.

I was hoping to combine everything in one line. I don't think I can syntactically run try: <line of code> except: <code>, and I'm not sure how I'd write a function that goes try_command(soup.find('div',{'id':'item-pic'}).img["src"]) without actually running the command.

我很想听听是否有人提出任何建议(包括:这不可能/不切实际,继续前进"). :)

I'd love to hear if anybody has any advice (including: "this isn't possible/practical, move on"). :)

聊了一下之后,我想我想看看什么是内联异常处理的良好实践,以及这是否是正确的选择.

After talking a bit, I guess I wanted to see what is good practice for inline exception handling, and if that's the right route to take.

推荐答案

也许是这样的:

def try_these(start_obj, *args) :
        obj = start_obj
        for trythat in args :
            if obj is None :
                return None
            try :
                if isinstance(trythat, str) :
                    obj = getattr(obj, trythat)
                else :
                    method, opts = trythat
                    obj = getattr(obj, method)(*opts)
            except :
                return None
        return obj    
src = try_these(soup, ('find', ({'id':'item-pic'},),), 
                      'img', 
                      ('get', ('src',),) )

您可以在其中传递str来从对象或tuple获取属性(str方法,元组参数),最后您将获得None或结果.我对汤不熟悉,所以我不确定get('src')是否是一个好方法(可能不是字典),无论如何,您可以轻松地修改该代码段以接受不仅仅是"call或attr"的内容.

where you can pass str to get attribute from object or tuple (str method, tuple params), finally you'll get None or result. I'm not familiar with soup so I'm not sure if get('src') would be a good approach (as probably its not a dict), anyway you can easily modify that snippet to accept something more than only 'call or attr'.

受您的问题启发,我编写了一个简单的python模块来帮助处理这种情况,您可以在在此处找到

Inspired by your question I wrote simple python module that helps to deal with such situation, you can find it here

import silentcrawler    

wrapped = silentcrawler.wrap(soup)
# just return None on failure
print wrapped.find('div', {'id':'item-pic'}).img["src"].value_

# or
def on_success(value) :
    print 'found value:', value
wrapped = silentcrawler.wrap(soup, success=on_success)
# call on_success if everything will be ok
wrapped.find('div', {'id':'item-pic'}).img["src"].value_ 

有更多的可能性

这篇关于尝试/排除一系列BeautifulSoup命令的优雅方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆