Python逻辑运算 [英] Python Logical Operation

查看:82
本文介绍了Python逻辑运算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是python的新手,正在使用Scrapy库进行Web抓取项目.我没有使用内置的域限制,因为我想检查到该域外部页面的任何链接是否已失效.但是,我仍然希望将域内的页面与域外的页面区别对待,并在解析响应之前尝试手动确定站点是否在域内.

I'm pretty new to python and I'm working on a web scraping project using the Scrapy library. I'm not using the built in domain restriction because I want to check if any of the links to pages outside the domain are dead. However, I still want to treat pages within the domain differently from those outside it and am trying to manually determine if a site is within the domain before parsing the response.

响应URL:

http://www.siteSection1.domainName.com

If语句:

if 'domainName.com' and ('siteSection1' or 'siteSection2' or 'siteSection3') in response.url:
    parsePageInDomain()

如果'siteSection1'是第一个出现在or列表中的,则以上语句为true(已分析页面),但如果响应URL相同但if语句如下,则不会解析该页面:

The above statement is true (the page is parsed) if 'siteSection1' is the first to appear in the list of or's but it will not parse the page if the response url is the same but the if statement were the following:

if 'domainName.com' and ('siteSection2' or 'siteSection1' or 'siteSection3') in response.url:
        parsePageInDomain()

我在这里做错了什么?我还不能很清楚地思考逻辑运算符发生了什么,任何指导都将不胜感激.谢谢!

What am I doing wrong here? I haven't been able to think through what is going on with the logical operators very clearly and any guidance would be greatly appreciated. Thanks!

推荐答案

or不能那样工作.尝试any:

or doesn't work that way. Try any:

if 'domainName.com' in response.url and any(name in response.url for name in ('siteSection1', 'siteSection2', 'siteSection3')):

此处发生的情况是or返回其两个参数的逻辑or-如果x计算为True,则x or y返回x,对于字符串来说,它表示不为空,或者如果x的评估结果不为True,则为y.因此('siteSection1' or 'siteSection2' or 'siteSection3')的计算结果为'siteSection1',因为当将其视为布尔值时,'siteSection1'True.

What's going on here is that or returns a logical or of its two arguments - x or y returns x if x evaluates to True, which for a string means it's not empty, or y if x does not evaluate to True. So ('siteSection1' or 'siteSection2' or 'siteSection3') evaluates to 'siteSection1' because 'siteSection1' is True when considered as a boolean.

此外,您还使用and组合条件.如果and返回第一个参数,则返回第一个参数;如果第一个参数计算为True,则返回第二个参数.因此,if x and y in z不会测试以查看xy是否都在z中. in的优先级高于and-我不得不 -这样 测试if x and (y in z).同样,domainName.com的评估结果为True,因此它仅返回y in z.

Moreover, you're also using and to combine your criteria. and returns its first argument if that argument evaluates to False, or its second if the first argument evaluates to True. Therefore, if x and y in z does not test to see whether both x and y are in z. in has higher precedence than and - and I had to look that up - so that tests if x and (y in z). Again, domainName.com evaluates as True, so this will return just y in z.

any是一个内置函数,该函数采用可迭代的布尔值,如果其中任何一个为True,则返回TrueFalse-True,否则返回False.一旦达到True值,它将立即停止工作,因此非常有效.我正在使用一个生成器表达式来告诉它继续检查您的三个可能的字符串,以查看它们中是否有任何一个在您的响应URL中.

any, conversely, is a built in function that takes an iterable of booleans and returns True or False - True if any of them are True, False otherwise. It stops its work as soon as it hits a True value, so it's efficient. I'm using a generator expression to tell it to keep checking your three different possible strings to see if any of them are in your response url.

这篇关于Python逻辑运算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆