Python逻辑操作 [英] Python Logical Operation

查看:26
本文介绍了Python逻辑操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 python 很陌生,我正在使用 Scrapy 库进行网络抓取项目.我没有使用内置域限制,因为我想检查域外页面的任何链接是否已失效.但是,我仍然希望将域内的页面与域外的页面区别对待,并尝试在解析响应之前手动确定站点是否在域内.

I'm pretty new to python and I'm working on a web scraping project using the Scrapy library. I'm not using the built in domain restriction because I want to check if any of the links to pages outside the domain are dead. However, I still want to treat pages within the domain differently from those outside it and am trying to manually determine if a site is within the domain before parsing the response.

响应网址:

http://www.siteSection1.domainName.com

If 语句:

if 'domainName.com' and ('siteSection1' or 'siteSection2' or 'siteSection3') in response.url:
    parsePageInDomain()

如果 'siteSection1' 是第一个出现在 or 的列表中,则上述语句为真(页面被解析),但如果响应 url 相同,则不会解析页面,但 if 语句如下:

The above statement is true (the page is parsed) if 'siteSection1' is the first to appear in the list of or's but it will not parse the page if the response url is the same but the if statement were the following:

if 'domainName.com' and ('siteSection2' or 'siteSection1' or 'siteSection3') in response.url:
        parsePageInDomain()

我在这里做错了什么?我一直无法非常清楚地思考逻辑运算符的情况,任何指导将不胜感激.谢谢!

What am I doing wrong here? I haven't been able to think through what is going on with the logical operators very clearly and any guidance would be greatly appreciated. Thanks!

推荐答案

or 不能那样工作.尝试 any:

or doesn't work that way. Try any:

if 'domainName.com' in response.url and any(name in response.url for name in ('siteSection1', 'siteSection2', 'siteSection3')):

这里发生的事情是 or 返回其两个参数的逻辑 or - x or y 返回 x 如果 x 的计算结果为 True,这对于字符串意味着它不是空的,或者 y 如果 x 是不评估为 True.所以 ('siteSection1' or 'siteSection2' or 'siteSection3') 计算结果为 'siteSection1' 因为 'siteSection1'True 当被视为布尔值时.

What's going on here is that or returns a logical or of its two arguments - x or y returns x if x evaluates to True, which for a string means it's not empty, or y if x does not evaluate to True. So ('siteSection1' or 'siteSection2' or 'siteSection3') evaluates to 'siteSection1' because 'siteSection1' is True when considered as a boolean.

此外,您还可以使用 来组合您的条件.and 如果该参数的计算结果为 False,则返回其第一个参数,如果第一个参数的计算结果为 True,则返回其第二个参数.因此,if x and y in z 不会检测xy 是否都在z 中.in 的优先级高于 and - 我不得不 查一下 - 这样测试 if x and (y in z).同样,domainName.com 评估为 True,因此这将仅返回 y in z.

Moreover, you're also using and to combine your criteria. and returns its first argument if that argument evaluates to False, or its second if the first argument evaluates to True. Therefore, if x and y in z does not test to see whether both x and y are in z. in has higher precedence than and - and I had to look that up - so that tests if x and (y in z). Again, domainName.com evaluates as True, so this will return just y in z.

any 是一个内置函数,它接受可迭代的布尔值并返回 TrueFalse - True 如果其中任何一个是 True,否则 False.它会在遇到 True 值后立即停止工作,因此它很高效.我正在使用生成器表达式告诉它继续检查您的三个不同的可能字符串,以查看它们中是否有任何一个在您的响应 url 中.

any, conversely, is a built in function that takes an iterable of booleans and returns True or False - True if any of them are True, False otherwise. It stops its work as soon as it hits a True value, so it's efficient. I'm using a generator expression to tell it to keep checking your three different possible strings to see if any of them are in your response url.

这篇关于Python逻辑操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆