Python逻辑操作 [英] Python Logical Operation
问题描述
我对 python 很陌生,我正在使用 Scrapy 库进行网络抓取项目.我没有使用内置域限制,因为我想检查域外页面的任何链接是否已失效.但是,我仍然希望将域内的页面与域外的页面区别对待,并尝试在解析响应之前手动确定站点是否在域内.
I'm pretty new to python and I'm working on a web scraping project using the Scrapy library. I'm not using the built in domain restriction because I want to check if any of the links to pages outside the domain are dead. However, I still want to treat pages within the domain differently from those outside it and am trying to manually determine if a site is within the domain before parsing the response.
响应网址:
http://www.siteSection1.domainName.com
If 语句:
if 'domainName.com' and ('siteSection1' or 'siteSection2' or 'siteSection3') in response.url:
parsePageInDomain()
如果 'siteSection1' 是第一个出现在 or 的列表中,则上述语句为真(页面被解析),但如果响应 url 相同,则不会解析页面,但 if 语句如下:
The above statement is true (the page is parsed) if 'siteSection1' is the first to appear in the list of or's but it will not parse the page if the response url is the same but the if statement were the following:
if 'domainName.com' and ('siteSection2' or 'siteSection1' or 'siteSection3') in response.url:
parsePageInDomain()
我在这里做错了什么?我一直无法非常清楚地思考逻辑运算符的情况,任何指导将不胜感激.谢谢!
What am I doing wrong here? I haven't been able to think through what is going on with the logical operators very clearly and any guidance would be greatly appreciated. Thanks!
推荐答案
or
不能那样工作.尝试 any
:
or
doesn't work that way. Try any
:
if 'domainName.com' in response.url and any(name in response.url for name in ('siteSection1', 'siteSection2', 'siteSection3')):
这里发生的事情是 or
返回其两个参数的逻辑 or
- x or y
返回 x
如果 x
的计算结果为 True
,这对于字符串意味着它不是空的,或者 y
如果 x
是不评估为 True
.所以 ('siteSection1' or 'siteSection2' or 'siteSection3')
计算结果为 'siteSection1'
因为 'siteSection1'
是 True
当被视为布尔值时.
What's going on here is that or
returns a logical or
of its two arguments - x or y
returns x
if x
evaluates to True
, which for a string means it's not empty, or y
if x
does not evaluate to True
. So ('siteSection1' or 'siteSection2' or 'siteSection3')
evaluates to 'siteSection1'
because 'siteSection1'
is True
when considered as a boolean.
此外,您还可以使用 和
来组合您的条件.and
如果该参数的计算结果为 False
,则返回其第一个参数,如果第一个参数的计算结果为 True
,则返回其第二个参数.因此,if x and y in z
不会检测x
和y
是否都在z
中.in
的优先级高于 and
- 我不得不 查一下 - 这样测试 if x and (y in z)
.同样,domainName.com
评估为 True,因此这将仅返回 y in z
.
Moreover, you're also using and
to combine your criteria. and
returns its first argument if that argument evaluates to False
, or its second if the first argument evaluates to True
. Therefore, if x and y in z
does not test to see whether both x
and y
are in z
. in
has higher precedence than and
- and I had to look that up - so that
tests if x and (y in z)
. Again, domainName.com
evaluates as True, so this will return just y in z
.
any
是一个内置函数,它接受可迭代的布尔值并返回 True
或 False
- True
如果其中任何一个是 True
,否则 False
.它会在遇到 True
值后立即停止工作,因此它很高效.我正在使用生成器表达式告诉它继续检查您的三个不同的可能字符串,以查看它们中是否有任何一个在您的响应 url 中.
any
, conversely, is a built in function that takes an iterable of booleans and returns True
or False
- True
if any of them are True
, False
otherwise. It stops its work as soon as it hits a True
value, so it's efficient. I'm using a generator expression to tell it to keep checking your three different possible strings to see if any of them are in your response url.
这篇关于Python逻辑操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!