检查字符串是否在列表中,取决于最后两个字符 [英] Check if string in list, depending on last two characters
问题描述
设置
我正在使用Scrapy抓取房屋广告.每个广告我检索一个邮政编码,该邮政编码由四个数字和两个字母组成,例如1053ZM
.
I am using Scrapy to scrape housing ads. Per ad I retrieve a postal code which consists of four numbers followed by 2 letters, e.g. 1053ZM
.
我有一个Excel工作表,可通过以下方式将地区与邮政编码联系起来,
I have a excel sheet linking districts to postal codes in the following way,
district postcode_min postcode_max
A 1011AB 1011BD
A 1011BG 1011CE
A 1011CH 1011CZ
因此,第二行指出范围为1011AB, 1011AC,..., 1011AZ, 1011BA,...,1011BD
的邮政编码属于区A
.
So, the second row states that postcodes ranging from 1011AB, 1011AC,..., 1011AZ, 1011BA,...,1011BD
belong to district A
.
实际列表包含1214行.
The actual list contains 1214 rows.
问题
我想使用其邮政编码和列表将每个广告与各自的地区进行匹配.
I'd like to match each ad with its respective district, using its postal code and the list.
我不确定什么是最好的方法以及如何做到这一点.
I am not sure what would be the best way to do this, and how to do this.
我想出了两种不同的方法:
I've come up with two different approaches:
- 在
postcode_min
和postcode_max
之间创建所有邮政编码,将所有邮政编码及其各自的地区分配给字典,以便随后使用循环进行匹配.
- Create all postcodes between
postcode_min
andpostcode_max
, assign all postcodes and their respective districts to a dictionary to subsequently match using a loop.
即创建
d = {'A': ['1011AB','1011AC',...,'1011BD',
'1011BG','1011BH',...,'1011CE',
'1011CH','1011CI',...,'1011CZ'],
'B': [...],
}
然后
found = False
for distr in d.keys(): # loop over districts
for code in d[distr]: # loop over district's postal codes
if postal_code in code: # assign if ad's postal code in code
district = distr
found = True
break
else:
district = 'unknown'
if found:
break
- 让Python理解
postcode_min
和postcode_max
之间存在一个范围,将范围及其各自的区域分配给字典,并使用循环进行匹配.
- Make Python understand there is a range between the
postcode_min
and thepostcode_max
, assign ranges and their respective districts to a dictionary, and match using a loop.
即像
d = {'A': [range(1011AB,1011BD), range(1011BG,1011CE),range(1011CH,1011CZ)],
'B': [...]
}
然后
found = False
for distr in d.keys(): # loop over districts
for range in d[distr]: # loop over district's ranges
if postal_code in range: # assign if ad's postal code in range
district = distr
found = True
break
else:
district = 'unknown'
if found:
break
问题
对于方法1:
- 如何创建所有邮政编码并将其分配给词典?
对于方法2:
我将range()
用于说明目的,但是我知道range()
不能像这样工作.
I used range()
for explanatory purpose but I know range()
does not work like this.
- 如上例所示,我需要什么才能有效地拥有
range()
? - 如何正确遍历这些范围?
我认为我更喜欢方法2,但我很高兴与任何一个一起工作.或使用其他解决方案(如果有).
I think my preference lies with approach 2, but I am happy to work with either one. Or with another solution if you have one.
推荐答案
您可以像这样在excel中收集值
You can just collect the values in excel like this
d = {'A': ['1011AB', '1011BD', '1011BG', '1011CE', '1011CH', '1011CZ'],
'B': ['1061WB', '1061WB'],
}
def is_in_postcode_range(current_postcode, min, max):
return min <= current_postcode <= max
def get_district_by_post_code(postcode):
for district, codes in d.items():
first_code = codes[0]
last_code = codes[-1]
if is_in_postcode_range(postcode, first_code, last_code):
if any(is_in_postcode_range(postcode, codes[i], codes[i+1]) for i in range(0, len(codes), 2)):
return district
else:
return None
用法:
print get_district_by_post_code('1011AC'): A
print get_district_by_post_code('1011BE'): None
print get_district_by_post_code('1061WB'): B
这篇关于检查字符串是否在列表中,取决于最后两个字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!