检查字符串是否在列表中,取决于最后两个字符 [英] Check if string in list, depending on last two characters

查看:63
本文介绍了检查字符串是否在列表中,取决于最后两个字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

设置

我正在使用Scrapy抓取房屋广告.每个广告我检索一个邮政编码,该邮政编码由四个数字和两个字母组成,例如1053ZM.

I am using Scrapy to scrape housing ads. Per ad I retrieve a postal code which consists of four numbers followed by 2 letters, e.g. 1053ZM.

我有一个Excel工作表,可通过以下方式将地区与邮政编码联系起来,

I have a excel sheet linking districts to postal codes in the following way,

district    postcode_min    postcode_max
   A           1011AB           1011BD
   A           1011BG           1011CE
   A           1011CH           1011CZ

因此,第二行指出范围为1011AB, 1011AC,..., 1011AZ, 1011BA,...,1011BD的邮政编码属于区A.

So, the second row states that postcodes ranging from 1011AB, 1011AC,..., 1011AZ, 1011BA,...,1011BD belong to district A.

实际列表包含1214行.

The actual list contains 1214 rows.


问题

我想使用其邮政编码和列表将每个广告与各自的地区进行匹配.

I'd like to match each ad with its respective district, using its postal code and the list.

我不确定什么是最好的方法以及如何做到这一点.

I am not sure what would be the best way to do this, and how to do this.

我想出了两种不同的方法:

I've come up with two different approaches:

  1. postcode_minpostcode_max之间创建所有邮政编码,将所有邮政编码及其各自的地区分配给字典,以便随后使用循环进行匹配.
  1. Create all postcodes between postcode_min and postcode_max, assign all postcodes and their respective districts to a dictionary to subsequently match using a loop.

即创建

 d = {'A': ['1011AB','1011AC',...,'1011BD',
            '1011BG','1011BH',...,'1011CE',
            '1011CH','1011CI',...,'1011CZ'],
      'B': [...],           
      }

然后

found = False
for distr in d.keys(): # loop over districts
     for code in d[distr]: # loop over district's postal codes
         if postal_code in code: # assign if ad's postal code in code                 
             district = distr
             found = True
             break
         else:
             district = 'unknown'
     if found:
         break

  1. 让Python理解postcode_minpostcode_max之间存在一个范围,将范围及其各自的区域分配给字典,并使用循环进行匹配.
  1. Make Python understand there is a range between the postcode_min and the postcode_max, assign ranges and their respective districts to a dictionary, and match using a loop.

即像

d = {'A': [range(1011AB,1011BD), range(1011BG,1011CE),range(1011CH,1011CZ)],
     'B': [...]
    }

然后

found = False
for distr in d.keys(): # loop over districts
     for range in d[distr]: # loop over district's ranges
         if postal_code in range: # assign if ad's postal code in range                 
             district = distr
             found = True
             break
         else:
             district = 'unknown'
     if found:
         break


问题

对于方法1:

  • 如何创建所有邮政编码并将其分配给词典?

对于方法2:

我将range()用于说明目的,但是我知道range()不能像这样工作.

I used range() for explanatory purpose but I know range() does not work like this.

  • 如上例所示,我需要什么才能有效地拥有range()?
  • 如何正确遍历这些范围?

我认为我更喜欢方法2,但我很高兴与任何一个一起工作.或使用其他解决方案(如果有).

I think my preference lies with approach 2, but I am happy to work with either one. Or with another solution if you have one.

推荐答案

您可以像这样在excel中收集值

You can just collect the values in excel like this

d = {'A': ['1011AB', '1011BD', '1011BG', '1011CE',  '1011CH', '1011CZ'],
     'B': ['1061WB', '1061WB'],
     }

def is_in_postcode_range(current_postcode, min, max):
    return min <= current_postcode <= max

def get_district_by_post_code(postcode):
    for district, codes in d.items():
        first_code = codes[0]
        last_code = codes[-1]
        if is_in_postcode_range(postcode, first_code, last_code):
            if any(is_in_postcode_range(postcode, codes[i], codes[i+1]) for i in range(0, len(codes), 2)):
                return district
            else:
                return None

用法:

print get_district_by_post_code('1011AC'): A
print get_district_by_post_code('1011BE'): None
print get_district_by_post_code('1061WB'): B

这篇关于检查字符串是否在列表中,取决于最后两个字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆