如何从字符串列表中检索部分匹配 [英] How to retrieve partial matches from a list of strings

查看:45
本文介绍了如何从字符串列表中检索部分匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有关在 数字 列表中检索部分匹配项的方法,请访问:

For approaches to retrieving partial matches in a numeric list, go to:

Python:在列表中查找

但是,如果您正在寻找如何检索 字符串 列表的部分匹配项,您会发现以下答案中简明地解释了最佳方法.

But if you're looking for how to retrieve partial matches for a list of strings, you'll find the best approaches concisely explained in the answer below.

SO:具有部分匹配项的Python列表查找显示了如何返回 bool ,如果 list 包含部分匹配的元素(例如 begins ends 包含)一个特定的字符串.但是如何 返回元素本身 ,而不是 True False

SO: Python list lookup with partial match shows how to return a bool, if a list contains an element that partially matches (e.g. begins, ends, or contains) a certain string. But how can you return the element itself, instead of True or False

l = ['ones', 'twos', 'threes']
wanted = 'three'

此处,链接问题中的方法将使用以下方式返回 True :

Here, the approach in the linked question will return True using:

any(s.startswith(wanted) for s in l)

那么如何返回元素'threes'?

推荐答案

  • startswith in 中,返回布尔值
  • in 运算符是对成员资格的测试.
  • 这可以通过 list-comprehension filter
  • 来执行
  • 使用带有 in list-comprehension 是最快的测试实现.
  • 如果大小写不是问题,请考虑将所有单词映射为小写.
    • l = list(map(str.lower,l)).
      • startswith and in, return a Boolean
      • The in operator is a test of membership.
      • This can be performed with a list-comprehension or filter
      • Using a list-comprehension, with in, is the fastest implementation tested.
      • If case is not an issue, consider mapping all the words to lowercase.
        • l = list(map(str.lower, l)).
          • 使用 filter 创建一个 filter 对象,因此 list()用于显示 list 中的所有匹配值.
          • Using filter creates a filter object, so list() is used to show all the matching values in a list.
          l = ['ones', 'twos', 'threes']
          wanted = 'three'
          
          # using startswith
          result = list(filter(lambda x: x.startswith(wanted), l))
          
          # using in
          result = list(filter(lambda x: wanted in x, l))
          
          print(result)
          [out]:
          ['threes']
          

          列表理解

          l = ['ones', 'twos', 'threes']
          wanted = 'three'
          
          # using startswith
          result = [v for v in l if v.startswith(wanted)]
          
          # using in
          result = [v for v in l if wanted in v]
          
          print(result)
          [out]:
          ['threes']
          

          哪种实施速度更快?

          • 使用 nltk
          • 中的 words 语料库
          • 带有'three'的单词
            • [三",三折",三折",三折",三折",三折",三折",三折","Theepence","theepenny","threepennyworth","threescore","threesome"]
            • Which implementation is faster?

              • Using the words corpus from nltk
              • Words with 'three'
                • ['three', 'threefold', 'threefolded', 'threefoldedness', 'threefoldly', 'threefoldness', 'threeling', 'threeness', 'threepence', 'threepenny', 'threepennyworth', 'threescore', 'threesome']
                • from nltk.corpus import words
                  
                  %timeit list(filter(lambda x: x.startswith(wanted), words.words()))
                  [out]:
                  47.4 ms ± 1.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
                  
                  %timeit list(filter(lambda x: wanted in x, words.words()))
                  [out]:
                  27 ms ± 1.78 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
                  
                  %timeit [v for v in words.words() if v.startswith(wanted)]
                  [out]:
                  34.1 ms ± 768 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
                  
                  %timeit [v for v in words.words() if wanted in v]
                  [out]:
                  14.5 ms ± 63.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
                  

                  这篇关于如何从字符串列表中检索部分匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆