python中不区分大小写的字符串类 [英] A case insensitive string class in python

查看:585
本文介绍了python中不区分大小写的字符串类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在python集合和字典键中执行不区分大小写的字符串比较。现在,要创建不区分大小写的集合和dict子类被证明是非常棘手的(请参阅:不区分大小写的字典,请注意)都使用较低的-嘿,甚至有被拒绝的 PEP ,尽管其范围是范围更广)。因此,我创建了一个不区分大小写的字符串类(利用@AlexMartelli的 answer ):

I need to perform case insensitive string comparisons in python in sets and dictionary keys. Now, to create sets and dict subclasses that are case insensitive proves surprisingly tricky (see: Case insensitive dictionary for ideas, note they all use lower - hey there's even a rejected PEP, albeit its scope is a bit broader). So I went with creating a case insensitive string class (leveraging this answer by @AlexMartelli):

class CIstr(unicode):
    """Case insensitive with respect to hashes and comparisons string class"""

    #--Hash/Compare
    def __hash__(self):
        return hash(self.lower())
    def __eq__(self, other):
        if isinstance(other, basestring):
            return self.lower() == other.lower()
        return NotImplemented
    def __ne__(self, other): return not (self == other)
    def __lt__(self, other):
        if isinstance(other, basestring):
            return self.lower() < other.lower()
        return NotImplemented
    def __ge__(self, other): return not (self < other)
    def __gt__(self, other):
        if isinstance(other, basestring):
            return self.lower() > other.lower()
        return NotImplemented
    def __le__(self, other): return not (self > other)

我完全知道较低 不足以涵盖unicode中所有字符串比较的情况,但是我正在重构现有代码,该代码使用了一个笨拙的类来进行字符串比较(内存和速度方面),但无论如何都使用了lower()-因此我可以对此进行修改稍后的阶段-再加上我使用的是python 2(如 unicode 所示)。我的问题是:

I am fully aware that lower is not really enough to cover all cases of string comparisons in unicode but I am refactoring existing code that used a much clunkier class for string comparisons (memory and speed wise) which anyway used lower() - so I can amend this on a later stage - plus I am on python 2 (as seen by unicode). My questions are:


  • 我是否正确设置了运算符?

  • did I get the operators right ?

这个类足以满足我的目的,因为我会小心地在dict中构造键并将元素设置为 CIstr 实例-我的目的是检查是否相等,包含,以不区分大小写的方式设置差异和类似操作。还是我遗漏了什么?

is this class enough for my purposes, given that I take care to construct keys in dicts and set elements as CIstr instances - my purposes being checking equality, containment, set differences and similar operations in a case insensitive way. Or am I missing something ?

缓存字符串的小写版本是否值得(例如在此古老的python食谱中所见:不区分大小写的字符串)。此评论表示不是-而且我想具有尽可能快的构造和尽可能小的尺寸,但是人们似乎包括了这个。

is it worth it to cache the lower case version of the string (as seen for instance in this ancient python recipe: Case Insensitive Strings). This comment suggests that not - plus I want to have construction as fast as possible and size as small as possible but people seem to include this.

Python 3兼容性提示赞赏!

Python 3 compatibility tips are appreciated !

微小的演示:

d = {CIstr('A'): 1, CIstr('B'): 2}
print 'a' in d # True
s = set(d)
print {'a'} - s # set([])


推荐答案

在您的演示中,您正在使用'a'在您的集合中查找内容。如果您尝试使用'A',则此方法将无效,因为'A'具有不同的哈希值。 d.keys()中的‘A’也是正确的,而d 中的‘A’则是错误的。通过声称等于具有不同哈希值的对象,您实际上已经创建了一种违反所有哈希值正常约定的类型。

In your demo you are using 'a' to look stuff up in your set. It wouldn't work if you tried to use 'A', because 'A' has a different hash. Also 'A' in d.keys() would be true, but 'A' in d would be false. You've essentially created a type that violates the normal contract of all hashes, by claiming to be equal to objects that have different hashes.

您可以将此答案与有关创建专用字典的答案,并有一个字典在尝试查找之前将所有可能的键都转换为 CIstr 。这样,您所有 CIstr 的转换都可以隐藏在字典类中。

You could combine this answer with the answers about creating specialised dicts, and have a dict that converted any possible key into CIstr before trying to look it up. Then all your CIstr conversions could be hidden away inside the dictionary class.

例如。

class CaseInsensitiveDict(dict):
    def __setitem__(self, key, value):
        super(CaseInsensitiveDict, self).__setitem__(convert_to_cistr(key), value)
    def __getitem__(self, key):
        return super(CaseInsensitiveDict, self).__getitem__(convert_to_cistr(key))
    # __init__, __contains__ etc.

(基于 https://stackoverflow.com/a/2082169/3890632

这篇关于python中不区分大小写的字符串类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆