灵活整理(请反馈) [英] Flexable Collating (feedback please)

查看:87
本文介绍了灵活整理(请反馈)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述




我今天整理了以下模块,并希望对任何明显的问题提供一些反馈。或者甚至天气的意见都是一个很好的方法。


虽然整理对于有经验的程序员来说并不困难,但是我已经看到了很多在商业应用程序中有很多排序不好的列表,所以它看起来很好用一个易于使用的现成API进行整理。


我试图使这既易于使用又灵活。我的第一个想法是

尝试和目标实际用途,如电话目录排序,或图书馆排序,

等,但它似乎使用关键字来改变行为是更容易和更多

灵活。


我认为我用来解析前导和尾随数字的正则表达式可以改进
。它们有效,但如果字符串格式不正确,则可能会得到不一致的结果。对此有任何建议将不胜感激。


我应该尝试扩展它以涵盖日期和货币分类?可能那些

类型应该在排序之前进行转换,但有时它可能很有用

不要?


另一种变化正在整理露水十进制字符串。如果有人认为可能有用,那么添加

应该很容易。


我还没有测试过* *,所以不要不要把它插入生产代码中
任何类型的
。我也没有做任何性能测试。


请参阅下面的doc测试,了解它是如何使用的。


干杯,

Ron Adam


""

Collat​​e.py


通用可配置整理模块。


可以使用以下关键字修改整理:

CAPS_FIRST -Aaa,aaa,Bbb ,bbb

HYPHEN_AS_SPACE - 不要忽略连字符

UNDERSCORE_AS_SPACE -Underscores as white space

IGNORE_LEADING_WS -Disregard领先的空白区域

NUMERICAL -Digit序列作为数字

COMMA_IN_NUMERALS - 用数字表示逗号


*参见doctests的例子。


作者:Ron Adam, ro*@ronadam.com ,2006年10月18日


""

import re

导入区域设置

locale.se tlocale(locale.LC_ALL,'''')#使用当前的语言环境设置


#上面的行可能会改变字符串中的字符串常量

#module 。如果您的程序

#假定它们始终是ascii默认值,则可能会产生意想不到的影响。

CAPS_FIRST = 1

NUMERICAL = 2
HYPHEN_AS_SPACE = 4

UNDERSCORE_AS_SPACE = 8

IGNORE_LEADING_WS = 16

COMMA_IN_NUMERALS = 32


class Collat​​e(object):

"""一个通用和可配置的整理器类。

""

def __init __(self,flag):

self.flag = flag

def transform(self,s):

"""转换字符串进行整理。

"""

if self.flag& CAPS_FIRST:

s = s.swapcase()

if self.flag& HYPHEN_AS_SPACE:

s = s.replace('' - '','''')

if self.flag& UNDERSCORE_AS_SPACE:

s = s.replace(''_'','''')

if self.flag& IGNORE_LEADING_WS:

s = s.strip()

if self.flag& NUMERICAL:

如果self.flag& COMMA_IN_NUMERALS:

rex = re.compile(''^(\d * \,?\d * \。?\d *)(\ D *)(\ d * \,?\d * \。?\d *)'',

re.LOCALE)

else:

rex = re.compile(''^(\d * \。?\d *)(\ D *)(\d * \。?\ d *)'',re。 LOCALE)

slist = rex.split(s)

for i,x in enumerate(slist):

if self.flag& COMMA_IN_NUMERALS:

x = x.replace('','','''')

试试:

slist [i] = float (x)

除外:

slist [i] = locale.strxfrm(x)

返回slist

return locale.strxfrm(s)

def __call __(self,a,b):

"""这允许Collat​​e类作为排序键。


USE:list.sort(key = Collat​​e(flags))

"""

返回cmp(self.transform(a),self.transform(b))


def collat​​e(slist,flags = 0):

"""整理字符串列表。

"""

返回slist.sort(整理(标记))


def collat​​ed(slist,flags = 0):

"""返回一个整理的字符串列表。


这是一个装饰 - 未装饰的整理。

"""

collat​​or =整理(旗帜)

dd = [(collat​​or.transform(x),x)for sl in slist]

dd.sort()

返回列表([B代表(A,B)为dd])


def _test():

"""

DOC测试和示例:


排序(和排序)通常在以小写字母开头的所有单词之前订购以caps开头的所有单词



>> t = [''tuesday'',''Tuesday' ','''周一'',''星期一'']
sorted(t)#regular sort



[ ''星期一'',''星期二','星期一'','星期二'']


语言环境整理会在单词后面加上大写字母

开头相同字母的小写。


>> collat​​ed(t)



[''星期一'','星期一'','星期二'','星期二''] >

CAPS_FIRST选项可用于将所有单词开头

,并以相同字母的小写字母开头。


>> collat​​ed(t,CAPS_FIRST)



[''星期一'','星期一'',''星期二','星期二'']

HYPHEN_AS_SPACE选项导致连字符等于空格。


>> t = [''a-b'',''b -a'',''a a-b'',''bb-a'']
整理(t)



[''aa -b'',''a-b'',''b-a'',''bb-a'']

< blockquote class =post_quotes>
>> collat​​ed(t,HYPHEN_AS_SPACE)



[''a- b'',''aa-b'',''b-a'',''bb-a'']

IGNORE_LEADING_WS和UNDERSCORE_AS_SPACE选项可以是

在某些情况下一起使用以改善排序。


>> t = [' 'sum'',''_ _ _ _ _ _ _''''''''''''''''''''整理(t)



[''round'','_ _ _ _ _ _ _'''''''''''''''''''


>> collat​​ed(t,IGNORE_LEADING_WS)



[''_ _ _ _ _ _ _ _''''''''''''''''''''''''''''''''''''''''''''}


>> collat​​ed(t,UNDERSCORE_AS_SPACE)



[''round'' ,'__str__'',''关于'',''sum'']


>>整理(t,IGNORE_LEADING_WS | UNDERSCORE_AS_SPACE)



[''about'',''round'', ''__str__'',''sum'']

NUMERICAL选项将前导和尾随数字排序为数字。


>> t = [''a5'',''a40'',''4abc'',''20abc'',''a10.2'' ,''13 .5b'',''b2'']
整理(t,NUMERICAL)



[' '4abc'',''13 .5b'',''20abc'',''a5'',''a10.2'',''a40'',''b2'']

COMMA_IN_NUMERALS选项忽略逗号而不是使用它们来表示

单独的数字。


>> t = [''a5'',''a4,000'',''500b'',''100,000b'']
整理(t,NUMERICAL | COMMA_IN_NUMERALS )



[''500b'',''100,000b'',''a5'',''a4,000 '']

整理也可以使用collat​​e()而不是collat​​ed()来完成。


>> t = [''Fred'',''Ron'',''Carol'','' Bob'']
整理(t)
t



[''Bob'','' Carol'',''Fred'',''Ron'']


""

import doctest

doctest.testmod()

if __name__ ==''__ main__'':

_test()

解决方案



固定...

更改collat​​e()函数返回None与sort()相同,因为它是一个

到位整理。


_test()doctests中的评论被颠倒了。 CAPS_FIRST选项将单词

以大写字母开头,而不是之后,以小写字母开头的单词

相同的字母。

似乎我总是我发帖后发现了一些明显的故障。 ;-)


干杯,

Ron





10月18日凌晨2:42,Ron Adam< r ... @ ronadam.com写道:


我今天整理了以下模块像任何反馈任何

明显的问题。或者甚至天气的意见,这是一个很好的方法。



,,,

def __call __(self,a,b):

""" ;这允许Collat​​e类作为排序键。


USE:list.sort(key = Collat​​e(flags))

"""

返回cmp(self.transform(a),self.transform(b))


您将_call__文档用作key的用途。要排序的关键字,但是你要为cmp实现它。关键词。 密钥允许更好的

性能,因为它每个值只调用一次。也许只是:

返回self.transform(a)


- George


< blockquote> ge**********@gmail.com 写道:


>

10月18日凌晨2:42,Ron Adam< r ... @ ronadam.comwrote:
< blockquote class =post_quotes>
>我今天整理了以下模块,希望对任何明显的问题提供一些反馈。或者甚至天气的意见,这是一个很好的方法。



,,,

def __call __(self,a,b):

""" ;这允许Collat​​e类作为排序键。


USE:list.sort(key = Collat​​e(flags))

"""

返回cmp(self.transform(a),self.transform(b))


您将_call__文档用作key的用途。要排序的关键字,但是你要为cmp实现它。关键词。 密钥允许更好的

性能,因为它每个值只调用一次。也许只是:

返回self.transform(a)


- George



谢谢,我将其更改为以下内容......


def __call __(self,a):

"""这允许Collat​​e类作为排序键。


USE:list.sort(key = Collat​​e(flags))

"""

返回self.transform(a)


这里还改变了排序调用...

def collat​​e(slist, flags = 0):

"""整理字符串列表。

""

slist.sort(key = Collat​​e(flags))<<<

今天我会做一些性能测试,看看适度的

大小的列表有多快。

干杯,

Ron





I put together the following module today and would like some feedback on any
obvious problems. Or even opinions of weather or not it is a good approach.

While collating is not a difficult thing to do for experienced programmers, I
have seen quite a lot of poorly sorted lists in commercial applications, so it
seems it would be good to have an easy to use ready made API for collating.

I tried to make this both easy to use and flexible. My first thoughts was to
try and target actual uses such as Phone directory sorting, or Library sorting,
etc., but it seemed using keywords to alter the behavior is both easier and more
flexible.

I think the regular expressions I used to parse leading and trailing numerals
could be improved. They work, but you will probably get inconsistent results if
the strings are not well formed. Any suggestions on this would be appreciated.

Should I try to extend it to cover dates and currency sorting? Probably those
types should be converted before sorting, but maybe sometimes it''s useful
not to?

Another variation is collating dewy decimal strings. It should be easy to add
if someone thinks that might be useful.

I haven''t tested this in *anything* yet, so don''t plug it into production code
of any type. I also haven''t done any performance testing.

See the doc tests below for examples of how it''s used.

Cheers,
Ron Adam

"""
Collate.py

A general purpose configurable collate module.

Collation can be modified with the following keywords:

CAPS_FIRST -Aaa, aaa, Bbb, bbb
HYPHEN_AS_SPACE -Don''t ignore hyphens
UNDERSCORE_AS_SPACE -Underscores as white space
IGNORE_LEADING_WS -Disregard leading white space
NUMERICAL -Digit sequences as numerals
COMMA_IN_NUMERALS -Allow commas in numerals

* See doctests for examples.

Author: Ron Adam, ro*@ronadam.com, 10/18/2006

"""
import re
import locale
locale.setlocale(locale.LC_ALL, '''') # use current locale settings

# The above line may change the string constants from the string
# module. This may have unintended effects if your program
# assumes they are always the ascii defaults.
CAPS_FIRST = 1
NUMERICAL = 2
HYPHEN_AS_SPACE = 4
UNDERSCORE_AS_SPACE = 8
IGNORE_LEADING_WS = 16
COMMA_IN_NUMERALS = 32

class Collate(object):
""" A general purpose and configurable collator class.
"""
def __init__(self, flag):
self.flag = flag
def transform(self, s):
""" Transform a string for collating.
"""
if self.flag & CAPS_FIRST:
s = s.swapcase()
if self.flag & HYPHEN_AS_SPACE:
s = s.replace(''-'', '' '')
if self.flag & UNDERSCORE_AS_SPACE:
s = s.replace(''_'', '' '')
if self.flag & IGNORE_LEADING_WS:
s = s.strip()
if self.flag & NUMERICAL:
if self.flag & COMMA_IN_NUMERALS:
rex = re.compile(''^(\d*\,?\d*\.?\d*)(\D*)(\d*\,?\d*\.?\d *)'',
re.LOCALE)
else:
rex = re.compile(''^(\d*\.?\d*)(\D*)(\d*\.?\d*)'', re.LOCALE)
slist = rex.split(s)
for i, x in enumerate(slist):
if self.flag & COMMA_IN_NUMERALS:
x = x.replace('','', '''')
try:
slist[i] = float(x)
except:
slist[i] = locale.strxfrm(x)
return slist
return locale.strxfrm(s)

def __call__(self, a, b):
""" This allows the Collate class work as a sort key.

USE: list.sort(key=Collate(flags))
"""
return cmp(self.transform(a), self.transform(b))

def collate(slist, flags=0):
""" Collate list of strings in place.
"""
return slist.sort(Collate(flags))

def collated(slist, flags=0):
""" Return a collated list of strings.

This is a decorate-undecorate collate.
"""
collator = Collate(flags)
dd = [(collator.transform(x), x) for x in slist]
dd.sort()
return list([B for (A, B) in dd])

def _test():
"""
DOC TESTS AND EXAMPLES:

Sort (and sorted) normally order all words beginning with caps
before all words beginning with lower case.

>>t = [''tuesday'', ''Tuesday'', ''Monday'', ''monday'']
sorted(t) # regular sort

[''Monday'', ''Tuesday'', ''monday'', ''tuesday'']

Locale collation puts words beginning with caps after words
beginning with lower case of the same letter.

>>collated(t)

[''monday'', ''Monday'', ''tuesday'', ''Tuesday'']

The CAPS_FIRST option can be used to put all words beginning
with caps after words beginning in lowercase of the same letter.

>>collated(t, CAPS_FIRST)

[''Monday'', ''monday'', ''Tuesday'', ''tuesday'']
The HYPHEN_AS_SPACE option causes hyphens to be equal to space.

>>t = [''a-b'', ''b-a'', ''aa-b'', ''bb-a'']
collated(t)

[''aa-b'', ''a-b'', ''b-a'', ''bb-a'']

>>collated(t, HYPHEN_AS_SPACE)

[''a-b'', ''aa-b'', ''b-a'', ''bb-a'']
The IGNORE_LEADING_WS and UNDERSCORE_AS_SPACE options can be
used together to improve ordering in some situations.

>>t = [''sum'', ''__str__'', ''about'', '' round'']
collated(t)

['' round'', ''__str__'', ''about'', ''sum'']

>>collated(t, IGNORE_LEADING_WS)

[''__str__'', ''about'', '' round'', ''sum'']

>>collated(t, UNDERSCORE_AS_SPACE)

['' round'', ''__str__'', ''about'', ''sum'']

>>collated(t, IGNORE_LEADING_WS|UNDERSCORE_AS_SPACE)

[''about'', '' round'', ''__str__'', ''sum'']
The NUMERICAL option orders leading and trailing digits as numerals.

>>t = [''a5'', ''a40'', ''4abc'', ''20abc'', ''a10.2'', ''13.5b'', ''b2'']
collated(t, NUMERICAL)

[''4abc'', ''13.5b'', ''20abc'', ''a5'', ''a10.2'', ''a40'', ''b2'']
The COMMA_IN_NUMERALS option ignores commas instead of using them to
seperate numerals.

>>t = [''a5'', ''a4,000'', ''500b'', ''100,000b'']
collated(t, NUMERICAL|COMMA_IN_NUMERALS)

[''500b'', ''100,000b'', ''a5'', ''a4,000'']
Collating also can be done in place using collate() instead of collated().

>>t = [''Fred'', ''Ron'', ''Carol'', ''Bob'']
collate(t)
t

[''Bob'', ''Carol'', ''Fred'', ''Ron'']

"""
import doctest
doctest.testmod()
if __name__ == ''__main__'':
_test()

解决方案


Fixed...
Changed the collate() function to return None the same as sort() since it is an
in place collate.

A comment in _test() doctests was reversed. CAPS_FIRST option puts words
beginning with capitals before, not after, words beginning with lower case of
the same letter.
It seems I always find a few obvious glitches right after I post something. ;-)

Cheers,
Ron




On Oct 18, 2:42 am, Ron Adam <r...@ronadam.comwrote:

I put together the following module today and would like some feedback on any
obvious problems. Or even opinions of weather or not it is a good approach.

,,,
def __call__(self, a, b):
""" This allows the Collate class work as a sort key.

USE: list.sort(key=Collate(flags))
"""
return cmp(self.transform(a), self.transform(b))

You document _call__ as useful for the "key" keyword to sort, but you
implement it for the "cmp" keyword. The "key" allows much better
performance, since it''s called only once per value. Maybe just :
return self.transform(a)

-- George


ge**********@gmail.com wrote:

>
On Oct 18, 2:42 am, Ron Adam <r...@ronadam.comwrote:

>I put together the following module today and would like some feedback on any
obvious problems. Or even opinions of weather or not it is a good approach.

,,,
def __call__(self, a, b):
""" This allows the Collate class work as a sort key.

USE: list.sort(key=Collate(flags))
"""
return cmp(self.transform(a), self.transform(b))

You document _call__ as useful for the "key" keyword to sort, but you
implement it for the "cmp" keyword. The "key" allows much better
performance, since it''s called only once per value. Maybe just :
return self.transform(a)

-- George


Thanks, I changed it to the following...

def __call__(self, a):
""" This allows the Collate class work as a sort key.

USE: list.sort(key=Collate(flags))
"""
return self.transform(a)

And also changed the sort call here ...
def collate(slist, flags=0):
""" Collate list of strings in place.
"""
slist.sort(key=Collate(flags)) <<<
Today I''ll do some performance tests to see how much faster it is for moderate
sized lists.
Cheers,
Ron



这篇关于灵活整理(请反馈)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆