在python中使用特定语言环境对字符串进行排序的列表 [英] Sorting list of string with specific locale in python
问题描述
我在使用不同语言文本的应用程序上工作,因此,出于查看或报告目的,某些文本(字符串)需要以特定语言进行排序.
I work on an application that uses texts from different languages, so, for viewing or reporting purposes, some texts (strings) need to be sorted in a specific language.
当前,我有一个变通方法,搞砸了全局语言环境设置,这很糟糕,我不想将其投入生产:
Currently I have a workaround messing with the global locale settings, which is bad, and I don't want to put it in production:
default_locale = locale.getlocale(locale.LC_COLLATE)
def sort_strings(strings, locale_=None):
if locale_ is None:
return sorted(strings)
locale.setlocale(locale.LC_COLLATE, locale_)
sorted_strings = sorted(strings, cmp=locale.strcoll)
locale.setlocale(locale.LC_COLLATE, default_locale)
return sorted_strings
官方的python语言环境文档明确表示保存和还原是个坏主意,但未给出任何建议:
The official python locale documentation explicitly says that saving and restoring is a bad idea, but does not give any suggestions: http://docs.python.org/library/locale.html#background-details-hints-tips-and-caveats
推荐答案
Glibc确实支持具有显式状态的语言环境API.这是用ctypes制作的API的快速包装.
Glibc does support a locale API with an explicit state. Here's a quick wrapper for that API made with ctypes.
# -*- coding: utf-8
import ctypes
class Locale(object):
def __init__(self, locale):
LC_ALL_MASK = 8127
# LC_COLLATE_MASK = 8
self.libc = ctypes.CDLL("libc.so.6")
self.ctx = self.libc.newlocale(LC_ALL_MASK, locale, 0)
def strxfrm(self, src, iteration=1):
size = 3 * iteration * len(src)
dest = ctypes.create_string_buffer('\000' * size)
n = self.libc.strxfrm_l(dest, src, size, self.ctx)
if n < size:
return dest.value
elif iteration<=4:
return self.strxfrm(src, iteration+1)
else:
raise Exception('max number of iterations trying to increase dest reached')
def __del__(self):
self.libc.freelocale(self.ctx)
和简短测试
locale1 = Locale('C')
locale2 = Locale('mk_MK.UTF-8')
a_list = ['а', 'б', 'в', 'ј', 'ќ', 'џ', 'ш']
import random
random.shuffle(a_list)
assert sorted(a_list, key=locale1.strxfrm) == ['а', 'б', 'в', 'ш', 'ј', 'ќ', 'џ']
assert sorted(a_list, key=locale2.strxfrm) == ['а', 'б', 'в', 'ј', 'ќ', 'џ', 'ш']
剩下要做的是实现所有语言环境功能,支持python unicode字符串(我猜想是带wchar *函数),并自动导入包含文件定义或其他内容
what's left to do is implement all the locale functions, support for python unicode strings (with wchar* functions I guess), and automatically import the include file definitions or something
这篇关于在python中使用特定语言环境对字符串进行排序的列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!