在python中使用特定语言环境对字符串进行排序的列表 [英] Sorting list of string with specific locale in python

查看:80
本文介绍了在python中使用特定语言环境对字符串进行排序的列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在使用不同语言文本的应用程序上工作,因此,出于查看或报告目的,某些文本(字符串)需要以特定语言进行排序.

I work on an application that uses texts from different languages, so, for viewing or reporting purposes, some texts (strings) need to be sorted in a specific language.

当前,我有一个变通方法,搞砸了全局语言环境设置,这很糟糕,我不想将其投入生产:

Currently I have a workaround messing with the global locale settings, which is bad, and I don't want to put it in production:

default_locale = locale.getlocale(locale.LC_COLLATE)

def sort_strings(strings, locale_=None):
    if locale_ is None:
        return sorted(strings)

    locale.setlocale(locale.LC_COLLATE, locale_)
    sorted_strings = sorted(strings, cmp=locale.strcoll)
    locale.setlocale(locale.LC_COLLATE, default_locale)

    return sorted_strings

官方的python语言环境文档明确表示保存和还原是个坏主意,但未给出任何建议:

The official python locale documentation explicitly says that saving and restoring is a bad idea, but does not give any suggestions: http://docs.python.org/library/locale.html#background-details-hints-tips-and-caveats

推荐答案

Glibc确实支持具有显式状态的语言环境API.这是用ctypes制作的API的快速包装.

Glibc does support a locale API with an explicit state. Here's a quick wrapper for that API made with ctypes.

# -*- coding: utf-8
import ctypes


class Locale(object):
    def __init__(self, locale):
        LC_ALL_MASK = 8127
        # LC_COLLATE_MASK = 8
        self.libc = ctypes.CDLL("libc.so.6")
        self.ctx = self.libc.newlocale(LC_ALL_MASK, locale, 0)



    def strxfrm(self, src, iteration=1):
        size = 3 * iteration * len(src)
        dest =  ctypes.create_string_buffer('\000' * size)
        n = self.libc.strxfrm_l(dest, src, size,  self.ctx)
        if n < size:
            return dest.value
        elif iteration<=4:
            return self.strxfrm(src, iteration+1)
        else:
            raise Exception('max number of iterations trying to increase dest reached')


    def __del__(self):
        self.libc.freelocale(self.ctx)

和简短测试

locale1 = Locale('C')
locale2 = Locale('mk_MK.UTF-8')

a_list = ['а', 'б', 'в', 'ј', 'ќ', 'џ', 'ш']
import random
random.shuffle(a_list)

assert sorted(a_list, key=locale1.strxfrm) == ['а', 'б', 'в', 'ш', 'ј', 'ќ', 'џ']
assert sorted(a_list, key=locale2.strxfrm) == ['а', 'б', 'в', 'ј', 'ќ', 'џ', 'ш']

剩下要做的是实现所有语言环境功能,支持python unicode字符串(我猜想是带wchar *函数),并自动导入包含文件定义或其他内容

what's left to do is implement all the locale functions, support for python unicode strings (with wchar* functions I guess), and automatically import the include file definitions or something

这篇关于在python中使用特定语言环境对字符串进行排序的列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆