如何在Python中创建GSM-7编码? [英] How to create a GSM-7 encoding in Python?

查看:164
本文介绍了如何在Python中创建GSM-7编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

GSM-7字符集被定义为基本映射表+扩展字符映射表( https://en.wikipedia.org/wiki/GSM_03.38#GSM_7-bit_default_alphabet_and_extension_table_of_3GPP_TS_23.038_.2F_GSM_03.38 )。
意思是将 u'@'映射到 b'\x00'(一个字节串的长度为1),但 u'['应映射到 b'\x1b<' b'\x1b\x3c'(长度为2的字节串)

The GSM-7 character set is defined as a basic mapping table + an extension character mapping table (https://en.wikipedia.org/wiki/GSM_03.38#GSM_7-bit_default_alphabet_and_extension_table_of_3GPP_TS_23.038_.2F_GSM_03.38). Meaning that u'@' should be mapped to b'\x00' (a byte string of length 1), but u'[' should be mapped to b'\x1b<' or b'\x1b\x3c' (a byte string of length 2).

我设法得到编码部分通过扩展 encoding_table 来工作,但是我不知道该如何用 decode_table 进行操作。 。

I've managed to get the encoding part to work by extending the encoding_table, but I'm not sure what to do that with the decoding_table..?

以下是编解码器的完整性样板:

Here is the codec boilerplate for completeness:

import codecs
from encodings import normalize_encoding

class GSM7Codec(codecs.Codec):
    def encode(self, input, errors='strict'):
        return codecs.charmap_encode(input, errors, encoding_table)

    def decode(self, input, errors='strict'):
        return codecs.charmap_decode(input, errors, decoding_table)

class GSM7IncrementalEncoder(codecs.IncrementalEncoder):
    def encode(self, input, final=False):
        return codecs.charmap_encode(input, self.errors, encoding_table)[0]

class GSM7IncrementalDecoder(codecs.IncrementalDecoder):
    def decode(self, input, final=False):
        return codecs.charmap_decode(input, self.errors, decoding_table)[0]

class GSM7StreamWriter(codecs.Codec, codecs.StreamWriter): pass
class GSM7StreamReader(codecs.Codec, codecs.StreamReader): pass

_cache = {}

def search_function(encoding):
    """Register the gsm-7 encoding with Python's codecs API. This involves
       adding a search function that takes in an encoding name, and returns
       a codec for that encoding if it knows one, or None if it doesn't.
    """
    if encoding in _cache:
        return _cache[encoding]
    norm_encoding = normalize_encoding(encoding)
    if norm_encoding in ('gsm_7', 'g7', 'gsm7'):
        cinfo = codecs.CodecInfo(
            name='gsm-7',
            encode=GSM7Codec().encode,
            decode=GSM7Codec().decode,
            incrementalencoder=GSM7IncrementalEncoder,
            incrementaldecoder=GSM7IncrementalDecoder,
            streamreader=GSM7StreamReader,
            streamwriter=GSM7StreamWriter,
        )
        _cache[norm_encoding] = cinfo
        return cinfo
    return None

codecs.register(search_function)

这里是表定义:

decoding_table = (
    u"@£$¥èéùìòÇ\nØø\rÅå" +
    u"Δ_ΦΓΛΩΠΨΣΘΞ\x1bÆæßÉ" +
    u" !\"#¤%&'()*+,-./" +
    u"0123456789:;<=>?" +
    u"¡ABCDEFGHIJKLMNO" +
    u"PQRSTUVWXYZÄÖÑܧ" +
    u"¿abcdefghijklmno" +
    u"pqrstuvwxyzäöñüà"
)

encoding_table = codecs.charmap_build(
    decoding_table + '\0' * (256 - len(decoding_table))
)

# extending the encoding table with extension characters
encoding_table[ord(u'|')] = '\x1b\x40'
encoding_table[ord(u'^')] = '\x1b\x14'
encoding_table[ord(u'€')] = '\x1b\x65'
encoding_table[ord(u'{')] = '\x1b\x28'
encoding_table[ord(u'}')] = '\x1b\x29'
encoding_table[ord(u'[')] = '\x1b\x3C'
encoding_table[ord(u'~')] = '\x1b\x3D'
encoding_table[ord(u']')] = '\x1b\x3E'
encoding_table[ord(u'\\')] = '\x1b\x2F'

现在的编码部分工作s,但解码不会:

The encoding part now works, but decoding does not:

>>> u'['.encode('g7')
'\x1b<'
>>> _.decode('g7')
u'\x1b<'
>>>

我无法找到有关编写编码的文档的好消息。

I haven't been able to find a good source for documentation about writing an encoding.

推荐答案

根据@Martijn Pieters的评论,我已将代码更改为:

Based on @Martijn Pieters comment, I've changed the code to:

def decode_gsm7(txt, errors):
    ext_table = {
        '\x40': u'|',
        '\x14': u'^',
        '\x65': u'€',
        '\x28': u'{',
        '\x29': u'}',
        '\x3C': u'[',
        '\x3D': u'~',
        '\x3E': u']',
        '\x2F': u'\\',
    }
    chunks = filter(None, txt.split('\x1b'))  # split on ESC
    res = u''
    for chunk in chunks:
        res += ext_table[chunk[0]]  # first character after ESC
        if len(chunk) > 1:
            # charmap_decode returns a tuple..
            decoded, _ = codecs.charmap_decode(chunk[1:], errors, decoding_table)
            res += decoded
    return res, len(txt)


class GSM7Codec(codecs.Codec):

    def encode(self, txt, errors='strict'):
        return codecs.charmap_encode(txt, errors, encoding_table)

    def decode(self, txt, errors='strict'):
        return decode_gsm7(txt, errors)

似乎有效: - )

这篇关于如何在Python中创建GSM-7编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆