什么是对字典中的(很长)字符串进行编码的有效方法?(Python) [英] What's an efficient way to encode a (very long) string from a dictionary? (Python)

查看:67
本文介绍了什么是对字典中的(很长)字符串进行编码的有效方法?(Python)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字典,其格式为{'待编码字符':'对应的二进制代码'等}.我一直在这样编码:

I have a dictionary assembled in the format {'character to be encoded':'corresponding binary code', etc.}. I've been encoding like this:

def encode(self, text): 
    encoded = ""
    def generator():
        for ch in text:
            yield self.codes[ch]  # Get the encoded representation from the dictionary
    return ''.join(generator())

这对于短字符串很有效,但是对于新颖长度的字符串,它是如此之慢以至于无法使用.编码这样的字符串的更快方法是什么?还是应该完全重新考虑如何存储和处理数据?

This works fine for short strings, but for novel-length strings it is so slow that it's unusable. What's a faster way to encode a string like this? Or should I completely rethink how I store and manipulate my data?

更多代码:

我一直在使用 print c.encode(f)进行测试,其中f是一个字符串(我刚刚检查过),而c是编码器对象.这适用于较短的文件-我已经测试了多达3000个字符.多亏了thg435,我的编码功能才有了

I've been testing using print c.encode(f), where f is a string (I just checked this), and c is the encoder object. This works for shorter files - I've tested up to 3000 characters. Thanks to thg435 my encode function is now

 def encode(self, text):
        return ''.join(map(self.codes.get,text))

self.codes 是一个映射字典-输入字符串'hello'时,它将设置为 {'h':'01','e':'00','l':'10','o':'11'} .我觉得我应该放更多的代码,但是我已经测试了参数(文本")和字典,所以我不确定什么是相关的,因为它们似乎是唯一可能影响此函数运行时间的因素..在 encode 之前调用的函数在速度方面工作正常 - 我知道这一点是因为我一直在使用打印语句来检查它们的输出,并且它总是在执行时间的几秒钟内打印出来.

self.codes is a dictionary of mappings - when the string 'hello' is input it will be set to {'h': '01', 'e': '00', 'l': '10', 'o': '11'}. I feel like I should put more code but I've tested the argument ('text') and the dictionary, so I'm not sure what would be relevant as they seem to be the only things that could affect the runtime of this function. The functions that get called before encode work fine in terms of speed - I know this because I have been using print statements to check their output and it is always printed within a couple of seconds of the time of execution.

推荐答案

这似乎是最快的:

''.join(map(codes.get, text))

时间:

codes = {chr(n): '[%d]' % n for n in range(255)}


def encode1(text): 
    return ''.join(codes[c] for c in text)

def encode2(text): 
    import re
    return re.sub(r'.', lambda m: codes[m.group()], text)

def encode3(text): 
    return ''.join(map(codes.get, text))


import timeit

a = 'foobarbaz' * 1000

print timeit.timeit(lambda: encode1(a), number=100)
print timeit.timeit(lambda: encode2(a), number=100)
print timeit.timeit(lambda: encode3(a), number=100)


# 0.113456964493
# 0.445501089096
# 0.0811159610748

这篇关于什么是对字典中的(很长)字符串进行编码的有效方法?(Python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆