从字典编码(非常长)的字符串是一种有效的方法? (蟒蛇) [英] What's an efficient way to encode a (very long) string from a dictionary? (Python)

查看:139
本文介绍了从字典编码(非常长)的字符串是一种有效的方法? (蟒蛇)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字典汇编为{'要编码的字符':'对应的二进制代码'等}。我一直在编码:

  def encode(self,text):
encoded =
def generator():
for text in text:
yield self.codes [ch]#从字典获取编码的代码
return''.join(generator())

这对小字符串来说可以正常工作,但是对于小字符串来说,它的速度太慢,无法使用。对这样的字符串进行编码更快的方式是什么?或者我应该完全重新思考如何存储和操纵我的数据?



更多代码:



使用 print c.encode(f)进行测试,其中f是一个字符串(我刚刚检查过),c是编码器对象。这适用于较短的文件 - 我已经测试了最多3000个字符。感谢thg435我的编码功能现在是

  def encode(self,text):
return''.join map(self.codes.get,text))

self.codes 是一个映射字典 - 当字符串hello被输入时,它将被设置为 {'h':'01','e':'00','l ':'10','o':'11'} 。我觉得我应该放更多的代码,但我已经测试了参数('text')和字典,所以我不知道什么是相关的,因为它们似乎是唯一可能影响这个函数运行时的事情。在编码之前调用的函数在速度方面工作正常 - 我知道这一点,因为我一直在使用print语句来检查它们的输出,并且总是在执行时间的几秒钟内打印。

解决方案

这似乎是最快的:

  ''.join(map(codes.get,text))

计时:

  codes = {chr(n):范围(255)中的n的'%%'} 


def encode1(text):
return''.join(code [c] for c in text)

def encode2(text):
import re
return re.sub(r'。',lambda m:codes [m.group()],text)

def encode3(text):
return''。 join(map(codes.get,text))


import timeit

a ='foobarbaz'* 1000

打印时间。 timeit(lambda:encode1(a),number = 100)
print timeit.timeit(lambda:encode2(a),number = 100)
print timeit.timeit mbda:encode3(a),number = 100)


#0.113456964493
#0.445501089096
#0.0811159610748


I have a dictionary assembled in the format {'character to be encoded':'corresponding binary code', etc.}. I've been encoding like this:

def encode(self, text): 
    encoded = ""
    def generator():
        for ch in text:
            yield self.codes[ch]  # Get the encoded representation from the dictionary
    return ''.join(generator())

This works fine for short strings, but for novel-length strings it is so slow that it's unusable. What's a faster way to encode a string like this? Or should I completely rethink how I store and manipulate my data?

More code:

I've been testing using print c.encode(f), where f is a string (I just checked this), and c is the encoder object. This works for shorter files - I've tested up to 3000 characters. Thanks to thg435 my encode function is now

 def encode(self, text):
        return ''.join(map(self.codes.get,text))

self.codes is a dictionary of mappings - when the string 'hello' is input it will be set to {'h': '01', 'e': '00', 'l': '10', 'o': '11'}. I feel like I should put more code but I've tested the argument ('text') and the dictionary, so I'm not sure what would be relevant as they seem to be the only things that could affect the runtime of this function. The functions that get called before encode work fine in terms of speed - I know this because I have been using print statements to check their output and it is always printed within a couple of seconds of the time of execution.

解决方案

This appears to be the fastest:

''.join(map(codes.get, text))

Timings:

codes = {chr(n): '[%d]' % n for n in range(255)}


def encode1(text): 
    return ''.join(codes[c] for c in text)

def encode2(text): 
    import re
    return re.sub(r'.', lambda m: codes[m.group()], text)

def encode3(text): 
    return ''.join(map(codes.get, text))


import timeit

a = 'foobarbaz' * 1000

print timeit.timeit(lambda: encode1(a), number=100)
print timeit.timeit(lambda: encode2(a), number=100)
print timeit.timeit(lambda: encode3(a), number=100)


# 0.113456964493
# 0.445501089096
# 0.0811159610748

这篇关于从字典编码(非常长)的字符串是一种有效的方法? (蟒蛇)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆