规范化字典值 [英] Normalizing dictionary values

查看:76
本文介绍了规范化字典值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个(相当大的)字典,其中包含数值,例如 data = {'a': 0.2, 'b': 0.3, ...} 的形式.标准化这些值的最佳方法是什么(确保值总和为 1)?

我特别感兴趣的是:对于某些数据集大小,使用例如 numpy 而不是 dict comprehension 是否有益?

我使用的是 python 2.7.

解决方案

试试这个就地修改:

d={'a':0.2, 'b':0.3}因子=1.0/sum(d.itervalues())对于 d 中的 k:d[k] = d[k]*因子

结果:

<预><代码>>>>d{'a':0.4,'b':0.6}

或者修改成新字典,使用字典理解:

d={'a':0.2, 'b':0.3}因子=1.0/sum(d.itervalues())normalised_d = {k: v*factor for k, v in d.iteritems() }

注意使用 d.iteritems() 比 d.items() 使用更少的内存,因此更适合大型字典.

编辑:由于其中有很多,并且正确处理似乎很重要,因此我已将对此答案的评论中的所有想法汇总到以下内容中(包括借用来自这篇文章):

导入数学进口经营者def real_safe_normalise_in_place(d):因子=1.0/math.fsum(d.itervalues())对于 d 中的 k:d[k] = d[k]*因子key_for_max = max(d.iteritems(), key=operator.itemgetter(1))[0]diff = 1.0 - math.fsum(d.itervalues())#print "discrepancy = " + str(diff)d[key_for_max] += 差异d={v: v+1.0/v for v in xrange(1, 1000001)}real_safe_normalise_in_place(d)打印 math.fsum(d.itervalues())

花了一些时间想出了字典,在规范化时实际上创建了一个非零错误,但希望这能说明这一点.

对于 Python 3.0.请参阅以下更改:Python 3.0 Wiki 内置更改

<块引用>

删除 dict.iteritems()dict.iterkeys()dict.itervalues().

改为:使用 dict.items()dict.keys()dict.values()分别.

I have a (quite large) dictionary that has numeric values, so for example in the form data = {'a': 0.2, 'b': 0.3, ...}. What is the best way to normalize these values (EDIT: make sure the values sum to 1)?

And what I'm especially interested in: Would it, for certain dataset size, be beneficial to use for example numpy instead of dict comprehension?

I'm using python 2.7.

解决方案

Try this to modify in place:

d={'a':0.2, 'b':0.3}
factor=1.0/sum(d.itervalues())
for k in d:
  d[k] = d[k]*factor

result:

>>> d
{'a': 0.4, 'b': 0.6}

Alternatively to modify into a new dictionary, use a dict comprehension:

d={'a':0.2, 'b':0.3}
factor=1.0/sum(d.itervalues())
normalised_d = {k: v*factor for k, v in d.iteritems() }

Note the use of d.iteritems() which uses less memory than d.items(), so is better for a large dictionary.

EDIT: Since there are quite a few of them, and getting this right seems to be important, I've summarised all the ideas in the comments to this answer together to the following (including borrowing something from this post):

import math
import operator

def really_safe_normalise_in_place(d):
    factor=1.0/math.fsum(d.itervalues())
    for k in d:
        d[k] = d[k]*factor
    key_for_max = max(d.iteritems(), key=operator.itemgetter(1))[0]
    diff = 1.0 - math.fsum(d.itervalues())
    #print "discrepancy = " + str(diff)
    d[key_for_max] += diff

d={v: v+1.0/v for v in xrange(1, 1000001)}
really_safe_normalise_in_place(d)
print math.fsum(d.itervalues())

Took a couple of goes to come up with dictionary that actually created a non-zero error when normalising but hope this illustrates the point.

EDIT: For Python 3.0. see the following change: Python 3.0 Wiki Built-in Changes

Remove dict.iteritems(), dict.iterkeys(), and dict.itervalues().

Instead: use dict.items(), dict.keys(), and dict.values() respectively.

这篇关于规范化字典值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆