我可以在numpy.einsum中使用超过26个字母吗? [英] Can I use more than 26 letters in `numpy.einsum`?

查看:103
本文介绍了我可以在numpy.einsum中使用超过26个字母吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用np.einsum来乘法概率表,例如:

I am using np.einsum to multiply probability tables like:

np.einsum('ijk,jklm->ijklm', A, B)

问题在于,我总共要处理26个以上的随机变量(轴),因此,如果给每个随机变量分配一个字母,我将用完所有字母.我是否可以通过另一种方式指定上述操作来避免此问题,而不必弄乱np.sumnp.dot操作?

The issue is that I am dealing with more than 26 random variables (axes) overall, so if I assign each random variable a letter I run out of letters. Is there another way I can specify the above operation to avoid this issue, without resorting to a mess of np.sum and np.dot operations?

推荐答案

最简单的答案是,您可以使用52个字母中的任何一个(大写和小写).这就是所有英文字母.任何更高级的轴名称都必须映射在这52个轴上,或一组等效的数字上.实际上,您将希望在任何一个einsum调用中使用这52个中的很小一部分.

The short answer is, you can use any of the 52 letters (upper and lower). That's all the letters in the English language. Any fancier axes names will have to be mapped on those 52, or an equivalent set of numbers. Practically speaking you will want to use a fraction of those 52 in any one einsum call.

@kennytm建议使用替代输入语法.一些示例运行表明这不是解决方案.尽管有可疑的错误消息,但26仍然是实际限制.

@kennytm suggests using the alternative input syntax. A few sample runs suggests that this is not a solution. 26 is still the practical limit (despite the suspicious error messages).

In [258]: np.einsum(np.ones((2,3)),[0,20],np.ones((3,4)),[20,2],[0,2])
Out[258]: 
array([[ 3.,  3.,  3.,  3.],
       [ 3.,  3.,  3.,  3.]])

In [259]: np.einsum(np.ones((2,3)),[0,27],np.ones((3,4)),[27,2],[0,2])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-259-ea61c9e50d6a> in <module>()
----> 1 np.einsum(np.ones((2,3)),[0,27],np.ones((3,4)),[27,2],[0,2])

ValueError: invalid subscript '|' in einstein sum subscripts string, subscripts must be letters

In [260]: np.einsum(np.ones((2,3)),[0,100],np.ones((3,4)),[100,2],[0,2])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-260-ebd9b4889388> in <module>()
----> 1 np.einsum(np.ones((2,3)),[0,100],np.ones((3,4)),[100,2],[0,2])

ValueError: subscript is not within the valid range [0, 52]

我不确定您为什么需要超过52个字母(大写和小写),但是我确定您需要进行某种映射.您不想一次使用超过52个轴编写einsum字符串.生成的迭代器太大(对于内存或时间而言).

I'm not entirely sure why you need more than 52 letters (upper and lower case), but I'm sure you need to do some sort of mapping. You don't want to write an einsum string using more than 52 axes all at once. The resulting iterator would be too large (for memory or time).

我正在描绘某种可以用作的映射功能:

I'm picturing some sort of mapping function that can be used as:

 astr = foo(A.names, B.names)
 # foo(['i','j','k'],['j','k','l','m'])
 # foo(['a1','a2','a3'],['a2','a3','b4','b5'])
 np.einsum(astr, A, B)


https://github.com/hpaulj/numpy-einsum/blob/master/einsum_py.py

einsum的Python版本.粗略地讲,einsum解析下标字符串,创建一个op_axes列表,该列表可在np.nditer中用于设置所需的乘积和.通过这段代码,我可以看看翻译是如何完成的:

is a Python version of einsum. Crudely speaking einsum parses the subscripts string, creating an op_axes list that can be used in np.nditer to set up the required sum-of-products calculation. With this code I can look at how the translation is done:

__name__块中的示例为例:

    label_str, op_axes = parse_subscripts('ik,kj->ij',    Labels([A.ndim,B.ndim]))
    print op_axes
    # [[0, -1, 1], [-1, 1, 0], [0, 1, -1]]  fine
    # map (4,newaxis,3)(newaxis,3,2)->(4,2,newaxis)
    print sum_of_prod([A,B],op_axes)

您的示例具有完整的诊断输出是

Your example, with full diagnostic output is

In [275]:  einsum_py.parse_subscripts('ijk,jklm->ijklm',einsum_py.Labels([3,4])) 
jklm
{'counts': {105: 1, 106: 2, 107: 2, 108: 1, 109: 1}, 
 'strides': [], 
 'num_labels': 5, 
 'min_label': 105, 
 'nop': 2, 
 'ndims': [3, 4], 
 'ndim_broadcast': 0, 
 'shapes': [], 
 'max_label': 109}
[('ijk', [105, 106, 107], 'NONE'), 
 ('jklm', [106, 107, 108, 109], 'NONE')]
 ('ijklm', [105, 106, 107, 108, 109], 'NONE')
iter labels: [105, 106, 107, 108, 109],'ijklm'
op_axes [[0, 1, 2, -1, -1], [-1, 0, 1, 2, 3], [0, 1, 2, 3, 4]]

Out[275]: 
(<einsum_py.Labels at 0xb4f80cac>,
 [[0, 1, 2, -1, -1], [-1, 0, 1, 2, 3], [0, 1, 2, 3, 4]])

使用'ajk,jkzZ->ajkzZ'会更改标签,但会导致相同的op_axes.

Using 'ajk,jkzZ->ajkzZ' changes labels, but results in the same op_axes.

这是翻译功能的初稿.它应该适用于(可哈希项的)列表的任何列表:

Here is a first draft of a translation function. It should work for any list of lists (of hashable items):

def translate(ll):
    mset=set()
    for i in ll: 
        mset.update(i)
    dd={k:v for v,k in enumerate(mset)}
    x=[''.join([chr(dd[i]+97) for i in l]) for l in ll]
    #  ['cdb', 'dbea', 'cdbea']
    y=','.join(x[:-1])+'->'+x[-1]
    # 'cdb,dbea->cdbea'

In [377]: A=np.ones((3,1,2),int)
In [378]: B=np.ones((1,2,4,3),int)
In [380]: ll=[list(i) for i in ['ijk','jklm','ijklm']]
In [381]: y=translate(ll)
In [382]: y
Out[382]: 'cdb,dbea->cdbea'

In [383]: np.einsum(y,A,B).shape
Out[383]: (3, 1, 2, 4, 3)

使用set映射索引对象意味着最后的索引字符是无序的.只要您指定不应该成为问题的RHS.我也忽略了ellipsis.

The use of set to map index objects means that the final indexing characters are unordered. As long as you specify the RHS that shouldn't be an issue. Also I ignored ellipsis.

=================

=================

einsum输入的列表版本将转换为einsum_list_to_subscripts()(在numpy/core/src/multiarray/multiarraymodule.c中)的下标字符串版本.它将ELLIPSIS替换为'...'.如果( s < 0 || s > 2*26)其中s是这些子列表之一中的数字,则会引发[0,52]错误消息.并使用s转换为字符串

The list version of einsum input is converted to the subscript string version in einsum_list_to_subscripts() (in numpy/core/src/multiarray/multiarraymodule.c). It replace ELLIPSIS with '...'. It raised the [0,52] error message if ( s < 0 || s > 2*26) where s is a number in one of those sublists. And converts s to string with

        if (s < 26) {
            subscripts[subindex++] = 'A' + s;
        }
        else {
            subscripts[subindex++] = 'a' + s;

但是第二种情况似乎不起作用;我收到类似26的错误消息.

But it looks like the 2nd case is not working; I get errors like for 26.

ValueError: invalid subscript '{' in einstein sum subscripts string, subscripts must be letters

如果s>26,则'a'+s是错误的:

In [424]: ''.join([chr(ord('A')+i) for i in range(0,26)])
Out[424]: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

In [425]: ''.join([chr(ord('a')+i) for i in range(0,26)])
Out[425]: 'abcdefghijklmnopqrstuvwxyz'

In [435]: ''.join([chr(ord('a')+i) for i in range(26,52)])
Out[435]: '{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94'

'a'+s是错误的;应该是:

In [436]: ''.join([chr(ord('a')+i-26) for i in range(26,52)])
Out[436]: 'abcdefghijklmnopqrstuvwxyz'

我提交了 https://github.com/numpy/numpy/issues/7741

一直存在此错误表明子列表格式并不常见,并且在该列表中使用大数字的频率更低.

The existence of this bug after all this time indicates that the sublist format is not common, and that using large numbers in that list is even less frequent.

这篇关于我可以在numpy.einsum中使用超过26个字母吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆