numpy.shares_memory 和 numpy.may_share_memory 有什么区别? [英] What is the difference between numpy.shares_memory and numpy.may_share_memory?

查看:94
本文介绍了numpy.shares_memory 和 numpy.may_share_memory 有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么 numpy.may_share_memory 存在?
给出准确结果的挑战是什么?

numpy.may_share_memory 方法是否已弃用?
numpy.may_share_memory 可能会给出误报,但不会给出漏报.

numpy.shares_memory 是否没有一个误报,也没有一个漏报?

我使用 numpy 版本 1.11.2.

见:

  1. numpy.may_share_memory
  2. numpy.shares_memory
  3. github 上的 1.11.2 版源

解决方案

引用 1.11.0 发行说明:

<块引用>

添加了一个新函数np.shares_memory,可以准确地检查两个数组是否有内存重叠.np.may_share_memory 现在还可以选择花费更多精力来减少误报.

从语义上讲,这表明旧的 may_share_memory 测试旨在对数组之间是否共享内存进行松散的猜测.如果肯定不是,那么可以相应地进行.如果测试呈阳性(可能是假阳性),则必须小心.另一方面,新的 shares_memory 函数允许精确检查.这需要更多的计算时间,但从长远来看可能是有益的,因为没有误报,人们可以使用更多可能的优化.may_share_memory 的宽松检查可能只能保证不返回 false negatives.

根据may_share_memoryshares_memory,两者都有一个关键字参数,告诉 numpy 用户想要的检查有多严格.

may_share_memory:

max_work : int,可选努力解决重叠问题.有关详细信息,请参阅shares_memory.may_share_memory 的默认值是进行边界检查.

shares_memory:

max_work : int,可选努力解决重叠问题(要考虑的候选解决方案的最大数量).识别以下特殊值:max_work=MAY_SHARE_EXACT(默认)问题就解决了.在这种情况下,只有在数组之间共享一个元素时,该函数才返回 True.max_work=MAY_SHARE_BOUNDS只检查 a 和 b 的内存边界.

从文档来看,这表明这两个函数可能调用相同的底层机制,但 may_share_memory 使用了较不严格的默认设置进行检查.

让我们看一看 在实现中:

静态 PyObject *array_shares_memory(PyObject *NPY_UNUSED(ignored), PyObject *args, PyObject *kwds){返回 array_shares_memory_impl(args, kwds, NPY_MAY_SHARE_EXACT, 1);}静态 PyObject *array_may_share_memory(PyObject *NPY_UNUSED(ignored), PyObject *args, PyObject *kwds){返回 array_shares_memory_impl(args, kwds, NPY_MAY_SHARE_BOUNDS, 0);}

使用签名调用相同的底层函数

静态 PyObject *array_shares_memory_impl(PyObject *args, PyObject *kwds, Py_ssize_t default_max_work,int raise_exceptions){}

在不深入研究源代码的情况下,在我看来 shares_memory 是对 may_share_memory 的改进,后者可以使用适当的关键字进行与后者相同的松散检查论据.为了方便和向后兼容,可以使用旧函数.

免责声明:这是我第一次看到这部分源代码,我没有进一步研究array_shares_memory_impl,所以我的印象可能完全错误.

<小时>

至于两种方法之间差异的具体示例(使用默认参数调用):在上面的链接中解释了may_share_memory 仅检查数组绑定索引.如果它们对于两个数组不相交,那么它们没有机会可以共享内存.但是如果它们不相交,数组仍然可以是独立的!

简单示例:通过切片对连续内存块进行不相交分区:

<预><代码>>>>将 numpy 导入为 np>>>v = np.arange(6)>>>x = v[::2]>>>y = v[1::2]>>>np.may_share_memory(x,y)真的>>>np.shares_memory(x,y)错误的>>>np.may_share_memory(x,y,max_work=np.MAY_SHARE_EXACT)错误的

如你所见,xy 是同一个数组的两个不相交的切片.因此它们的数据范围在很大程度上重叠(它们几乎相同,在内存中保存一个整数).然而,它们的元素实际上并不相同:一个包含原始连续块的偶数元素,另一个包含奇数元素.所以may_share_memory 正确地断言数组可能共享内存,但经过更严格的检查,事实证明它们没有.

<小时>

至于精确计算重叠的额外困难,这项工作可以追溯到名为solve_may_share_memory,其中还包含许多关于正在发生的事情的有用评论.简而言之,就是

  1. 快速检查并返回如果边界重叠,否则
  2. a 返回 MEM_OVERLAP_TOO_HARD 如果我们要求松散检查(即 may_share_memory 使用默认参数),即 在调用方处理,因为我们不知道,所以返回 True"
  3. 否则我们实际上解决了问题映射到从这里开始

所以上面第 3 点的工作是需要由 shares_memory 额外完成的(或者一般来说,一个严格的检查案例).

Why does numpy.may_share_memory exist?
What is the challenge to give an exact result?

Is numpy.may_share_memory deprecated method?
numpy.may_share_memory may give false positives, but it does not give false negatives.

Does numpy.shares_memory give no one false positive and no one false negative?

I use numpy version 1.11.2.

See:

  1. numpy.may_share_memory
  2. numpy.shares_memory
  3. version 1.11.2 source on github

解决方案

Quoting the release notes for 1.11.0:

A new function np.shares_memory that can check exactly whether two arrays have memory overlap is added. np.may_share_memory also now has an option to spend more effort to reduce false positives.

Semantically, this suggests that the older may_share_memory test was designed to get a loose guess whether memory is shared between the arrays. If surely not, then one could proceed accordingly. If there was a positive test (possibly a false positive), care had to be taken. The new shares_memory function, on the other hand, allows exact checks. This takes more computational time, but can be beneficial in the long run, since free of false positives one can use more possible optimizations. The looser check of may_share_memory probably only guarantees to not return false negatives.

In terms of the documentation of may_share_memory and shares_memory, both have a keyword argument that tells numpy how strict a check the user wants.

may_share_memory:

max_work : int, optional

    Effort to spend on solving the overlap problem. See shares_memory for details. Default for may_share_memory is to do a bounds check.

shares_memory:

max_work : int, optional

    Effort to spend on solving the overlap problem (maximum number of candidate solutions to consider). The following special values are recognized:

    max_work=MAY_SHARE_EXACT (default)

        The problem is solved exactly. In this case, the function returns True only if there is an element shared between the arrays.
    max_work=MAY_SHARE_BOUNDS

        Only the memory bounds of a and b are checked.

Judging by the docs, this suggests that the two functions might call the same underlying machinery, but may_share_memory uses a less strict default setting for the check.

Let's take a peek at the implementation:

static PyObject *
array_shares_memory(PyObject *NPY_UNUSED(ignored), PyObject *args, PyObject *kwds)
{
    return array_shares_memory_impl(args, kwds, NPY_MAY_SHARE_EXACT, 1);
}


static PyObject *
array_may_share_memory(PyObject *NPY_UNUSED(ignored), PyObject *args, PyObject *kwds)
{
    return array_shares_memory_impl(args, kwds, NPY_MAY_SHARE_BOUNDS, 0);
}

calling the same underlying function with signature

static PyObject *
array_shares_memory_impl(PyObject *args, PyObject *kwds, Py_ssize_t default_max_work,
                         int raise_exceptions)
{}

Without delving deeper into the source, it seems to me that shares_memory is an improvement over may_share_memory, which can give the same loose check as the latter with the appropriate keyword arguments. The older function can be used for convenience and backward compatibility.

Disclaimer: this is the first time I looked at this part of the source, and I didn't investigate further into array_shares_memory_impl, so my impression can be simply wrong.


As for a specific example for the difference between the two methods (called with default arguments): it is explained at the above links that may_share_memory only checks array bound indices. If they are disjoint for the two arrays, then there's no chance that they can share memory. But if they are not disjoint, the arrays can still be independent!

Simple example: a disjoint partitioning of a contiguous block of memory via slicing:

>>> import numpy as np
>>> v = np.arange(6)
>>> x = v[::2]
>>> y = v[1::2]
>>> np.may_share_memory(x,y)
True
>>> np.shares_memory(x,y)
False
>>> np.may_share_memory(x,y,max_work=np.MAY_SHARE_EXACT)
False

As you can see, x and y are two disjoint slices of the same array. Thus their data ranges largely overlap (they are almost the same, save a single integer in memory). However, none of their elements are actually the same: one contains the even, the other the odd elements of the original contiguous block. So may_share_memory correctly asserts that the arrays may share memory, but on a stricter check it turns out that they don't.


As for the additional difficulty of computing the overlap exactly, the work can be traced down to the worker called solve_may_share_memory, which also contains a lot of helpful comments about what's going on. In a nutshell, there's

  1. a quick check and return if the bounds don't overlap, otherwise
  2. a return with MEM_OVERLAP_TOO_HARD if we asked for loose checking (i.e. may_share_memory with default args), which is handled on the calling side as "we don't know, so return True"
  3. otherwise we actually solve the Diophantine equations that the problem maps to starting here

So the work in point 3 above is what needs to additionally be done by shares_memory (or generally, a strict checking case).

这篇关于numpy.shares_memory 和 numpy.may_share_memory 有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆