如何确定容器是否无限递归并找到其最小的唯一容器? [英] How do I determine whether a container is infinitely recursive and find its smallest unique container?

查看:44
本文介绍了如何确定容器是否无限递归并找到其最小的唯一容器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读扁平化(不规则的)列表列表 并决定采用它作为 Python 练习——一个我偶尔会重写的小函数,不参考原文,仅供练习.我第一次尝试这个时,我遇到了如下情况:

def flat(iterable):尝试:迭代(可迭代)除了类型错误:产量可迭代别的:对于可迭代项目:展平(项目)的产量

这适用于包含数字的嵌套 list 等基本结构,但字符串会使它崩溃,因为字符串的第一个元素是单字符字符串,其中第一个元素是它本身,第一个元素又是它自己,依此类推.检查上面链接的问题,我意识到这解释了对字符串的检查.这给了我以下内容:

def flatter(iterable):尝试:迭代(可迭代)如果 isinstance(iterable, str):引发类型错误除了类型错误:产量可迭代别的:对于可迭代项目:展平(项目)的产量

现在它也适用于字符串.但是,我随后回忆起 list 可以包含对自身的引用.

<预><代码>>>>lst = []>>>lst.append(lst)>>>首先[[...]]>>>lst[0][0][0][0] 是 lst真的

因此,字符串并不是唯一可能导致此类问题的类型.在这一点上,我开始寻找一种无需显式类型检查即可防范此问题的方法.

随后出现了以下 flattener.py.flattish() 是一个只检查字符串的版本.flatten_notype() 检查对象的第一项的第一项是否等于自身以确定递归.flatten() 执行此操作,然后检查对象或其第一项的第一项是否是另一个类型的实例.Fake 类基本上只是为序列定义了一个包装器.测试每个函数的行上的注释描述了结果,形式 应该是 `desired_result` [>`undesired_actual_result`].正如你所看到的,在Fake 包裹着一个字符串、Fake 包裹着一个list 整数、单字符字符串时,每种方法都以各种方式失败, 和多字符串.

def flattish(*i):对于 i 中的项目:尝试:迭代(项目)除了:产量项目别的:if isinstance(item, str): yield itemelse: 从 flattish(*item) 产生类假:def __init__(self, l):self.l = lself.index = 0def __iter__(self):回归自我def __next__(self):如果 self.index >= len(self.l):引发停止迭代别的:self.index +=1返回 self.l[self.index-1]def __str__(self):返回 str(self.l)def flatten_notype(*i):对于 i 中的项目:尝试:n = 下一个(迭代器(项目))尝试:n2 = 下一个(迭代器(n))递归 = n == n2除了类型错误:从展平(*项目)的产量别的:如果再次发生:产量项目别的:从展平(*项目)的产量除了类型错误:产量项目def展平(*i):对于 i 中的项目:尝试:n = 下一个(迭代器(项目))尝试:n2 = 下一个(迭代器(n))递归 = n == n2除了类型错误:从展平(*项目)的产量别的:如果再次发生:如果 isinstance(n2, type(item)) 或 isinstance(item, type(n2)) else n2,则产生项目别的:从展平(*项目)的产量除了类型错误:产量项目f = 假('abc')print(*flattish(f)) # 应该是 `abc`print(*flattish((f,))) # 应该是 `abc` >``print(*flattish(1, ('a',), ('bc',))) # 应该是`1 a bc`f = 假([1, 2, 3])print(*flattish(f)) # 应该是`1 2 3`print(*flattish((f,))) # 应该是 `1 2 3` >``print(*flattish(1, ('a',), ('bc',))) # 应该是`1 a bc`f = 假('abc')print(*flatten_notype(f)) # 应该是 `abc`print(*flatten_notype((f,))) # 应该是 `abc` >`c`print(*flatten_notype(1, ('a',), ('bc',))) # 应该是`1 a bc` >`1 ('a',) bc`f = 假([1, 2, 3])print(*flatten_notype(f)) # 应该是 `1 2 3` >`2 3`print(*flatten_notype((f,))) # 应该是`1 2 3` >``print(*flatten_notype(1, ('a',), ('bc',))) # 应该是`1 a bc` >`1 ('a',) bc`f = 假('abc')print(*flatten(f)) # 应该是 `abc` >`a`print(*flatten((f,))) # 应该是 `abc` >`c`print(*flatten(1, ('a',), ('bc',))) # 应该是`1 a bc`f = 假([1, 2, 3])print(*flatten(f)) # 应该是 `1 2 3` >`2 3`print(*flatten((f,))) # 应该是 `1 2 3` >``print(*flatten(1, ('a',), ('bc',))) # 应该是`1 a bc`

我还使用上面定义的递归 lstflatten() 尝试了以下操作:

<预><代码>>>>打印(*展平(lst))[[...]]>>>lst.append(0)>>>打印(*展平(lst))[[...], 0]>>>打印(*列表(展平(lst))[0])[[...], 0] 0

如您所见,它与 1 ('a',) bc 类似,并且以自己的特殊方式失败.

我阅读了 python 函数如何访问自己的属性? 认为也许函数可以跟踪它看到的每个对象,但这也行不通,因为我们的 lst 包含一个具有匹配身份和相等性的对象,字符串包含的对象可能只有匹配相等性,并且相等性是不够的,因为可能出现类似 flatten([1, 2], [1, 2]).

是否有任何可靠的方法(即不简单地检查已知类型,不要求递归容器及其容器都具有相同类型等)来检查容器是否包含潜在无限的可迭代对象递归,并可靠地确定最小的唯一容器?如果有,请解释它是如何做到的,为什么它是可靠的,以及它如何处理各种递归情况.如果不是,请解释为什么这在逻辑上是不可能的.

解决方案

您询问的场景定义非常松散.正如您的问题所定义的那样,检查容器是否包含具有潜在无限递归的可迭代对象[.]"在逻辑上是不可能的.您的问题范围的唯一限制是可迭代"对象.Python 官方文档对iterable"的定义如下:

<块引用>

一个能够一次返回一个成员的对象.可迭代对象的示例包括所有序列类型(例如列表、字符串和元组)和一些非序列类型,例如 dict、文件对象以及您使用 __iter__() 定义的任何类的对象代码>__getitem__() 方法.[...]

这里的关键短语是任何具有__iter__()__getitem__() 方法的[定义] 类".这允许具有按需生成的成员的可迭代"对象.例如,假设有人试图使用一堆字符串对象,这些对象根据创建特定字符串的时间按时间顺序自动排序和比较.他们要么继承 str 要么重新实现其功能,添加与每个指向 timestampedString( ) 对象的指针相关联的时间戳,并相应地调整比较方法.

通过索引位置访问子字符串是创建新字符串的一种方式,因此 len( ) == 1timestampedString( ) 可以合法地返回一个 timestampedString( )len( ) == 1 具有相同的字符,但在您访问 timestampedString( )[0:1] 时具有新的时间戳.因为时间戳是特定对象实例的一部分,所以没有一种身份测试可以说这两个对象是相同的,除非由相同字符组成的任何两个字符串被认为是相同的.您在问题中声明不应该是这种情况.

要检测无限递归,您首先需要向问题范围添加一个约束,即容器仅包含静态对象,即预先生成的对象.有了这个约束,容器中的任何合法对象都可以转换为对象的某种字节字符串表示.一种简单的方法是在到达容器时对容器中的每个对象进行腌制,并维护由腌制产生的字节字符串表示的堆栈.如果您允许任何任意静态对象​​,那么对对象的原始字节解释就可以正常工作.

然而,在算法上强制执行容器仅包含静态对象的约束会带来另一个问题:它需要针对一些预先批准的类型列表(例如一些原语概念)进行类型检查.然后可以容纳两类对象:已知静态类型的单个对象(例如基元)和可以预先确定包含项目数量的容器.当许多包含的对象被迭代并且所有对象都被证明是有限的时,后一类可以被证明是有限的.容器内的容器可以递归处理.已知静态类型的单个对象是递归基本情况.

如果容器产生更多的对象,那么就违反了该类对象的定义.在 Python 中允许任意对象的问题在于,这些对象可以在 Python 代码中定义,这些代码可以使用用 C 代码编写的组件以及 C 可以链接到的任何其他语言.无法评估此代码以确定它是否真正符合静态要求.

I was reading Flatten (an irregular) list of lists and decided to adopt it as a Python exercise - a small function I'll occasionally rewrite without referring to the original, just for practice. The first time I tried this, I had something like the following:

def flat(iterable):
    try:
        iter(iterable)
    except TypeError:
        yield iterable
    else:
        for item in iterable:
            yield from flatten(item)

This works fine for basic structures like nested lists containing numbers, but strings crash it because the first element of a string is a single-character string, the first element of which is itself, the first element of which is itself again, and so on. Checking the question linked above, I realized that that explains the check for strings. That gave me the following:

def flatter(iterable):
    try:
        iter(iterable)
        if isinstance(iterable, str):
            raise TypeError
    except TypeError:
        yield iterable
    else:
        for item in iterable:
            yield from flatten(item)

Now it works for strings as well. However, I then recalled that a list can contain references to itself.

>>> lst = []
>>> lst.append(lst)
>>> lst
[[...]]
>>> lst[0][0][0][0] is lst
True

So, a string isn't the only type that could cause this sort of problem. At this point, I started looking for a way to guard against this issue without explicit type-checking.

The following flattener.py ensued. flattish() is a version that just checks for strings. flatten_notype() checks whether an object's first item's first item is equal to itself to determine recursion. flatten() does this and then checks whether either the object or its first item's first item is an instance of the other's type. The Fake class basically just defines a wrapper for sequences. The comments on the lines that test each function describe the results, in the form should be `desired_result` [> `undesired_actual_result`]. As you can see, each fails in various ways on Fake wrapped around a string, Fake wrapped around a list of integers, single-character strings, and multiple-character strings.

def flattish(*i):
    for item in i:
        try: iter(item)
        except: yield item
        else:
            if isinstance(item, str): yield item
            else: yield from flattish(*item)

class Fake:
    def __init__(self, l):
        self.l = l
        self.index = 0
    def __iter__(self):
        return self
    def __next__(self):
        if self.index >= len(self.l):
            raise StopIteration
        else:
            self.index +=1
            return self.l[self.index-1]
    def __str__(self):
        return str(self.l)

def flatten_notype(*i):
    for item in i:
        try:
            n = next(iter(item))
            try:
                n2 = next(iter(n))
                recur = n == n2
            except TypeError:
                yield from flatten(*item)
            else:
                if recur:
                    yield item
                else:
                    yield from flatten(*item)
        except TypeError:
            yield item

def flatten(*i):
    for item in i:
        try:
            n = next(iter(item))
            try:
                n2 = next(iter(n))
                recur = n == n2
            except TypeError:
                yield from flatten(*item)
            else:
                if recur:
                    yield item if isinstance(n2, type(item)) or isinstance(item, type(n2)) else n2
                else:
                    yield from flatten(*item)
        except TypeError:
            yield item


f = Fake('abc')

print(*flattish(f)) # should be `abc`
print(*flattish((f,))) # should be `abc` > ``
print(*flattish(1, ('a',), ('bc',))) # should be `1 a bc`

f = Fake([1, 2, 3])

print(*flattish(f)) # should be `1 2 3`
print(*flattish((f,))) # should be `1 2 3` > ``
print(*flattish(1, ('a',), ('bc',))) # should be `1 a bc`

f = Fake('abc')
print(*flatten_notype(f)) # should be `abc`
print(*flatten_notype((f,))) # should be `abc` > `c`
print(*flatten_notype(1, ('a',), ('bc',))) # should be `1 a bc` > `1 ('a',) bc`

f = Fake([1, 2, 3])     

print(*flatten_notype(f)) # should be `1 2 3` > `2 3`
print(*flatten_notype((f,))) # should be `1 2 3` > ``
print(*flatten_notype(1, ('a',), ('bc',))) # should be `1 a bc` > `1 ('a',) bc`

f = Fake('abc')
print(*flatten(f)) # should be `abc` > `a`
print(*flatten((f,))) # should be `abc` > `c`
print(*flatten(1, ('a',), ('bc',))) # should be `1 a bc`

f = Fake([1, 2, 3])     

print(*flatten(f)) # should be `1 2 3` > `2 3`
print(*flatten((f,))) # should be `1 2 3` > ``
print(*flatten(1, ('a',), ('bc',))) # should be `1 a bc`

I've also tried the following with the recursive lst defined above and flatten():

>>> print(*flatten(lst))
[[...]]
>>> lst.append(0)
>>> print(*flatten(lst))
[[...], 0]
>>> print(*list(flatten(lst))[0])
[[...], 0] 0

As you can see, it fails similarly to 1 ('a',) bc as well as in its own special way.

I read how can python function access its own attributes? thinking that maybe the function could keep track of every object it had seen, but that wouldn't work either because our lst contains an object with matching identity and equality, strings contain objects that may only have matching equality, and equality isn't enough due to the possibility of something like flatten([1, 2], [1, 2]).

Is there any reliable way (i.e. doesn't simply check known types, doesn't require that a recursive container and its containers all be of the same type, etc.) to check whether a container holds iterable objects with potential infinite recursion, and reliably determine the smallest unique container? If there is, please explain how it can be done, why it is reliable, and how it handles various recursive circumstances. If not, please explain why this is logically impossible.

解决方案

The scenario you ask about is very loosely defined. As defined in your question, it is logically impossible "to check whether a container holds iterable objects with potential infinite recursion[.]" The only limit on the scope of your question is "iterable" object. The official Python documentation defines "iterable" as follows:

An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an __iter__() or __getitem__() method. [...]

The key phrase here is "any classes [defined] with an __iter__() or __getitem__() method." This allows for "iterable" objects with members that are generated on demand. For example, suppose that someone seeks to use a bunch of string objects that automatically sort and compare in chronological order based on the time at which the particular string was created. They either subclass str or reimplement its functionality, adding a timestamp associated with each pointer to a timestampedString( ) object, and adjust the comparison methods accordingly.

Accessing a substring by index location is a way of creating a new string, so a timestampedString( ) of len( ) == 1 could legitimately return a timestampedString( ) of len( ) == 1 with the same character but a new timestamp when you access timestampedString( )[0:1]. Because the timestamp is part of the specific object instance, there is no kind of identity test that would say that the two objects are the same unless any two strings consisting of the same character are considered to be the same. You state in your question that this should not be the case.

To detect infinite recursion, you first need to add a constraint to the scope of your question that the container only contain static, i.e. pre-generated, objects. With this constraint, any legal object in the container can be converted to some byte-string representation of the object. A simple way to do this would be to pickle each object in the container as you reach it, and maintain a stack of the byte-string representations that result from pickling. If you allow any arbitrary static object, nothing less than a raw-byte interpretation of the objects is going to work.

However, algorithmically enforcing the constraint that the container only contain static objects presents another problem: it requires type-checking against some pre-approved list of types such as some notion of primitives. Two categories of objects can then be accommodated: single objects of a known-static type (e.g. primitives) and containers for which the number of contained items can be determined in advance. The latter category can then be shown to be finite when that many contained objects have been iterated through and all have been shown to be finite. Containers within the container can be handled recursively. The known-static type single objects are the recursive base-case.

If the container produces more objects, then it violates the definition of this category of object. The problem with allowing arbitrary objects in Python is that these objects can be defined in Python code that can use components written in C code and any other language that C can be linked to. There is no way to evaluate this code to determine if it actually complies with the static requirement.

这篇关于如何确定容器是否无限递归并找到其最小的唯一容器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆