如何确定Python中嵌套数据结构的类型? [英] How to determine type of nested data structures in Python?

查看:23
本文介绍了如何确定Python中嵌套数据结构的类型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在将一些 Python 翻译成 F#,特别是

使用 Visual Studio 社区适用于 Visual Studio 的 Python 工具

(array([[ 0., 0., 0., ..., 0., 0., 0.],[ 0., 0., 0., ..., 0., 0., 0.],[ 0., 0., 0., ..., 0., 0., 0.],...,[ 0., 0., 0., ..., 0., 0., 0.],[ 0., 0., 0., ..., 0., 0., 0.],[ 0., 0., 0., ..., 0., 0., 0.]], dtype=float32),数组([5, 0, 4, ..., 8, 4, 8], dtype=int64))

由于这个问题已经被人盯上了,显然有人在寻找更多细节,这是我修改后的版本,它也可以处理 numpy ndarray.感谢 Vlad 提供初始版本.

此外,由于使用了运行长度编码的变体,因此没有多用?用于异构类型.

# 注意:输入可迭代类型的元素,例如 Set、List 或 Dict# 使用运行长度编码的变体.def type_spec_iterable(iterable, name):def iterable_info(iterable):# 使用可比较的可迭代对象# 标识必须包含名称和长度# 以及元素的类型、顺序和计数.长度 = 0类型列表 = []pervious_identity_type = 无pervious_identity_type_count = 0first_item_done = 假对于可迭代的 e:item_type = type_spec(e)如果(item_type != pervious_identity_type):如果不是 first_item_done:first_item_done = 真别的:types_list.append((pervious_identity_type,pervious_identity_type_count))pervious_identity_type = item_typepervious_identity_type_count = 1别的:pervious_identity_type_count += 1长度 += 1types_list.append((pervious_identity_type,pervious_identity_type_count))返回(长度,types_list)(length, identity_list) = iterable_info(iterable)element_types = "";对于身份列表中的 (identity_item_type, identity_item_count):如果 element_types == ":经过别的:element_types += ",";element_types += identity_item_type如果 (identity_item_count != length) 和 (identity_item_count != 1):element_types += "[";+ `identity_item_count` + "]";结果 = 名称 + [";+ `长度` + "]<";+ element_types +>"返回结果def type_spec_dict(dict, name):def dict_info(dict):# 用一个 dict 使其具有可比性# 标识必须包含名称和长度# 键和值组合的类型、顺序和计数.长度 = 0类型列表 = []pervious_identity_type = 无pervious_identity_type_count = 0first_item_done = 假对于 dict.iteritems() 中的 (k, v):key_type = type_spec(k)value_type = type_spec(v)item_type = (key_type, value_type)如果(item_type != pervious_identity_type):如果不是 first_item_done:first_item_done = 真别的:types_list.append((pervious_identity_type,pervious_identity_type_count))pervious_identity_type = item_typepervious_identity_type_count = 1别的:pervious_identity_type_count += 1长度 += 1types_list.append((pervious_identity_type,pervious_identity_type_count))返回(长度,types_list)(长度,identity_list)=dict_info(dict)element_types = "";对于身份列表中的 ((identity_key_type,identity_value_type), identity_item_count):如果 element_types == ":经过别的:element_types += ",";identity_item_type = "(" + identity_key_type + "," + identity_value_type + ")";element_types += identity_item_type如果 (identity_item_count != length) 和 (identity_item_count != 1):element_types += "[";+ `identity_item_count` + "]";结果 = 名称 + [";+ `长度` + "]<";+ element_types +>"返回结果def type_spec_tuple(元组,名称):返回名称 + "<";+ ", ".join(type_spec(e) for e in tuple) + ">";def type_spec(obj):object_type = 类型(对象)名称 = object_type.__name__if (object_type is int) or (object_type is long) or (object_type is str) or (object_type is bool) or (object_type is float):结果 = 名称elif object_type 是类型(无):结果=(无)";elif(object_type 是列表)或(object_type 被设置):结果 = type_spec_iterable(obj, name)elif(object_type 是字典):结果 = type_spec_dict(obj, name)elif(object_type 是元组):结果 = type_spec_tuple(对象,名称)别的:如果名称 == 'ndarray':ndarray = objndarray_shape =[";+ `ndarray.shape`.replace("L","").replace("","").replace("(","").replace(";)","") + "]"ndarray_data_type = `ndarray.dtype`.split("'")[1]结果 = 名称 + ndarray_shape + "<";+ ndarray_data_type +>"别的:结果 = "未知类型:", 姓名返回结果

我不认为它已经完成,但到目前为止它已经满足了我需要的一切.

解决方案

手动完成的一种方法是:

def type_spec_iterable(obj, name):tps = set(type_spec(e) for e in obj)如果 len(tps) == 1:返回名称 + "<"+ next(iter(tps)) + ">"别的:返回名称 + ""def type_spec_dict(obj):tps = set((type_spec(k), type_spec(v)) for (k,v) in obj.iteritems())keytypes = set(k for (k, v) in tps)valtypes = set(v for (k, v) in tps)kt = next(iter(keytypes)) if len(keytypes) == 1 else "?"vt = next(iter(valtypes)) if len(valtypes) == 1 else "?"返回 "dict<%s, %s>"% (kt, vt)def type_spec_tuple(obj):返回元组<"+ ", ".join(type_spec(e) for e in obj) + ">"def type_spec(obj):t = 类型(对象)资源 = {int: "int",str: "str",布尔: "布尔",浮动:浮动",类型(无):(无)",列表:lambda o:type_spec_iterable(o, 'list'),设置:lambda o:type_spec_iterable(o, 'set'),字典:type_spec_dict,元组:type_spec_tuple,}.get(t, lambda o: type(o).__name__)如果 type(res) 是 str 则返回 res 否则 res(obj)如果 __name__ == "__main__":类 Foo(对象):经过对于 [ 中的 obj1、2.3、没有任何,错误的,你好",[1, 2, 3],["a", "b"],[1, "h"],(错误,1,2"),设置([1.2, 2.3, 3.4]),[[1,2,3],[4,5,6],[7,8,9]],[(1,'a'), (2,'b')],{1:'b', 2:'c'},[Foo()], # todo - 继承?]:打印 repr(obj), ":", type_spec(obj)

打印:

1 : int2.3 : 浮动无:(无)错误:布尔你好":str[1, 2, 3] : list['a', 'b'] : list[1, 'h'] :列表(False, 1, '2') : tupleset([2.3, 1.2, 3.4]) : set[[1, 2, 3], [4, 5, 6], [7, 8, 9]] : list>[(1, 'a'), (2, 'b')] : list>{1: 'b', 2: 'c'} : dict[<__main__.Foo object at 0x101de6c50>] : list<Foo>

存在一个问题,即您想走多远,检查多深,需要在速度和准确性之间进行权衡.例如,您想浏览一个大列表中的所有项目吗?您想处理自定义类型(并追踪这些类型的共同祖先)吗?

值得一读,虽然我不确定它是否适用,但是 type 上的这个 PEP提示.

I am currently translating some Python to F#, specifically neural-networks-and-deep-learning .

To make sure the data structures are correctly translated the details of the nested types from Python are needed. The type() function is working for simple types but not for nested types.

For example in Python:

> data = ([[1,2,3],[4,5,6],[7,8,9]],["a","b","c"])
> type(data)
<type 'tuple'>

only gives the type of the first level. Nothing is known about the arrays in the tuple.

I was hoping for something like what F# does

> let data = ([|[|1;2;3|];[|4;5;6|];[|7;8;9|]|],[|"a";"b";"c"|]);;

val data : int [] [] * string [] =
  ([|[|1; 2; 3|]; [|4; 5; 6|]; [|7; 8; 9|]|], [|"a"; "b"; "c"|])

returning the signature independent of the value

int [] [] * string []

*         is a tuple item separator  
int [] [] is a two dimensional jagged array of int  
string [] is a one dimensional array of string

Can or how is this done in Python?

TLDR;

Currently I am using PyCharm with the debugger and in the variables window clicking the view option for an individual variable to see the details. The problem is that the output contains the values along with the types intermixed and I only need the type signature. When the variables are like (float[50000][784], int[50000]) the values get in the way. Yes I am resizing the variables for now, but that is a workaround and not a solution.

e.g.

Using PyCharm Community

(array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        ...,     
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.]], dtype=float32),
  array([7, 2, 1, ..., 4, 5, 6]))

Using Spyder

Using Visual Studio Community with Python Tools for Visual Studio

(array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],    
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],  
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],  
        ...,   
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],  
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],  
        [ 0.,  0.,  0., ...,  0.,  0.,  0.]], dtype=float32),  
  array([5, 0, 4, ..., 8, 4, 8], dtype=int64)) 

EDIT:

Since this question has been stared someone is apparently looking for more details, here is my modified version which can also handle numpy ndarray. Thanks to Vlad for the initial version.

Also because of the use of a variation of Run Length Encoding there is no more use of ? for heterogeneous types.

# Note: Typing for elements of iterable types such as Set, List, or Dict 
# use a variation of Run Length Encoding.

def type_spec_iterable(iterable, name):
    def iterable_info(iterable):
        # With an iterable for it to be comparable 
        # the identity must contain the name and length 
        # and for the elements the type, order and count.
        length = 0
        types_list = []
        pervious_identity_type = None
        pervious_identity_type_count = 0
        first_item_done = False
        for e in iterable:
            item_type = type_spec(e)
            if (item_type != pervious_identity_type):
                if not first_item_done:
                    first_item_done = True
                else:
                    types_list.append((pervious_identity_type, pervious_identity_type_count))
                pervious_identity_type = item_type
                pervious_identity_type_count = 1
            else:
                pervious_identity_type_count += 1
            length += 1
        types_list.append((pervious_identity_type, pervious_identity_type_count))
        return (length, types_list)
    (length, identity_list) = iterable_info(iterable)
    element_types = ""
    for (identity_item_type, identity_item_count) in identity_list:
        if element_types == "":
            pass
        else:
            element_types += ","
        element_types += identity_item_type
        if (identity_item_count != length) and (identity_item_count != 1):
            element_types += "[" + `identity_item_count` + "]"
    result = name + "[" + `length` + "]<" + element_types + ">"
    return result

def type_spec_dict(dict, name):
    def dict_info(dict):
        # With a dict for it to be comparable 
        # the identity must contain the name and length 
        # and for the key and value combinations the type, order and count.
        length = 0
        types_list = []
        pervious_identity_type = None
        pervious_identity_type_count = 0
        first_item_done = False
        for (k, v) in dict.iteritems():
            key_type = type_spec(k)
            value_type = type_spec(v)
            item_type = (key_type, value_type)
            if (item_type != pervious_identity_type):
                if not first_item_done:
                    first_item_done = True
                else:
                    types_list.append((pervious_identity_type, pervious_identity_type_count))
                pervious_identity_type = item_type
                pervious_identity_type_count = 1
            else:
                pervious_identity_type_count += 1
            length += 1
        types_list.append((pervious_identity_type, pervious_identity_type_count))
        return (length, types_list)
    (length, identity_list) = dict_info(dict)
    element_types = ""
    for ((identity_key_type,identity_value_type), identity_item_count) in identity_list:
        if element_types == "":
            pass
        else:
            element_types += ","
        identity_item_type = "(" + identity_key_type + "," + identity_value_type + ")"
        element_types += identity_item_type
        if (identity_item_count != length) and (identity_item_count != 1):
            element_types += "[" + `identity_item_count` + "]"
    result = name + "[" + `length` + "]<" + element_types + ">"
    return result

def type_spec_tuple(tuple, name):
    return name + "<" + ", ".join(type_spec(e) for e in tuple) + ">"

def type_spec(obj):
    object_type = type(obj)
    name = object_type.__name__
    if (object_type is int) or (object_type is long) or (object_type is str) or (object_type is bool) or (object_type is float):            
        result = name
    elif object_type is type(None):
        result = "(none)"
    elif (object_type is list) or (object_type is set):
        result = type_spec_iterable(obj, name)
    elif (object_type is dict):
        result = type_spec_dict(obj, name)
    elif (object_type is tuple):
        result = type_spec_tuple(obj, name)
    else:
        if name == 'ndarray':
            ndarray = obj
            ndarray_shape = "[" + `ndarray.shape`.replace("L","").replace(" ","").replace("(","").replace(")","") + "]"
            ndarray_data_type = `ndarray.dtype`.split("'")[1]
            result = name + ndarray_shape + "<" + ndarray_data_type + ">"
        else:
            result = "Unknown type: " , name
    return result

I would not consider it done, but it has worked on everything I needed thus far.

解决方案

One way to do it by hand would be:

def type_spec_iterable(obj, name):
    tps = set(type_spec(e) for e in obj)
    if len(tps) == 1:
        return name + "<" + next(iter(tps)) + ">"
    else:
        return name + "<?>"


def type_spec_dict(obj):
    tps = set((type_spec(k), type_spec(v)) for (k,v) in obj.iteritems())
    keytypes = set(k for (k, v) in tps)
    valtypes =  set(v for (k, v) in tps)
    kt = next(iter(keytypes)) if len(keytypes) == 1 else "?"
    vt = next(iter(valtypes)) if len(valtypes) == 1 else "?"
    return "dict<%s, %s>" % (kt, vt)


def type_spec_tuple(obj):
    return "tuple<" + ", ".join(type_spec(e) for e in obj) + ">"


def type_spec(obj):
    t = type(obj)
    res = {
        int: "int",
        str: "str",
        bool: "bool",
        float: "float",
        type(None): "(none)",
        list: lambda o: type_spec_iterable(o, 'list'),
        set: lambda o: type_spec_iterable(o, 'set'),
        dict: type_spec_dict,
        tuple: type_spec_tuple,
    }.get(t, lambda o: type(o).__name__)
    return res if type(res) is str else res(obj)


if __name__ == "__main__":
    class Foo(object):
        pass
    for obj in [
        1,
        2.3,
        None,
        False,
        "hello",
        [1, 2, 3],
        ["a", "b"],
        [1, "h"],
        (False, 1, "2"),
        set([1.2, 2.3, 3.4]),
        [[1,2,3],[4,5,6],[7,8,9]],
        [(1,'a'), (2, 'b')],
        {1:'b', 2:'c'},
        [Foo()], # todo - inheritance?
    ]:
        print repr(obj), ":", type_spec(obj)

This prints:

1 : int
2.3 : float
None : (none)
False : bool
'hello' : str
[1, 2, 3] : list<int>
['a', 'b'] : list<str>
[1, 'h'] : list<?>
(False, 1, '2') : tuple<bool, int, str>
set([2.3, 1.2, 3.4]) : set<float>
[[1, 2, 3], [4, 5, 6], [7, 8, 9]] : list<list<int>>
[(1, 'a'), (2, 'b')] : list<tuple<int, str>>
{1: 'b', 2: 'c'} : dict<int, str>
[<__main__.Foo object at 0x101de6c50>] : list<Foo>

There's a question of how far you want to take it, and how deeply to check, with trade-offs between speed and accuracy. For example, do you want to go through all the items in a large list? Do you want to handle custom types (and tracking down common ancestors of those types)?

Worth a read, though I'm not sure it's applicable, this PEP on type hints.

这篇关于如何确定Python中嵌套数据结构的类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆