为什么需要将地图类型转换为列表以将其分配给 pandas 系列? [英] Why is it required to typecast a map into a list to assign it to a pandas series?

查看：85 发布时间：2020/5/24 1:57:37 python python-3.x pandas

本文介绍了为什么需要将地图类型转换为列表以将其分配给 pandas 系列?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我刚刚开始学习熊猫的基础知识，有一件事让我思考.

I have just started learning the basics of pandas, and there is one thing which made me think.

import pandas as pd
data = pd.DataFrame({'Column1': ['A', 'B', 'C']})
data['Column2'] = map(str.lower, data['Column1'])
print(data)

该程序的输出为:

   Column1                             Column2
 0       A  <map object at 0x00000205D80BCF98>
 1       B  <map object at 0x00000205D80BCF98>
 2       C  <map object at 0x00000205D80BCF98>

获得所需输出的一种可能解决方案是将地图对象转换成列表.

One possible solution to get the desired output is to typecast the map object into a list.

import pandas as pd
data = pd.DataFrame({'Column1': ['A', 'B', 'C']})
data['Column2'] = list(map(str.lower, data['Column1']))
print(data)

输出:

   Column1 Column2
 0       A       a
 1       B       b
 2       C       c

但是，如果我使用range()并在Python 3中返回其自身的类型，则无需将对象类型转换为列表.

However if I use range(), which also returns its own type in Python 3, there is no need to typecast the object to a list.

import pandas as pd
data = pd.DataFrame({'Column1': ['A', 'B', 'C']})
data['Column2'] = range(3)
print(data)

输出:

   Column1  Column2
 0       A        0
 1       B        1
 2       C        2

有什么理由不要求强制转换范围对象，而要求映射对象?

Is there any reason why range object is not required to be typecasted but map object is?

推荐答案

TL; DR::range具有__getitem__和__len__，而map没有

我假设对于

I'm assuming that the syntax of creating a new dataframe column is in some way syntactic sugar for Pandas.DataFrame.insert, which takes as an argument for value a

标量，系列或类似数组的

scalar, Series, or array-like

鉴于此，问题似乎简化为:为什么熊猫将列表和范围视为数组而不是地图?"

Given that, it seems the question reduces to "Why does pandas treat a list and a range as array-like, but not a map?"

请参阅: numpy:"array_like"的正式定义；对象?.

如果您尝试使数组超出范围，则可以正常工作，因为范围足够接近类似于数组的数组，但是您无法使用地图做到这一点.

If you try making an array out of a range, it works fine, because range is close enough to array-like, but you can't do so with a map.

>>>将numpy导入为np
>>> foo = np.array(range(10))
>>> bar = np.array(map(lambda x:x + 1，范围(10))
>>> foo
array([0，1，2，3，4，5，6，7，8，9])
>>>栏
array(<在0x7f7e553219e8处映射对象，dtype = object)

>>> import numpy as np
>>> foo = np.array(range(10))
>>> bar = np.array(map(lambda x: x + 1, range(10))
>>> foo
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> bar
array(<map object at 0x7f7e553219e8>, dtype=object)

map不是类似数组的"，而range是.

map is not "array-like", while range is.

进一步查看 PyArray_GetArrayParamsFromObject ，在链接的答案中引用，该函数的末尾调用PySequence_Check.该代码是python代码，并且在Stack Overflow上对此进行了很好的讨论:什么是Python的序列协议?

Looking further into PyArray_GetArrayParamsFromObject, referred to in the linked answer, the end of the function calls PySequence_Check. That code is python code, and there's a good discussion of it on Stack Overflow: What is Python's sequence protocol? .

更早之前，在相同文件，它说:

   /*
     * PySequence_Check detects whether an old type object is a
     * sequence by the presence of the __getitem__ attribute, and
     * for new type objects that aren't dictionaries by the
     * presence of the __len__ attribute as well. In either case it
     * is possible to have an object that tests as a sequence but
     * doesn't behave as a sequence and consequently, the
     * PySequence_GetItem call can fail. When that happens and the
     * object looks like a dictionary, we truncate the dimensions
     * and set the object creation flag, otherwise we pass the
     * error back up the call chain.
     */

这似乎是类数组"的主要部分-具有 getitem 和 len 的任何项目都类似于数组. range都有，而map都没有.

This seems to be a major part of "array-like" - any item that has getitem and len is array like. range has both, while map has neither.

__getitem__和__len__是创建序列所必需的，因此足以使该列按您希望的方式显示，而不是作为单个对象显示.

__getitem__ and __len__ are necessary and sufficient to make a sequence, and therefore get the column to display as you wish instead of as a single object.

尝试一下:

class Column(object):
    def __len__(self):
        return 5
    def __getitem__(self, index):
        if 0 <= index < 5:
            return index+5
        else:
            raise IndexError

col = Column()
a_col = np.array(col)

如果您没有__getitem__()或__len()__，numpy将为您创建一个数组，但是它将与其中的对象一起使用，并且不会为您迭代.
如果同时具有这两种功能，它将显示您想要的方式.

If you don't have either __getitem__() or __len()__, numpy will create an array for you, but it will be with the object in it, and it won't iterate through for you.
If you have both functions, it displays the way you want.

(感谢user2357112纠正我.在一个稍微简单的示例中，我认为需要__iter__.不是.__getitem__函数的确需要确保索引在范围内.)

(Thanks to user2357112 for correcting me. In a slightly simpler example, I thought __iter__ was required. It's not. The __getitem__ function does need to make sure the index is in range, though.)

这篇关于为什么需要将地图类型转换为列表以将其分配给 pandas 系列?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么需要将地图类型转换为列表以将其分配给 pandas 系列? [英] Why is it required to typecast a map into a list to assign it to a pandas series?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

为什么需要将地图类型转换为列表以将其分配给 pandas 系列? [英] Why is it required to typecast a map into a list to assign it to a pandas series?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭