为什么需要将地图类型转换为列表以将其分配给 pandas 系列? [英] Why is it required to typecast a map into a list to assign it to a pandas series?
问题描述
我刚刚开始学习熊猫的基础知识,有一件事让我思考.
I have just started learning the basics of pandas, and there is one thing which made me think.
import pandas as pd
data = pd.DataFrame({'Column1': ['A', 'B', 'C']})
data['Column2'] = map(str.lower, data['Column1'])
print(data)
该程序的输出为:
Column1 Column2
0 A <map object at 0x00000205D80BCF98>
1 B <map object at 0x00000205D80BCF98>
2 C <map object at 0x00000205D80BCF98>
获得所需输出的一种可能解决方案是将地图对象转换成列表.
One possible solution to get the desired output is to typecast the map object into a list.
import pandas as pd
data = pd.DataFrame({'Column1': ['A', 'B', 'C']})
data['Column2'] = list(map(str.lower, data['Column1']))
print(data)
输出:
Column1 Column2
0 A a
1 B b
2 C c
但是,如果我使用range()并在Python 3中返回其自身的类型,则无需将对象类型转换为列表.
However if I use range(), which also returns its own type in Python 3, there is no need to typecast the object to a list.
import pandas as pd
data = pd.DataFrame({'Column1': ['A', 'B', 'C']})
data['Column2'] = range(3)
print(data)
输出:
Column1 Column2
0 A 0
1 B 1
2 C 2
有什么理由不要求强制转换范围对象,而要求映射对象?
Is there any reason why range object is not required to be typecasted but map object is?
推荐答案
TL; DR::range
具有__getitem__
和__len__
,而map
没有
I'm assuming that the syntax of creating a new dataframe column is in some way syntactic sugar for Pandas.DataFrame.insert, which takes as an argument for value
a
标量,系列或类似数组的
scalar, Series, or array-like
鉴于此,问题似乎简化为:为什么熊猫将列表和范围视为数组而不是地图?"
Given that, it seems the question reduces to "Why does pandas treat a list and a range as array-like, but not a map?"
请参阅: numpy:"array_like"的正式定义;对象?.
如果您尝试使数组超出范围,则可以正常工作,因为范围足够接近类似于数组的数组,但是您无法使用地图做到这一点.
If you try making an array out of a range, it works fine, because range is close enough to array-like, but you can't do so with a map.
>>>将numpy导入为np
>>> foo = np.array(range(10))
>>> bar = np.array(map(lambda x:x + 1,范围(10))
>>> foo
array([0,1,2,3,4,5,6,7,8,9])
>>>栏
array(<在0x7f7e553219e8处映射对象,dtype = object)
>>> import numpy as np
>>> foo = np.array(range(10))
>>> bar = np.array(map(lambda x: x + 1, range(10))
>>> foo
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> bar
array(<map object at 0x7f7e553219e8>, dtype=object)
map
不是类似数组的",而range
是.
map
is not "array-like", while range
is.
进一步查看 PyArray_GetArrayParamsFromObject ,在链接的答案中引用,该函数的末尾调用PySequence_Check.该代码是python代码,并且在Stack Overflow上对此进行了很好的讨论:什么是Python的序列协议?
Looking further into PyArray_GetArrayParamsFromObject, referred to in the linked answer, the end of the function calls PySequence_Check. That code is python code, and there's a good discussion of it on Stack Overflow: What is Python's sequence protocol? .
更早之前,在相同文件,它说:
/*
* PySequence_Check detects whether an old type object is a
* sequence by the presence of the __getitem__ attribute, and
* for new type objects that aren't dictionaries by the
* presence of the __len__ attribute as well. In either case it
* is possible to have an object that tests as a sequence but
* doesn't behave as a sequence and consequently, the
* PySequence_GetItem call can fail. When that happens and the
* object looks like a dictionary, we truncate the dimensions
* and set the object creation flag, otherwise we pass the
* error back up the call chain.
*/
这似乎是类数组"的主要部分-具有 getitem 和 len 的任何项目都类似于数组. range
都有,而map
都没有.
This seems to be a major part of "array-like" - any item that has getitem and len is array like. range
has both, while map
has neither.
__getitem__
和__len__
是创建序列所必需的,因此足以使该列按您希望的方式显示,而不是作为单个对象显示.
__getitem__
and __len__
are necessary and sufficient to make a sequence, and therefore get the column to display as you wish instead of as a single object.
尝试一下:
class Column(object):
def __len__(self):
return 5
def __getitem__(self, index):
if 0 <= index < 5:
return index+5
else:
raise IndexError
col = Column()
a_col = np.array(col)
- 如果您没有
__getitem__()
或__len()__
,numpy将为您创建一个数组,但是它将与其中的对象一起使用,并且不会为您迭代. - 如果同时具有这两种功能,它将显示您想要的方式.
- If you don't have either
__getitem__()
or__len()__
, numpy will create an array for you, but it will be with the object in it, and it won't iterate through for you. - If you have both functions, it displays the way you want.
(感谢user2357112纠正我.在一个稍微简单的示例中,我认为需要__iter__
.不是.__getitem__
函数的确需要确保索引在范围内.)
(Thanks to user2357112 for correcting me. In a slightly simpler example, I thought __iter__
was required. It's not. The __getitem__
function does need to make sure the index is in range, though.)
这篇关于为什么需要将地图类型转换为列表以将其分配给 pandas 系列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!