pandas 专栏创作 [英] Pandas column creation
问题描述
鉴于以下尝试之一创建新列的尝试似乎失败了,所以我正在努力理解列命名约定的概念:
I'm struggling to understand the concept behind column naming conventions, given that one of the following attempts to create a new column appears to fail:
from numpy.random import randn
import pandas as pd
df = pd.DataFrame({'a':range(0,10,2), 'c':range(0,1000,200)},
columns=list('ac'))
df['b'] = 10*df.a
df
给出以下结果:
但是,如果我尝试通过替换下面的行来创建列b,则不会出现错误消息,但是数据帧df仅包含列a和c.
Yet, if I were to try to create column b by substituting with the following line, there is no error message, yet the dataframe df remains with only the columns a and c.
df.b = 10*df.a ### rather than the previous df['b'] = 10*df.a ###
熊猫做了什么,为什么我的命令不正确?
What has pandas done and why is my command incorrect?
推荐答案
您所做的是在df中添加了属性b
:
What you did was add an attribute b
to your df:
In [70]:
df.b = 10*df.a
df.b
Out[70]:
0 0
1 20
2 40
3 60
4 80
Name: a, dtype: int32
但是我们看到没有添加任何新列:
but we see that no new column has been added:
In [73]:
df.columns
Out[73]:
Index(['a', 'c'], dtype='object')
这意味着如果尝试使用df['b']
,我们会得到一个KeyError
,为避免这种歧义,您在分配时应始终使用方括号.
which means we get a KeyError
if we tried df['b']
, to avoid this ambiguity you should always use square brackets when assigning.
例如,如果您有名为index
或sum
或max
的列,则执行df.index
会返回索引而不是索引列,并且类似地df.sum
和df.max
会弄乱那些df方法.
for instance if you had a column named index
or sum
or max
then doing df.index
would return the index and not the index column, and similarly df.sum
and df.max
would screw up those df methods.
我强烈建议始终使用方括号,这样可以避免任何歧义,并且最新的ipython能够使用方括号来解析列名称.将数据框视为系列的字典也是有用的,在该系列中,使用方括号分配和返回列是有意义的
I strongly advise to always use square brackets, it avoids any ambiguity and the latest ipython is able to resolve column names using square brackets. It's also useful to think of a dataframe as a dict of series in which it makes sense to use square brackets for assigning and returning a column
这篇关于 pandas 专栏创作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!