如何向现有 DataFrame 添加新列? [英] How to add a new column to an existing DataFrame?

查看:78
本文介绍了如何向现有 DataFrame 添加新列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下索引 DataFrame,其中命名的列和行不是连续数字:

 a b c d2 0.671399 0.101208 -0.181532 0.2412733 0.446172 -0.243316 0.051767 1.5773185 0.614758 0.075793 -0.451460 -0.012493

我想在现有数据框中添加一个新列 'e' 并且不想更改数据框中的任何内容(即,新列始终具有相同的长度作为数据帧).

0 -0.3354851 -1.1666582 -0.385571数据类型:float64

如何将 e 列添加到上述示例中?

解决方案

Edit 2017

正如评论和@Alexander 所指出的,目前将系列的值添加为 DataFrame 的新列的最佳方法可能是使用 assign:

df1 = df1.assign(e=pd.Series(np.random.randn(sLength)).values)


编辑 2015
有些人报告说使用此代码获得了 SettingWithCopyWarning.
但是,该代码在当前的 Pandas 版本 0.16.1 上仍然可以完美运行.

<预><代码>>>>sLength = len(df1['a'])>>>df1A B C D6 -0.269221 -0.026476 0.997517 1.2943858 0.917438 0.847941 0.034235 -0.448948>>>df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index)>>>df1a b c d6 -0.269221 -0.026476 0.997517 1.294385 1.7571678 0.917438 0.847941 0.034235 -0.448948 2.228131>>>pd.version.short_version'0.16.1'

SettingWithCopyWarning 旨在通知对 Dataframe 副本可能无效的分配.它不一定说你做错了(它可能会触发误报),但从 0.13.0 开始,它让你知道有更多适合相同目的的方法.然后,如果您收到警告,只需遵循其建议:尝试使用 .loc[row_index,col_indexer] = value 代替

<预><代码>>>>df1.loc[:,'f'] = pd.Series(np.random.randn(sLength), index=df1.index)>>>df1a b c d e f6 -0.269221 -0.026476 0.997517 1.294385 1.757167 -0.0509278 0.917438 0.847941 0.034235 -0.448948 2.228131 0.006109>>>

事实上,这是目前更有效的方法 在熊猫文档中描述


原答案:

使用原始 df1 索引创建系列:

df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index)

I have the following indexed DataFrame with named columns and rows not- continuous numbers:

          a         b         c         d
2  0.671399  0.101208 -0.181532  0.241273
3  0.446172 -0.243316  0.051767  1.577318
5  0.614758  0.075793 -0.451460 -0.012493

I would like to add a new column, 'e', to the existing data frame and do not want to change anything in the data frame (i.e., the new column always has the same length as the DataFrame).

0   -0.335485
1   -1.166658
2   -0.385571
dtype: float64

How can I add column e to the above example?

解决方案

Edit 2017

As indicated in the comments and by @Alexander, currently the best method to add the values of a Series as a new column of a DataFrame could be using assign:

df1 = df1.assign(e=pd.Series(np.random.randn(sLength)).values)


Edit 2015
Some reported getting the SettingWithCopyWarning with this code.
However, the code still runs perfectly with the current pandas version 0.16.1.

>>> sLength = len(df1['a'])
>>> df1
          a         b         c         d
6 -0.269221 -0.026476  0.997517  1.294385
8  0.917438  0.847941  0.034235 -0.448948

>>> df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index)
>>> df1
          a         b         c         d         e
6 -0.269221 -0.026476  0.997517  1.294385  1.757167
8  0.917438  0.847941  0.034235 -0.448948  2.228131

>>> pd.version.short_version
'0.16.1'

The SettingWithCopyWarning aims to inform of a possibly invalid assignment on a copy of the Dataframe. It doesn't necessarily say you did it wrong (it can trigger false positives) but from 0.13.0 it let you know there are more adequate methods for the same purpose. Then, if you get the warning, just follow its advise: Try using .loc[row_index,col_indexer] = value instead

>>> df1.loc[:,'f'] = pd.Series(np.random.randn(sLength), index=df1.index)
>>> df1
          a         b         c         d         e         f
6 -0.269221 -0.026476  0.997517  1.294385  1.757167 -0.050927
8  0.917438  0.847941  0.034235 -0.448948  2.228131  0.006109
>>> 

In fact, this is currently the more efficient method as described in pandas docs


Original answer:

Use the original df1 indexes to create the series:

df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index)

这篇关于如何向现有 DataFrame 添加新列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆