将一列从一个DataFrame复制到另一个提供NaN值? [英] Copying a column from one DataFrame to another gives NaN values?

查看:433
本文介绍了将一列从一个DataFrame复制到另一个提供NaN值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题已经问了很多遍了,并且似乎对其他人有用,但是,当我从不同的DataFrame复制列时,我得到了NaN值(df1df2的长​​度相同).

This question has been asked so many times, and it seemed to work for others, however, I am getting NaN values when I copy a column from a different DataFrame(df1 and df2 are same length).

df1

        date     hour      var1
a   2017-05-01  00:00:00   456585
b   2017-05-01  01:00:00   899875
c   2017-05-01  02:00:00   569566
d   2017-05-01  03:00:00   458756
e   2017-05-01  04:00:00   231458
f   2017-05-01  05:00:00   986545

df2

      MyVar1     MyVar2 
 0  6169.719338 3688.045368
 1  5861.148007 3152.238704
 2  5797.053347 2700.469871
 3  5779.102340 2730.471948
 4  6708.219647 3181.298291
 5  8550.380343 3793.580394

我在df2

       MyVar1    MyVar2        date        hour
 0  6169.719338 3688.045368  2017-05-01  00:00:00
 1  5861.148007 3152.238704  2017-05-01  01:00:00
 2  5797.053347 2700.469871  2017-05-01  02:00:00
 3  5779.102340 2730.471948  2017-05-01  03:00:00
 4  6708.219647 3181.298291  2017-05-01  04:00:00
 5  8550.380343 3793.580394  2017-05-01  05:00:00

我尝试了以下方法,

df2['date'] = df1['date']
df2['hour'] = df1['hour']

type(df1)
>> pandas.core.frame.DataFrame

type(df2)
>> pandas.core.frame.DataFrame

我得到以下内容,

       MyVar1    MyVar2      date       hour
 0  6169.719338 3688.045368  NaN        NaN
 1  5861.148007 3152.238704  NaN        NaN
 2  5797.053347 2700.469871  NaN        NaN

为什么会这样?还有另一篇帖子,其中讨论了merge,但是我只需要复制它.任何帮助,将不胜感激.

Why is this happening? There is another post that discusses merge, but I just need to copy it. Any help would be appreciated.

推荐答案

罪魁祸首是无法对齐的索引

您的DataFrames的索引不同(相应地,每列的索引),因此,当尝试将一个DataFrame的列分配给另一列时, pandas将尝试对齐索引,但这样做失败,请插入NaN.

The culprit is unalignable indexes

Your DataFrames' indexes are different (and correspondingly, the indexes for each columns), so when trying to assign a column of one DataFrame to another, pandas will try to align the indexes, and failing to do so, insert NaNs.

考虑以下示例以了解其含义:

Consider the following examples to understand what this means:

# Setup
A = pd.DataFrame(index=['a', 'b', 'c']) 
B = pd.DataFrame(index=['b', 'c', 'd', 'f'])                                  
C = pd.DataFrame(index=[1, 2, 3])

# Example of alignable indexes - A & B (complete or partial overlap of indexes)
A.index B.index
      a        
      b       b   (overlap)
      c       c   (overlap)
              d
              f

# Example of unalignable indexes - A & C (no overlap at all)
A.index C.index
      a        
      b        
      c        
              1
              2
              3

当没有重叠时,熊猫甚至无法匹配两个DataFrame之间的单个值以放入分配结果,因此输出是充满NaN的列.

When there are no overlaps, pandas cannot match even a single value between the two DataFrames to put in the result of the assignment, so the output is a column full of NaNs.

如果您使用的是IPython笔记本,则可以检查是否确实是使用的根本原因,

If you're working on an IPython notebook, you can check that this is indeed the root cause using,

df1.index.equals(df2.index)                                                                                               
# False
df1.index.intersection(df2.index).empty                                                                                     
# True


您可以使用以下任何一种解决方案来解决此问题.


You can use any of the following solutions to fix this issue.

如果您并非本来就打算拥有不同的索引,或者您不太在意保留索引,则可能更喜欢此选项.

You may prefer this option if you didn't mean to have different indices in the first place, or if you don't particularly care about preserving the index.

# Optional, if you want a RangeIndex => [0, 1, 2, ...]
# df1.index = pd.RangeIndex(len(df))
# Homogenize the index values,
df2.index = df1.index
# Assign the columns.
df2[['date', 'hour']] = df1[['date', 'hour']]

如果要保留现有索引,但要保留为一列,则可以改用reset_index().

If you want to keep the existing index, but as a column, you may use reset_index() instead.

仅当两个DataFrame的长度匹配时,此解决方案才有效.

This solution will only work if the lengths of the two DataFrames match.

# pandas >= 0.24
df2['date'] = df1['date'].to_numpy()
# pandas < 0.24
df2['date'] = df1['date'].values

要轻松分配多个列,请使用

To assign multiple columns easily, use,

df2 = df2.assign(**{c: df1[c].to_numpy() for c in ('date', 'hour')})

这篇关于将一列从一个DataFrame复制到另一个提供NaN值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆