ValueError:值的长度与索引的长度不匹配 pandas DataFrame.unique() [英] ValueError: Length of values does not match length of index | Pandas DataFrame.unique()

查看:1415
本文介绍了ValueError:值的长度与索引的长度不匹配 pandas DataFrame.unique()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试获取新的数据集,或将当前数据集列的值更改为其唯一值. 这是我要获取的示例:

   A B
 -----
0| 1 1
1| 2 5
2| 1 5
3| 7 9
4| 7 9
5| 8 9

Wanted Result    Not Wanted Result
       A B            A B
     -----          -----
    0| 1 1         0| 1 1
    1| 2 5         1| 2 5
    2| 7 9         2| 
    3| 8           3| 7 9
                   4|
                   5| 8

我不太在乎索引,但这似乎是问题所在. 到目前为止,我的代码非常简单,我尝试了两种方法,一种是使用新的dataFrame,另一种是不使用.

#With New DataFrame
 def UniqueResults(dataframe):
    df = pd.DataFrame()
    for col in dataframe:
        S=pd.Series(dataframe[col].unique())
        df[col]=S.values
    return df

#Without new DataFrame
def UniqueResults(dataframe):
    for col in dataframe:
        dataframe[col]=dataframe[col].unique()
    return dataframe

两次都出现错误值的长度与索引的长度不匹配".

解决方案

当您尝试为数据帧分配不同长度的numpy数组列表时,会出现错误,并且可以将其复制如下: >

四行数据框:

df = pd.DataFrame({'A': [1,2,3,4]})

现在尝试为其分配两个元素的列表/数组:

df['B'] = [3,4]   # or df['B'] = np.array([3,4])

两个错误都消失了

ValueError:值的长度与索引的长度不匹配

因为数据框有四行,但是列表和数组只有两个元素.

解决方案 (请谨慎使用):将列表/数组转换为pandas系列,然后在进行分配时,该系列中的缺失索引将为用 NaN 填充:

df['B'] = pd.Series([3,4])

df
#   A     B
#0  1   3.0
#1  2   4.0
#2  3   NaN          # NaN because the value at index 2 and 3 doesn't exist in the Series
#3  4   NaN


对于您的特定问题,如果您不关心索引或列之间的值的对应关系,则可以在删除重复项后为每列重置索引:

df.apply(lambda col: col.drop_duplicates().reset_index(drop=True))

#   A     B
#0  1   1.0
#1  2   5.0
#2  7   9.0
#3  8   NaN

I am trying to get a new dataset, or change the value of the current dataset columns to their unique values. Here is an example of what I am trying to get :

   A B
 -----
0| 1 1
1| 2 5
2| 1 5
3| 7 9
4| 7 9
5| 8 9

Wanted Result    Not Wanted Result
       A B            A B
     -----          -----
    0| 1 1         0| 1 1
    1| 2 5         1| 2 5
    2| 7 9         2| 
    3| 8           3| 7 9
                   4|
                   5| 8

I don't really care about the index but it seems to be the problem. My code so far is pretty simple, I tried 2 approaches, 1 with a new dataFrame and one without.

#With New DataFrame
 def UniqueResults(dataframe):
    df = pd.DataFrame()
    for col in dataframe:
        S=pd.Series(dataframe[col].unique())
        df[col]=S.values
    return df

#Without new DataFrame
def UniqueResults(dataframe):
    for col in dataframe:
        dataframe[col]=dataframe[col].unique()
    return dataframe

I have the error "Length of Values does not match length of index" both times.

解决方案

The error comes up when you are trying to assign a list of numpy array of different length to a data frame, and it can be reproduced as follows:

A data frame of four rows:

df = pd.DataFrame({'A': [1,2,3,4]})

Now trying to assign a list/array of two elements to it:

df['B'] = [3,4]   # or df['B'] = np.array([3,4])

Both errors out:

ValueError: Length of values does not match length of index

Because the data frame has four rows but the list and array has only two elements.

Work around Solution (use with caution): convert the list/array to a pandas Series, and then when you do assignment, missing index in the Series will be filled with NaN:

df['B'] = pd.Series([3,4])

df
#   A     B
#0  1   3.0
#1  2   4.0
#2  3   NaN          # NaN because the value at index 2 and 3 doesn't exist in the Series
#3  4   NaN


For your specific problem, if you don't care about the index or the correspondence of values between columns, you can reset index for each column after dropping the duplicates:

df.apply(lambda col: col.drop_duplicates().reset_index(drop=True))

#   A     B
#0  1   1.0
#1  2   5.0
#2  7   9.0
#3  8   NaN

这篇关于ValueError:值的长度与索引的长度不匹配 pandas DataFrame.unique()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆