如何修复由于 Pandas Groupby 中的级别导致的索引错误 [英] How to fix Index Error due to level in Pandas Groupby

查看:67
本文介绍了如何修复由于 Pandas Groupby 中的级别导致的索引错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下 DataFrame badges.UserId 列包含同一用户的多个条目.我想为给定的 BadgeName 的每个 UserId 获取 Date 的最小值.我创建了一个函数 user_badge_dt 来执行相同的操作,但出现索引错误.需要注意的一点是,尽管所有用户的数据集都是相同的,但我只针对某些徽章而不是其他徽章收到此错误.我不知道为什么会这样.

I have the following DataFrame badges. The column UserId includes multiple entries for same user. I want to obtain the minimum value of Date for every UserId for a given BadgeName. I have created a function user_badge_dt to perform the same but I get Index Error. The point to note is that although the dataset is same for all users, I get this error only for some badges and not for others. I don't know why this is happening.

徽章数据帧的一部分

    UserId    BadgeName            Date                   
0     23    Curious         2016-01-12T18:44:49.267 
1     22    Autobiographer  2017-01-12T18:44:49.267 
2     23    Curious         2018-01-12T18:44:49.267 
3     20    Autobiographer  2019-01-12T18:44:49.267 
4     22    Autobiographer  2020-01-12T18:44:49.267
5     30    Curious         2020-01-12T18:44:49.267

功能

#Function to obtain UserId with the date-time of obtaining given badge for the first time
def user_badge_dt(badge_name):
  
  #Creating DataFrame to obtain all UserId and date-Time of given badge
  df = badges[['UserId','Date']].loc[badges.Name == badge]
  
  #Obtaining the first date-time of badge attainment
  v = df.groupby("UserId", group_keys=False)['Date'].nsmallest(1)
  v.index = v.index.droplevel(1)

  df['date'] = df['UserId'].map(v)
  df.drop(columns='Date',inplace=True)
  
  #Removing all duplicate values of Users
  df.drop_duplicates(subset='UserId',  inplace=True )

  return df

错误

IndexError: Too many levels: Index has only 1 level, not 2

注意
在进一步检查时,我发现错误是在这条线上引起的v.index = v.index.droplevel(1)

这是因为前面的代码行对不同的徽章名称给出了不同的结果:

This was because the previous code line is giving different results for different badge names:

案例 1:当代码对于给定的徽章正常工作时

CASE 1: When code works correctly for given badge

df = 徽章[['UserId','Date']].loc[badges.Name == '自传']
v = df.groupby("UserId", group_keys=False)['Date'].nsmallest(1)打印(v)

df = badges[['UserId','Date']].loc[badges.Name == 'Autobiographer']
v = df.groupby("UserId", group_keys=False)['Date'].nsmallest(1) print(v)

o/p:

    1   22    2017-01-12T18:44:49.267 
    3   20    2019-01-12T18:44:49.267 

(此输出具有 indexUserId 和给定徽章的 Date 最小值)

(This output has index, UserId and minimum value of Date for given badge)

案例 2:当代码对给定徽章工作不正确时

CASE 2: When code works incorrectly for given badge

df = 徽章[['UserId','Date']].loc[badges.Name == 'Curious']
v = df.groupby("UserId", group_keys=False)['Date'].nsmallest(1)打印(v)

df = badges[['UserId','Date']].loc[badges.Name == 'Curious']
v = df.groupby("UserId", group_keys=False)['Date'].nsmallest(1) print(v)

o/p:

      23   2016-01-12T18:44:49.267 
      30   2020-01-12T18:44:49.267

(此输出没有 index 这就是代码在下一行失败的原因.我不知道它是怎么发生的.)

(This output does not have index that is why code is failing at the next line. I don't know how is it happening.)

对于任何输入 badge_name 的函数的预期输出应该返回一个带有 UserId 和给定徽章的 Date 最小值的数据帧.如果我的功能不清楚,请提供使用新功能的不同方式来实现此目的.

The expected output of the function for any input badge_name should return a dataframe with the UserId and the minimum value of Date the given badge. If my function is unclear, please provide a different way to achieve this using a new function.

推荐答案

我无法模拟您的错误,但我认为您的解决方案应该使用 DataFrame.sort_values - 然后获取所有日期最小的第一个用户:

I cannot simulate your error, but I think your solution should be simplify with DataFrame.sort_values - then get all first users with smallest dates:

badges['Date'] = pd.to_datetime(badges['Date'])

def user_badge_dt(badge_name):
  
  #Creating DataFrame to obtain all UserId and date-Time of given badge
  return  (badges.loc[badges.BadgeName == badge_name, ['UserId','Date']]
                 .sort_values('Date')
                 .drop_duplicates(subset='UserId'))

这篇关于如何修复由于 Pandas Groupby 中的级别导致的索引错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆