用于数据帧循环的 pandas 提供了太多无法解包的值 [英] Pandas for loop over dataframe gives too many values to unpack

查看:63
本文介绍了用于数据帧循环的 pandas 提供了太多无法解包的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不明白为什么此代码不起作用?我正在尝试遍历一个数据帧,在这种情况下,for循环中只有一行?只有两列,并且我有两个for循环变量来使用它们?我想念什么?

I don't see why this code isn't working? I am trying to iterate over a data frame, which in this case only has one row in a for loop? There are only two columns and I have two for loop variables to take them? what am I missing please?

  print("process_list =  ",process_list)

  for row in process_list.itertuples():
       print("row = ", row)


  df_to_date = pd.DataFrame()

  try:
        print("process_list = {}  and it's type {}  process_list.itertuples() {} ".format(process_list, type(process_list),process_list.itertuples() ) )

        for   file_date , file_name  in process_list.itertuples(): # a whole batch of days 
               file_to_process = dev_env + file_name
               print("PROCESSING BATCH: ",file_to_process)
               df  = pd.read_csv(file_to_process, header=None,skiprows=22, sep=',', comment='*', converters = {"Days" : just_number,"Percentile" : just_number,"Date" : just_number} ,names = column_names )
               df.insert(0,'File_date',file_date)
               df_to_date = df_to_date.append(df)

  except Exception as e: 
           print ("nothing to process exception = ",e)
           sys.exit(0)

当我运行它时,我会得到

when I run it I get

process_list =       File_date          File_name
94   20180507  mcmhv20180507.csv
row =  Pandas(Index=94, File_date=20180507, File_name='mcmhv20180507.csv')
process_list =     File_date          File_name
94   20180507  mcmhv20180507.csv  and it's type <class 'pandas.core.frame.DataFrame'>  process_list.itertuples() <map object at 0x7f6339371e48> 
nothing to process exception =  too many values to unpack (expected 2)

推荐答案

有两种方法可以解决这个问题.

There are two options to account for this.

选项1

打开3个物品的包装,而不是2个,不要使用其中的第一个.

Unpack 3 items instead of 2, the first of which you do not use.

这是一个最小的示例:

df = pd.DataFrame([[10, 20], [30, 40], [50, 60]],
                  columns=['A', 'B'])

for idx, a, b in df.itertuples():
    print(idx, a, b)

0 10 20
1 30 40
2 50 60

在您的情况下,可以使用的一个很好的约定是通过_指示未使用的变量:

In your case, a good convention to use would be to indicate an unused variable by _:

for _, file_date, file_name in process_list[['date', 'name']].itertuples():
    # do something

选项2

使用index=False参数并解压缩2个元素:

Use index=False argument and unpack 2 elements:

for file_date, file_name in process_list[['date', 'name']].itertuples(index=False):
    # do something

该行为在文档中指出:

DataFrame.内容( index = True,名称="Pandas" )

将DataFrame行迭代为namedtuple,索引值作为第一个 元组的元素.

Iterate over DataFrame rows as namedtuples, with index value as first element of the tuple.

这篇关于用于数据帧循环的 pandas 提供了太多无法解包的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆