对 PANDAS 数据帧“单元格"中的所有整数求和 [英] Sum all integers in a PANDAS DataFrame "cell"

查看:44
本文介绍了对 PANDAS 数据帧“单元格"中的所有整数求和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 PANDAS DF 对象,其中每个单元格"都是一个元组列表:

I have a PANDAS DF object where each "cell" is a list of tuples:

d = {"seen":[[('A', 4)], [], [('B', 4), ('C',3)], [('A', 1), ('C',4)]],\
 'unseen':[[('B', 2), ('C',2)], [('A', 4), ('B', 2), ('C',2)], [('A', 4)], 
[('C',1)]]}
df = pd.DataFrame(d)
df  

结果如下:

    seen                 unseen
0   [(A, 4)]            [(B, 2), (C, 2)]
1   []                  [(A, 4), (B, 2), (C, 2)]
2   [(B, 4), (C, 3)]    [(A, 4)]
3   [(A, 1), (C, 4)]    [(B, 1)]

我需要创建一个有 4 列的新 DF:每个元组列表的长度 + 每个单元格中所有数字的总和:

I need to create a new DF with 4 columns: the length of each list-of-tuples + the sum of all the numbers in each cells:

    seen_count    seen_sum    unseen_count    unseen_sum
0   1             4           2               4  
1   0             0           3               8  
2   2             7           1               4  
3   2             5           1               1  

我可以遍历行并计算每个单元格"(在本例中为列表)的长度,然后遍历每个列表中的元组并对数字求和……但我希望有一个更有效的方法方法然后这个.有什么想法吗?

I can iterate over the rows and count the length of each "cell" (list in this case), and then iterate over the tuples in each list and sum the numbers... but I'm hoping there's a more efficient method then this. any ideas?

推荐答案

当您处理如此复杂的一组行时,首先创建数据框毫无意义.在从中制作数据框之前,请先使用自定义函数对其进行清理.以下是数据帧创建前清理的说明:

There is no point in creating the dataframe first when you're dealing with such a complicated set of rows. Clean it up first with custom functions before you make a dataframe out of it. The following is an illustration of clean up before dataframe creation:

import pandas as pd

# starting dictionary
d = {"seen":[[('A', 4)], [], [('B', 4), ('C',3)], [('A', 1), ('C',4)]],
     "unseen":[[('B', 2), ('C',2)], [('A', 4), ('B', 2), ('C',2)], [('A', 4)], [('C',1)]]
     }

# custom funcs
funcs = {'sum': lambda r: [sum(y[1] for y in x) for x in r],
         'count': lambda r: [len(y) for y in r]}

df = pd.DataFrame()
for k in d:
    for f in funcs:
        df["{k}_{f}".format(k=k, f=f)] = funcs.get(f)(d.get(k))

df

#    seen_count  seen_sum  unseen_count  unseen_sum
# 0           1         4             2           4
# 1           0         0             3           8
# 2           2         7             1           4
# 3           2         5             1           1

这篇关于对 PANDAS 数据帧“单元格"中的所有整数求和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆