对 PANDAS 数据帧“单元格"中的所有整数求和 [英] Sum all integers in a PANDAS DataFrame "cell"
问题描述
我有一个 PANDAS DF 对象,其中每个单元格"都是一个元组列表:
I have a PANDAS DF object where each "cell" is a list of tuples:
d = {"seen":[[('A', 4)], [], [('B', 4), ('C',3)], [('A', 1), ('C',4)]],\
'unseen':[[('B', 2), ('C',2)], [('A', 4), ('B', 2), ('C',2)], [('A', 4)],
[('C',1)]]}
df = pd.DataFrame(d)
df
结果如下:
seen unseen
0 [(A, 4)] [(B, 2), (C, 2)]
1 [] [(A, 4), (B, 2), (C, 2)]
2 [(B, 4), (C, 3)] [(A, 4)]
3 [(A, 1), (C, 4)] [(B, 1)]
我需要创建一个有 4 列的新 DF:每个元组列表的长度 + 每个单元格中所有数字的总和:
I need to create a new DF with 4 columns: the length of each list-of-tuples + the sum of all the numbers in each cells:
seen_count seen_sum unseen_count unseen_sum
0 1 4 2 4
1 0 0 3 8
2 2 7 1 4
3 2 5 1 1
我可以遍历行并计算每个单元格"(在本例中为列表)的长度,然后遍历每个列表中的元组并对数字求和……但我希望有一个更有效的方法方法然后这个.有什么想法吗?
I can iterate over the rows and count the length of each "cell" (list in this case), and then iterate over the tuples in each list and sum the numbers... but I'm hoping there's a more efficient method then this. any ideas?
推荐答案
当您处理如此复杂的一组行时,首先创建数据框毫无意义.在从中制作数据框之前,请先使用自定义函数对其进行清理.以下是数据帧创建前清理的说明:
There is no point in creating the dataframe first when you're dealing with such a complicated set of rows. Clean it up first with custom functions before you make a dataframe out of it. The following is an illustration of clean up before dataframe creation:
import pandas as pd
# starting dictionary
d = {"seen":[[('A', 4)], [], [('B', 4), ('C',3)], [('A', 1), ('C',4)]],
"unseen":[[('B', 2), ('C',2)], [('A', 4), ('B', 2), ('C',2)], [('A', 4)], [('C',1)]]
}
# custom funcs
funcs = {'sum': lambda r: [sum(y[1] for y in x) for x in r],
'count': lambda r: [len(y) for y in r]}
df = pd.DataFrame()
for k in d:
for f in funcs:
df["{k}_{f}".format(k=k, f=f)] = funcs.get(f)(d.get(k))
df
# seen_count seen_sum unseen_count unseen_sum
# 0 1 4 2 4
# 1 0 0 3 8
# 2 2 7 1 4
# 3 2 5 1 1
这篇关于对 PANDAS 数据帧“单元格"中的所有整数求和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!