从Pandas Dataframe中的列表中提取元组 [英] Extracting tuples from a list in Pandas Dataframe

查看:521
本文介绍了从Pandas Dataframe中的列表中提取元组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个12列的数据框.我想根据另一列的值提取一列的行.

I have a dataframe with 12 column. I would like to extract the rows of a column depending on the values of another column.

我的数据框示例

order_id    order_type   order_items
45           Lunch       [('Burger', 5), ('Fries', 6)]
12           Dinner      [('Shrimp', 10), ('Fish&Chips', 7)]
44           Lunch       [('Salad', 9), ('Steak', 9)]
23           Breakfast   [('Coffee', 2), ('Eggs', 3)]

我想通过提取每个tuple的第一项来提取早餐,午餐和晚餐菜单. 并从元组中的下一个项目中提取订单数.

I would like to extract the breakfast, lunch and dinner menu by extracting the first item of each tuple. and extract the number of orders from the next item in the tuple.

根据此行代码,每个项目都是字符串类型

Each item is type string according to this line of code

print(type(df['order_items'][0]))
>> <class 'str'>

我尝试应用过滤器以提取早餐菜单:

I tried to apply a filter to extract the breakfast menu:

BreakfastLst=df.loc[df['order_type'] == 'Breakfast']['order_items']

但是输出看起来像这样,并且我不能使用for loop来遍历子列表并访问元组.

but the output looks like this, and I can't use a for loop to iterate through sublists and access the tuples.

2                           [('Coffee', 4), ('Eggs', 7)]
7                           [('Coffee', 2), ('Eggs', 3)]
8      [('Cereal', 7), ('Pancake', 8), ('Coffee', 4),...
9      [('Cereal', 3), ('Eggs', 1), ('Coffee', 1), ('...

我还试图转换为lists:

orderTypeLst = df(['order_type'])['order_items'].apply(list)

,然后通过执行以下操作提取列表:

and then extract the lists by doing this:

breakFast=orderTypeLst['Breakfast']
lunch=orderTypeLst['Lunch']
dinner=orderTypeLst['Dinner']

,但是输出是字符串.而且我也无法遍历.

but the output is a string. And I can't iterate through that either.

["[('Coffee', 4), ('Eggs', 7)]",
 "[('Coffee', 2), ('Eggs', 3)]",
 "[('Cereal', 7), ('Pancake', 8), ('Coffee', 4), ('Eggs', 8)]"]

对于dictionaries,我尝试了以下操作,但输出重复:

As for dictionaries I tried the below, but the output is duplicated:

pd.Series(outlierFile.order_type.values,index=outlierFile.order_items).to_dict()

输出样本

 "[('Fries', 1), ('Steak', 6), ('Salad', 8), ('Chicken', 10)]": 'Lunch',
 "[('Cereal', 6), ('Pancake', 8), ('Eggs', 3)]": 'Breakfast',
 "[('Shrimp', 9), ('Salmon', 9)]": 'Dinner',
 "[('Pancake', 3), ('Coffee', 5)]": 'Breakfast',
 "[('Eggs', 1), ('Pancake', 1), ('Coffee', 5), ('Cereal', 5)]": 'Breakfast'

我想要的输出是每个order_type(列表或字典)的干净版本,因此我可以遍历元组并提取所需的项.

my desired output is a clean version of each order_type (list or dictionary) so I can iterate through the tuples and extract the needed items.

任何输入都会有所帮助 谢谢,

Any input would be helpful Thanks,

推荐答案

IIUC,请在评估后尝试使用pandas.DataFrame.groupby:

IIUC, try using pandas.DataFrame.groupby after evaluation:

my_dict = df.groupby('order_type')['order_items'].apply(lambda x: sum(x, [])).to_dict()
print(my_dict)

输出:

{'Breakfast': [('Coffee', 2), ('Eggs', 3)],
 'Dinner': [('Shrimp', 10), ('Fish&Chips', 7)],
 'Lunch': [('Burger', 5), ('Fries', 6), ('Salad', 9), ('Steak', 9)]}

这篇关于从Pandas Dataframe中的列表中提取元组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆