从Pandas Dataframe中的列表中提取元组 [英] Extracting tuples from a list in Pandas Dataframe
问题描述
我有一个12列的数据框.我想根据另一列的值提取一列的行.
I have a dataframe with 12 column. I would like to extract the rows of a column depending on the values of another column.
我的数据框示例
order_id order_type order_items
45 Lunch [('Burger', 5), ('Fries', 6)]
12 Dinner [('Shrimp', 10), ('Fish&Chips', 7)]
44 Lunch [('Salad', 9), ('Steak', 9)]
23 Breakfast [('Coffee', 2), ('Eggs', 3)]
我想通过提取每个tuple
的第一项来提取早餐,午餐和晚餐菜单.
并从元组中的下一个项目中提取订单数.
I would like to extract the breakfast, lunch and dinner menu by extracting the first item of each tuple
.
and extract the number of orders from the next item in the tuple.
根据此行代码,每个项目都是字符串类型
Each item is type string according to this line of code
print(type(df['order_items'][0]))
>> <class 'str'>
我尝试应用过滤器以提取早餐菜单:
I tried to apply a filter to extract the breakfast menu:
BreakfastLst=df.loc[df['order_type'] == 'Breakfast']['order_items']
但是输出看起来像这样,并且我不能使用for loop
来遍历子列表并访问元组.
but the output looks like this, and I can't use a for loop
to iterate through sublists and access the tuples.
2 [('Coffee', 4), ('Eggs', 7)]
7 [('Coffee', 2), ('Eggs', 3)]
8 [('Cereal', 7), ('Pancake', 8), ('Coffee', 4),...
9 [('Cereal', 3), ('Eggs', 1), ('Coffee', 1), ('...
我还试图转换为lists
:
orderTypeLst = df(['order_type'])['order_items'].apply(list)
,然后通过执行以下操作提取列表:
and then extract the lists by doing this:
breakFast=orderTypeLst['Breakfast']
lunch=orderTypeLst['Lunch']
dinner=orderTypeLst['Dinner']
,但是输出是字符串.而且我也无法遍历.
but the output is a string. And I can't iterate through that either.
["[('Coffee', 4), ('Eggs', 7)]",
"[('Coffee', 2), ('Eggs', 3)]",
"[('Cereal', 7), ('Pancake', 8), ('Coffee', 4), ('Eggs', 8)]"]
对于dictionaries
,我尝试了以下操作,但输出重复:
As for dictionaries
I tried the below, but the output is duplicated:
pd.Series(outlierFile.order_type.values,index=outlierFile.order_items).to_dict()
输出样本
"[('Fries', 1), ('Steak', 6), ('Salad', 8), ('Chicken', 10)]": 'Lunch',
"[('Cereal', 6), ('Pancake', 8), ('Eggs', 3)]": 'Breakfast',
"[('Shrimp', 9), ('Salmon', 9)]": 'Dinner',
"[('Pancake', 3), ('Coffee', 5)]": 'Breakfast',
"[('Eggs', 1), ('Pancake', 1), ('Coffee', 5), ('Cereal', 5)]": 'Breakfast'
我想要的输出是每个order_type(列表或字典)的干净版本,因此我可以遍历元组并提取所需的项.
my desired output is a clean version of each order_type (list or dictionary) so I can iterate through the tuples and extract the needed items.
任何输入都会有所帮助 谢谢,
Any input would be helpful Thanks,
推荐答案
IIUC,请在评估后尝试使用pandas.DataFrame.groupby
:
IIUC, try using pandas.DataFrame.groupby
after evaluation:
my_dict = df.groupby('order_type')['order_items'].apply(lambda x: sum(x, [])).to_dict()
print(my_dict)
输出:
{'Breakfast': [('Coffee', 2), ('Eggs', 3)],
'Dinner': [('Shrimp', 10), ('Fish&Chips', 7)],
'Lunch': [('Burger', 5), ('Fries', 6), ('Salad', 9), ('Steak', 9)]}
这篇关于从Pandas Dataframe中的列表中提取元组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!