Python pandas 确保基于列值的每一行都存在一组数据,如果没有添加行 [英] Python pandas to ensure each row based on column value has a set of data present, if not add row
本文介绍了Python pandas 确保基于列值的每一行都存在一组数据,如果没有添加行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在组织用于标记的AWS资源,并且已将数据捕获到CSV文件中。 CSV文件的示例输出如下。我试图确保对于每个resource_id,都有一个我需要确保存在的tag_key数据集。此数据集是
I am organising AWS resources for tagging, and have captured data into a CSV file. A sample output of the CSV file is as follows. I am trying to make sure that for each resource_id, there is a dataset of tag_key that I need to ensure is present. This dataset is
tag_key
Application
Client
Environment
Name
Owner
Project
Purpose
我是熊猫的新手,我只设法将CSV文件读取为数据帧
I'm new to pandas, I've only managed to get the CSV file read as a dataframe
import pandas as pd
file_name = "z.csv"
df = pd.read_csv(file_name, names=['resource_id', 'resource_type', 'tag_key', 'tag_value'])
print (df)
CSV文件
vol-00441b671ca48ba41,volume,Environment,Development
vol-00441b671ca48ba41,volume,Name,Database Files
vol-00441b671ca48ba41,volume,Project,Application Development
vol-00441b671ca48ba41,volume,Purpose,Web Server
i-1234567890abcdef0,instance,Environment,Production
i-1234567890abcdef0,instance,Owner,Fast Company
我期望输出如下:
vol-00441b671ca48ba41,volume,Environment,Development
vol-00441b671ca48ba41,volume,Name,Database Files
vol-00441b671ca48ba41,volume,Project,Application Development
vol-00441b671ca48ba41,volume,Purpose,Web Server
vol-00441b671ca48ba41,volume,Client,
vol-00441b671ca48ba41,volume,Owner,
vol-00441b671ca48ba41,volume,Application,
i-1234567890abcdef0,instance,Environment,Production
i-1234567890abcdef0,instance,Owner,Fast Company
i-1234567890abcdef0,instance,Application,
i-1234567890abcdef0,instance,Client,
i-1234567890abcdef0,instance,Name,
i-1234567890abcdef0,instance,Project,
i-1234567890abcdef0,instance,Purpose,
推荐答案
一种方法是使用multindexe, from_product
和 renindex
:
One way to do this is to use multindexes, from_product
, and renindex
:
taglist = ['Application',
'Client',
'Environment',
'Name',
'Owner',
'Project',
'Purpose']
df_out = df.set_index(['resource_id','tag_key'])\
.reindex(pd.MultiIndex.from_product([df['resource_id'].unique(), taglist],
names=['resource_id','tag_key']))
df_out.assign(resource_type = df_out.groupby('resource_id')['resource_type']\
.ffill().bfill()).reset_index()
输出:
resource_id tag_key resource_type tag_value
0 vol-00441b671ca48ba41 Application volume NaN
1 vol-00441b671ca48ba41 Client volume NaN
2 vol-00441b671ca48ba41 Environment volume Development
3 vol-00441b671ca48ba41 Name volume Database Files
4 vol-00441b671ca48ba41 Owner volume NaN
5 vol-00441b671ca48ba41 Project volume Application Development
6 vol-00441b671ca48ba41 Purpose volume Web Server
7 i-1234567890abcdef0 Application instance NaN
8 i-1234567890abcdef0 Client instance NaN
9 i-1234567890abcdef0 Environment instance Production
10 i-1234567890abcdef0 Name instance NaN
11 i-1234567890abcdef0 Owner instance Fast Company
12 i-1234567890abcdef0 Project instance NaN
13 i-1234567890abcdef0 Purpose instance NaN
这篇关于Python pandas 确保基于列值的每一行都存在一组数据,如果没有添加行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文