如何使用具有不同dict和dict列表的数据爆炸Panda列 [英] How to explode Panda column with data having different dict and list of dict
问题描述
我有一个熊猫数据框,具有一组不同的值,例如第一个是列表或数组以及其他元素
I have a panda dataframe with different set of values like first one is an list or array and other elements or not
>>> df_3['integration-outbound:IntegrationEntity.integrationEntityDetails.supplier.forms.form.records.record']
0 [{'Internalid': '24348', 'isDelete': 'false', 'fields': {'field': [{'id': 'CATEGOR_LEVEL_1', 'value': 'MR'}, {'id': 'LOW_PRODSERV', 'value': 'RES'}, {'id': 'LOW_LEVEL_2', 'value': 'keylevel221'}, {'id': 'LOW_LEVEL_3', 'value': 'keylevel3127'}, {'id': 'LOW_LEVEL_4', 'value': 'keylevel4434'}, {'id': 'LOW_LEVEL_5', 'value': 'keylevel5545'}]}}, {'Internalid': '24349', 'isDelete': 'false', 'fields': {'field': [{'id': 'CATEGOR_LEVEL_1', 'value': 'MR'}, {'id': 'LOW_PRODSERV', 'value': 'RES'}, {'id': 'LOW_LEVEL_2', 'value': 'keylevel221'}, {'id': 'LOW_LEVEL_3', 'value': 'keylevel3125'}, {'id': 'LOW_LEVEL_4', 'value': 'keylevel4268'}, {'id': 'LOW_LEVEL_5', 'value': 'keylevel5418'}]}}, {'Internalid': '24350', 'isDelete': 'false', 'fields': {'field': [{'id': 'CATEGOR_LEVEL_1', 'value': 'MR'}, {'id': 'LOW_PRODSERV', 'value': 'RES'}, {'id': 'LOW_LEVEL_2', 'value': 'keylevel221'}, {'id': 'LOW_LEVEL_3', 'value': 'keylevel3122'}, {'id': 'LOW_LEVEL_4', 'value': 'keylevel425'}, {'id': 'LOW_LEVEL_5', 'value': 'keylevel5221'}]}}]
0 {'isDelete': 'false', 'fields': {'field': [{'id': 'S_EAST', 'value': 'N'}, {'id': 'W_EST', 'value': 'N'}, {'id': 'M_WEST', 'value': 'N'}, {'id': 'N_EAST', 'value': 'N'}, {'id': 'LOW_AREYOU_ASSET', 'value': '-1'}, {'id': 'LOW_SWART_PROG', 'value': '-1'}]}}
0 {'isDelete': 'false', 'fields': {'field': {'id': 'LOW_COD_CONDUCT', 'value': '-1'}}}
0 {'isDelete': 'false', 'fields': {'field': [{'id': 'LOW_SUPPLIER_TYPE', 'value': '2'}, {'id': 'LOW_DO_INT_BOTH', 'value': '1'}]}}
我想将其分解为多行.第一行是列表,其他行还是不是?
I want explode this into multiple rows. The first row is list and other rows or not ?
>>> type(df_3)
<class 'pandas.core.frame.DataFrame'>
>>> type(df_3['integration-outbound:IntegrationEntity.integrationEntityDetails.supplier.forms.form.records.record'])
<class 'pandas.core.series.Series'>
预期输出-
{'Internalid': '24348', 'isDelete': 'false', 'fields': {'field': [{'id': 'CATEGOR_LEVEL_1', 'value': 'MR'}, {'id': 'LOW_PRODSERV', 'value': 'RES'}, {'id': 'LOW_LEVEL_2', 'value': 'keylevel221'}, {'id': 'LOW_LEVEL_3', 'value': 'keylevel3127'}, {'id': 'LOW_LEVEL_4', 'value': 'keylevel4434'}, {'id': 'LOW_LEVEL_5', 'value': 'keylevel5545'}]}}
{'Internalid': '24349', 'isDelete': 'false', 'fields': {'field': [{'id': 'CATEGOR_LEVEL_1', 'value': 'MR'}, {'id': 'LOW_PRODSERV', 'value': 'RES'}, {'id': 'LOW_LEVEL_2', 'value': 'keylevel221'}, {'id': 'LOW_LEVEL_3', 'value': 'keylevel3125'}, {'id': 'LOW_LEVEL_4', 'value': 'keylevel4268'}, {'id': 'LOW_LEVEL_5', 'value': 'keylevel5418'}]}}
{'Internalid': '24350', 'isDelete': 'false', 'fields': {'field': [{'id': 'CATEGOR_LEVEL_1', 'value': 'MR'}, {'id': 'LOW_PRODSERV', 'value': 'RES'}, {'id': 'LOW_LEVEL_2', 'value': 'keylevel221'}, {'id': 'LOW_LEVEL_3', 'value': 'keylevel3122'}, {'id': 'LOW_LEVEL_4', 'value': 'keylevel425'}, {'id': 'LOW_LEVEL_5', 'value': 'keylevel5221'}]}}]
{'isDelete': 'false', 'fields': {'field': [{'id': 'S_EAST', 'value': 'N'}, {'id': 'W_EST', 'value': 'N'}, {'id': 'M_WEST', 'value': 'N'}, {'id': 'N_EAST', 'value': 'N'}, {'id': 'LOW_AREYOU_ASSET', 'value': '-1'}, {'id': 'LOW_SWART_PROG', 'value': '-1'}]}}
{'isDelete': 'false', 'fields': {'field': {'id': 'LOW_COD_CONDUCT', 'value': '-1'}}}
{'isDelete': 'false', 'fields': {'field': [{'id': 'LOW_SUPPLIER_TYPE', 'value': '2'}, {'id': 'LOW_DO_INT_BOTH', 'value': '1'}]}}
我试图爆炸这列
>>> df_3.explode('integration-outbound:IntegrationEntity.integrationEntityDetails.supplier.forms.form.records.record')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib64/python3.6/site-packages/pandas/core/frame.py", line 6318, in explode
result = df[column].explode()
File "/usr/local/lib64/python3.6/site-packages/pandas/core/series.py", line 3504, in explode
values, counts = reshape.explode(np.asarray(self.array))
File "pandas/_libs/reshape.pyx", line 129, in pandas._libs.reshape.explode
KeyError: 0
我可以遍历每一行,尝试找出它是否为列表并实现某些东西,但这似乎不正确
I can run through each row and try to find out if its a list and implement something but it doesnt seems right
if str(type(df_3.loc[i,'{}'.format(c)])) == "<class 'list'>":
有什么办法可以对这类数据使用爆炸功能
Is there any way we ca use an explode function on such kind of data
推荐答案
我能够做到,但是爆炸行都被过滤到DataFrame的顶部(以防在较低的行中有更多列表类型的对象)
I was able to do it, but the exploded rows are all filtered to the top of the DataFrame (in case there are more list type object in lower rows).
pd.concat((df.iloc[[type(item) == list for item in df['Column']]].explode('Column'),
df.iloc[[type(item) != list for item in df['Column']]]))
它基本上可以完成您所说的事情:检查对象类型是否为列表,如果是,则爆炸.然后将此分解系列与其余数据(即非列表)连接起来.较长的DataFrame似乎对性能没有太大影响.
It essentially does what you've said: check if object type is list, if so, explode. Then concatenate this exploded Series with the rest of the data (i.e. the non-lists). Performance doesn't seem to hurt much from longer DataFrames.
输出:
Column
0 {'Internalid': '24348', 'isDelete': 'false', '...
0 {'Internalid': '24349', 'isDelete': 'false', '...
0 {'Internalid': '24350', 'isDelete': 'false', '...
1 {'isDelete': 'false', 'fields': {'field': [{'i...
2 {'isDelete': 'false', 'fields': {'field': {'id...
3 {'isDelete': 'false', 'fields': {'field': [{'i...
这篇关于如何使用具有不同dict和dict列表的数据爆炸Panda列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!