DataFrame-嵌套字典中的表中的表 [英] DataFrame - table in table from nested dictionary
问题描述
我使用python 3.
这是我的数据结构:
字典= {'HexaPlex x50':{供应商":戴尔公司",'BIOS版本':'12 .72.9','最新的BIOS':'12 .73.9','反对并购S':是","W10支持":是",'电脑': {'someName001':'12 .72.9','someName002':'12 .73.9','someName003':'12 .73.9'},映射的类别":['SomeOtherCategory']},...}
我设法创建了一个表,该表显示从第一个嵌套字典(以'Vendor'
开头)的键创建的列.行名称是'HexaPlex x50'
.其中一列包含带有数字的计算机,即嵌套字典:
{'someName001':'12 .72.9','someName002':'12 .73.9','someName003':'12 .73.9'}
我希望能够将键值对放在表中'Computers'
下的单元格中,实际上是嵌套表.
ATM看起来像这样:
表应该看起来像这样
我该如何实现?
此外,我想为数字或BIOS版本低于最新版本的单元上色.
我还面临这样一个问题:在一种情况下,即使我设置了 pd.set_option('display.max_colwidth',-1)
,包含计算机的词典也是如此,以至于它被缩写了..看起来像这样:
正如注释中已强调的那样,pandas不支持子数据帧".为了KISS的缘故,我建议复制这些行(或在必要时管理两个单独的表...).
您提到的问题的答案(
同样,您可以使用一些逻辑来合并"列中的连续冗余值(快速示例,我假设更多的努力可能会导致更好的输出):
上述示例的
代码
将pandas导入为pd从列表导入列表导入功能工具def pprint(df,headers = True,fmt ='psql'):#https://stackoverflow.com/questions/18528533/pretty-printing-a-pandas-dataframe打印(tabulate(df,headers ='keys'if headers else'',tablefmt = fmt))df = pd.DataFrame({'类型':['HexaPlex x50'] * 3,'供应商':['Dell Inc.'] * 3,'BIOS版本':['12 .72.9','12 .72.9','12 .73.9'],最新的BIOS":['12.73.9'] * 3,'反对并购S':['是'] * 3,"W10支持":[是"] * 3,'计算机':['someName001','someName002','someName003'],'位置':['12.72.9','12.73.9','12.73.9'],'Category1':['SomeCategory'] * 3,'Category2':['SomeCategory'] * 3,'Category3':['SomeCategory'] * 3,'Category4':['SomeCategory'] * 3,'Category5':['SomeCategory'] * 3,'Category6':['SomeCategory'] * 3,'Category7':['SomeCategory'] * 3,'Category8':['SomeCategory'] * 3,'Category9':['SomeCategory'] * 3,'Category0':['SomeCategory'] * 3,})打印(#标准熊猫打印")打印(df)print("\ n#制表tablefmt = psql(带标题)")pprint(df)print("\ n#制表tablefmt = psql")pprint(df,headers = False)print("\ n#制表tablefmt = plain")pprint(df,fmt ='plain')def merge_cells_for_print(rows,ls ='\ n'):结果= pd.DataFrame()对于row.columns中的col:vals = rows [col] .values如果全部([val == vals [0] for val in vals]):result [col] = [vals [0]]别的:result [col] = [ls.join(vals)]返回结果print("\ n#制表+手动合并")pprint(df.groupby('Type').apply(merge_cells_for_print).reset_index(drop = True))#https://pandas.pydata.org/pandas-docs/stable/style.html#https://pandas.pydata.org/pandas-docs/version/0.22.0/generated/pandas.io.formats.style.Styler.apply.html#pandas.io.formats.style.Styler.applydef Highlight_lower(ref,col):返回[f'color:{"red"如果hgl否则为"}表示col中的hgl<参考]def merge_duplicates(col):vals = col.values返回[''] + ['颜色:透明',如果curr == pred else"表示pred,则在zip(vals [1:],vals)中出现与open('only_red.html','w +')为f:样式= df.stylestyle = style.apply(functools.partial(highlight_lower,df ['Newest BIOS'])),子集= ['BIOS版本'])f.write(style.render())将open('red_and_merged.html','w +')设为f:样式= df.stylestyle = style.apply(functools.partial(highlight_lower,df ['Newest BIOS'])),子集= ['BIOS版本'])样式= style.apply(merge_duplicates)f.write(style.render())
I use python 3.
This is my data structure:
dictionary = {
'HexaPlex x50': {
'Vendor': 'Dell Inc.',
'BIOS Version': '12.72.9',
'Newest BIOS': '12.73.9',
'Against M & S': 'Yes',
'W10 Support': 'Yes',
'Computers': {
'someName001': '12.72.9',
'someName002': '12.73.9',
'someName003': '12.73.9'
},
'Mapped Category': ['SomeOtherCategory']
},
...
}
I have managed to create a table that displays columns created from keys of the first nested dictionary (which starts with 'Vendor'
). The row name is 'HexaPlex x50'
. One of the columns contains computers with a number, i.e. the nested dictionary:
{'someName001': '12.72.9',
'someName002': '12.73.9',
'someName003': '12.73.9'}
I would like to be able to have the key values pairs inside the table in the cell under column 'Computers'
, in effect a nested table.
ATM it looks like this:
The table should look somewhat like this
How can I achieve this?
Further, I would like to color the numbers or the cell that has a lower BIOS version than the newest one.
I also face the problem that in one case the dictionary that contains the computers is so large that it gets abbreviated even though I have set pd.set_option('display.max_colwidth', -1)
. This looks like so:
As already emphasized in the comments, pandas does not support "sub-dataframes". For the sake of KISS, I would recommend duplicating those rows (or to manage two separate tables... if really necessary).
The answers in the question you referred to (parsing a dictionary in a pandas dataframe cell into new row cells (new columns)) result in new (frame-wide) columns for each (row-local) "computer name". I doubt that this is what you aim for, considering your domain model.
The abbreviation of pandas can be circumvented by using another output engine, e.g. tabulate (Pretty Printing a pandas dataframe):
# standard pandas output
Vendor BIOS Version Newest BIOS Against M & S W10 Support Computer Location ... Category4 Category5 Category6 Category7 Category8 Category9 Category0
0 Dell Inc. 12.72.9 12.73.9 Yes Yes someName001 12.72.9 ... SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory
1 Dell Inc. 12.72.9 12.73.9 Yes Yes someName002 12.73.9 ... SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory
2 Dell Inc. 12.73.9 12.73.9 Yes Yes someName003 12.73.9 ... SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory
[3 rows x 17 columns]
# tabulate psql (with headers)
+----+------------+----------------+---------------+-----------------+---------------+-------------+------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
| | Vendor | BIOS Version | Newest BIOS | Against M & S | W10 Support | Computer | Location | Category1 | Category2 | Category3 | Category4 | Category5 | Category6 | Category7 | Category8 | Category9 | Category0 |
|----+------------+----------------+---------------+-----------------+---------------+-------------+------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------|
| 0 | Dell Inc. | 12.72.9 | 12.73.9 | Yes | Yes | someName001 | 12.72.9 | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory |
| 1 | Dell Inc. | 12.72.9 | 12.73.9 | Yes | Yes | someName002 | 12.73.9 | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory |
| 2 | Dell Inc. | 12.73.9 | 12.73.9 | Yes | Yes | someName003 | 12.73.9 | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory |
+----+------------+----------------+---------------+-----------------+---------------+-------------+------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
# tabulate psql
+---+------------+---------+---------+-----+-----+-------------+---------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
| 0 | Dell Inc. | 12.72.9 | 12.73.9 | Yes | Yes | someName001 | 12.72.9 | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory |
| 1 | Dell Inc. | 12.72.9 | 12.73.9 | Yes | Yes | someName002 | 12.73.9 | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory |
| 2 | Dell Inc. | 12.73.9 | 12.73.9 | Yes | Yes | someName003 | 12.73.9 | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory | SomeCategory |
+---+------------+---------+---------+-----+-----+-------------+---------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
# tabulate plain
Vendor BIOS Version Newest BIOS Against M & S W10 Support Computer Location Category1 Category2 Category3 Category4 Category5 Category6 Category7 Category8 Category9 Category0
0 Dell Inc. 12.72.9 12.73.9 Yes Yes someName001 12.72.9 SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory
1 Dell Inc. 12.72.9 12.73.9 Yes Yes someName002 12.73.9 SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory
2 Dell Inc. 12.73.9 12.73.9 Yes Yes someName003 12.73.9 SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory SomeCategory
You could also use some groupBy(..).apply(..)
+ string magic to produce a string representation which simply hides the duplicates:
# tabulate + merge manually
+----+--------------+------------+----------------+---------------+-----------------+---------------+-------------+------------+--------------+--------------+
| | Type | Vendor | BIOS Version | Newest BIOS | Against M & S | W10 Support | Computer | Location | Category1 | Category2 |
|----+--------------+------------+----------------+---------------+-----------------+---------------+-------------+------------+--------------+--------------|
| 0 | HexaPlex x50 | Dell Inc. | 12.72.9 | 12.73.9 | Yes | Yes | someName001 | 12.72.9 | SomeCategory | SomeCategory |
| | | | 12.72.9 | | | | someName002 | 12.73.9 | | |
| | | | 12.73.9 | | | | someName003 | 12.73.9 | | |
+----+--------------+------------+----------------+---------------+-----------------+---------------+-------------+------------+--------------+--------------+
Styled output can be generated via the new Styling API which is still provisional and under development:
Again, you can use some logic to 'merge' consecutively redundant values in a column (quick example, I assume some more effort could result in much nicer output):
Code for the above examples
import pandas as pd
from tabulate import tabulate
import functools
def pprint(df, headers=True, fmt='psql'):
# https://stackoverflow.com/questions/18528533/pretty-printing-a-pandas-dataframe
print(tabulate(df, headers='keys' if headers else '', tablefmt=fmt))
df = pd.DataFrame({
'Type': ['HexaPlex x50'] * 3,
'Vendor': ['Dell Inc.'] * 3,
'BIOS Version': ['12.72.9', '12.72.9', '12.73.9'],
'Newest BIOS': ['12.73.9'] * 3,
'Against M & S': ['Yes'] * 3,
'W10 Support': ['Yes'] * 3,
'Computer': ['someName001', 'someName002', 'someName003'],
'Location': ['12.72.9', '12.73.9', '12.73.9'],
'Category1': ['SomeCategory'] * 3,
'Category2': ['SomeCategory'] * 3,
'Category3': ['SomeCategory'] * 3,
'Category4': ['SomeCategory'] * 3,
'Category5': ['SomeCategory'] * 3,
'Category6': ['SomeCategory'] * 3,
'Category7': ['SomeCategory'] * 3,
'Category8': ['SomeCategory'] * 3,
'Category9': ['SomeCategory'] * 3,
'Category0': ['SomeCategory'] * 3,
})
print("# standard pandas print")
print(df)
print("\n# tabulate tablefmt=psql (with headers)")
pprint(df)
print("\n# tabulate tablefmt=psql")
pprint(df, headers=False)
print("\n# tabulate tablefmt=plain")
pprint(df, fmt='plain')
def merge_cells_for_print(rows, ls='\n'):
result = pd.DataFrame()
for col in rows.columns:
vals = rows[col].values
if all([val == vals[0] for val in vals]):
result[col] = [vals[0]]
else:
result[col] = [ls.join(vals)]
return result
print("\n# tabulate + merge manually")
pprint(df.groupby('Type').apply(merge_cells_for_print).reset_index(drop=True))
# https://pandas.pydata.org/pandas-docs/stable/style.html
# https://pandas.pydata.org/pandas-docs/version/0.22.0/generated/pandas.io.formats.style.Styler.apply.html#pandas.io.formats.style.Styler.apply
def highlight_lower(ref, col):
return [f'color: {"red" if hgl else ""}' for hgl in col < ref]
def merge_duplicates(col):
vals = col.values
return [''] + ['color: transparent' if curr == pred else '' for pred, curr in zip(vals[1:], vals)]
with open('only_red.html', 'w+') as f:
style = df.style
style = style.apply(functools.partial(highlight_lower, df['Newest BIOS']),
subset=['BIOS Version'])
f.write(style.render())
with open('red_and_merged.html', 'w+') as f:
style = df.style
style = style.apply(functools.partial(highlight_lower, df['Newest BIOS']),
subset=['BIOS Version'])
style = style.apply(merge_duplicates)
f.write(style.render())
这篇关于DataFrame-嵌套字典中的表中的表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!