Python Pandas - 读取包含多个表的csv文件 [英] Python Pandas - Read csv file containing multiple tables

查看:2760
本文介绍了Python Pandas - 读取包含多个表的csv文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含多个表格的 .csv 文件。



使用Pandas,从这个文件中获取两个DataFrame 库存 HPBladeSystemRack 的策略?



输入 .csv 如下所示:

 
系统名称IP地址系统状态
dg-enc05正常
dg-enc05_vc_domain未知
dg-enc05-oa1 172.20.0.213正常

HP BladeSystem机架
系统名称机架名称机箱名称
dg-enc05 BU40
dg-enc05-oa1 BU40 dg-enc05
dg-enc05-oa2 BU40 dg-enc05

到目前为止,我想出的最好的是转换这个 .csv 文件分割为Excel工作簿( xlxs ),将表拆分为表,然后使用:

  inventory = read_excel('path_to_file.csv','sheet1',skiprow = 1)
HPBladeSystemRack = read_excel('path_to_file.csv','sheet2',skiprow = 2)

但是:




  • 需要 xlrd 模块。

  • 这些日志文件必须实时分析,

  • 真正的日志有比这两个更多的表。


 <$ c $ 

c> df = pd.read_csv(jahmyst2.csv,header = None,names = range(3))
table_names = [Inventory,HP BladeSystem Rack,Network Interface]
groups = df [0] .isin(table_names).cumsum()
tables = {g.iloc [0,0]:g.iloc [1:] for df.groupby }

应该可以产生一个字典,键的表名称和值作为子表。 p>

 >>> list(tables)
['HP BladeSystem Rack','Inventory']
>>>对于在table.items()中的k,v:
... print(table:,k)
... print(v)
... print $ b ...
表:HP BladeSystem机架
0 1 2
6系统名称机架名称机箱名称
7 dg-enc05 BU40 NaN
8 dg-enc05 -oa1 BU40 dg-enc05
9 dg-enc05-oa2 BU40 dg-enc05

表:库存
0 1 2
1系统名称IP地址系统状态
2 dg-enc05 NaN正常
3 dg-enc05_vc_domain NaN未知
4 dg-enc05-oa1 172.20.0.213正常

一旦你得到了,你可以设置列名称的第一行,等。


I have a single .csv file containing multiple tables.

Using Pandas, what would be the best strategy to get two DataFrame inventory and HPBladeSystemRack from this one file ?

The input .csv looks like this:

Inventory       
System Name            IP Address    System Status
dg-enc05                             Normal
dg-enc05_vc_domain                   Unknown
dg-enc05-oa1           172.20.0.213  Normal

HP BladeSystem Rack         
System Name               Rack Name   Enclosure Name
dg-enc05                  BU40  
dg-enc05-oa1              BU40        dg-enc05
dg-enc05-oa2              BU40        dg-enc05

The best I've come up with so far is to convert this .csv file into Excel workbook (xlxs), split the tables into sheets and use:

inventory = read_excel('path_to_file.csv', 'sheet1', skiprow=1)
HPBladeSystemRack = read_excel('path_to_file.csv', 'sheet2', skiprow=2)

However:

  • This approach requires xlrd module.
  • Those log files have to be analyzed in real time, so that it would be way better to find a way to analyze them as they come from the logs.
  • The real logs have far more tables than those two.

解决方案

If you know the table names beforehand, then something like this:

df = pd.read_csv("jahmyst2.csv", header=None, names=range(3))
table_names = ["Inventory", "HP BladeSystem Rack", "Network Interface"]
groups = df[0].isin(table_names).cumsum()
tables = {g.iloc[0,0]: g.iloc[1:] for k,g in df.groupby(groups)}

should work to produce a dictionary with keys as the table names and values as the subtables.

>>> list(tables)
['HP BladeSystem Rack', 'Inventory']
>>> for k,v in tables.items():
...     print("table:", k)
...     print(v)
...     print()
...     
table: HP BladeSystem Rack
              0          1               2
6   System Name  Rack Name  Enclosure Name
7      dg-enc05       BU40             NaN
8  dg-enc05-oa1       BU40        dg-enc05
9  dg-enc05-oa2       BU40        dg-enc05

table: Inventory
                    0             1              2
1         System Name    IP Address  System Status
2            dg-enc05           NaN         Normal
3  dg-enc05_vc_domain           NaN        Unknown
4        dg-enc05-oa1  172.20.0.213         Normal

Once you've got that, you can set the column names to the first rows, etc.

这篇关于Python Pandas - 读取包含多个表的csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆