Python:将特殊文件解压缩到内存中并将它们放入 DataFrame [英] Python: unziping special files into memory and getting them into a DataFrame

查看:18
本文介绍了Python:将特殊文件解压缩到内存中并将它们放入 DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对我正在用 Python 编写的代码感到非常困惑,我是初学者,也许真的很简单,但我就是看不到它.任何帮助,将不胜感激.所以提前谢谢你:)

I'm quite stuck with a code I'm writing in Python, I'm a beginner and maybe is really easy, but I just can't see it. Any help would be appreciated. So thank you in advance :)

问题在于:我必须将一些带有特殊扩展名 .fen 的特殊数据文件读入 Pandas DataFrame.这个 .fen 文件位于一个包含 .fen 文件和 .cfg 配置文件的压缩文件 .fenx 中.

Here is the problem: I have to read some special data files with an special extension .fen into a pandas DataFrame.This .fen files are inside a zipped file .fenx that contains the .fen file and a .cfg configuration file.

在我编写的代码中,我使用 zipfile 库来解压缩文件,然后将它们放入 DataFrame.这段代码如下

In the code I've written I use zipfile library in order to unzip the files, and then get them in the DataFrame. This code is the following

import zipfile
import numpy as np
import pandas as pd

def readfenxfile(Directory,File):

    fenxzip = zipfile.ZipFile(Directory+ '\\' + File, 'r')
    fenxzip.extractall()
    fenxzip.close()

    cfgGeneral,cfgDevice,cfgChannels,cfgDtypes=readCfgFile(Directory,File[:-5]+'.CFG')
    #readCfgFile redas the .cfg file and returns some important data. 
    #Here only the cfgDtypes would be important as it contains the type of data inside the .fen and that will become the column index in the final DataFrame.
    if cfgChannels!=None:        
        dtDtype=eval('np.dtype([' + cfgDtypes + '])')
        dt=np.fromfile(Directory+'\\'+File[:-5]+'.fen',dtype=dtDtype)
        dt=pd.DataFrame(dt)
    else:
        dt=[]

    return dt,cfgChannels,cfgDtypes

现在,extract() 方法将解压后的文件保存在硬盘中..fenx 文件可能非常大,因此这种存储(然后删除它们)的需求真的很慢.我想做和现在一样的事情,但是将 .fen 和 .cfg 文件放入内存中,而不是硬盘中.

Now, the extract() method saves the unzipped file in the hard drive. The .fenx files can be quite big so this need of storing (and afterwards deleting them) is really slow. I would like to do the same I do now, but getting the .fen and .cfg files into the memory, not the hard drive.

我尝试了诸如 fenxzip.read('whateverthenameofthefileis.fen') 之类的方法以及来自 zipfile 库的其他一些方法,例如 .open().但无论如何我都无法将 .read() 返回到一个 numpy 数组中.

I have tried things like fenxzip.read('whateverthenameofthefileis.fen')and some other methods like .open() from the zipfile library. But I can't get what .read() returns into a numpy array in anyway i tried.

我知道这可能是一个很难回答的问题,因为您没有可以尝试查看会发生什么的文件.但如果有人有任何想法,我会很高兴阅读它们.:) 非常感谢!

I know it can be a difficult question to answer, because you don't have the files to try and see what happens. But if someone would have any ideas I would be glad of reading them. :) Thank you very much!

推荐答案

这是我最终找到的解决方案,以防它对任何人都有帮助.它使用临时文件库在内存中创建一个临时对象.

Here is the solution I finally found in case it can be helpful for anyone. It uses the tempfile library to create a temporal object in memory.

import zipfile
import tempfile
import numpy as np
import pandas as pd

def readfenxfile(Directory,File,ExtractDirectory):


    fenxzip = zipfile.ZipFile(Directory+ r'\\' + File, 'r')

    fenfile=tempfile.SpooledTemporaryFile(max_size=10000000000,mode='w+b') 
     fenfile.write(fenxzip.read(File[:-5]+'.fen'))
     cfgGeneral,cfgDevice,cfgChannels,cfgDtypes=readCfgFile(fenxzip,File[:-5]+'.CFG')

    if cfgChannels!=None:        
        dtDtype=eval('np.dtype([' + cfgDtypes + '])')
        fenfile.seek(0)
        dt=np.fromfile(fenfile,dtype=dtDtype)
        dt=pd.DataFrame(dt)
    else:
        dt=[]
    fenfile.close()
    fenxzip.close()    
    return dt,cfgChannels,cfgDtypes

这篇关于Python:将特殊文件解压缩到内存中并将它们放入 DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆