读取多个csv文件(大小为mxm),并以n维数组的形式加载(大小为nxmxm)(不串联) [英] Read multiple csv files (size mxm) and load as an n dimensional array (size nxmxm) (not concatenate)

查看:76
本文介绍了读取多个csv文件(大小为mxm),并以n维数组的形式加载(大小为nxmxm)(不串联)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个程序,该程序需要将大量的csv文件(成千上万个)加载到数组中.

I'm working on a program that requires loading of a large number of csv files (thousands of them) into an array.

csv文件的尺寸为45x100,我想创建一个尺寸为nx45x100的3-d数组.现在,我正在使用pd.read_csv()加载每个csv文件,然后使用np.array()将它们转换为数组.然后,我使用np.array(data_0,data_1,...,data_n)创建3d数组,并获得具有所需尺寸的3d数组.

The csv files are of dimension 45x100, and I want to create a 3-d array with dimension nx45x100. For now, I am using pd.read_csv() to load each csv file and then convert each into an array using np.array(). I then create a 3d array using np.array(data_0, data_1,...,data_n), to which I get a 3-d array with the required dimensions.

尽管有效,但非常繁琐.无需单独读取和处理每个csv文件,有什么方法可以做到?

Although it works, it is very tedious. Is there any way that this can be done without individually reading and processing each csv file?

   #this is my current code
   import numpy as np
   import pandas as pd
   from pandas import Series, DataFrame

   mBGS5L = pd.read_csv("strain5.csv") #45x100 
   mBGS8L = pd.read_csv("strain8.csv")
   mBGS10L = pd.read_csv("strain10.csv")

   mBGS5L_ = np.array(mBGS5L)
   mBGS8L_ = np.array(mBGS8L)
   mBGS10L_ = np.array(mBGS10L)

   mBGS = np.array([mBGS5L_,mBGS8L_,mBGS10L_])
   #to which mBGS.shape returns a 3x45x100 array'''

注意:我已经在将多个csv文件加载到1个数据帧中时检查了其他stackoverflow链接,我了解了glob以获取所需的所有csv文件的列表.但是我的问题是,使用glob并连接csv文件会返回一个列表,而不是3d数组--我无法将其转换为numpy数组,因为它会返回错误

Note: I have checked other stackoverflow links on loading multiple csv files into 1 dataframe, to which I learned about glob to get the list of all csv files I need. My problem though is that using glob and concatenating the csv files returns a list and not a 3d array---which I can't convert to numpy array as it returns an error

   from glob import glob
   strain = glob("strain*.csv")
   df= [pd.read_csv(f) for f in strain]
   df_ = np.asarray(df)
   #this returns an error: cannot copy sequence with size 45 to array axis with dimension 30

任何帮助将不胜感激.谢谢

Any help would be greatly appreciated. Thanks

推荐答案

首先,您需要将 dataframes 转换为mxm数组.请参阅下面的代码

First you need to convert the dataframes in to mxm array. Refer to the code below

from glob import glob
import numpy as np
strain = glob("strain*.csv")
df = [pd.read_csv(f).values for f in strain]
df_ = np.asarray(df)

这篇关于读取多个csv文件(大小为mxm),并以n维数组的形式加载(大小为nxmxm)(不串联)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆