读取多个csv文件(大小为mxm),并以n维数组的形式加载(大小为nxmxm)(不串联) [英] Read multiple csv files (size mxm) and load as an n dimensional array (size nxmxm) (not concatenate)
问题描述
我正在开发一个程序,该程序需要将大量的csv文件(成千上万个)加载到数组中.
I'm working on a program that requires loading of a large number of csv files (thousands of them) into an array.
csv文件的尺寸为45x100,我想创建一个尺寸为nx45x100的3-d数组.现在,我正在使用pd.read_csv()加载每个csv文件,然后使用np.array()将它们转换为数组.然后,我使用np.array(data_0,data_1,...,data_n)创建3d数组,并获得具有所需尺寸的3d数组.
The csv files are of dimension 45x100, and I want to create a 3-d array with dimension nx45x100. For now, I am using pd.read_csv() to load each csv file and then convert each into an array using np.array(). I then create a 3d array using np.array(data_0, data_1,...,data_n), to which I get a 3-d array with the required dimensions.
尽管有效,但非常繁琐.无需单独读取和处理每个csv文件,有什么方法可以做到?
Although it works, it is very tedious. Is there any way that this can be done without individually reading and processing each csv file?
#this is my current code
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
mBGS5L = pd.read_csv("strain5.csv") #45x100
mBGS8L = pd.read_csv("strain8.csv")
mBGS10L = pd.read_csv("strain10.csv")
mBGS5L_ = np.array(mBGS5L)
mBGS8L_ = np.array(mBGS8L)
mBGS10L_ = np.array(mBGS10L)
mBGS = np.array([mBGS5L_,mBGS8L_,mBGS10L_])
#to which mBGS.shape returns a 3x45x100 array'''
注意:我已经在将多个csv文件加载到1个数据帧中时检查了其他stackoverflow链接,我了解了glob以获取所需的所有csv文件的列表.但是我的问题是,使用glob并连接csv文件会返回一个列表,而不是3d数组--我无法将其转换为numpy数组,因为它会返回错误
Note: I have checked other stackoverflow links on loading multiple csv files into 1 dataframe, to which I learned about glob to get the list of all csv files I need. My problem though is that using glob and concatenating the csv files returns a list and not a 3d array---which I can't convert to numpy array as it returns an error
from glob import glob
strain = glob("strain*.csv")
df= [pd.read_csv(f) for f in strain]
df_ = np.asarray(df)
#this returns an error: cannot copy sequence with size 45 to array axis with dimension 30
任何帮助将不胜感激.谢谢
Any help would be greatly appreciated. Thanks
推荐答案
首先,您需要将 dataframes
转换为mxm数组.请参阅下面的代码
First you need to convert the dataframes
in to mxm array. Refer to the code below
from glob import glob
import numpy as np
strain = glob("strain*.csv")
df = [pd.read_csv(f).values for f in strain]
df_ = np.asarray(df)
这篇关于读取多个csv文件(大小为mxm),并以n维数组的形式加载(大小为nxmxm)(不串联)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!