将多个* .txt文件读取到Pandas Dataframe中,文件名作为列标题 [英] Read multiple *.txt files into Pandas Dataframe with filename as column header
问题描述
我正在尝试导入一组* .txt文件。我需要将文件导入Python中Pandas DataFrame的连续列。
I am trying to import a set of *.txt files. I need to import the files into successive columns of a Pandas DataFrame in Python.
要求和背景信息:
- 每个文件都有一列数字
- 文件中没有标题
- 正负整数是可能的
- 所有* .txt文件的大小相同
- DataFrame的列必须具有文件名(无扩展名)为标题
- 文件数量未提前知道
- Each file has one column of numbers
- No headers are present in the files
- Positive and negative integers are possible
- The size of all the *.txt files is the same
- The columns of the DataFrame must have the name of file (without extension) as the header
- The number of files is not known ahead of time
这是一个示例* .txt文件。所有其他人都有相同的格式。
Here is one sample *.txt file. All the others have the same format.
16
54
-314
1
15
4
153
86
4
64
373
3
434
31
93
53
873
43
11
533
46
这是我的尝试:
import pandas as pd
import os
import glob
# Step 1: get a list of all csv files in target directory
my_dir = "C:\\Python27\Files\\"
filelist = []
filesList = []
os.chdir( my_dir )
# Step 2: Build up list of files:
for files in glob.glob("*.txt"):
fileName, fileExtension = os.path.splitext(files)
filelist.append(fileName) #filename without extension
filesList.append(files) #filename with extension
# Step 3: Build up DataFrame:
df = pd.DataFrame()
for ijk in filelist:
frame = pd.read_csv(filesList[ijk])
df = df.append(frame)
print df
步骤1和2工作。我遇到第3步的问题。我收到以下错误消息:
Steps 1 and 2 work. I am having problems with step 3. I get the following error message:
Traceback (most recent call last):
File "C:\Python27\TextFile.py", line 26, in <module>
frame = pd.read_csv(filesList[ijk])
TypeError: list indices must be integers, not str
问题:
有没有更好的方式将这些* .txt文件加载到熊猫数据框?为什么read_csv不接受文件名的字符串?
Question: Is there a better way to load these *.txt files into a Pandas dataframe? Why does read_csv not accept strings for file names?
推荐答案
您可以将它们读入多个数据帧,然后将它们并入。假设您有两个文件,其中包含显示的数据。
You can read them into multiple dataframes and concat them together afterwards. Suppose you have two of those files, containing the data shown.
In [6]:
filelist = ['val1.txt', 'val2.txt']
print pd.concat([pd.read_csv(item, names=[item[:-4]]) for item in filelist], axis=1)
val1 val2
0 16 16
1 54 54
2 -314 -314
3 1 1
4 15 15
5 4 4
6 153 153
7 86 86
8 4 4
9 64 64
10 373 373
11 3 3
12 434 434
13 31 31
14 93 93
15 53 53
16 873 873
17 43 43
18 11 11
19 533 533
20 46 46
这篇关于将多个* .txt文件读取到Pandas Dataframe中,文件名作为列标题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!