将多个* .txt文件读取到Pandas Dataframe中,文件名作为列标题 [英] Read multiple *.txt files into Pandas Dataframe with filename as column header

查看:4741
本文介绍了将多个* .txt文件读取到Pandas Dataframe中,文件名作为列标题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试导入一组* .txt文件。我需要将文件导入Python中Pandas DataFrame的连续列。

I am trying to import a set of *.txt files. I need to import the files into successive columns of a Pandas DataFrame in Python.

要求和背景信息:


  1. 每个文件都有一列数字

  2. 文件中没有标题

  3. 正负整数是可能的

  4. 所有* .txt文件的大小相同

  5. DataFrame的列必须具有文件名(无扩展名)为标题

  6. 文件数量未提前知道

  1. Each file has one column of numbers
  2. No headers are present in the files
  3. Positive and negative integers are possible
  4. The size of all the *.txt files is the same
  5. The columns of the DataFrame must have the name of file (without extension) as the header
  6. The number of files is not known ahead of time

这是一个示例* .txt文件。所有其他人都有相同的格式。

Here is one sample *.txt file. All the others have the same format.

16
54
-314
1
15
4
153
86
4
64
373
3
434
31
93
53
873
43
11
533
46

这是我的尝试:

import pandas as pd
import os
import glob

# Step 1: get a list of all csv files in target directory
my_dir = "C:\\Python27\Files\\"
filelist = []
filesList = []
os.chdir( my_dir )

# Step 2: Build up list of files:
for files in glob.glob("*.txt"):
    fileName, fileExtension = os.path.splitext(files)
    filelist.append(fileName) #filename without extension
    filesList.append(files) #filename with extension

# Step 3: Build up DataFrame:
df = pd.DataFrame()
for ijk in filelist:
    frame = pd.read_csv(filesList[ijk])
    df = df.append(frame)
print df

步骤1和2工作。我遇到第3步的问题。我收到以下错误消息:

Steps 1 and 2 work. I am having problems with step 3. I get the following error message:

Traceback (most recent call last):
  File "C:\Python27\TextFile.py", line 26, in <module>
    frame = pd.read_csv(filesList[ijk])
TypeError: list indices must be integers, not str

问题:
有没有更好的方式将这些* .txt文件加载到熊猫数据框?为什么read_csv不接受文件名的字符串?

Question: Is there a better way to load these *.txt files into a Pandas dataframe? Why does read_csv not accept strings for file names?

推荐答案

您可以将它们读入多个数据帧,然后将它们并入。假设您有两个文件,其中包含显示的数据。

You can read them into multiple dataframes and concat them together afterwards. Suppose you have two of those files, containing the data shown.

In [6]:
filelist = ['val1.txt', 'val2.txt']
print pd.concat([pd.read_csv(item, names=[item[:-4]]) for item in filelist], axis=1)
    val1  val2
0     16    16
1     54    54
2   -314  -314
3      1     1
4     15    15
5      4     4
6    153   153
7     86    86
8      4     4
9     64    64
10   373   373
11     3     3
12   434   434
13    31    31
14    93    93
15    53    53
16   873   873
17    43    43
18    11    11
19   533   533
20    46    46

这篇关于将多个* .txt文件读取到Pandas Dataframe中,文件名作为列标题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆