如何从Python中的txt文件读取数据集? [英] How to read a dataset from a txt file in Python?

查看:641
本文介绍了如何从Python中的txt文件读取数据集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个采用以下格式的数据集:

I have a dataset in this format:

我需要导入数据并使用它.

I need to import the data and work with it.

主要问题是第一和第四列是字符串,而第二和第三列分别是浮点数和整数.

The main problem is that the first and the fourth columns are strings while the second and third columns are floats and ints, respectively.

我想将数据放入矩阵中,或者至少获取每列数据的列表.

I'd like to put the data in a matrix or at least obtain a list of each column's data.

我试图以字符串的形式读取整个数据集,但这很混乱:

I tried to read the whole dataset as a string but it's a mess:

f = open ( 'input.txt' , 'r')
l = [ map(str,line.split('\t')) for line in f ]

什么是一个好的解决方案?

What could be a good solution?

推荐答案

您可以使用熊猫.它们非常适合读取csv文件,制表符分隔的文件等.当使用行/列进行访问时,熊猫几乎总是会正确地读取数据类型并将其放入numpy数组中.

You can use pandas. They are great for reading csv files, tab delimited files etc. Pandas will almost all the time read the data type correctly and put them in an numpy array when accessed using rows/columns as demonstrated.

我使用了以制表符分隔的"test.txt"文件:

I used this tab delimited 'test.txt' file:

    bbbbffdd    434343  228 D 
    bbbWWWff    43545343    289 E
    ajkfbdafa   2345345 2312    F

这是熊猫码.您将使用python中的一行在一个不错的数据框中读取您的文件.您可以将'sep'值更改为适合您文件的其他任何值.

Here is the pandas code. Your file will be read in a nice dataframe using one line in python. You can change the 'sep' value to anything else to suit your file.

    import pandas as pd
    X = pd.read_csv('test.txt', sep="\t", header=None)

然后尝试:

    print X
            0         1     2   3
    0   bbbbffdd    434343   228  D 
    1   bbbWWWff  43545343   289   E
    2  ajkfbdafa   2345345  2312   F

    print X[0]
    0     bbbbffdd
    1     bbbWWWff
    2    ajkfbdafa

    print X[2]
    0     228
    1     289
    2    2312

    print X[1][1:]
    1    43545343
    2     2345345

您可以将列名称添加为:

You can add column names as:

    X.columns = ['random_letters', 'number', 'simple_number', 'letter']

然后将列取为:

    X['number'].values
    array([  434343, 43545343,  2345345])

这篇关于如何从Python中的txt文件读取数据集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆