在python中解析数据集中的特定列 [英] Parsing specific columns from a dataset in python
问题描述
我有一个包含多列的数据集,我只对分析六列中的数据感兴趣.它在txt文件中,我想加载该文件,然后拉出以下标题(时间,模式,事件,xcoord,ycoord,phi)的列(0、1、2、4、6、7).一共有十列,下面是数据的示例:
I have a dataset with multiple columns and I am only interested in analyzing the data from six of the columns. It is in a txt file, and I want to load the file and pull out the following columns (0, 1, 2, 4, 6, 7) with the headings (time, mode, event, xcoord, ycoord, phi). There are ten columns total, Here is an example of what the data looks like:
1385940076332 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076336 2 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076339 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076342 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076346 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076350 2 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076353 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076356 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
当我使用以下代码将数据解析为列时,它似乎只对数据进行计数-但我希望能够列出数据以进行进一步分析.这是我从@alko使用的代码:
When I use the following code to parse the data into columns, it only appears to count the data- but I would like to be able to list the data for further analysis. Here is the code I am using from @alko:
import pandas as pd
df = pd.read_csv('filtered.txt', header=None, false_values=None, sep='\s+')[[0, 1, 2, 4, 6, 7]]
df.columns = ['time', 'mode', 'event', 'xcoord', 'ycoord', 'phi']
print df
以下是该代码返回的内容:
Here is what that code returns:
class 'pandas.core.frame.DataFrame'
Int64Index: 115534 entries, 0 to 115533
Data columns (total 6 columns):
time 115534 non-null values
mode 115534 non-null values
event 115534 non-null values
xcoord 115534 non-null values
ycoord 115534 non-null values
phi 115534 non-null values
dtypes: float64(3), int64(2), object(1)
因此,目标是从10个原始文档中拉出这6列,标记它们并列出它们.
So the goal is to pull out these 6 columns from the 10 original, label them, and list them.
推荐答案
import pandas as pd
from StringIO import StringIO
s = """1385940076332 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076336 2 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076339 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076342 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076346 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076350 2 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076353 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.000000 0.000000
1385940076356 3 M subject_avatar -30.000000 1.000000 -59.028107 180.000000 0.# 000000 0.000000"""
df = pd.read_csv(StringIO(s),header=None, sep='\s+')[[0, 2, 3, 4, 6, 7]]
df.columns = ['time', 'mode', 'event', 'xcoord', 'ycoord', 'phi']
print df
# time mode event xcoord ycoord phi
# 0 1385940076332 M subject_avatar -30 -59.028107 180
# 1 1385940076336 M subject_avatar -30 -59.028107 180
# 2 1385940076339 M subject_avatar -30 -59.028107 180
# 3 1385940076342 M subject_avatar -30 -59.028107 180
# 4 1385940076346 M subject_avatar -30 -59.028107 180
# 5 1385940076350 M subject_avatar -30 -59.028107 180
# 6 1385940076353 M subject_avatar -30 -59.028107 180
# 7 1385940076356 M subject_avatar -30 -59.028107 180
请注意,我纠正了列索引,因为您在问题中提供的索引似乎不正确.
Note, that I corrected columns indices, as it seems that ones provided by You in the question are not correct.
这篇关于在python中解析数据集中的特定列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!