从python中的.dat文件读取和做计算 [英] reading and doing calculation from .dat file in python
问题描述
我需要在python中读取一个.dat文件,共有12列和数百万行的行。我需要将列2,3和4与列1分开,以进行计算。所以在加载.dat文件之前,我需要删除所有其他不需要的列吗?如果没有,我如何选择性地声明列,并要求python做数学?
.dat文件的示例为
data.dat
我是python的新用户,所以一些小指令打开,读取和计算
我已经添加了我使用的代码,作为您的建议:
from sys import argv
import pandas as pd
script,filename = argv
txt = open(filename)
print这是你的文件%r:%filename
print txt.read()
def your_func :
return row ['x-momentum'] / row ['mass']
columns_to_keep = ['mass','x-momentum']
dataframe = pd。 read_csv('〜/ Pictures',delimiter =,,usecols = columns_to_keep)
dataframe ['new_column'] = dataframe.apply(your_func,axis = 1)
,以及我遇到的错误:
Traceback(最近一次调用):
文件flash.py,第18行,在< module>
dataframe = pd.read_csv('〜/ Pictures',delimiter =,,usecols = columns_to_keep)
文件/home/trina/anaconda2/lib/python2.7/site-packages/pandas /io/parsers.py,line 529,在parser_f
中return _read(filepath_or_buffer,kwds)
文件/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io /parsers.py,第295行,在_read
中parser = TextFileReader(filepath_or_buffer,** kwds)
文件/home/trina/anaconda2/lib/python2.7/site-packages/pandas/ io / parsers.py,行612,在__init__
self._make_engine(self.engine)
文件/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io /parsers.py,行747,在_make_engine
中self._engine = CParserWrapper(self.f,** self.options)
文件/home/trina/anaconda2/lib/python2.7/ site-packages / pandas / io / parsers.py,第1119行,在__init__
self._reader = _parser.TextReader(src,** kwds)
文件pandas / parser.pyx,行518,in pandas.parser.TextReader .__ cinit__(pandas / parser.c:5030)
ValueError:没有要从文件中解析的列
解决方案查看你的
flash.dat
文件后,很明显你需要做一点清理之前你处理它。以下代码将其转换为CSV文件:import csv
#读取flash.dat列表列表
datContent = [i.strip()。split()for i in open(./ flash.dat)。readlines()]
#一个新的CSV文件
打开(./ flash.csv,wb)为f:
writer = csv.writer(f)
writer.writerows(datContent)
现在,使用Pandas计算新列。
import pandas as pd
def your_func(row):
return row ['x-momentum'] / row ['mass']
columns_to_keep = ['#time','x-momentum','mass']
dataframe = pd.read_csv(./ flash.csv,usecols = columns_to_keep)
dataframe ['new_column'] = dataframe.apply(your_func,axis = 1)
print dataframe
I need to read a .dat file in python which has 12 columns in total and millions of lines of rows. I need to divide column 2,3 and 4 with column 1 for my calculation. So before I load that .dat file, do I need to delete all the other unwanted columns? If not, how do I selectively declare the column and ask python to do the math?
an example of the .dat file would be data.dat
I am new to python , so a little instruction to open , read and calculation would be appreciated.
I have added the code I am using as a starter from your suggestion:
from sys import argv import pandas as pd script, filename = argv txt = open(filename) print "Here's your file %r:" % filename print txt.read() def your_func(row): return row['x-momentum'] / row['mass'] columns_to_keep = ['mass', 'x-momentum'] dataframe = pd.read_csv('~/Pictures', delimiter="," , usecols=columns_to_keep) dataframe['new_column'] = dataframe.apply(your_func, axis=1)
and also the error I get through it:
Traceback (most recent call last): File "flash.py", line 18, in <module> dataframe = pd.read_csv('~/Pictures', delimiter="," , usecols=columns_to_keep) File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 529, in parser_f return _read(filepath_or_buffer, kwds) File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 295, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 612, in __init__ self._make_engine(self.engine) File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 747, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 1119, in __init__ self._reader = _parser.TextReader(src, **kwds) File "pandas/parser.pyx", line 518, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:5030) ValueError: No columns to parse from file
解决方案After looking at your
flash.dat
file, it's clear you need to do a little clean up before you process it. The following code converts it to a CSV file:import csv # read flash.dat to a list of lists datContent = [i.strip().split() for i in open("./flash.dat").readlines()] # write it as a new CSV file with open("./flash.csv", "wb") as f: writer = csv.writer(f) writer.writerows(datContent)
Now, use Pandas to compute new column.
import pandas as pd def your_func(row): return row['x-momentum'] / row['mass'] columns_to_keep = ['#time', 'x-momentum', 'mass'] dataframe = pd.read_csv("./flash.csv", usecols=columns_to_keep) dataframe['new_column'] = dataframe.apply(your_func, axis=1) print dataframe
这篇关于从python中的.dat文件读取和做计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!