Python - CSV时间导向将大量列转换为行 [英] Python - CSV time oriented Transposing large number of columns to rows
问题描述
我有很多csv文件,它们是列为导向的,我需要预处理来最终索引它们。
这是时间导向的数据,每个设备(最多128列)的列数非常大:
LDEV_XXXXXX.csv
编号:XXXXX(VSP)
From:2014/06/04 05:58
To:2014/06/05 05:58
采样率:1
No。,time,00:30:00X(X2497-1),00:30:01X(X2498-1),00:30:02X b242,2014/06/04 10:00,0,0,0
243,2014/06/04 10:01,0,0,0
244,2014/06/04 10:02,9,0,0
245,2014/06/04 10:03,0,0,0
246 ,2014/06/04 10:04,0,0,0
247,2014/06/04 10:05,0,0,0
我的目标是将数据转换(如果它是正确的)数据到行,这样我就能够操纵数据更有效率,例如:
time,device,value
2014/06/04 10:00 00:30:00X(X2497-1),0
2014/06/04 10:00,00:30:01X(X2498-1),0
2014/06/04 10:00,00:30:02X(X2499-1),0
2014/06/04 10:01,00:30:00X ,0
2014/06/04 10:01,00:30:01X(X2498-1),0
2014/06/04 10:01,00: 30:02X(X2499-1),0
2014/06/04 10:02,00:30:00X(X2497-1),9
2014/06/04 10:02,00:30:01X(X2498-1),0
2014/06/04 10:02,00:30:02X(X2499-1),0
等等...
我已经让原始数据(这是使用,作为分隔符),你会注意到,我需要删除6行的没有列,没有兴趣,但这不是主要的目标和困难)
我有一个python开始代码来转置csv数据,但它不完全是我需要的...
import csv
import sys
infile = sys.argv [1]
outfile = sys.argv [2]
with open(infile)as f:
reader = csv.reader(f)
cols = []
读取器中的行:
cols.append b:
$ b,其中open(outfile,'wb')as f:
writer = csv.writer(f)
for i in range(len(max(cols,key = len) )):
writer.writerow([c [i] if i
请注意,列数是任意的,有几个,最多128个取决于文件。
m很肯定这是一个常见的需求,但我还是找不到确切的python代码,或者我不能得到...
编辑: / p>
更精确:
每个时间戳记行将重复设备数量,更多的行(乘以设备数量),但只有几行(时间戳,设备,值)
最终所需的结果已更新: - )
我想能够使用脚本使用argument1作为infile和argument2作为outfile: - )
编辑:期望报价()
>,端口代码到python 2,指示python 3并删除调试
print
EDIT2:fixed stupid bug not递增索引
EDIT3:新版本允许输入文件包含多个标头,每个标头后跟数据
不确定是否值得使用 csv
模块,因为您的分隔符是固定的,您没有引号,并且没有包含换行符或分隔符的字段: line .strip.split(',')
就够了。
这是我试过的:
- 跳过行,直到以第一个开头为止,并读取两个第一个之后的字段以获取标识符
- 逐行前进
- 在第二栏输入日期
- 使用标识符
代码为python 2(删除第一行 from __future__ import print_function
for python 3)
from __future__ import print_function
类转换器(对象):
def _skip_preamble :
for self.fin:
如果line.strip()。startswith('No。'):
self.keys = line.strip()。split ,')[2:]
return
raise异常('未找到初始行)
def _do_loop(self):
在self.fin中的行:
elts [2:]:
打印出来的代码如下:b elts = line.strip()。split(',')
dat = elts [1]
ix = 0
(dat,self.keys [ix],val,sep =',',file = self.out)
ix + = 1
def转置(self,ficin,ficout):
with open(ficin)as fin:
with open(ficout,'w')as fout:
self.do_transpose(fin,fout)
def do_transpose ,fout):
self.fin = fin
self.out = fout
self._skip_preamble()
self._do_loop()
用法:
t = transposer
t.transpose('in','out')
如果输入文件包含多个标题,必须重置每个标题上的键列表:
从__future__ import print_function
类转换器(对象):
def _do_loop(self):
line_number = 0
在self.fin中的行:
line_number + = 1
line = 。跳闸();
如果line.strip()。startswith('No。'):
self.keys = line.strip()。split(',')[2:]
elif line.startswith(''):
elts = line.strip()。split(',')
if len(elts)==(len(self.keys)+ 2):
dat = elts [1]
ix = 0
val in elts [2:]:
print(dat,self.keys [ix],val,sep =', ',file = self.out)
ix + = 1
else:
raise Exception(语法错误行%d找到%d个预期%d
% ,len(self.keys),len(elts) - 2))
def transpose(self,ficin,ficout):
with open(ficin)as fin:
开放(ficout,'w')为fout:
self.do_transpose(fin,fout)
def do_transpose(self,fin,fout):
self.fin = fin
self.out = fout
self.keys = []
self._do_loop()
I have many csv files which are "column" oriented and that I need to pre-process to finally index them.
This is time oriented data, with a very large number of columns for each "device" (up to 128 columns) like:
LDEV_XXXXXX.csv Serial number : XXXXX(VSP) From : 2014/06/04 05:58 To : 2014/06/05 05:58 sampling rate : 1 "No.","time","00:30:00X(X2497-1)","00:30:01X(X2498-1)","00:30:02X(X2499-1)" "242","2014/06/04 10:00",0,0,0 "243","2014/06/04 10:01",0,0,0 "244","2014/06/04 10:02",9,0,0 "245","2014/06/04 10:03",0,0,0 "246","2014/06/04 10:04",0,0,0 "247","2014/06/04 10:05",0,0,0
My goal is to transpose (if it the term is the right one) data into rows, such that i will be able to manipulate the data much more efficiently, such as:
"time",device,value "2014/06/04 10:00","00:30:00X(X2497-1)",0 "2014/06/04 10:00","00:30:01X(X2498-1)",0 "2014/06/04 10:00","00:30:02X(X2499-1)",0 "2014/06/04 10:01","00:30:00X(X2497-1)",0 "2014/06/04 10:01","00:30:01X(X2498-1)",0 "2014/06/04 10:01","00:30:02X(X2499-1)",0 "2014/06/04 10:02","00:30:00X(X2497-1)",9 "2014/06/04 10:02","00:30:01X(X2498-1)",0 "2014/06/04 10:02","00:30:02X(X2499-1)",0
And so on...
Note: I have let the raw data (which is uses "," as a separator), you would note that I need to delete the 6 first lines the "No" column which has no interest, but this is not the main goal and difficulty)
I have a python starting code to transpose csv data, but it doesn't exactly what i need...
import csv import sys infile = sys.argv[1] outfile = sys.argv[2] with open(infile) as f: reader = csv.reader(f) cols = [] for row in reader: cols.append(row) with open(outfile, 'wb') as f: writer = csv.writer(f) for i in range(len(max(cols, key=len))): writer.writerow([(c[i] if i<len(c) else '') for c in cols])
Note the number of columns are arbitrary, something a few, and up to 128 depending on files.
I'm pretty sure this is a common need but I couldn't yet find the exact python code that does this, or I couldn't get...
Edit:
More precision:
Each timestamp row will be repeated by the number of devices, so that the file will have much more lines (multiplied by the number of devices) but only a few rows (timestamp,device,value) The final desired result has been updated :-)
Edit:
I would like to be able to use the script using argument1 for infile and argument2 for outfile :-)
解决方案EDIT : Expect quotes (
"
) aroundNo.
, port code to python 2 with indication for python 3 and remove debuggingEDIT2 : fixed stupid bug not incrementing indexes
EDIT3 : new version allowing the input file to contain multiple headers each followed by data
I am not sure it is worth to use
csv
module, because you separator is fixed, you have no quotes, and no field containing newline or separator character :line.strip.split(',')
is enough.Here is what I tried :
- skip lines until one begins with No. and read fields after 2 firsts to get identifiers
- proceed line by line
- take date on second field
- print on line for each field after 2 firsts using identifier
Code for python 2 (remove first line from __future__ import print_function
for python 3)
from __future__ import print_function
class transposer(object):
def _skip_preamble(self):
for line in self.fin:
if line.strip().startswith('"No."'):
self.keys = line.strip().split(',')[2:]
return
raise Exception('Initial line not found')
def _do_loop(self):
for line in self.fin:
elts = line.strip().split(',')
dat = elts[1]
ix = 0
for val in elts[2:]:
print(dat, self.keys[ix], val, sep=',', file = self.out)
ix += 1
def transpose(self, ficin, ficout):
with open(ficin) as fin:
with open(ficout, 'w') as fout:
self.do_transpose(fin, fout)
def do_transpose(self, fin, fout):
self.fin = fin
self.out = fout
self._skip_preamble()
self._do_loop()
Usage :
t = transposer()
t.transpose('in', 'out')
If input file contains multiple headers, it is necessary to reset the list of keys on each header :
from __future__ import print_function
class transposer(object):
def _do_loop(self):
line_number = 0
for line in self.fin:
line_number += 1
line = line.strip();
if line.strip().startswith('"No."'):
self.keys = line.strip().split(',')[2:]
elif line.startswith('"'):
elts = line.strip().split(',')
if len(elts) == (len(self.keys) + 2):
dat = elts[1]
ix = 0
for val in elts[2:]:
print(dat, self.keys[ix], val, sep=',', file = self.out)
ix += 1
else:
raise Exception("Syntax error line %d expected %d values found %d"
% (line_number, len(self.keys), len(elts) - 2))
def transpose(self, ficin, ficout):
with open(ficin) as fin:
with open(ficout, 'w') as fout:
self.do_transpose(fin, fout)
def do_transpose(self, fin, fout):
self.fin = fin
self.out = fout
self.keys = []
self._do_loop()
这篇关于Python - CSV时间导向将大量列转换为行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!