Python - CSV时间导向将大量列转换为行 [英] Python - CSV time oriented Transposing large number of columns to rows

查看:203
本文介绍了Python - CSV时间导向将大量列转换为行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很多csv文件,它们是列为导向的,我需要预处理来最终索引它们。



这是时间导向的数据,每个设备(最多128列)的列数非常大:

  LDEV_XXXXXX.csv 
编号:XXXXX(VSP)
From:2014/06/04 05:58
To:2014/06/05 05:58
采样率:1

No。,time,00:30:00X(X2497-1),00:30:01X(X2498-1),00:30:02X b242,2014/06/04 10:00,0,0,0
243,2014/06/04 10:01,0,0,0
244,2014/06/04 10:02,9,0,0
245,2014/06/04 10:03,0,0,0
246 ,2014/06/04 10:04,0,0,0
247,2014/06/04 10:05,0,0,0

我的目标是将数据转换(如果它是正确的)数据到行,这样我就能够操纵数据更有效率,例如:

 time,device,value 
2014/06/04 10:00 00:30:00X(X2497-1),0
2014/06/04 10:00,00:30:01X(X2498-1),0
2014/06/04 10:00,00:30:02X(X2499-1),0
2014/06/04 10:01,00:30:00X ,0
2014/06/04 10:01,00:30:01X(X2498-1),0
2014/06/04 10:01,00: 30:02X(X2499-1),0
2014/06/04 10:02,00:30:00X(X2497-1),9
2014/06/04 10:02,00:30:01X(X2498-1),0
2014/06/04 10:02,00:30:02X(X2499-1),0

等等...



我已经让原始数据(这是使用,作为分隔符),你会注意到,我需要删除6行的没有列,没有兴趣,但这不是主要的目标和困难)



我有一个python开始代码来转置csv数据,但它不完全是我需要的...

  import csv 
import sys
infile = sys.argv [1]
outfile = sys.argv [2]

with open(infile)as f:
reader = csv.reader(f)
cols = []
读取器中的行:
cols.append b:
$ b,其中open(outfile,'wb')as f:
writer = csv.writer(f)
for i in range(len(max(cols,key = len) )):
writer.writerow([c [i] if i

请注意,列数是任意的,有几个,最多128个取决于文件。



m很肯定这是一个常见的需求,但我还是找不到确切的python代码,或者我不能得到...



编辑: / p>

更精确:



每个时间戳记行将重复设备数量,更多的行(乘以设备数量),但只有几行(时间戳,设备,值)
最终所需的结果已更新: - )



我想能够使用脚本使用argument1作为infile和argument2作为outfile: - )

解决方案

编辑:期望报价( >,端口代码到python 2,指示python 3并删除调试 print



EDIT2:fixed stupid bug not递增索引



EDIT3:新版本允许输入文件包含多个标头,每个标头后跟数据



不确定是否值得使用 csv 模块,因为您的分隔符是固定的,您没有引号,并且没有包含换行符或分隔符的字段: line .strip.split(',')就够了。



这是我试过的:




  • 跳过行,直到以第一个开头为止​​,并读取两个第一个之后的字段以获取标识符

  • 逐行前进

    • 在第二栏输入日期

    • 使用标识符

    >


代码为python 2(删除第一行 from __future__ import print_function for python 3)

  from __future__ import print_function 

类转换器(对象):
def _skip_preamble :
for self.fin:
如果line.strip()。startswith('No。'):
self.keys = line.strip()。split ,')[2:]
return
raise异常('未找到初始行)
def _do_loop(self):
在self.fin中的行:
elts [2:]:
打印出来的代码如下:b elts = line.strip()。split(',')
dat = elts [1]
ix = 0
(dat,self.keys [ix],val,sep =',',file = self.out)
ix + = 1

def转置(self,ficin,ficout):
with open(ficin)as fin:
with open(ficout,'w')as fout:
self.do_transpose(fin,fout)
def do_transpose ,fout):
self.fin = fin
self.out = fout
self._skip_preamble()
self._do_loop()



用法:

  t = transposer 
t.transpose('in','out')

如果输入文件包含多个标题,必须重置每个标题上的键列表:

 从__future__ import print_function 

类转换器(对象):
def _do_loop(self):
line_number = 0
在self.fin中的行:
line_number + = 1
line = 。跳闸();
如果line.strip()。startswith('No。'):
self.keys = line.strip()。split(',')[2:]
elif line.startswith(''):
elts = line.strip()。split(',')
if len(elts)==(len(self.keys)+ 2):
dat = elts [1]
ix = 0
val in elts [2:]:
print(dat,self.keys [ix],val,sep =', ',file = self.out)
ix + = 1
else:
raise Exception(语法错误行%d找到%d个预期%d
% ,len(self.keys),len(elts) - 2))

def transpose(self,ficin,ficout):
with open(ficin)as fin:
开放(ficout,'w')为fout:
self.do_transpose(fin,fout)
def do_transpose(self,fin,fout):
self.fin = fin
self.out = fout
self.keys = []
self._do_loop()


I have many csv files which are "column" oriented and that I need to pre-process to finally index them.

This is time oriented data, with a very large number of columns for each "device" (up to 128 columns) like:

LDEV_XXXXXX.csv             
Serial number : XXXXX(VSP)              
From : 2014/06/04 05:58             
To   : 2014/06/05 05:58             
sampling rate : 1               

"No.","time","00:30:00X(X2497-1)","00:30:01X(X2498-1)","00:30:02X(X2499-1)"
"242","2014/06/04 10:00",0,0,0
"243","2014/06/04 10:01",0,0,0
"244","2014/06/04 10:02",9,0,0
"245","2014/06/04 10:03",0,0,0
"246","2014/06/04 10:04",0,0,0
"247","2014/06/04 10:05",0,0,0

My goal is to transpose (if it the term is the right one) data into rows, such that i will be able to manipulate the data much more efficiently, such as:

"time",device,value
"2014/06/04 10:00","00:30:00X(X2497-1)",0
"2014/06/04 10:00","00:30:01X(X2498-1)",0
"2014/06/04 10:00","00:30:02X(X2499-1)",0
"2014/06/04 10:01","00:30:00X(X2497-1)",0
"2014/06/04 10:01","00:30:01X(X2498-1)",0
"2014/06/04 10:01","00:30:02X(X2499-1)",0
"2014/06/04 10:02","00:30:00X(X2497-1)",9
"2014/06/04 10:02","00:30:01X(X2498-1)",0
"2014/06/04 10:02","00:30:02X(X2499-1)",0

And so on...

Note: I have let the raw data (which is uses "," as a separator), you would note that I need to delete the 6 first lines the "No" column which has no interest, but this is not the main goal and difficulty)

I have a python starting code to transpose csv data, but it doesn't exactly what i need...

import csv
import sys
infile = sys.argv[1]
outfile = sys.argv[2]

with open(infile) as f:
    reader = csv.reader(f)
    cols = []
    for row in reader:
        cols.append(row)

with open(outfile, 'wb') as f:
    writer = csv.writer(f)
    for i in range(len(max(cols, key=len))):
        writer.writerow([(c[i] if i<len(c) else '') for c in cols])

Note the number of columns are arbitrary, something a few, and up to 128 depending on files.

I'm pretty sure this is a common need but I couldn't yet find the exact python code that does this, or I couldn't get...

Edit:

More precision:

Each timestamp row will be repeated by the number of devices, so that the file will have much more lines (multiplied by the number of devices) but only a few rows (timestamp,device,value) The final desired result has been updated :-)

Edit:

I would like to be able to use the script using argument1 for infile and argument2 for outfile :-)

解决方案

EDIT : Expect quotes (") around No., port code to python 2 with indication for python 3 and remove debugging print

EDIT2 : fixed stupid bug not incrementing indexes

EDIT3 : new version allowing the input file to contain multiple headers each followed by data

I am not sure it is worth to use csv module, because you separator is fixed, you have no quotes, and no field containing newline or separator character : line.strip.split(',') is enough.

Here is what I tried :

  • skip lines until one begins with No. and read fields after 2 firsts to get identifiers
  • proceed line by line
    • take date on second field
    • print on line for each field after 2 firsts using identifier

Code for python 2 (remove first line from __future__ import print_function for python 3)

from __future__ import print_function

class transposer(object):
    def _skip_preamble(self):
        for line in self.fin:
            if line.strip().startswith('"No."'):
                self.keys = line.strip().split(',')[2:]
                return
        raise Exception('Initial line not found')
    def _do_loop(self):
        for line in self.fin:
            elts = line.strip().split(',')
            dat = elts[1]
            ix = 0
            for val in elts[2:]:
                print(dat, self.keys[ix], val, sep=',', file = self.out)
                ix += 1

    def transpose(self, ficin, ficout):
        with open(ficin) as fin:
            with open(ficout, 'w') as fout:
                self.do_transpose(fin, fout)
    def do_transpose(self, fin, fout):
        self.fin = fin
        self.out = fout
        self._skip_preamble()
        self._do_loop()

Usage :

t = transposer()
t.transpose('in', 'out')

If input file contains multiple headers, it is necessary to reset the list of keys on each header :

from __future__ import print_function

class transposer(object):
    def _do_loop(self):
        line_number = 0
        for line in self.fin:
            line_number += 1
            line = line.strip();
            if line.strip().startswith('"No."'):
                self.keys = line.strip().split(',')[2:]
            elif line.startswith('"'):
                elts = line.strip().split(',')
                if len(elts) == (len(self.keys) + 2):
                    dat = elts[1]
                    ix = 0
                    for val in elts[2:]:
                        print(dat, self.keys[ix], val, sep=',', file = self.out)
                        ix += 1
                else:
                    raise Exception("Syntax error line %d expected %d values found %d"
                                    % (line_number, len(self.keys), len(elts) - 2))

    def transpose(self, ficin, ficout):
        with open(ficin) as fin:
            with open(ficout, 'w') as fout:
                self.do_transpose(fin, fout)
    def do_transpose(self, fin, fout):
        self.fin = fin
        self.out = fout
        self.keys = []
        self._do_loop()

这篇关于Python - CSV时间导向将大量列转换为行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆