Python - CSV时间导向将大量列转换为行 [英] Python - CSV time oriented Transposing large number of columns to rows

查看：203 发布时间：2017/2/24 18:47:59 python csv transpose

本文介绍了Python - CSV时间导向将大量列转换为行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有很多csv文件，它们是列为导向的，我需要预处理来最终索引它们。

这是时间导向的数据，每个设备（最多128列）的列数非常大：

  LDEV_XXXXXX.csv 
编号：XXXXX（VSP）
 From：2014/06/04 05:58 
 To：2014/06/05 05:58 
采样率：1 
 
 No。，time，00：30：00X（X2497-1），00：30：01X（X2498-1），00：30：02X b242，2014/06/04 10:00，0,0,0 
243，2014/06/04 10:01，0,0,0 
 244，2014/06/04 10:02，9,0,0 
245，2014/06/04 10:03，0,0,0 
246 ，2014/06/04 10:04，0,0,0 
247，2014/06/04 10:05，0,0,0

我的目标是将数据转换（如果它是正确的）数据到行，这样我就能够操纵数据更有效率，例如：

 time，device，value 
2014/06/04 10:00 00：30：00X（X2497-1），0 
2014/06/04 10:00，00：30：01X（X2498-1），0 
 2014/06/04 10:00，00：30：02X（X2499-1），0 
2014/06/04 10:01，00：30：00X ，0 
2014/06/04 10:01，00：30：01X（X2498-1），0 
2014/06/04 10:01，00： 30：02X（X2499-1），0 
2014/06/04 10:02，00：30：00X（X2497-1），9 
2014/06/04 10:02，00：30：01X（X2498-1），0 
2014/06/04 10:02，00：30：02X（X2499-1），0

等等...

我已经让原始数据（这是使用，作为分隔符），你会注意到，我需要删除6行的没有列，没有兴趣，但这不是主要的目标和困难）

我有一个python开始代码来转置csv数据，但它不完全是我需要的...

  import csv 
 import sys 
 infile = sys.argv [1] 
 outfile = sys.argv [2] 
 
 with open（infile）as f：
 reader = csv.reader（f）
 cols = [] 
读取器中的行：
 cols.append b：
 $ b，其中open（outfile，'wb'）as f：
 writer = csv.writer（f）
 for i in range（len（max（cols，key = len） ））：
 writer.writerow（[c [i] if i

请注意，列数是任意的，有几个，最多128个取决于文件。

m很肯定这是一个常见的需求，但我还是找不到确切的python代码，或者我不能得到...

编辑： / p>

更精确：

每个时间戳记行将重复设备数量，更多的行（乘以设备数量），但只有几行（时间戳，设备，值）
最终所需的结果已更新： - ）

我想能够使用脚本使用argument1作为infile和argument2作为outfile： - ）

解决方案

编辑：期望报价（） >，端口代码到python 2，指示python 3并删除调试 print

 
 
  EDIT2：fixed stupid bug not递增索引
 
 
  EDIT3：新版本允许输入文件包含多个标头，每个标头后跟数据
 
 
 不确定是否值得使用 csv 模块，因为您的分隔符是固定的，您没有引号，并且没有包含换行符或分隔符的字段： line .strip.split（'，'）就够了。
 
 
 这是我试过的：
 
 
  
 跳过行，直到以第一个开头为止，并读取两个第一个之后的字段以获取标识符
 
 逐行前进
  
 在第二栏输入日期
 
 使用标识符
 
 
 > 
 
 
 
 代码为python 2（删除第一行 from __future__ import print_function  for python 3）
  from __future__ import print_function 
 
类转换器（对象）：
 def _skip_preamble ：
 for self.fin：
如果line.strip（）。startswith（'No。'）：
 self.keys = line.strip（）。split ，'）[2：] 
 return 
 raise异常（'未找到初始行）
 def _do_loop（self）：
在self.fin中的行：
 elts [2：]：
打印出来的代码如下：b elts = line.strip（）。split（'，'）
 dat = elts [1] 
 ix = 0 
 （dat，self.keys [ix]，val，sep ='，'，file = self.out）
 ix + = 1 
 
 def转置（self，ficin，ficout）： 
 with open（ficin）as fin：
 with open（ficout，'w'）as fout：
 self.do_transpose（fin，fout）
 def do_transpose ，fout）：
 self.fin = fin 
 self.out = fout 
 self._skip_preamble（）
 self._do_loop（）
  
 
 
 用法：
  t = transposer 
 t.transpose（'in'，'out'）
  
如果输入文件包含多个标题，必须重置每个标题上的键列表：
 从__future__ import print_function 
 
类转换器（对象）：
 def _do_loop（self）：
 line_number = 0 
在self.fin中的行：
 line_number + = 1 
 line = 。跳闸（）; 
如果line.strip（）。startswith（'No。'）：
 self.keys = line.strip（）。split（'，'）[2：] 
 elif line.startswith（''）：
 elts = line.strip（）。split（'，'）
 if len（elts）==（len（self.keys）+ 2）：
 dat = elts [1] 
 ix = 0 
 val in elts [2：]：
 print（dat，self.keys [ix]，val，sep ='， '，file = self.out）
 ix + = 1 
 else：
 raise Exception（语法错误行％d找到％d个预期％d
％ ，len（self.keys），len（elts） -  2））
 
 def transpose（self，ficin，ficout）：
 with open（ficin）as fin：
开放（ficout，'w'）为fout：
 self.do_transpose（fin，fout）
 def do_transpose（self，fin，fout）：
 self.fin = fin 
 self.out = fout 
 self.keys = [] 
 self._do_loop（）
  
 
I have many csv files which are "column" oriented and that I need to pre-process to finally index them.

This is time oriented data, with a very large number of columns for each "device" (up to 128 columns) like:
LDEV_XXXXXX.csv             
Serial number : XXXXX(VSP)              
From : 2014/06/04 05:58             
To   : 2014/06/05 05:58             
sampling rate : 1               

"No.","time","00:30:00X(X2497-1)","00:30:01X(X2498-1)","00:30:02X(X2499-1)"
"242","2014/06/04 10:00",0,0,0
"243","2014/06/04 10:01",0,0,0
"244","2014/06/04 10:02",9,0,0
"245","2014/06/04 10:03",0,0,0
"246","2014/06/04 10:04",0,0,0
"247","2014/06/04 10:05",0,0,0
My goal is to transpose (if it the term is the right one) data into rows, such that i will be able to manipulate the data much more efficiently, such as:
"time",device,value
"2014/06/04 10:00","00:30:00X(X2497-1)",0
"2014/06/04 10:00","00:30:01X(X2498-1)",0
"2014/06/04 10:00","00:30:02X(X2499-1)",0
"2014/06/04 10:01","00:30:00X(X2497-1)",0
"2014/06/04 10:01","00:30:01X(X2498-1)",0
"2014/06/04 10:01","00:30:02X(X2499-1)",0
"2014/06/04 10:02","00:30:00X(X2497-1)",9
"2014/06/04 10:02","00:30:01X(X2498-1)",0
"2014/06/04 10:02","00:30:02X(X2499-1)",0
And so on...

Note: I have let the raw data (which is uses "," as a separator), you would note that I need to delete the 6 first lines the "No" column which has no interest, but this is not the main goal and difficulty)

I have a python starting code to transpose csv data, but it doesn't exactly what i need...
import csv
import sys
infile = sys.argv[1]
outfile = sys.argv[2]

with open(infile) as f:
    reader = csv.reader(f)
    cols = []
    for row in reader:
        cols.append(row)

with open(outfile, 'wb') as f:
    writer = csv.writer(f)
    for i in range(len(max(cols, key=len))):
        writer.writerow([(c[i] if i<len(c) else '') for c in cols])
Note the number of columns are arbitrary, something a few, and up to 128 depending on files.

I'm pretty sure this is a common need but I couldn't yet find the exact python code that does this, or I couldn't get...

Edit:

More precision:

Each timestamp row will be repeated by the number of devices, so that the file will have much more lines (multiplied by the number of devices) but only a few rows (timestamp,device,value)
The final desired result has been updated :-)

Edit:

I would like to be able to use the script using argument1 for infile and argument2 for outfile :-)
 解决方案 
EDIT : Expect quotes (") around No., port code to python 2 with indication for python 3 and remove debugging print

EDIT2 : fixed stupid bug not incrementing indexes

EDIT3 : new version allowing the input file to contain multiple headers each followed by data

I am not sure it is worth to use csv module, because you separator is fixed, you have no quotes, and no field containing newline or separator character : line.strip.split(',') is enough.

Here is what I tried :


skip lines until one begins with No. and read fields after 2 firsts to get identifiers
proceed line by line

take date on second field
print on line for each field after 2 firsts using identifier



Code for python 2 (remove first line from __future__ import print_function for python 3)
from __future__ import print_function

class transposer(object):
    def _skip_preamble(self):
        for line in self.fin:
            if line.strip().startswith('"No."'):
                self.keys = line.strip().split(',')[2:]
                return
        raise Exception('Initial line not found')
    def _do_loop(self):
        for line in self.fin:
            elts = line.strip().split(',')
            dat = elts[1]
            ix = 0
            for val in elts[2:]:
                print(dat, self.keys[ix], val, sep=',', file = self.out)
                ix += 1

    def transpose(self, ficin, ficout):
        with open(ficin) as fin:
            with open(ficout, 'w') as fout:
                self.do_transpose(fin, fout)
    def do_transpose(self, fin, fout):
        self.fin = fin
        self.out = fout
        self._skip_preamble()
        self._do_loop()
Usage :
t = transposer()
t.transpose('in', 'out')
If input file contains multiple headers, it is necessary to reset the list of keys on each header :
from __future__ import print_function

class transposer(object):
    def _do_loop(self):
        line_number = 0
        for line in self.fin:
            line_number += 1
            line = line.strip();
            if line.strip().startswith('"No."'):
                self.keys = line.strip().split(',')[2:]
            elif line.startswith('"'):
                elts = line.strip().split(',')
                if len(elts) == (len(self.keys) + 2):
                    dat = elts[1]
                    ix = 0
                    for val in elts[2:]:
                        print(dat, self.keys[ix], val, sep=',', file = self.out)
                        ix += 1
                else:
                    raise Exception("Syntax error line %d expected %d values found %d"
                                    % (line_number, len(self.keys), len(elts) - 2))

    def transpose(self, ficin, ficout):
        with open(ficin) as fin:
            with open(ficout, 'w') as fout:
                self.do_transpose(fin, fout)
    def do_transpose(self, fin, fout):
        self.fin = fin
        self.out = fout
        self.keys = []
        self._do_loop()


                        
这篇关于Python  -  CSV时间导向将大量列转换为行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

Python - CSV时间导向将大量列转换为行 [英] Python - CSV time oriented Transposing large number of columns to rows

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python - CSV时间导向将大量列转换为行 [英] Python - CSV time oriented Transposing large number of columns to rows

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭