将基于行的记录的表转换为基于列的记录(CSV) [英] Convert table of row-based records to column-based records (CSV)

查看:136
本文介绍了将基于行的记录的表转换为基于列的记录(CSV)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在包含基于行的数据的文本文件中给定一个表,您建议将哪个表转换为基于列的表? (例如CSV)。



Input_data.txt:

  Source = X:\folder_abc 
目标= Y:\Abc_folder
总字节数= 208,731,021
MB每分钟= 256.5
源= X:\folder_def
目标= Y:\xyz_folder
总字节数= 123,134,545
MB每分钟= 326
源= X:\folder_foo
目标= Y:\Baz_folder
总字节数= 24,344
MB每分钟= 532
...等。

希望的结果(仅格式化此处的标签可读性):

 源,目标,总字节数,MB每分钟
X:\folder_abc,Y:\Abc_folder,208,731,021 256.5
X:\folder_def,Y:\xyz_folder,123,134,545,326
X:\folder_foo,Y:\Baz_folder 24,344,532
...

我可以使用的工具是Windows批处理文件和Powershell。喜欢.bat解决方案,因为我在那里更舒适,但如果这是太迂回或不透明,我们可以踢它。



UPDATE,根据评论



我想出了如何将记录转换成名称

变量,但不知道如何从那个点转换到列。

  for / ftokens = 1,2 delims ==%% a in('findstr / c:=%logfile%')do(
@echo %% a %% b

刚刚发生我可以为每个文本文件做一列,然后将它们全部添加在Excel中。

  for / ftokens = 1,2 delims ==%% a in(' findstr / c:=%logfile%')do(
@echo %% b>> %% a.csv



UPDATE-2:引用所有结果中的所有值,因为dbenham指出不这样做会导致问题。

解决方案

这与Aacini的原始答案类似,除了我从来没有在内存中存储多行。大的输入文件将消耗大量内存,这会减慢脚本。只存储一行避免了这个问题。



另一个主要区别是我让代码发现开始一个新行的列名,而不是硬编码的值。 p>

我还使用不同的方法去掉标题中每个列名称的尾随空格。我假设列名不包含任何以下字符: \ / 。我依赖于文件名不能以空格结束的事实,因此〜n 修饰符规范化名称以删除任何尾随空格。



当从值中去除引号空间时,我还使用tokens = * p>

  @echo OFF 
setlocal enableDelayedExpansion

设置input = test.txt
setoutput = result.csv

setrow =
setheader =
setbegin =
setfirst =$ / ftokens = *%% C in(%% B)中的$ f($ input%)b $ b(
for / fusebackq tokens = 1 * delims == )do(
if!begin!equ%% A(
如果没有先定义(
set first = 1
echo!header:〜1!

echo!row:〜1!
setrow =

setrow =!row!,%% C
b如果没有首先定义/ fdelims =%% H in(%% A)do(
如果没有定义begin setbegin = %% A
setheader =! header!,%%〜nH


echo!row:〜1!
)>%output%



-12-05



相同的算法在VBS或JScript中可以更加强大地实现,并且速度更快。



或者你可以得到一个跳跃开始,并使用 JREPL.BAT - 一个混合JScript /批处理实用程序,用于在文本上执行正则表达式搜索和替换。它允许将用户定义的JScript代码片段合并到过程中,但是在批处理上下文中执行。



整个命令可以放在一个lonnnnnggggggg行,会真的丑陋。相反,我使用批处理行继续定义一个变量与大多数用户定义的JScript代码,并传递使用 / JBEG 。不可能将双引号文字传递给CSCRIPT,所以我使用'\x22'



脚本期望源文件作为第一个也是唯一的参数传递,并且输出使用具有.csv扩展名的相同基本名称写入相同的位置。

  @echo off 
setlocal
set beg = ^
var begin,header ='。',line ='',q ='\x22 '; ^
function writeLn(){^
if(header)output.WriteLine(header.substr(2)); ^
header =''; ^
if line $($ 1,$ 2){^
if( $ 1 == begin)writeLn(); ^
if(!begin)begin = $ 1; ^
if(header)header + =','+ q + $ 1 + q; ^
line + ','+ q + $ 2 + q; ^
return false; ^
}
call jrepl^(。+?)* = *(。*)repl($ 1,$ 2 ); / jmatch / jbeg%beg%/ jendwriteLn(); / f%1 / o%〜dpn1.csv
exit / b

使用完全相同的JScript代码,但我使用 / JLIB 选项直接从文件而不是从变量加载它。该脚本使用标准的混合Jscript /批处理技术。此选项允许我在代码中使用双引号文字。

  @if(@X)==(@ Y) @end / *开始一个JScript注释的无害的混合行

:: ****批次代码********
@echo off
call jrepl ^(。+?)* = *(。*)repl($ 1,$ 2); / jmatch / jlib%〜f0/ jendwriteLn(); / f%1 / o%〜dpn1.csv
exit / b

****** Jscript代码****** /
$ b b var begin,header ='。',line ='',q ='';

function writeLn(){
if(header)output.WriteLine(header.substr 2));
header ='';
if(line)output.WriteLine(line.substr(1));
line ='';
}

函数repl($ 1,$ 2){
if($ 1 == begin)writeLn();
if(!begin)begin = $ 1;
if =','+ q + $ 1 + q;
line + =','+ q + $ 2 + q;
return false;
}
pre>

Given a table in a text file that contains row based data, what method would you recommend for converting to a column based table? (e.g. CSV).

Input_data.txt:

Source =         X:\folder_abc
Destination =    Y:\Abc_folder
Total bytes =    208,731,021
MB per min =     256.5
Source =         X:\folder_def
Destination =    Y:\xyz_folder
Total bytes =    123,134,545
MB per min =     326
Source =         X:\folder_foo
Destination =    Y:\Baz_folder
Total bytes =    24,344
MB per min =     532
...etc.

Desired Result (only formatted with tabs here for legibility):

Source,             Destination,        Total bytes,    MB per min
"X:\folder_abc",    "Y:\Abc_folder",    "208,731,021",  "256.5"
"X:\folder_def",    "Y:\xyz_folder",    "123,134,545",  "326"
"X:\folder_foo",    "Y:\Baz_folder",    "24,344",       "532"
...

Tools at my disposal are Windows batch files and Powershell. Prefer a .bat solution because I'm more comfortable there, but if that is too circuitous or opaque we can punt it.

UPDATE, as per comments

I've figured out how to turn the records into name & value variables, but don't know how to manipulate them from that point to transpose into columns.

for /f "tokens=1,2 delims==" %%a in ('findstr /c:"=" "%logfile%"') do (
  @echo %%a %%b
  )

It just occured to me I can do one column per text file and then append them all in Excel. Crude but workable perhaps(?)

for /f "tokens=1,2 delims==" %%a in ('findstr /c:"=" "%logfile%"') do (
  @echo %%b >>  %%a.csv
  )

UPDATE-2: quote all values in desired result, as dbenham pointed out not doing so will cause problems.

解决方案

This is similar to Aacini's original answer, except I never store more than one line in memory. A large input file would consume a lot of memory, which would slow down the script. Storing only one line avoids that problem.

Another major difference is I let the code discover the column name that starts a new row, rather than hard coding the value.

I also use a different method to strip off the trailing space(s) from each column name in the header. I assume that the column names do not contain any of the following characters: :, ., \, or /. I rely on the fact that file names cannot end with a space, so the ~n modifier normalizes the "name" to remove any trailing space(s).

I also use "tokens=*" when stripping leading spaces from the values, just in case a value contains spaces.

@echo OFF
setlocal enableDelayedExpansion

set "input=test.txt"
set "output=result.csv"

set "row="
set "header="
set "begin="
set "first="
(
  for /f "usebackq tokens=1* delims==" %%A in ("%input%") do for /f "tokens=*" %%C in ("%%B") do (
    if "!begin!" equ "%%A" (
      if not defined first (
        set first=1
        echo !header:~1!
      )
      echo !row:~1!
      set "row="
    )
    set "row=!row!,"%%C""
    if not defined first for /f "delims=" %%H in ("%%A") do (
      if not defined begin set "begin=%%A"
      set "header=!header!,"%%~nH""
    )
  )
  echo !row:~1!
)>"%output%"


EDIT 2014-12-05

The same algorithm could be implemented more robustly in VBS or JScript, and it would be faster.

Or you could get a bit of a jump start and use JREPL.BAT - a hybrid JScript/batch utility that performs regular expression search and replace on text. It allows user defined JScript code snippets to be incorporated into the process, but is executed within a batch context.

The entire command could be put on one lonnnnnggggggg line, but that would be really ugly. Instead I use batch line continuation to define a variable with most of the user defined JScript code and pass that in using /JBEG. It is impossible to pass a double quote literal to CSCRIPT, so I use '\x22' instead.

The script expects the source file to be passed as the first and only argument, and the output is written to the same location using the same base name with a .csv extension.

@echo off
setlocal
set beg=^
var begin, header='.', line='', q='\x22';^
function writeLn(){^
  if (header) output.WriteLine(header.substr(2));^
  header='';^
  if (line) output.WriteLine(line.substr(1));^
  line='';^
}^
function repl($1,$2){^
  if ($1==begin) writeLn();^
  if (!begin) begin=$1;^
  if (header) header+=','+q+$1+q;^
  line+=','+q+$2+q;^
  return false;^
}
call jrepl "^(.+?) *= *(.*)" "repl($1,$2);" /jmatch /jbeg "%beg%" /jend "writeLn();" /f %1 /o "%~dpn1.csv"
exit /b

Below uses the exact same JScript code, but I use the /JLIB option to load it directly from the file instead of from a variable. The script uses standard hybrid Jscript/batch technology. This option allows me to use a double quote literal in the code.

@if (@X)==(@Y) @end /* harmless hybrid line that begins a JScript comment

::**** Batch code ********
@echo off
call jrepl "^(.+?) *= *(.*)" "repl($1,$2);" /jmatch /jlib "%~f0" /jend "writeLn();" /f %1 /o "%~dpn1.csv"
exit /b

****** Jscript code ******/

var begin, header='.', line='', q='"';

function writeLn(){
  if (header) output.WriteLine(header.substr(2));
  header='';
  if (line) output.WriteLine(line.substr(1));
  line='';
}

function repl($1,$2){
  if ($1==begin) writeLn();
  if (!begin) begin=$1;
  if (header) header+=','+q+$1+q;
  line+=','+q+$2+q;
  return false;
}

这篇关于将基于行的记录的表转换为基于列的记录(CSV)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆