Windows批处理脚本以解析CSV文件并输出文本文件 [英] Windows batch script to parse CSV file and output a text file

查看:176
本文介绍了Windows批处理脚本以解析CSV文件并输出文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在另一页上看到了回复(

I've seen a response on another page (Help in writing a batch script to parse CSV file and output a text file) - brilliant code BTW:

@ECHO OFF
IF "%~1"=="" GOTO :EOF
SET "filename=%~1"
SET fcount=0
SET linenum=0
FOR /F "usebackq tokens=1-10 delims=," %%a IN ("%filename%") DO ^
CALL :process "%%a" "%%b" "%%c" "%%d" "%%e" "%%f" "%%g" "%%h" "%%i" "%%j"
GOTO :EOF

:trim
SET "tmp=%~1"
:trimlead
IF NOT "%tmp:~0,1%"==" " GOTO :EOF
SET "tmp=%tmp:~1%"
GOTO trimlead

:process
SET /A linenum+=1
IF "%linenum%"=="1" GOTO picknames

SET ind=0
:display
IF "%fcount%"=="%ind%" (ECHO.&GOTO :EOF)
SET /A ind+=1
CALL :trim %1
SETLOCAL ENABLEDELAYEDEXPANSION
ECHO !f%ind%!!tmp!
ENDLOCAL
SHIFT
GOTO display

:picknames
IF %1=="" GOTO :EOF
CALL :trim %1
SET /a fcount+=1
SET "f%fcount%=%tmp%"
SHIFT
GOTO picknames

对于我以以下格式制作的示例csv文件,它的工作非常出色:

It works brilliantly for an example csv file I made in the format:

Header,Name,Place
one,two,three
four,five,six

但是我要更改的实际文件包含64个字段-因此我将tokens=1-10更改为tokens=1-64并增加了%%a等,直到多达64个变量(例如,最后一个称为%%BL) .但是,现在,当我在大" csv文件(带有64个令牌)上运行批处理时,什么也没发生.没有错误(良好),但没有输出! (坏的).如果有人能帮上忙,那真是太棒了……如果我能在最后一点讲的话,那将使整个应用程序运转起来真是太好了!或者,如果有人有一些示例代码可以对无限数量的标记执行类似的操作……最终,我想制作一个类似于以下内容的字符串:

However the actual file I want to change comprises of 64 fields - so I altered the tokens=1-10 to tokens=1-64 and increased the %%a etc right up to 64 variables (the last being called %%BL for example). Now, however, when I run the batch on my 'big' csv file (with the 64 tokens) nothing happens. No errors (good) but no output! (bad). If anyone can help that would be fantastic... am soooo close to getting the whole app working if I can just nail this last bit! Or if anyone has some example code that will do similar for an indefinite number of tokens... Ultimately I want to make a string which will be something like:

field7,field12,field15,field18

推荐答案

重要更新-我认为Windows批处理不是您的理想选择,因为单个FOR/F无法解析更多内容超过31个令牌.有关说明,请参见下面的附录的底部.

Important update - I don't think Windows batch is a good option for your needs because a single FOR /F cannot parse more than 31 tokens. See the bottom of the Addendum below for an explanation.

但是,可以批量处理您想做的事情.这个丑陋的代码将使您能够访问所有64个令牌.

However, it is possible to do what you want with batch. This ugly code will give you access to all 64 tokens.

for /f "usebackq tokens=1-29* delims=," %%A in ("%filename%") do (
  for /f "tokens=1-26* delims=," %%a in ("%%^") do (
    for /f "tokens=1-9 delims=," %%1 in ("%%{") do (
      rem Tokens 1-26 are in variables %%A - %%Z
      rem Token  27 is in %%[
      rem Token  28 is in %%\
      rem Token  29 is in %%]
      rem Tokens 30-55 are in %%a - %%z
      rem Tokens 56-64 are in %%1 - %%9
    )
  )
)

附录提供了有关上述工作原理的重要信息.

The addendum provides important info on how the above works.

如果您只需要在行中的64个令牌中散布一些令牌,那么该解决方案将稍微容易一些,因为您可以避免使用疯狂的字符作为FOR变量.但是仍然需要仔细进行簿记.

If you only need a few of the tokens spread out amongst the 64 on the line, then the solution is marginally easier in that you might be able to avoid using crazy characters as FOR variables. But there is still careful bookkeeping to be done.

例如,以下内容将使您可以访问令牌5、27、46和64

For example, the following will give you access to tokens 5, 27, 46 and 64

for /f "usebackq tokens=5,27,30* delims=," %%A in ("%filename%") do (
  for /f "tokens=16,30* delims=," %%E in ("%%D") do (
    for /f "tokens=4 delims=," %%H in ("%%G") do (
      rem Token  5 is in %%A
      rem Token 27 is in %%B
      rem Token 46 is in %%E
      rem Token 64 is in %%H
    )
  )
)

2016年4月更新-基于DosTips用户Aacini,penpen和aGerman的调查工作,我开发了一种相对简单的方法,可以使用FOR/F同时访问数千个令牌.该作品是此DosTips线程的一部分.实际的代码可以在以下3个帖子中找到:

April 2016 Update - Based on investigative work by DosTips users Aacini, penpen, and aGerman, I have developed a relatively easy method to simultaneously access thousands of tokens using FOR /F. The work is part of this DosTips thread. The actual code can be found in these 3 posts:

  • Work with a fixed number of columns
  • Work with varying numbers of columns
  • Dynamically choose which tokens to expand within the DO clause

原始答案 FOR变量限制为单个字符,因此您的%% BL策略无法正常工作.变量区分大小写.根据Microsoft的说法,您只能在一个FOR语句中捕获26个令牌,但是,如果您使用的不只是alpha,则有可能获得更多.这很麻烦,因为您需要一个ASCII表来确定哪些字符在哪里.但是,FOR不允许仅使用任何字符,并且单个FOR/F可以分配的最大令牌数为31 +1.正如您所发现的那样,任何尝试解析和分配31个以上的尝试都将悄然失败.

Original Answer FOR variables are limited to a single character, so your %%BL strategy can't work. The variables are case sensitive. According to Microsoft you are limited to capturing 26 tokens within one FOR statement, but it is possible to get more if you use more than just alpha. Its a pain because you need an ASCII table to figure out which characters go where. FOR does not allow just any character however, and the maximum number of tokens that a single FOR /F can assign is 31 +1. Any attempt to parse and assign more than 31 will quietly fail, as you have discovered.

非常感谢,我认为您不需要那么多令牌.您只需使用TOKENS选项指定想要的令牌.

Thankfully, I don't think you need that many tokens. You simply specify which tokens you want with the TOKENS option.

for /f "usebackq tokens=7,12,15,18 delims=," %%A in ("%filename%") do echo %%A,%%B,%%C,%%D

将为您提供第7、12、15和18个令牌.

will give you your 7th, 12th, 15th and 18th tokens.

附录

2016年4月更新 几个星期前,我了解到以下规则(6年前编写)取决于代码页.以下数据已针对 代码页437和850进行了验证. 更重要的是,扩展ASCII字符128-254的FOR变量序列没有得到验证.与字节代码值匹配,并且因代码页而异.事实证明,FOR/F变量映射基于基础UTF-(16?)代码点.因此,与FOR/F一起使用时,扩展的ASCII字符的使用受到限制.请参见 http://www.dostips.com/上的线程. forum/viewtopic.php?f = 3& t = 7703 了解更多信息.

April 2016 Update A couple weeks ago I learned that the following rules (written 6 years ago) are code page dependent. The data below has been verified for code pages 437 and 850. More importantly, the FOR variable sequence of extended ASCII characters 128-254 does not match the byte code value, and varies tremendously by code page. It turns out the FOR /F variable mapping is based on the underlying UTF-(16?) code point. So the extended ASCII characters are of limited use when used with FOR /F. See the thread at http://www.dostips.com/forum/viewtopic.php?f=3&t=7703 for more information.

我进行了一些测试,并且可以报告以下(根据jeb的评论进行了更新):

I performed some tests, and can report the following (updated in response to jeb's comment):

大多数字符都可以用作FOR变量,包括扩展的ASCII 128-254.但是某些字符不能用于在FOR语句的第一部分中定义变量,而可以在DO子句中使用.几个都不能使用.有些没有限制,但是需要特殊的语法.

Most characters can be used as a FOR variable, including extended ASCII 128-254. But some characters cannot be used to define a variable in the first part of a FOR statement, but can be used in the DO clause. A few can't be used for either. Some have no restrictions, but require special syntax.

以下是有限制或需要特殊语法的字符的摘要.请注意,尖括号内的文本(如<space>)代表单个字符.

The following is a summary of characters that have restrictions or require special syntax. Note that text within angle brackets like <space> represents a single character.

Dec  Hex   Character   Define     Access
  0  0x00  <nul>       No         No
 09  0x09  <tab>       No         %%^<tab>  or  "%%<tab>"
 10  0x0A  <LF>        No         %%^<CR><LF><CR><LF>  or  %%^<LF><LF>
 11  0x0B  <VT>        No         %%<VT>
 12  0x0C  <FF>        No         %%<FF>
 13  0x0D  <CR>        No         No
 26  0x1A  <SUB>       %%%VAR%    %%%VAR% (%VAR% must be defined as <SUB>)
 32  0x20  <space>     No         %%^<space>  or  "%%<space>"
 34  0x22  "           %%^"       %%"  or  %%^"
 36  0x24  $           %%$        %%$ works, but %%~$ does not
 37  0x25  %           %%%%       %%~%%
 38  0x26  &           %%^&       %%^&  or  "%%&"
 41  0x29  )           %%^)       %%^)  or  "%%)"
 44  0x2C  ,           No         %%^,  or  "%%,"
 59  0x3B  ;           No         %%^;  or  "%%;"
 60  0x3C  <           %%^<       %%^<  or  "%%<"
 61  0x3D  =           No         %%^=  or  "%%="
 62  0x3E  >           %%^>       %%^>  or  "%%>"
 94  0x5E  ^           %%^^       %%^^  or  "%%^"
124  0x7C  |           %%^|       %%^|  or  "%%|"
126  0x7E  ~           %%~        %%~~ (%%~ may crash CMD.EXE if at end of line)
255  0xFF  <NB space>  No         No

^ < > | &之类的特殊字符必须转义或加引号.例如,以下作品:

Special characters like ^ < > | & must be either escaped or quoted. For example, the following works:

for /f %%^< in ("OK") do echo "%%<" %%^<

某些字符不能用于定义FOR变量.例如,以下给出了语法错误:

Some characters cannot be used to define a FOR variable. For example, the following gives a syntax error:

for /f %%^= in ("No can do") do echo anything

但是%%=可以使用TOKENS选项隐式定义,并且可以在DO子句中访问该值,如下所示:

But %%= can be implicitly defined by using the TOKENS option, and the value accessed in the DO clause like so:

for /f "tokens=1-3" %%^< in ("A B C") do echo %%^< %%^= %%^>

%很奇怪-您可以使用%%%%定义一个FOR变量.但是,除非使用~修饰符,否则无法访问该值.这意味着不能保留引号.

The % is odd - You can define a FOR variable using %%%%. But The value cannot be accessed unless you use the ~ modifier. This means enclosing quotes cannot be preserved.

for /f "usebackq tokens=1,2" %%%% in ('"A"') do echo %%%% %%~%%

以上结果为%% A

~是潜在危险的FOR变量.如果尝试在一行的末尾使用%%~访问变量,则可能会得到无法预料的结果,甚至可能使CMD.EXE崩溃!不受限制地访问它的唯一可靠方法是使用%%~~,它当然会剥去所有引号.

The ~ is a potentially dangerous FOR variable. If you attempt to access the variable using %%~ at the end of a line, you can get unpredictable results, and may even crash CMD.EXE! The only reliable way to access it without restrictions is to use %%~~, which of course strips any enclosing quotes.

for /f %%~ in ("A") do echo This can crash because its the end of line: %%~

for /f %%~ in ("A") do echo But this (%%~) should be safe

for /f %%~ in ("A") do echo This works even at end of line: %%~~

<SUB>(0x1A)字符是特殊字符,因为嵌入在批处理脚本中的<SUB>文字将作为换行符(<LF>)读取.为了将<SUB>用作FOR变量,必须将值以某种方式存储在环境变量中,然后%%%VAR%将同时用于定义和访问.

The <SUB> (0x1A) character is special because <SUB> literals embedded within batch scripts are read as linefeeds (<LF>). In order to use <SUB> as a FOR variable, the value must be somehow stored within an environment variable, and then %%%VAR% will work for both definition and access.

如前所述,单个FOR/F可以解析并最多分配31个令牌.例如:

As already stated, a single FOR /F can parse and assign a maximum of 31 tokens. For example:

@echo off
setlocal enableDelayedExpansion
set "str="
for /l %%n in (1 1 35) do set "str=!str! %%n"
for /f "tokens=1-31" %%A in ("!str!") do echo A=%%A _=%%_

上面的结果A=1 _=31 注意-令牌2-30正常工作,我只想举一个小例子

任何尝试分析和分配超过31个令牌的尝试都会在没有设置ERRORLEVEL的情况下自动失败.

Any attempt to parse and assign more than 31 tokens will silently fail without setting ERRORLEVEL.

@echo off
setlocal enableDelayedExpansion
set "str="
for /l %%n in (1 1 35) do set "str=!str! %%n"
for /f "tokens=1-32" %%A in ("!str!") do echo this example fails entirely

您最多可以解析并分配31个令牌,然后将剩余的令牌分配给另一个令牌:

You can parse and assign up to 31 tokens and assign the remainder to another token as follows:

@echo off
setlocal enableDelayedExpansion
set "str="
for /l %%0 in (1 1 35) do set "str=!str! %%n"
for /f "tokens=1-31*" %%@ in ("!str!") do echo @=%%A  ^^=%%^^  _=%%_

以上结果为@=1 ^=31 _=32 33 34 35

现在是真正的坏消息.正如我在查看

非常不幸的输出是A=1 B=31 C=%C

这篇关于Windows批处理脚本以解析CSV文件并输出文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆