如何使用批处理文件忽略对文本进行排序和求和的第一和第三数据列? [英] How to ignore first and third data column on sorting and summing the text using batch file?

查看:61
本文介绍了如何使用批处理文件忽略对文本进行排序和求和的第一和第三数据列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在下面做了这个脚本:

I have made this script below:

start /w "" "C:\Program Files (x86)\CoolUtils\Total PDF Converter\PDFConverter.exe" "%userprofile%\Desktop\teste.oxps" "%userprofile%\Desktop\teste.txt"
del /f /s /q "%userprofile%\Desktop\teste.pdf"
findstr /v /r /c:"http" /c:"Banrisul" /c:"Sac" /c:":" /c:"-" /c:"SAC" /c:"OUVIDORIA" /c:"B A N R I S U L" /c:"+" "teste.txt" > "output.txt"
for /f "usebackq tokens=* delims=" %%a in ("output.txt") do (echo(%%a)>>output2.txt
move /y output2.txt output.txt

我的output.txt是以下内容,该内容归纳为真实文件太大:

My output.txt is that below which is summarized as real file is too large:

                    01  VERO BANRICOMPRAS A PRAZO                         914709              77,56
                        VERO BANRICOMPRAS A PRAZO                         914710             322,58
                        VERO BANRICOMPRAS A VISTA                         256534             187,64
                        VERO BANRICOMPRAS A VISTA                         256539              17,62
                        VERO CARTAO CREDITO                               391534             146,22
                        VERO CARTAO CREDITO                               391535             159,26
                        VERO CARTAO CREDITO                               391536             543,16
                        VERO CARTAO CREDITO                               391537             479,37
                        VERO CARTAO CREDITO                               391538              96,00
                        REDECARD S.A CARTAO CREDITO                       415856             217,25
                        REDECARD S.A CARTAO CREDITO                       416552             152,28
                        VERO CARTAO DEBITO                                338711             144,60
                        VERO CARTAO DEBITO                                338712             378,03
                        VERO CARTAO DEBITO                                338713             134,34
                        CIELO S/A CARTAO DEBITO                           312125              57,71
                        CIELO S/A CARTAO DEBITO                           312126              23,46
                        CIELO S/A CARTAO DEBITO                           312127             232,55
                        RESGATE AUTOMATICO                                000000          17.700,00
                    02  VERO BANRICOMPRAS A PRAZO                         208538             396,99
                        VERO BANRICOMPRAS A PRAZO                         208539             217,65
                        VERO BANRICOMPRAS A VISTA                         562243             147,65
                        VERO CARTAO CREDITO                               692285             105,58
                        VERO CARTAO CREDITO                               692286              28,53
                        VERO CARTAO CREDITO                               692287             561,48
                        VERO CARTAO CREDITO                               692288             150,14
                        VERO CARTAO CREDITO                               692289             101,78
                        CIELO S/A CARTAO CREDITO                          673728             125,39
                        CIELO S/A CARTAO CREDITO                          673729              25,36
                        REDECARD S.A CARTAO CREDITO                       716879             278,83
                        REDECARD S.A CARTAO CREDITO                       720069              47,78
                        VERO CARTAO DEBITO                                637568             124,70
                        VERO CARTAO DEBITO                                637569             272,15
                        VERO CARTAO DEBITO                                637570             244,13
                        CIELO S/A CARTAO DEBITO                           609977             100,13
                    03  VERO BANRICOMPRAS A PRAZO                         501238             158,81

我陷入了如何求和每个相同文本的值,然后求和所有求和文本总和的问题.中间的数据列与首字母0102等无关紧要.仅应使用文本和最后一列.它必须像下面这样:

I'm stuck in how to sum the values of each same text, and then sum the total of all text summed. The middle data column and the initials 01, 02, etc. don't matter. Only the text and the last column should be used. It need be like that bellow:

        VERO BANRICOMPRAS A PRAZO                 20.596,26
        VERO BANRICOMPRAS A VISTA                 14.658,12
        VERO CARTAO CREDITO                        2.549,34
        (etc.)
        ---------------------------------------------------
        TOTAL OF ALL:                             37.803,72


只需在此链接中上传我的完整output.txt: https://ufile.io/uyxr1

关于我的output.txt文件:数据列之间只有空格,但是描述的字符数和数字始终会变化.但是对我来说唯一重要的是描述和最后一栏.

About my output.txt file: There are just spaces between the data columns, but the number of charachteres of the description and the numbers will always change. But what only matter for me is the description and the last column.

@Mofi,看看我的期望如何:

@Mofi, look how i expected:

------------ CARDS OF MONTH -----------
CIELO S/A CARTAO CREDITO        2.147,13
CIELO S/A CARTAO DEBITO        10.867,72
REDECARD S.A CARTAO CREDITO    11.835,11
REDECARD S.A CARTAO DEBITO         87,20
VERO BANRICOMPRAS A PRAZO      17.083,70
VERO BANRICOMPRAS A VISTA       7.829,06
VERO CARTAO CREDITO            58.052,05
VERO CARTAO DEBITO             17.215,64
----------------------------------------
TOTAL OF ALL:                 125.117,61
----------------------------------------
RESGATE AUTOMATICO            152.900,00
PAGAMENTO DE FORNECEDOR        25.900,00
DOC-E                          63.487,36

推荐答案

批处理文件用于执行一系列命令和应用程序,但不适用于文本文件编辑或数据计算.还有许多其他编程和脚本语言,比使用Windows命令处理器cmd.exe更好地完成此任务.我永远不会想到使用cmd.exe使用批处理文件来完成这样的任务.因此,即使仅使用Windows命令行支持的命令都可以做到这一点,我对这个任务也很感兴趣.是的,经过数小时的编码,我可以提供一个批处理文件,该文件对于有问题的示例适用.我尚未在整个文件上运行它.

Batch files are for executing a sequence of commands and applications, but not for text file editing or data calculations. There are many other programming and scripting languages which would be much better for this task than using Windows command processor cmd.exe. I would have never come to the idea doing such a task with a batch file using cmd.exe. For that reason I was interested in this task if it is even possible at all to do this with just commands supported on Windows command line. And yes, after hours of coding I can offer a batch file which worked for the example as posted in question. I have not run it on entire file.

@echo off
setlocal EnableExtensions DisableDelayedExpansion
set "DataCount=0"
set "MaxNameLength=19"
set "TempFile1=%TEMP%\Output.tmp"
set "TempFile2=%TEMP%\Sorted.tmp"
set "OutputFile=Output.txt"

"%ProgramFiles(x86)%\CoolUtils\Total PDF Converter\PDFConverter.exe" "%UserProfile%\Desktop\teste.oxps" "%UserProfile%\Desktop\teste.txt"
del /F "%UserProfile%\Desktop\teste.pdf"
%SystemRoot%\System32\findstr.exe /V /R /C:"http" /C:"Banrisul" /C:"Sac" /C:":" /C:"-" /C:"SAC" /C:"OUVIDORIA" /C:"B A N R I S U L" /C:"+" "%UserProfile%\Desktop\teste.txt" >"%TempFile1%"
if not exist "%TempFile1%" goto EndBatch

rem Process each line in the output file line by line with leading
rem spaces and tabs already removed by FOR, but no other characters.
for /F usebackq^ tokens^=*^ eol^= %%I in ("%TempFile1%") do (
    set "DataRow=%%~I"
    call :ProcessLine
)
goto FormatOutput


:ProcessLine
rem Remove all double quotes within the data row.
set "DataRow=%DataRow:"=%"
rem Remove all exclamation marks within the data row.
set "DataRow=%DataRow:!=%"
rem Remove all horizontal tabs within the data row by spaces.
set "DataRow=%DataRow:  = %"
rem Replace all $ by the string #DollarSign# within the data row.
set "DataRow=%DataRow:$=#DollarSign#%"
rem Replace two spaces in series by a dollar sign.
set "DataRow=%DataRow:  =$%"
rem Replace all occurrences of dollar sign plus space by dollar sign.
set "DataRow=%DataRow:$ =$%"

rem Split up the data row using dollar sign as delimiter which can result
rem in four or just three tokenized substrings depending on existence of
rem 01, 02, ... in first data column. Of interest is the name string in
rem first or second data column and the value string in third or fourth
rem data column.
for /F tokens^=1-4^ delims^=$^ eol^= %%J in ("%DataRow%") do (
    if "%%M" == "" (
        set "DataName=%%~J"
        set "DataValue=%%L"
    ) else (
        set "DataName=%%~K"
        set "DataValue=%%M"
    )
)

rem It is safe now to replace the string #DollarSign# back to dollar sign.
set "DataName=%DataName:#DollarSign#=$%"

rem If the first 2 characters of data name are two digits and third
rem character is a space then remove those 3 characters from data name.
if not "%DataName:~2,1%" == " " goto CheckName
if "%DataName:~0,1%" == ";" goto CheckName
for /F "delims=01234567890" %%J in ("%DataName:~0,2%") do goto CheckName
set "DataName=%DataName:~3%"

rem Data names not containing one of the following four strings
rem should be at end of the list and not included in the total sum.
:CheckName
if not "%DataName:DEBITO=%"  == "%DataName%" goto ReformatValue
if not "%DataName:CREDITO=%" == "%DataName%" goto ReformatValue
if not "%DataName:A VISTA=%" == "%DataName%" goto ReformatValue
if not "%DataName:A PRAZO=%" == "%DataName%" goto ReformatValue
set "DataName=z_%DataName%"

rem Floating point arithmetic is not supported by Windows command processor,
rem just signed 32-bit integer arithmetic with values in range -2147483648
rem to 2147483647. Therefore remove all dots and commas from data value.
:ReformatValue
set "DataValue=%DataValue:.=%"
set "DataValue=%DataValue:,=%"
for /F "tokens=* delims=0" %%J in ("%DataValue%") do set "DataValue=%%J"
if not defined DataValue set "DataValue=0"

rem Add the data value as integer to total sum. There is no special
rem overflow handling implemented yet in case of total sum exceeds
rem the maximum 32-bit positive signed integer value 2147483647.
if not "%DataName:~0,2%" == "z_" set /A "#TotalSum+=DataValue"

rem Find out if a data value of current data name is already in list
rem of environment variables and in this case just add the data value
rem to the existing sum for data data name and exit the subroutine.
for /F "tokens=1* delims==" %%J in ('set $\ 2^>nul') do if /I "%%K" == "%DataName%" set /A "#%%~nJ+=DataValue" & goto :EOF

rem This is a new data name with its first data value. So set
rem the appropriate environment variables for name and value.
set /A DataCount+=1
set "$\%DataCount%=%DataName%"
set "#%DataCount%=%DataValue%"

rem For a later aligned output find out the length of the data name and
rem remember its length if being greater than longest data name up to now.
rem z_ added at beginning of some data names must be always ignored for
rem length of data name.
set "NameLength=1"
:GetNameLength
set "DataName=%DataName:~1%"
if not "%DataName%" == "" set /A "NameLength+=1" & goto GetNameLength
if "%DataName:~0,2%" == "z_" set /A NameLength-=2
if %NameLength% GTR %MaxNameLength% set "MaxNameLength=%NameLength%"

rem Exit the subroutine.
goto :EOF


:FormatOutput
rem Add 3 to maximum name length to have always at least 3 spaces
rem between longest data name and the sum of the data values.
set /A MaxNameLength+=3

setlocal EnableDelayedExpansion
rem Build a string consisting of spaces according to maximum name length.
set "SpacesName="
for /L %%I in (1,1,%MaxNameLength%) do set "SpacesName=!SpacesName! "

rem Format value of total sum.
call :FormatNumber %#TotalSum%

rem For a later aligned output find out the length of the total
rem sum which is expected to be the greatest value in output.
set "ValueLength=1"
:GetValueLength
set "DataValue=%DataValue:~1%"
if not "%DataValue%" == "" set /A "ValueLength+=1" & goto GetValueLength

rem Build a string consisting of spaces according to maximum name length.
set "SpacesValue="
for /L %%I in (1,1,%ValueLength%) do set "SpacesValue=!SpacesValue! "

rem Output the data names with appropriate number of aligning spaces
rem and the data sum for each data name formatted and with aligning
rem spaces into the output file.
set /A ExtraNameLength=MaxNameLength+2
del "%TempFile1%" 2>nul
(for /F "tokens=1* delims==" %%I in ('set $\ 2^>nul') do (
    set "DataName=%%J%SpacesName%"
    if not "!DataName:~0,2!" == "z_" (
        set "DataName=!DataName:~0,%MaxNameLength%!"
    ) else set "DataName=!DataName:~0,%ExtraNameLength%!"
    call :FormatNumber !#%%~nI!
    set "DataValue=%SpacesValue%!DataValue!"
    set "DataValue=!DataValue:~-%ValueLength%!"
    echo !DataName!!DataValue!
))>>"%TempFile1%"

rem Sort the lines in output file according to name.
%SystemRoot%\System32\sort.exe "%TempFile1%" /O "%TempFile2%"
del "%TempFile1%" 2>nul

rem 16 is the length of string CARDS OF MONTH with a space on both sides.
set /A LineLength=MaxNameLength+ValueLength
set /A HeadLength=(LineLength-16) / 2

rem Build the heading for the output file with centered CARDS OF MONTH
rem and create the output file with this heading as first line.
set "HyphensHead="
for /L %%I in (1,1,%HeadLength%) do set "HyphensHead=!HyphensHead!-"
set "Heading=%HyphensHead% CARDS OF MONTH %HyphensHead%"
set /A HeadLength=HeadLength*2 + 16
if not %HeadLength% == %LineLength% set "Heading=%Heading%-"
echo %Heading%>"%OutputFile%"

set "TotalSumWrite=1"
for /F usebackq^ tokens^=*^ eol^= %%I in ("%TempFile2%") do (
    set "DataRow=%%I"
    if defined TotalSumWrite (
        if not "!DataRow:~0,2!" == "z_" (
            echo !DataRow!>>"%OutputFile%"
        ) else (
            rem Append the total summary to the output file.
            set "TotalSumWrite="
            >>"%OutputFile%" echo %SpacesName: =-%%SpacesValue: =-%
            set "DataName=TOTAL OF ALL:%SpacesName%"
            set "DataName=!DataName:~0,%MaxNameLength%!"
            call :FormatNumber %#TotalSum%
            set "DataValue=%SpacesValue%!DataValue!"
            set "DataValue=!DataValue:~-%ValueLength%!"
            >>"%OutputFile%" echo !DataName!!DataValue!
            >>"%OutputFile%" echo %SpacesName: =-%%SpacesValue: =-%
            >>"%OutputFile%" echo !DataRow:~2!
        )
    ) else echo !DataRow:~2!>>"%OutputFile%"
)

if defined TotalSumWrite (
    >>"%OutputFile%" echo %SpacesName: =-%%SpacesValue: =-%
    set "DataName=TOTAL OF ALL:%SpacesName%"
    set "DataName=!DataName:~0,%MaxNameLength%!"
    call :FormatNumber %#TotalSum%
    set "DataValue=%SpacesValue%!DataValue!"
    set "DataValue=!DataValue:~-%ValueLength%!"
    >>"%OutputFile%" echo !DataName!!DataValue!
)

del "%TempFile2%" 2>nul
endlocal
goto EndBatch


rem The subroutine below reformats 0 to 99 to 0,00 to 0,99 and inserts
rem dots on larger values after a series of 3 digits left to the comma.

:FormatNumber
set "DataValue=%1"
if "%DataValue:~1,1%" == "" set "DataValue=0%DataValue%"
if "%DataValue:~2,1%" == "" set "DataValue=0%DataValue%"
set "DataValue=%DataValue:~0,-2%,%DataValue:~-2%"
if not "%DataValue:~6,1%" == "" set "DataValue=%DataValue:~0,-6%.%DataValue:~-6%"
if not "%DataValue:~10,1%" == "" set "DataValue=%DataValue:~0,-10%.%DataValue:~-10%"
goto :EOF

:EndBatch
endlocal

注释1:在代码中只有一行,其中水平制表符必须在该行中,而在浏览器中根据HTML规范显示时,则不能有两个空格.因此,搜索包含字符串 horizo​​ntal tab 的注释,并用制表符替换:=之间下面一行的空格.

Note 1: There is one line in code where a horizontal tab character must be in the line and not two spaces as the browsers display according to HTML specification. So search for the comment containing the string horizontal tab and replace the spaces in the line below between : and = by a tab character.

注释2:阅读以rem开头的注释.批处理代码在处理每一行之前从所有行中删除所有"和所有!.

Note 2: Read the comments which are the lines starting with rem. The batch code removes all " as well as all ! from all lines before processing each line.

注释3: Windows命令处理器不支持浮点运算.它仅支持32位有符号整数算术运算.因此,每个总和包括在计算中变得大于21.474.836,47的总和,在超过此限制时都是错误的.如果确实需要根据数据,那么当然可以使用额外的代码来解决此限制.

Note 3: Windows command processor does not support floating point arithmetic. It supports only 32-bit signed integer arithmetic. So each sum including the total sum which becomes greater than 21.474.836,47 during the calculation is wrong on exceeding this limit. It would be of course possible to work around this limit with extra code if that would be really needed depending on the data.

注释4:批处理代码将两个或多个串联的空格解释为数据列之间的分隔符.因此,如果第二个数据列中的任何字符串偶然有两个空格串联,则批处理文件至少对于该数据行会产生错误的结果.

Note 4: The batch code interprets two or more spaces in series as separator between the data columns. So if any string in second data column has by chance two spaces in series, the batch file produces a wrong result for at least this data row.

注5::批处理文件需要一些时间才能完成任务,因为Windows命令处理器并不是真正为此类任务设计的.

Note 5: The batch file takes some time to finish the task because of Windows command processor is not really designed for such tasks.

要了解所使用的命令及其工作方式,请打开命令提示符窗口,在其中执行以下命令,并非常仔细地阅读每个命令显示的所有帮助页面.

For understanding the used commands and how they work, open a command prompt window, execute there the following commands, and read entirely all help pages displayed for each command very carefully.

  • call /?
  • del /?
  • echo /?
  • endlocal /?
  • findstr /?
  • for /?
  • goto /?
  • if /?
  • rem /?
  • set /?
  • setlocal /?
  • sort /?
  • call /?
  • del /?
  • echo /?
  • endlocal /?
  • findstr /?
  • for /?
  • goto /?
  • if /?
  • rem /?
  • set /?
  • setlocal /?
  • sort /?

另请参阅:

  • Where does GOTO :EOF return to?
  • Single line with multiple commands using Windows batch file
  • Microsoft article about Using command redirection operators

编辑:请使用此代码,该代码是此处发布的代码的增强版本如果在TOTAL OF ALL:行上输出的总和以下有一个总和,且其值具有更多字符,则输出的所有总和都是正确的.

Please use this code which is an enhanced version of the code posted here which outputs all sums correct in case of one sum below the sum output on line TOTAL OF ALL: has a value with more characters.

这篇关于如何使用批处理文件忽略对文本进行排序和求和的第一和第三数据列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆