如何使用批处理文件忽略对文本进行排序和求和的第一和第三数据列? [英] How to ignore first and third data column on sorting and summing the text using batch file?
问题描述
我在下面做了这个脚本:
I have made this script below:
start /w "" "C:\Program Files (x86)\CoolUtils\Total PDF Converter\PDFConverter.exe" "%userprofile%\Desktop\teste.oxps" "%userprofile%\Desktop\teste.txt"
del /f /s /q "%userprofile%\Desktop\teste.pdf"
findstr /v /r /c:"http" /c:"Banrisul" /c:"Sac" /c:":" /c:"-" /c:"SAC" /c:"OUVIDORIA" /c:"B A N R I S U L" /c:"+" "teste.txt" > "output.txt"
for /f "usebackq tokens=* delims=" %%a in ("output.txt") do (echo(%%a)>>output2.txt
move /y output2.txt output.txt
我的output.txt
是以下内容,该内容归纳为真实文件太大:
My output.txt
is that below which is summarized as real file is too large:
01 VERO BANRICOMPRAS A PRAZO 914709 77,56
VERO BANRICOMPRAS A PRAZO 914710 322,58
VERO BANRICOMPRAS A VISTA 256534 187,64
VERO BANRICOMPRAS A VISTA 256539 17,62
VERO CARTAO CREDITO 391534 146,22
VERO CARTAO CREDITO 391535 159,26
VERO CARTAO CREDITO 391536 543,16
VERO CARTAO CREDITO 391537 479,37
VERO CARTAO CREDITO 391538 96,00
REDECARD S.A CARTAO CREDITO 415856 217,25
REDECARD S.A CARTAO CREDITO 416552 152,28
VERO CARTAO DEBITO 338711 144,60
VERO CARTAO DEBITO 338712 378,03
VERO CARTAO DEBITO 338713 134,34
CIELO S/A CARTAO DEBITO 312125 57,71
CIELO S/A CARTAO DEBITO 312126 23,46
CIELO S/A CARTAO DEBITO 312127 232,55
RESGATE AUTOMATICO 000000 17.700,00
02 VERO BANRICOMPRAS A PRAZO 208538 396,99
VERO BANRICOMPRAS A PRAZO 208539 217,65
VERO BANRICOMPRAS A VISTA 562243 147,65
VERO CARTAO CREDITO 692285 105,58
VERO CARTAO CREDITO 692286 28,53
VERO CARTAO CREDITO 692287 561,48
VERO CARTAO CREDITO 692288 150,14
VERO CARTAO CREDITO 692289 101,78
CIELO S/A CARTAO CREDITO 673728 125,39
CIELO S/A CARTAO CREDITO 673729 25,36
REDECARD S.A CARTAO CREDITO 716879 278,83
REDECARD S.A CARTAO CREDITO 720069 47,78
VERO CARTAO DEBITO 637568 124,70
VERO CARTAO DEBITO 637569 272,15
VERO CARTAO DEBITO 637570 244,13
CIELO S/A CARTAO DEBITO 609977 100,13
03 VERO BANRICOMPRAS A PRAZO 501238 158,81
我陷入了如何求和每个相同文本的值,然后求和所有求和文本总和的问题.中间的数据列与首字母01
,02
等无关紧要.仅应使用文本和最后一列.它必须像下面这样:
I'm stuck in how to sum the values of each same text, and then sum the total of all text summed. The middle data column and the initials 01
, 02
, etc. don't matter. Only the text and the last column should be used. It need be like that bellow:
VERO BANRICOMPRAS A PRAZO 20.596,26
VERO BANRICOMPRAS A VISTA 14.658,12
VERO CARTAO CREDITO 2.549,34
(etc.)
---------------------------------------------------
TOTAL OF ALL: 37.803,72
只需在此链接中上传我的完整output.txt
: https://ufile.io/uyxr1
关于我的output.txt
文件:数据列之间只有空格,但是描述的字符数和数字始终会变化.但是对我来说唯一重要的是描述和最后一栏.
About my output.txt
file: There are just spaces between the data columns, but the number of charachteres of the description and the numbers will always change. But what only matter for me is the description and the last column.
@Mofi,看看我的期望如何:
@Mofi, look how i expected:
------------ CARDS OF MONTH -----------
CIELO S/A CARTAO CREDITO 2.147,13
CIELO S/A CARTAO DEBITO 10.867,72
REDECARD S.A CARTAO CREDITO 11.835,11
REDECARD S.A CARTAO DEBITO 87,20
VERO BANRICOMPRAS A PRAZO 17.083,70
VERO BANRICOMPRAS A VISTA 7.829,06
VERO CARTAO CREDITO 58.052,05
VERO CARTAO DEBITO 17.215,64
----------------------------------------
TOTAL OF ALL: 125.117,61
----------------------------------------
RESGATE AUTOMATICO 152.900,00
PAGAMENTO DE FORNECEDOR 25.900,00
DOC-E 63.487,36
推荐答案
批处理文件用于执行一系列命令和应用程序,但不适用于文本文件编辑或数据计算.还有许多其他编程和脚本语言,比使用Windows命令处理器cmd.exe
更好地完成此任务.我永远不会想到使用cmd.exe
使用批处理文件来完成这样的任务.因此,即使仅使用Windows命令行支持的命令都可以做到这一点,我对这个任务也很感兴趣.是的,经过数小时的编码,我可以提供一个批处理文件,该文件对于有问题的示例适用.我尚未在整个文件上运行它.
Batch files are for executing a sequence of commands and applications, but not for text file editing or data calculations. There are many other programming and scripting languages which would be much better for this task than using Windows command processor cmd.exe
. I would have never come to the idea doing such a task with a batch file using cmd.exe
. For that reason I was interested in this task if it is even possible at all to do this with just commands supported on Windows command line. And yes, after hours of coding I can offer a batch file which worked for the example as posted in question. I have not run it on entire file.
@echo off
setlocal EnableExtensions DisableDelayedExpansion
set "DataCount=0"
set "MaxNameLength=19"
set "TempFile1=%TEMP%\Output.tmp"
set "TempFile2=%TEMP%\Sorted.tmp"
set "OutputFile=Output.txt"
"%ProgramFiles(x86)%\CoolUtils\Total PDF Converter\PDFConverter.exe" "%UserProfile%\Desktop\teste.oxps" "%UserProfile%\Desktop\teste.txt"
del /F "%UserProfile%\Desktop\teste.pdf"
%SystemRoot%\System32\findstr.exe /V /R /C:"http" /C:"Banrisul" /C:"Sac" /C:":" /C:"-" /C:"SAC" /C:"OUVIDORIA" /C:"B A N R I S U L" /C:"+" "%UserProfile%\Desktop\teste.txt" >"%TempFile1%"
if not exist "%TempFile1%" goto EndBatch
rem Process each line in the output file line by line with leading
rem spaces and tabs already removed by FOR, but no other characters.
for /F usebackq^ tokens^=*^ eol^= %%I in ("%TempFile1%") do (
set "DataRow=%%~I"
call :ProcessLine
)
goto FormatOutput
:ProcessLine
rem Remove all double quotes within the data row.
set "DataRow=%DataRow:"=%"
rem Remove all exclamation marks within the data row.
set "DataRow=%DataRow:!=%"
rem Remove all horizontal tabs within the data row by spaces.
set "DataRow=%DataRow: = %"
rem Replace all $ by the string #DollarSign# within the data row.
set "DataRow=%DataRow:$=#DollarSign#%"
rem Replace two spaces in series by a dollar sign.
set "DataRow=%DataRow: =$%"
rem Replace all occurrences of dollar sign plus space by dollar sign.
set "DataRow=%DataRow:$ =$%"
rem Split up the data row using dollar sign as delimiter which can result
rem in four or just three tokenized substrings depending on existence of
rem 01, 02, ... in first data column. Of interest is the name string in
rem first or second data column and the value string in third or fourth
rem data column.
for /F tokens^=1-4^ delims^=$^ eol^= %%J in ("%DataRow%") do (
if "%%M" == "" (
set "DataName=%%~J"
set "DataValue=%%L"
) else (
set "DataName=%%~K"
set "DataValue=%%M"
)
)
rem It is safe now to replace the string #DollarSign# back to dollar sign.
set "DataName=%DataName:#DollarSign#=$%"
rem If the first 2 characters of data name are two digits and third
rem character is a space then remove those 3 characters from data name.
if not "%DataName:~2,1%" == " " goto CheckName
if "%DataName:~0,1%" == ";" goto CheckName
for /F "delims=01234567890" %%J in ("%DataName:~0,2%") do goto CheckName
set "DataName=%DataName:~3%"
rem Data names not containing one of the following four strings
rem should be at end of the list and not included in the total sum.
:CheckName
if not "%DataName:DEBITO=%" == "%DataName%" goto ReformatValue
if not "%DataName:CREDITO=%" == "%DataName%" goto ReformatValue
if not "%DataName:A VISTA=%" == "%DataName%" goto ReformatValue
if not "%DataName:A PRAZO=%" == "%DataName%" goto ReformatValue
set "DataName=z_%DataName%"
rem Floating point arithmetic is not supported by Windows command processor,
rem just signed 32-bit integer arithmetic with values in range -2147483648
rem to 2147483647. Therefore remove all dots and commas from data value.
:ReformatValue
set "DataValue=%DataValue:.=%"
set "DataValue=%DataValue:,=%"
for /F "tokens=* delims=0" %%J in ("%DataValue%") do set "DataValue=%%J"
if not defined DataValue set "DataValue=0"
rem Add the data value as integer to total sum. There is no special
rem overflow handling implemented yet in case of total sum exceeds
rem the maximum 32-bit positive signed integer value 2147483647.
if not "%DataName:~0,2%" == "z_" set /A "#TotalSum+=DataValue"
rem Find out if a data value of current data name is already in list
rem of environment variables and in this case just add the data value
rem to the existing sum for data data name and exit the subroutine.
for /F "tokens=1* delims==" %%J in ('set $\ 2^>nul') do if /I "%%K" == "%DataName%" set /A "#%%~nJ+=DataValue" & goto :EOF
rem This is a new data name with its first data value. So set
rem the appropriate environment variables for name and value.
set /A DataCount+=1
set "$\%DataCount%=%DataName%"
set "#%DataCount%=%DataValue%"
rem For a later aligned output find out the length of the data name and
rem remember its length if being greater than longest data name up to now.
rem z_ added at beginning of some data names must be always ignored for
rem length of data name.
set "NameLength=1"
:GetNameLength
set "DataName=%DataName:~1%"
if not "%DataName%" == "" set /A "NameLength+=1" & goto GetNameLength
if "%DataName:~0,2%" == "z_" set /A NameLength-=2
if %NameLength% GTR %MaxNameLength% set "MaxNameLength=%NameLength%"
rem Exit the subroutine.
goto :EOF
:FormatOutput
rem Add 3 to maximum name length to have always at least 3 spaces
rem between longest data name and the sum of the data values.
set /A MaxNameLength+=3
setlocal EnableDelayedExpansion
rem Build a string consisting of spaces according to maximum name length.
set "SpacesName="
for /L %%I in (1,1,%MaxNameLength%) do set "SpacesName=!SpacesName! "
rem Format value of total sum.
call :FormatNumber %#TotalSum%
rem For a later aligned output find out the length of the total
rem sum which is expected to be the greatest value in output.
set "ValueLength=1"
:GetValueLength
set "DataValue=%DataValue:~1%"
if not "%DataValue%" == "" set /A "ValueLength+=1" & goto GetValueLength
rem Build a string consisting of spaces according to maximum name length.
set "SpacesValue="
for /L %%I in (1,1,%ValueLength%) do set "SpacesValue=!SpacesValue! "
rem Output the data names with appropriate number of aligning spaces
rem and the data sum for each data name formatted and with aligning
rem spaces into the output file.
set /A ExtraNameLength=MaxNameLength+2
del "%TempFile1%" 2>nul
(for /F "tokens=1* delims==" %%I in ('set $\ 2^>nul') do (
set "DataName=%%J%SpacesName%"
if not "!DataName:~0,2!" == "z_" (
set "DataName=!DataName:~0,%MaxNameLength%!"
) else set "DataName=!DataName:~0,%ExtraNameLength%!"
call :FormatNumber !#%%~nI!
set "DataValue=%SpacesValue%!DataValue!"
set "DataValue=!DataValue:~-%ValueLength%!"
echo !DataName!!DataValue!
))>>"%TempFile1%"
rem Sort the lines in output file according to name.
%SystemRoot%\System32\sort.exe "%TempFile1%" /O "%TempFile2%"
del "%TempFile1%" 2>nul
rem 16 is the length of string CARDS OF MONTH with a space on both sides.
set /A LineLength=MaxNameLength+ValueLength
set /A HeadLength=(LineLength-16) / 2
rem Build the heading for the output file with centered CARDS OF MONTH
rem and create the output file with this heading as first line.
set "HyphensHead="
for /L %%I in (1,1,%HeadLength%) do set "HyphensHead=!HyphensHead!-"
set "Heading=%HyphensHead% CARDS OF MONTH %HyphensHead%"
set /A HeadLength=HeadLength*2 + 16
if not %HeadLength% == %LineLength% set "Heading=%Heading%-"
echo %Heading%>"%OutputFile%"
set "TotalSumWrite=1"
for /F usebackq^ tokens^=*^ eol^= %%I in ("%TempFile2%") do (
set "DataRow=%%I"
if defined TotalSumWrite (
if not "!DataRow:~0,2!" == "z_" (
echo !DataRow!>>"%OutputFile%"
) else (
rem Append the total summary to the output file.
set "TotalSumWrite="
>>"%OutputFile%" echo %SpacesName: =-%%SpacesValue: =-%
set "DataName=TOTAL OF ALL:%SpacesName%"
set "DataName=!DataName:~0,%MaxNameLength%!"
call :FormatNumber %#TotalSum%
set "DataValue=%SpacesValue%!DataValue!"
set "DataValue=!DataValue:~-%ValueLength%!"
>>"%OutputFile%" echo !DataName!!DataValue!
>>"%OutputFile%" echo %SpacesName: =-%%SpacesValue: =-%
>>"%OutputFile%" echo !DataRow:~2!
)
) else echo !DataRow:~2!>>"%OutputFile%"
)
if defined TotalSumWrite (
>>"%OutputFile%" echo %SpacesName: =-%%SpacesValue: =-%
set "DataName=TOTAL OF ALL:%SpacesName%"
set "DataName=!DataName:~0,%MaxNameLength%!"
call :FormatNumber %#TotalSum%
set "DataValue=%SpacesValue%!DataValue!"
set "DataValue=!DataValue:~-%ValueLength%!"
>>"%OutputFile%" echo !DataName!!DataValue!
)
del "%TempFile2%" 2>nul
endlocal
goto EndBatch
rem The subroutine below reformats 0 to 99 to 0,00 to 0,99 and inserts
rem dots on larger values after a series of 3 digits left to the comma.
:FormatNumber
set "DataValue=%1"
if "%DataValue:~1,1%" == "" set "DataValue=0%DataValue%"
if "%DataValue:~2,1%" == "" set "DataValue=0%DataValue%"
set "DataValue=%DataValue:~0,-2%,%DataValue:~-2%"
if not "%DataValue:~6,1%" == "" set "DataValue=%DataValue:~0,-6%.%DataValue:~-6%"
if not "%DataValue:~10,1%" == "" set "DataValue=%DataValue:~0,-10%.%DataValue:~-10%"
goto :EOF
:EndBatch
endlocal
注释1:在代码中只有一行,其中水平制表符必须在该行中,而在浏览器中根据HTML规范显示时,则不能有两个空格.因此,搜索包含字符串 horizontal tab 的注释,并用制表符替换:
和=
之间下面一行的空格.
Note 1: There is one line in code where a horizontal tab character must be in the line and not two spaces as the browsers display according to HTML specification. So search for the comment containing the string horizontal tab and replace the spaces in the line below between :
and =
by a tab character.
注释2:阅读以rem
开头的注释.批处理代码在处理每一行之前从所有行中删除所有"
和所有!
.
Note 2: Read the comments which are the lines starting with rem
. The batch code removes all "
as well as all !
from all lines before processing each line.
注释3: Windows命令处理器不支持浮点运算.它仅支持32位有符号整数算术运算.因此,每个总和包括在计算中变得大于21.474.836,47
的总和,在超过此限制时都是错误的.如果确实需要根据数据,那么当然可以使用额外的代码来解决此限制.
Note 3: Windows command processor does not support floating point arithmetic. It supports only 32-bit signed integer arithmetic. So each sum including the total sum which becomes greater than 21.474.836,47
during the calculation is wrong on exceeding this limit. It would be of course possible to work around this limit with extra code if that would be really needed depending on the data.
注释4:批处理代码将两个或多个串联的空格解释为数据列之间的分隔符.因此,如果第二个数据列中的任何字符串偶然有两个空格串联,则批处理文件至少对于该数据行会产生错误的结果.
Note 4: The batch code interprets two or more spaces in series as separator between the data columns. So if any string in second data column has by chance two spaces in series, the batch file produces a wrong result for at least this data row.
注5::批处理文件需要一些时间才能完成任务,因为Windows命令处理器并不是真正为此类任务设计的.
Note 5: The batch file takes some time to finish the task because of Windows command processor is not really designed for such tasks.
要了解所使用的命令及其工作方式,请打开命令提示符窗口,在其中执行以下命令,并非常仔细地阅读每个命令显示的所有帮助页面.
For understanding the used commands and how they work, open a command prompt window, execute there the following commands, and read entirely all help pages displayed for each command very carefully.
-
call /?
-
del /?
-
echo /?
-
endlocal /?
-
findstr /?
-
for /?
-
goto /?
-
if /?
-
rem /?
-
set /?
-
setlocal /?
-
sort /?
call /?
del /?
echo /?
endlocal /?
findstr /?
for /?
goto /?
if /?
rem /?
set /?
setlocal /?
sort /?
另请参阅:
- Where does GOTO :EOF return to?
- Single line with multiple commands using Windows batch file
- Microsoft article about Using command redirection operators
编辑:请使用此代码,该代码是此处发布的代码的增强版本如果在TOTAL OF ALL:
行上输出的总和以下有一个总和,且其值具有更多字符,则输出的所有总和都是正确的.
Please use this code which is an enhanced version of the code posted here which outputs all sums correct in case of one sum below the sum output on line TOTAL OF ALL:
has a value with more characters.
这篇关于如何使用批处理文件忽略对文本进行排序和求和的第一和第三数据列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!