循环慢速处理,利用FINDSTR [英] Slow processing a for loop that utilizes findstr
问题描述
我有一个奇怪的有些情况下,当一个for循环是慢得令人难以置信,当我使用FINDSTR作为字符串做的。
其值得一提的是,文件(老file.xml
)我是处理包含大约200万行。
这部分是极快的,但如果我删除可以呈现慢|查找/ C:
REM发现XML的文件中的行总数
FINDSTR / N ^^老file.xml |查找/ C:> TEMP-count.txt
集/ p线=< TEMP-count.txt
在code这是缓慢的这个样子的,我不能使用管道上面的伎俩。这似乎是缓慢的部分是为
本身,因为我没有看到在标题栏中的任何进展,直到10分钟后。
SETLOCAL DisableDelayedExpansion
REM开始与正确的日期更换错误的日期
FOR / F有usebackq令牌= 1 * Delims =:我%%中(`FINDSTR / N ^^老file.xml')做(
REM缓存的每一行的值在变量
集读取行= %%Ĵ
集线= %%我
REM恢复延迟扩展
SETLOCAL EnableDelayedExpansion
在标题栏中写入物权进度
标题处理行:!行/%线%
REM删除尾随行号
REM SET读线=读行:*:=!
FOR / F有usebackq%% i的(%TMPFILE%)做(
REM替换所有错误的日期与正确的日期
!设置读取行=读行:%% I =%correctdate%!
)
REM结果写入新文件
回声(读线>!>中更新了-file.xml
REM结束地方
ENDLOCAL
)
编辑:
进一步调查显示我使用应显示绕环当前行号这一行就需要200万行我8MB文件约10分钟。这只是为了得到它开始显示的行。
FOR / F有usebackq令牌= 1 * Delims =:我%%中(`FINDSTR / N ^^老file.xml`)也呼应%%我
所以它看起来像 FINDSTR
正在写屏幕输出对用户隐藏了,但看到的为
-loop 。我怎样才能prevent这种情况的发生,同时还得到了相同的结果?
编辑2:解决方案
所建议的 Aacini 最后修订由我的解决方案。
这是一个更大的脚本片段。错误的日期是在另一个循环检索。和线的总数量也从另一个循环中检索。
SETLOCAL enabledelayedexpansion
REM这部分是片段而已,是从最终脚本另一个循环生成的日期
回音2069年4月29日>日期,tmp.txt
回声二零六九年四月三十零日>>日期,tmp.txtFINDSTR / N ^^超级大File.xml> out.tmp集TMPFILE =日期,tmp.txt
集correctdate = 2011-11-25
设置错误的日期=
行REM硬codeD总数
套系= 186442
FOR / F %% i的(%TMPFILE%)做(
设置错误的日期!=错误的日期! %%一世
)
REM过程out.tmp和环路他们经过的每一行:ProcessLines
致电:ProcessLines< out.tmp
REM当上面要求在out.tmp每一行,转到出口成品
转到ProcessLinesEnd
:ProcessLines
在/ L %% L(1,1,线条%%),做(
集/ p读线=
在标题栏中写入物权进度
标题处理行:%% L /%线%
对于%% i的(%错日期%)做(
REM替换所有错误的日期与正确的日期
!设置读取行=读行:%% I =%correctdate%!
)
REM结果写入新文件
回声(读行:*:!=>>中out2.tmp
)
REM在这里结束及以下继续
GOTO:EOF:ProcessLinesEnd
回声这不应该被打印,直到通话结束:出口
退出/ B
下面两点:
1的 SETLOCAL EnableDelayedExpansion
命令执行的的文件的每一行。这意味着完整的环境大约200000次必须复制到一个新的本地存储区域。这可能会导致一些问题。
2 - 我建议你先从最基础的部分。需要多少时间的FINDSTR执行?运行 FINDSTR / N ^^老file.xml
独自试图修复任何其他部分之前检查。如果这个过程是快速的,然后再添加一个步骤,它和测试,直到你发现了减缓的原因。我建议你不要使用管道也不 FOR / F
在 FINDSTR
的执行,但在由$生成的文件p $ pvious重定向。
修改的一个更快的解决方案的
有另一种方式来做到这一点。你可以管FINDSTR输出成批量子程序,这样的线条可以用 SET / P
命令读取。这种方法可以处理通过的命令行susbtitution FOR / F
的通过延迟扩展,而不是完全的线条,所以对的SETLOCAL EnableDelayedExpansion
和 ENDLOCAL
命令不再是必要的。但是,如果你仍然想显示的行数,需要重新计算。
另外,也更快加载错误的日期在一个变量而不是过程的%TMPFILE%的大文件的每一行
SETLOCAL EnableDelayedExpansion
从TMPFILE REM负荷错日期
设置错误的日期=
FOR / F %% i的(%TMPFILE%)做(
设置错误的日期!=错误的日期! %%一世
)
回声创建FINDSTR输出,请稍候...
FINDSTR / N ^^老file.xml> findstr.txt
回声:EOF>> findstr.txt
REM开始与正确的日期更换错误的日期
致电:ProcessLines< findstr.txt
GOTO:EOF
:ProcessLines
设置行= 0
:读下一行
集/ p读线=
REM检查输入文件结束
如果!看行! ==:EOF的goto:EOF
在标题栏中写入物权进度
集/ A线+ = 1
标题处理行:%行%/%线%
对于%% i的(%错日期%)做(
REM替换所有错误的日期与正确的日期
!设置读取行=读行:%% I =%correctdate%!
)
REM结果写入新文件
回声(读行:*:!=>>中更新了-file.xml
REM回去下一行
转到读下一行
第二修改 更快的修改的
previous方法可slighlty加快如果循环是通过为/ L
命令,而不是通过转到$实现C $ C>。
:ProcessLines
在/ L %% L(1,1,线条%%),做(
集/ p读线=
在标题栏中写入物权进度
标题处理行:%% L /%线%
对于%% i的(%错日期%)做(
REM替换所有错误的日期与正确的日期
!设置读取行=读行:%% I =%correctdate%!
)
REM结果写入新文件
回声(读行:*:!=>>中更新了-file.xml
)
此修改还省略了:EOF比较和行数计算,所以时间增益可很有意义经过反复它20万次。如果你使用这种方法,不要忘记删除回响:EOF>>在第一部分findstr.txt
行。
I've got a somewhat weird case, where a for-loop is incredibly slow when I use findstr as the string for DO.
Its worth mentioning that the file (old-file.xml
) that I'm processing contains about 200 000 lines.
This part is blazing fast, but can be rendered slower if I remove | find /c ":"
rem find total number of lines in xml-file
findstr /n ^^ old-file.xml | find /c ":" > "temp-count.txt"
set /p lines=< "temp-count.txt"
The code which is slow looks like this and I can't use the pipe trick above. It seems like the slow part is the for
itself, as i'm not seeing any progress in the title bar until after 10 min.
setlocal DisableDelayedExpansion
rem start replacing wrong dates with correct date
for /f "usebackq Tokens=1* Delims=:" %%i in (`"findstr /n ^^ old-file.xml"`) do (
rem cache the value of each line in a variable
set read-line=%%j
set line=%%i
rem restore delayed expansion
setlocal EnableDelayedExpansion
rem write progress in title bar
title Processing line: !line!/%lines%
rem remove trailing line number
rem set read-line=!read-line:*:=!
for /f "usebackq" %%i in ("%tmpfile%") do (
rem replace all wrong dates with correct dates
set read-line=!read-line:%%i=%correctdate%!
)
rem write results to new file
echo(!read-line!>>"Updated-file.xml"
rem end local
endlocal
)
EDIT:
Further investigation showed me that using this single line that should display the current line number being looped takes about 10 minutes on my 8MB file of 200 000 lines. That's just for getting it to start displaying the lines.
for /f "usebackq Tokens=1* Delims=:" %%i in (`"findstr /n ^^ old-file.xml"`) do echo %%i
So it seems like findstr
is writing screen output hidden for the user, but visible for the for
-loop. How can I prevent that from happening while still getting the same results?
EDIT 2: Solution
The solution as proposed by Aacini and finally revised by me.
This is a snippet from a much bigger script. Wrong dates are retrieved in another loop. And total number of lines are also retrieved from another loop.
setlocal enabledelayedexpansion
rem this part is for snippet only, dates are generated from another loop in final script
echo 2069-04-29 > dates-tmp.txt
echo 2069-04-30 >> dates-tmp.txt
findstr /n ^^ Super-Large-File.xml > out.tmp
set tmpfile=dates-tmp.txt
set correctdate=2011-11-25
set wrong-dates=
rem hardcoded total number of lines
set lines=186442
for /F %%i in (%tmpfile%) do (
set wrong-dates=!wrong-dates! %%i
)
rem process each line in out.tmp and loop them through :ProcessLines
call :ProcessLines < out.tmp
rem when finished with above call for each line in out.tmp, goto exit
goto ProcessLinesEnd
:ProcessLines
for /L %%l in (1,1,%lines%) do (
set /P read-line=
rem write progress in title bar
title Processing line: %%l/%lines%
for %%i in (%wrong-dates%) do (
rem replace all wrong dates with correct dates
set read-line=!read-line:%%i=%correctdate%!
)
rem write results to new file
echo(!read-line:*:=!>>"out2.tmp"
)
rem end here and continue below
goto :eof
:ProcessLinesEnd
echo this should not be printed until call has ended
:exit
exit /b
Two points here:
1- The setlocal EnableDelayedExpansion
command is executed with every line of the file. This means that about 200000 times the complete environment must be copied to a new local memory area. This may cause several problems.
2- I suggest you to start with the most basic part. How much time takes the findstr to execute? Run findstr /n ^^ old-file.xml
alone and check this before trying to fix any other part. If this process is fast, then add a single step to it and test again until you discover the cause of the slow down. I suggest you not use pipes nor for /f
over the execution of findstr
, but over the file generated by a previous redirection.
EDIT A faster solution
There is another way to do this. You may pipe findstr output into a Batch subroutine, so the lines can be read with SET /P
command. This method allows to process the lines entirely via delayed expansions and not via the command-line susbtitution of FOR /F
, so the pair of setlocal EnableDelayedExpansion
and endlocal
commands are no longer necessary. However, if you still want to display the line number it is necessary to calculate it again.
Also, it is faster to load the wrong dates in a variable instead of process the %tmpfile% with every line of the big file.
setlocal EnableDelayedExpansion
rem load wrong dates from tmpfile
set wrong-dates=
for /F %%i in (%tmpfile%) do (
set wrong-dates=!wrong-dates! %%i
)
echo creating findstr output, please wait...
findstr /n ^^ old-file.xml > findstr.txt
echo :EOF>> findstr.txt
rem start replacing wrong dates with correct date
call :ProcessLines < findstr.txt
goto :eof
.
:ProcessLines
set line=0
:read-next-line
set /P read-line=
rem check if the input file ends
if !read-line! == :EOF goto :eof
rem write progress in title bar
set /A line+=1
title Processing line: %line%/%lines%
for %%i in (%wrong-dates%) do (
rem replace all wrong dates with correct dates
set read-line=!read-line:%%i=%correctdate%!
)
rem write results to new file
echo(!read-line:*:=!>>"Updated-file.xml"
rem go back for next line
goto read-next-line
SECOND EDIT An even faster modification
Previous method may be slighlty speeded up if the loop is achieved via for /L
command instead of via a goto
.
:ProcessLines
for /L %%l in (1,1,%lines%) do (
set /P read-line=
rem write progress in title bar
title Processing line: %%l/%lines%
for %%i in (%wrong-dates%) do (
rem replace all wrong dates with correct dates
set read-line=!read-line:%%i=%correctdate%!
)
rem write results to new file
echo(!read-line:*:=!>>"Updated-file.xml"
)
This modification also omit the :EOF comparison and the calculation of line number, so the time gain may be significative after repeated it 200000 times. If you use this method, don't forget to delete the echo :EOF>> findstr.txt
line in first part.
这篇关于循环慢速处理,利用FINDSTR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!