加快文件夹重组代码 [英] speed up folder reorg code

查看:58
本文介绍了加快文件夹重组代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有Rojo和Magoo帮我编写的一些CMD代码,可以对目录中的某些XML文件运行.该代码从文件名中的文件中获取日期和时间,并从中创建一个年月文件夹,然后将文件移入其中.我遇到的问题是文件夹本身包含914,000个xml文件,脚本无法处理.我需要更快的方法或多线程脚本的方法.我正在考虑的另一种选择是一次移动几千个文件,然后只在temp目录中的文件上运行它,然后在脚本的最后将这些文件夹移动到生产位置.这是代码和另一个脚本,用于创建要测试的XML文件.日期尚未验证,但对于本练习而言,日期不必如此.这将在Microsoft Server 2012 R2 VM上运行. 正在运行处理器Intel Xeon(R)CPU E5-2650 0 @ 2.00GHz,2000 Mhz,1个Core,1个逻辑处理器和4 gig ram.我还提供了Powershell和VbScript标记,以防有人可以为使用这些语言编写代码提供任何建议.

XML移动脚本

@ECHO OFF
SETLOCAL
Title Reorganizing XMLs - DO NOT CLOSE THIS WINDOW!
color 0F
mode con: cols=100 lines=6
prompt $t $d$_$p$g

::Get start time
for /F "tokens=1-4 delims=:.," %%a in ("%time%") do (
     set /A "start=(((%%a*60)+1%%b %% 100)*60+1%%c %% 100)*100+1%%d %% 100"
)

Echo Start time: %start%

set "sourcedir=C:\Temp\TestDummyFiles"
set "tempdir=C:\temp\xmlreorgtemp"

::call :Get1000Files %sourcedir% %tempdir% %total%

pushd %sourcedir%
SET "spinChars=\|/-"
for /f %%a in ('"prompt $H&for %%b in (1) do rem"') do set "BS=%%a"
set "spaces=          "
SET /a filesMoved = 0, spinPos = 0, prev = 0

echo Moving XML Files...

setlocal enabledelayedexpansion
for /L %%I in (1,1,7) do set "BS=!BS!!BS!"
for /L %%I in (1,1,3) do set "spaces=!spaces!!spaces!"

For %%A in (%sourcedir%\*.xml) do set /a cnt+=1
echo.
Echo Total XML files: %cnt%
echo.

FOR /f "tokens=1*delims=" %%a IN ('dir /b /a-d "%sourcedir%\*.xml" ' ) DO (
        set /a filesmoved += 1 
        call :spinner !filesmoved! "%%~nxa"
)
call :spinner %filesMoved% Done.

for /F "tokens=1-4 delims=:.," %%a in ("%time%") do (
     set /A "end=(((%%a*60)+1%%b %% 100)*60+1%%c %% 100)*100+1%%d %% 100"
)

echo End time: %end%
set /A elapsed=end-start

rem Show elapsed time:
set /A hh=elapsed/(60*60*100), rest=elapsed%%(60*60*100), mm=rest/(60*100), rest%%=60*100, ss=rest/100, cc=rest%%100
if %mm% lss 10 set mm=0%mm%
if %ss% lss 10 set ss=0%ss%
if %cc% lss 10 set cc=0%cc%
echo Elapsed Time: %hh%:%mm%:%ss%
endlocal & echo;
exit /b 0

:Get1000Files
@echo off
setlocal enabledelayedexpansion
for /f %%a in ('dir "%~1" /b /a-d *.xml') do (
    set /a cnt+=1 & move "%%~a" "%~2"
        if !cnt! EQU 1000 exit /b
)
exit /b

:spinner <filecount> <filename>
set /a spinPos += 1, spinPos %%= 4, ten = %~1 / 10 * 10
if "%~2"=="Done." set ten=%~1
set "str=[!spinChars:~%spinPos%,1!] %ten% files moved... [%~2]"
set "str=%str:~0,79%"
call :length len "%str%"
set /a diff = 79 - len
if %diff% gtr 0 set "str=%str%!spaces:~-%diff%!"
set /P "=!BS:~-79!%str%"<NUL
if "%~2" NEQ "Done." call :process %~2
exit /b 0

:length <return_var> <string>
setlocal enabledelayedexpansion
if "%~2"=="" (set ret=0) else set ret=1
set "tmpstr=%~2"
for %%I in (4096 2048 1024 512 256 128 64 32 16 8 4 2 1) do (
        if not "!tmpstr:~%%I,1!"=="" (
                set /a ret += %%I
                set "tmpstr=!tmpstr:~%%I!"
        )
)
endlocal & set "%~1=%ret%"
exit /b 0

:process
FOR /f "tokens=2,3,6delims=_" %%m IN ("%~1") DO SET "date1=%%m"&SET "date2=%%n"&SET "whichdate=%%o"
IF DEFINED whichdate SET "date1=%date2%"
IF NOT DEFINED date2 exit /b 1
If not exist .\%date1:~0,4%\%date1:~4,2% MD .\%date1:~0,4%\%date1:~4,2%
MOVE %~1 .\%date1:~0,4%\%date1:~4,2%\ > nul

和用于创建一些虚拟文件的脚本

@echo off
setlocal EnableDelayedExpansion

cd /d %~dp0
For /f %%a in ('copy /Z "%~dpf0" nul') Do set "CR=%%a"
set fileSize=%~Z1
set /a cnt=0
echo Creating files. Please wait.&echo.
:loop
    if %cnt% GTR 5000 exit /b
    set /a cnt+=1
    set /p "=Creating %cnt% File(s)       !CR!"<nul:
    Call :random 2009 2015 yyyy
    call :random 1 12 mm
    call :random 1 31 dd
    if %mm% LSS 10 set mm=0%mm%
    if %dd% LSS 10 set dd=0%dd%
    set /P "=0" > thisSize.txt < NUL
    (for /L %%i in (0,1,30) do (
         set /A "bit=(1<<%%i)&fileSize, fileSize&=~(1<<%%i)"
         if !bit! neq 0 type thisSize.txt
         if !fileSize! neq 0 type thisSize.txt >> thisSize.txt
    )) > IDABCDEFG001_STUFF_%yyyy%%mm%%dd%_ABC_0_1234567890.xml
    del thisSize.txt
goto :loop 
exit /b

:random Min Max [RtnVar]
    @echo off & setlocal
    set /a rtn=%random% %% ((%~2)-(%~1)+1) + (%~1)
    (endlocal
        if "%~3" neq "" (set %~3=%rtn%) else echo:%rtn%
    )
exit /b

服务器上装有Powershell 4.

解决方案

不是powershell,但这也许可以完成工作

 @echo off
    setlocal enableextensions disabledelayedexpansion

    set "xmlFolder=C:\Temp\TestDummyFiles"

    pushd "%xmlFolder%" && (
        for %%x in ("*_*_*.xml") do if exist "%%x" (
            for /f "tokens=2-4 delims=_" %%a in ("%%~nx") do if "%%c"=="" (set "fileDate=%%a") else (set "fileDate=%%b")
            setlocal enabledelayedexpansion
            for /f "tokens=1,2" %%a in ("!fileDate:~0,4! !fileDate:~4,2!") do (
                endlocal
                <nul set /p "=%%a\%%b : "
                md ".\%%a\%%b" 2>nul 
                move /y "*_%%a%%b??_*.xml" ".\%%a\%%b" 2>nul | find /v ":"
            )
        )
        popd
    )
 

代码运行缓慢的三个原因(由于您正在处理914000个文件而感到不高兴):

  1. 有914000个!!文件
  2. call的使用速度很慢. 914000 *每个文件的调用次数=非常慢
  3. 对控制台的914000状态更新缓慢
  4. for /f

是的,在

FOR /f "tokens=1*delims=" %%a IN ('dir /b /a-d "%sourcedir%\*.xml" ' ) DO (
for /f %%a in ('dir "%~1" /b /a-d *.xml') do (

中使用的for /f命令

FOR /f "tokens=1*delims=" %%a IN ('dir /b /a-d "%sourcedir%\*.xml" ' ) DO (
for /f %%a in ('dir "%~1" /b /a-d *.xml') do (

有一个问题,因为:

  1. dir命令必须枚举914000个文件
  2. 完整列表需要先加载到内存中,然后才能开始处理
  3. for /f命令将数据加载到缓冲区中.当缓冲区已满时,将创建一个更大的新缓冲区(在Windows 7中为4KB增加),并将数据从旧缓冲区复制到新缓冲区,并重复此过程,直到检索到所有数据为止.每次调整缓冲区大小时,都需要执行较大的内存复制操作,因此处理所有数据所需的时间呈指数增长.

这是

914000 files * ( 50 chars file name + CR LF ) = 47528000 characters
47528000 characters / 4KB buffer increase = 11603 redim operations
11603 redim operations = 1103170928640 bytes moved in memory copy operations

要处理所有这些问题,建议的代码将

  1. 使用简单的for枚举文件.该过程从找到第一个文件开始,并在迭代文件时执行更多搜索操作.

  2. 与日期匹配的所有文件都仅通过一次move操作移动,而不是处理每个文件.

I have some CMD code that Rojo and Magoo helped me write that runs against some XML files in a directory. The code grabs a date and time from the files in the file name and creates a year and month folder from it and then moves the files into them. The problem that I'm having is the folder itself contains 914,000 xml files and the script just can't handle it. I need something faster or a way to multithread the script. Another option I was considering is to move a few thousand files at a time and just run it on those from a temp directory and at the very end of the script move those folders into the production location. Here is the code and another script to create the XML files to test. The date isn't validated but for this exercise, they don't need to be. This will be running on a Microsoft Server 2012 R2 VM. running Processor Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz, 2000 Mhz, 1 Core(s), 1 Logical Processor(s) and 4 gigs of ram. I'm also including the Powershell and VbScript tags in case someone can offer any advise for writing the code in those languages.

XML move script

@ECHO OFF
SETLOCAL
Title Reorganizing XMLs - DO NOT CLOSE THIS WINDOW!
color 0F
mode con: cols=100 lines=6
prompt $t $d$_$p$g

::Get start time
for /F "tokens=1-4 delims=:.," %%a in ("%time%") do (
     set /A "start=(((%%a*60)+1%%b %% 100)*60+1%%c %% 100)*100+1%%d %% 100"
)

Echo Start time: %start%

set "sourcedir=C:\Temp\TestDummyFiles"
set "tempdir=C:\temp\xmlreorgtemp"

::call :Get1000Files %sourcedir% %tempdir% %total%

pushd %sourcedir%
SET "spinChars=\|/-"
for /f %%a in ('"prompt $H&for %%b in (1) do rem"') do set "BS=%%a"
set "spaces=          "
SET /a filesMoved = 0, spinPos = 0, prev = 0

echo Moving XML Files...

setlocal enabledelayedexpansion
for /L %%I in (1,1,7) do set "BS=!BS!!BS!"
for /L %%I in (1,1,3) do set "spaces=!spaces!!spaces!"

For %%A in (%sourcedir%\*.xml) do set /a cnt+=1
echo.
Echo Total XML files: %cnt%
echo.

FOR /f "tokens=1*delims=" %%a IN ('dir /b /a-d "%sourcedir%\*.xml" ' ) DO (
        set /a filesmoved += 1 
        call :spinner !filesmoved! "%%~nxa"
)
call :spinner %filesMoved% Done.

for /F "tokens=1-4 delims=:.," %%a in ("%time%") do (
     set /A "end=(((%%a*60)+1%%b %% 100)*60+1%%c %% 100)*100+1%%d %% 100"
)

echo End time: %end%
set /A elapsed=end-start

rem Show elapsed time:
set /A hh=elapsed/(60*60*100), rest=elapsed%%(60*60*100), mm=rest/(60*100), rest%%=60*100, ss=rest/100, cc=rest%%100
if %mm% lss 10 set mm=0%mm%
if %ss% lss 10 set ss=0%ss%
if %cc% lss 10 set cc=0%cc%
echo Elapsed Time: %hh%:%mm%:%ss%
endlocal & echo;
exit /b 0

:Get1000Files
@echo off
setlocal enabledelayedexpansion
for /f %%a in ('dir "%~1" /b /a-d *.xml') do (
    set /a cnt+=1 & move "%%~a" "%~2"
        if !cnt! EQU 1000 exit /b
)
exit /b

:spinner <filecount> <filename>
set /a spinPos += 1, spinPos %%= 4, ten = %~1 / 10 * 10
if "%~2"=="Done." set ten=%~1
set "str=[!spinChars:~%spinPos%,1!] %ten% files moved... [%~2]"
set "str=%str:~0,79%"
call :length len "%str%"
set /a diff = 79 - len
if %diff% gtr 0 set "str=%str%!spaces:~-%diff%!"
set /P "=!BS:~-79!%str%"<NUL
if "%~2" NEQ "Done." call :process %~2
exit /b 0

:length <return_var> <string>
setlocal enabledelayedexpansion
if "%~2"=="" (set ret=0) else set ret=1
set "tmpstr=%~2"
for %%I in (4096 2048 1024 512 256 128 64 32 16 8 4 2 1) do (
        if not "!tmpstr:~%%I,1!"=="" (
                set /a ret += %%I
                set "tmpstr=!tmpstr:~%%I!"
        )
)
endlocal & set "%~1=%ret%"
exit /b 0

:process
FOR /f "tokens=2,3,6delims=_" %%m IN ("%~1") DO SET "date1=%%m"&SET "date2=%%n"&SET "whichdate=%%o"
IF DEFINED whichdate SET "date1=%date2%"
IF NOT DEFINED date2 exit /b 1
If not exist .\%date1:~0,4%\%date1:~4,2% MD .\%date1:~0,4%\%date1:~4,2%
MOVE %~1 .\%date1:~0,4%\%date1:~4,2%\ > nul

And the script to create some dummy files

@echo off
setlocal EnableDelayedExpansion

cd /d %~dp0
For /f %%a in ('copy /Z "%~dpf0" nul') Do set "CR=%%a"
set fileSize=%~Z1
set /a cnt=0
echo Creating files. Please wait.&echo.
:loop
    if %cnt% GTR 5000 exit /b
    set /a cnt+=1
    set /p "=Creating %cnt% File(s)       !CR!"<nul:
    Call :random 2009 2015 yyyy
    call :random 1 12 mm
    call :random 1 31 dd
    if %mm% LSS 10 set mm=0%mm%
    if %dd% LSS 10 set dd=0%dd%
    set /P "=0" > thisSize.txt < NUL
    (for /L %%i in (0,1,30) do (
         set /A "bit=(1<<%%i)&fileSize, fileSize&=~(1<<%%i)"
         if !bit! neq 0 type thisSize.txt
         if !fileSize! neq 0 type thisSize.txt >> thisSize.txt
    )) > IDABCDEFG001_STUFF_%yyyy%%mm%%dd%_ABC_0_1234567890.xml
    del thisSize.txt
goto :loop 
exit /b

:random Min Max [RtnVar]
    @echo off & setlocal
    set /a rtn=%random% %% ((%~2)-(%~1)+1) + (%~1)
    (endlocal
        if "%~3" neq "" (set %~3=%rtn%) else echo:%rtn%
    )
exit /b

The server has Powershell 4 on it.

解决方案

Not powershell, but maybe this could do the work

@echo off
    setlocal enableextensions disabledelayedexpansion

    set "xmlFolder=C:\Temp\TestDummyFiles"

    pushd "%xmlFolder%" && (
        for %%x in ("*_*_*.xml") do if exist "%%x" (
            for /f "tokens=2-4 delims=_" %%a in ("%%~nx") do if "%%c"=="" (set "fileDate=%%a") else (set "fileDate=%%b")
            setlocal enabledelayedexpansion
            for /f "tokens=1,2" %%a in ("!fileDate:~0,4! !fileDate:~4,2!") do (
                endlocal
                <nul set /p "=%%a\%%b : "
                md ".\%%a\%%b" 2>nul 
                move /y "*_%%a%%b??_*.xml" ".\%%a\%%b" 2>nul | find /v ":"
            )
        )
        popd
    )

There are three reasons for your code to be slow (appart from the fact that you are handling 914000 files):

  1. There are 914000!! files
  2. call usage is slow. 914000 * #calls for each file = very slow
  3. 914000 status updates to console are slow
  4. for /f

Yes, the for /f commands used in

FOR /f "tokens=1*delims=" %%a IN ('dir /b /a-d "%sourcedir%\*.xml" ' ) DO (
for /f %%a in ('dir "%~1" /b /a-d *.xml') do (

have one problem because:

  1. The dir command has to enumerate the 914000 files
  2. The full list needs to be loaded into memory before starting to process it
  3. The for /f command loads data into a buffer. When the buffer is full a new bigger (4KB increase in windows 7) buffer is created and data is copied from the old buffer to the new and this process is repeated until all the data has been retrieved. Each time the buffer is resized a larger memory copy operation needs to be done so the time needed to handle all the data increases exponentially.

This means

914000 files * ( 50 chars file name + CR LF ) = 47528000 characters
47528000 characters / 4KB buffer increase = 11603 redim operations
11603 redim operations = 1103170928640 bytes moved in memory copy operations

To handle all this, the proposed code will

  1. Use a simple for to enumerate the files. The process starts on the first file being found and more search operations are done as the files are being iterated.

  2. Instead of processing each file, all the files matching a date are moved in only one move operation.

这篇关于加快文件夹重组代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆