递归清理分区上的 HTML 文件的最快方法? [英] Fastest way to recursively clean HTML files on a partition?
问题描述
用于清理 HTML 文件的脚本,即删除 </HTML>
标记(不带引号)之后的所有内容,以递归方式删除分区中的所有文件.这就像在病毒感染/注入多个 HTML 文件中的代码后恢复 Web 服务器内容
script for cleaning HTML files i.e. delete everything after </HTML>
tag (without quotes), for all files recursively in a partition. This would be like recovering Web server content after Virus infects/injects code in multiple HTML files
推荐答案
从顶级代码开始测试:
Dim aTests : aTests = Array( _
Array( "", "", "" ) _
, Array( "<html></html>junk", "</html>", "<html></html>" ) _
)
Dim aTest
For Each aTest In aTests
WScript.Echo qq(aTest(0))
WScript.Echo qq(aTest(1))
WScript.Echo qq(cutTail(aTest(0), aTest(1)))
Wscript.Echo CStr(aTest(2) = cutTail(aTest(0), aTest(1)))
WScript.Echo
Next
可以解决您的第一个子任务 - 清理字符串的函数:
a function that could solve your first sub task - cleaning a string:
Function cutTail(sTxt, sFnd)
cutTail = sTxt
Dim nPos : nPos = Instr(1, sTxt, sFnd, vbTextCompare)
If 0 < nPos Then cutTail = Left( sTxt, nPos + Len(sFnd) - 1)
End Function
编写一个裸骨 Sub 来遍历文件夹树并为找到的每个文件调用一个做我想做的"子:
Write a bare bones Sub to traverse a folder tree and call a "do what I want" Sub for each file found:
Sub walkDirs(oDir, fFile)
Dim oItem
For Each oItem In oDir.Files
fFile oItem
Next
For Each oItem In oDir.SubFolders
walkDirs oItem, fFile
Next
End Sub
使用提供的普通工人 Sub 试驾:
Test drive it with a trivial worker Sub provided:
Dim sRDir : sRDir = "..\data"
Dim fFile : Set fFile = GetRef("justPrint")
walkDirs goFS.GetFolder(sRDir), fFile
Sub justPrint(oFile)
WScript.Echo "Processing:", qq(oFile.Path)
End Sub
为清理文件的 worker Sub 编写第一次尝试"版本:
Write a 'first attempt' version for a worker Sub that cleans a file:
Sub cleanHtml(oFile)
WScript.Echo "Processing:", qq(oFile.Path)
Dim sAll : sAll = cutTail(OFile.OpenAsTextStream(ForReading).ReadAll(),"</html>")
OFile.OpenAsTextStream(ForWriting).Write sAll
End Sub
在带有代表性示例文件集的测试文件夹上使用它.寻找问题:
Use it on a test folder with a representative sample set of files. Look for problems:
对于像这样的数据,cutTail 会失败:
Will cutTail fail for data like:
, Array( "<html></html>", "</HTml>", "<html></html>" ) _
, Array( "<html><!--</html>-->keep</html>junk", "</HTml>", "<html><!--</html>-->keep</html>" ) _
会不会因为安全限制遍历失败?
Will the traversal fail because of security restriction?
您的脚本会破坏 .js、.css 或 .jpg 文件吗?
Will your script clobber .js, .css, or .jpg files?
这篇关于递归清理分区上的 HTML 文件的最快方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!