递归清理分区上的 HTML 文件的最快方法? [英] Fastest way to recursively clean HTML files on a partition?

查看:14
本文介绍了递归清理分区上的 HTML 文件的最快方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

用于清理 HTML 文件的脚本,即删除 </HTML> 标记(不带引号)之后的所有内容,以递归方式删除分区中的所有文件.这就像在病毒感染/注入多个 HTML 文件中的代码后恢复 Web 服务器内容

script for cleaning HTML files i.e. delete everything after </HTML> tag (without quotes), for all files recursively in a partition. This would be like recovering Web server content after Virus infects/injects code in multiple HTML files

推荐答案

从顶级代码开始测试:

  Dim aTests : aTests = Array( _
      Array( "", "", "" ) _
    , Array( "<html></html>junk", "</html>", "<html></html>" ) _
  )
  Dim aTest
  For Each aTest In aTests
      WScript.Echo qq(aTest(0))
      WScript.Echo qq(aTest(1))
      WScript.Echo qq(cutTail(aTest(0), aTest(1)))
      Wscript.Echo CStr(aTest(2) = cutTail(aTest(0), aTest(1)))
      WScript.Echo
  Next

可以解决您的第一个子任务 - 清理字符串的函数:

a function that could solve your first sub task - cleaning a string:

Function cutTail(sTxt, sFnd)
  cutTail = sTxt
  Dim nPos : nPos = Instr(1, sTxt, sFnd, vbTextCompare)
  If 0 < nPos Then cutTail = Left( sTxt, nPos + Len(sFnd) - 1)
End Function

编写一个裸骨 Sub 来遍历文件夹树并为找到的每个文件调用一个做我想做的"子:

Write a bare bones Sub to traverse a folder tree and call a "do what I want" Sub for each file found:

Sub walkDirs(oDir, fFile)
  Dim oItem
  For Each oItem In oDir.Files
      fFile oItem
  Next
  For Each oItem In oDir.SubFolders
      walkDirs oItem, fFile
  Next
End Sub

使用提供的普通工人 Sub 试驾:

Test drive it with a trivial worker Sub provided:

  Dim sRDir : sRDir     = "..\data"
  Dim fFile : Set fFile = GetRef("justPrint")
  walkDirs goFS.GetFolder(sRDir), fFile

Sub justPrint(oFile)
  WScript.Echo "Processing:", qq(oFile.Path)
End Sub

为清理文件的 worker Sub 编写第一次尝试"版本:

Write a 'first attempt' version for a worker Sub that cleans a file:

Sub cleanHtml(oFile)
  WScript.Echo "Processing:", qq(oFile.Path)
  Dim sAll : sAll = cutTail(OFile.OpenAsTextStream(ForReading).ReadAll(),"</html>")
  OFile.OpenAsTextStream(ForWriting).Write sAll
End Sub

在带有代表性示例文件集的测试文件夹上使用它.寻找问题:

Use it on a test folder with a representative sample set of files. Look for problems:

对于像这样的数据,cutTail 会失败:

Will cutTail fail for data like:

, Array( "<html></html>", "</HTml>", "<html></html>" ) _
, Array( "<html><!--</html>-->keep</html>junk", "</HTml>", "<html><!--</html>-->keep</html>" ) _

会不会因为安全限制遍历失败?

Will the traversal fail because of security restriction?

您的脚本会破坏 .js、.css 或 .jpg 文件吗?

Will your script clobber .js, .css, or .jpg files?

这篇关于递归清理分区上的 HTML 文件的最快方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆