Windows批处理脚本从html文件中查找字符串并将其复制到文本文件 [英] Windows batch script to find string from html file and copy it to text file

查看:304
本文介绍了Windows批处理脚本从html文件中查找字符串并将其复制到文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以,我有一行HTML文件,例如:

So, i have html file with a line like:

<script data-cfasync="false" type="text/javascript"> fid="RandonString"; v_width=620; v_height=490;</script>

我的任务是找到 fid ="RandomString" ,并将"之间的所有内容复制到文本文件,而无需使用任何外部软件. RandonString的长度为2-100个字符.

My task is to find fid="RandomString" and copy everything between "" to a text file without using any external software. RandonString is 2-100 characters long.

推荐答案

这并不困难,这里有5行解决方案:

It's not all that hard, here's a 5 line solution:

set "x=<script data-cfasync="false" type="text/javascript"> fid="RandonString"; v_width=620; v_height=490;</script>"
set "x=%x:*fid=%"
set "x=%x:";="&rem %
set  x=%x:~2%
echo %x%


对正在发生的事情的解释.


An explanation of what is going on.

您必须处理5个特殊字符,字符串中的<>="和用于修剪尾随数据的&字符.

You have to deal with 5 special characters, the <, >, =, " in your string, and the & character used to trim trailing data.

第1-3行: <>都是重定向字符,因此要处理它们,必须将 whole 变量用双引号括起来(" ). 但是,您不希望将双引号添加到变量本身.

Lines 1-3: <> are both redirection characters, so to deal with them, it's required that the whole variable be surrounded by doublequotes ("). BUT you don't want the double quotes to be added to the variable itself.

第1行,将第一个引号放在要设置的变量("x=)之前之前,将第二个引号放在之后,要设置的数据(<script data-cfasync="false" type="text/javascript"> fid="RandonString"; v_width=620; v_height=490;</script>),命令 SET 识别出引号不包含在变量中.因此,可以无错误地设置带有特殊字符的变量数据. (将引号放入变量数据中也可以,但在变量数据中添加2个特殊字符会使处理其他搜索和替换命令更加困难.)

Line 1 By putting the first quote before the variable to be set ( "x=), and the second one after the data to be set (<script data-cfasync="false" type="text/javascript"> fid="RandonString"; v_width=620; v_height=490;</script>" ), the command SET recognises that the quotes are not to be included in the variable. Thus variable data with special characters can be set without errors. (Putting the quotes inside the variable data will work too, but adds 2 special characters to the variable data and makes dealing with other search & replace commands more difficult.)

第2行下一步是删除fid之前的所有内容,*fid匹配fid之前的所有内容,=%则不进行任何替换.

Line 2 The next step is to remove everything up to and including fid, *fid matches everything up to fid, the =% replaces it with nothing.

第3行下一步是删除";之后的所有内容,这需要一点技巧.通过将="&rem %添加到搜索和替换中,可以欺骗命令处理器来执行此操作. '='告诉命令处理器用以下字符替换;",但是下一个字符是",这使得前面的set命令成为带引号的命令,并且意味着特殊的&字符不带引号,使其可以被解释.这实际上将所有内容放在&之后的单独行中,因此search and replace命令将";替换为空. REM语句用于确保匹配的";之后的数据不会被解释为命令,并且这意味着所有重定向字符都将被忽略.

Line 3 The next step is to to remove everything after ";, this requires a little hack. The command processor can be tricked into doing this by adding ="&rem % to the search and replace. The '=' tells the command processor to replace ;" with the following characters, but the next character is ", which makes the preceding set command a quoted command, and means that the special & character is not quoted, leaving it available to be interpreted. This essentially puts everything after the & on a separate line, and so the search and replace command replaces "; with nothing. The REM statement is there to make sure the data that comes after the matched "; is not interpreted as a command and also means that any redirection characters will be ignored.

所以命令处理器看到的是:

So what the command processor sees is:

set x=="RandonString
rem "; v_width=620; v_height=490;</script>

x设置为="RandonString

第4行现在,我们遇到了一个问题,因为%x%="开头,并且="都是特殊字符,而=很难匹配.但是,幸运的是,我们知道,该字符串现在以="开头,因此解决方案很简单.我们只是通过告诉命令处理器以第二个字符开头来跳过前两个字符(当前字符0 = =,当前字符1 = ",因此字符2 = R).因此,由于第2行删除了包括fid在内的所有内容(包括任何重定向字符),并且第3行删除了包含";的所有内容,直到字符串的末尾. (包括任何重定向字符),
%x:~2% = RandonString.删除所有重定向字符后,该变量根本不需要加引号.

Line 4 Now we have a problem since %x% begins with =", and both = and " are special characters, with = being especially hard to match. But, luckily we know that the string now starts with =", so the solution is simple. We simply skip the first two characters by telling the command processor to start with the 2nd character (character 0 currently = =, character 1 currently = ", so character 2 = R). Therefore, since Line 2 removed everything (including any redirection characters) up to and including fid, and Line 3 removed everything including "; to the end of the string (including any redirection characters),
%x:~2% = RandonString. With all the redirection characters removed, the variable does not need to be quoted at all.

第5行:只需回显变量x.

这篇关于Windows批处理脚本从html文件中查找字符串并将其复制到文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆