需要从批处理文件中的HTML文档中解析出字符串 [英] Need to parse out string from HTML document in a batch file

查看:120
本文介绍了需要从批处理文件中的HTML文档中解析出字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



这是我的HTML文件的摘录:

 < div id =pair_today> 
< div class =left_block>
< div class =tpl_box>
< h1 style =margin-top:5px; color:#ec1b1b;>
< span style =font-size:15px; color:#000;> 1 Australian Dollar =< / span>< br /> 93.663日元< / h1>

< span style =display:inline-block; margin-top:10px; text-align:right; align:right; font-size:12px; color:#9c9c9c> rate于2015年3月6日星期五15:58:20(AEDT)< / span>

< / span>

我需要从第5行解析出93.663的值。
这个值将会不同每次我必须运行脚本时,所以我认为正则表达式是实现这个值的最好方法。



我一直在修改/ f循环,但是我不知道如何在脚本中实现正则表达式。



谢谢你们!

解决方案

使用Windows脚本宿主(VBscript或JScript)。使用 htmlfile COM对象。解析DOM。然后,您可以按照需要使用正则表达式来按摩 innerText



将其保存为.bat文件,根据需要修改设置htmlfile = test.html行,然后运行它。 (派生自这个答案 htmlfile COM对象的文档WSH很稀疏;但如果您想了解更多信息,请参阅面包屑。)

  @if(@CodeSection == @Batch)@then 

@echo off
setlocal

sethtmlfile = test.html

rem //调用JScript混合代码并捕获它的输出
for / f %% I in('cscript / nologo / e:JScript%〜f0% htmlfile%')do setconverted = %% I

echo%converted%

rem // end main runtime
goto:EOF

@end // end batch / begin JScript chimera

var fso = WSH.CreateObject('scripting.filesystemobject'),
DOM = WSH.CreateObject('htmlfile' ),
htmlfile = fso.OpenTextFile(WSH.Arguments(0),1),
html = htmlfile.ReadAll();

DOM.write(html);
htmlfile.Close();

var scrape = DOM.getElementById('pair_today')。getElementsByTagName('h1')[0] .innerText;
WSH.Echo(scrape.match(/^.*= \s +(\S +)。* $ /)[0]);






您知道,只要您调用无论如何,如果您使用wget或类似方式获取您的html文件,您可能可以摆脱该依赖关系。除非您下载的页面使用一系列复杂的cookie和会话重定向,否则您可以使用 Microsoft.XMLHTTP COM对象代替wget,并通过XHR(或那些组织性不强的人会说,阿贾克斯)。 (基于 fetch.bat 。)

 
@echo off
setlocal

set from =%〜1
setto =%〜2
setURL = http://host.domain/currency?from =%from%&to =%to%

for / fdelims =%% I in('cscript / nologo / e:jscript%〜f0%URL%')do setconv = %% I

echo%conv%

rem //结束主运行库
goto:EOF

@end //结束批处理/开始JScript chimera

var x = WSH.CreateObject(Microsoft.XMLHTTP),
DOM = WSH.CreateObject('htmlfile');

x.open(GET,WSH.Arguments(0),true);
x.setRequestHeader('User-Agent','XMLHTTP / 1.0');
x.send('');
while(x.readyState!= 4){WSH.Sleep(50)};

DOM.Write(x.responseText);

var scrape = DOM.getElementById('pair_today')。getElementsByTagName('h1')[0] .innerText;
WSH.Echo(scrape.match(/^.*= \s +(\S +)。* $ /)[0]);


I tried searching but couldn't find anything anything specific to what I need.

This is an excerpt from my HTML file:

<div id="pair_today">
    <div class="left_block">
        <div class="tpl_box">
            <h1 style="margin-top:5px;color:#ec1b1b;">
            <span style="font-size:15px;color:#000;">1 Australian Dollar =</span><br /> 93.663 Japanese Yen</h1>

                        <span style="display:inline-block; margin-top:10px; text-align:right; align:right; font-size:12px; color:#9c9c9c">rate on Fri, 6 March, 2015 15:58:20 (AEDT)</span>

           <a href="http://fx-rate.net/AUD/JPY/currency-transfer/" title="Currenty Transfer from Australia to Japan" style="float:right" class="btn" onclick="ga('send','event', {'eventCategory': 'CurrencyTransfer', 'eventAction' : 'click','eventLabel':'Today Box'});"><span class="btn-ico btn-ico-go">Get Rate</span></a>
           </span>

I need to parse out the 93.663 value from line 5. This value will be different every time I have to run the script, so I figured regex would be the best way to specifically target this value.

I've been tinkering with for /f loops but I don't know how to implement regex into the script.

Thanks guys!

解决方案

Use Windows Scripting Host (VBscript or JScript). Use the htmlfile COM object. Parse the DOM. Then you can massage the innerText as needed with a regexp.

Here you go. Save this as a .bat file, modify the set "htmlfile=test.html" line as needed, and run it. (Derived from this answer. Documentation for the htmlfile COM object in WSH is sparse; but if you would like to learn more about it, follow that bread crumb.)

@if (@CodeSection == @Batch) @then

@echo off
setlocal

set "htmlfile=test.html"

rem // invoke JScript hybrid code and capture its output
for /f %%I in ('cscript /nologo /e:JScript "%~f0" "%htmlfile%"') do set "converted=%%I"

echo %converted%

rem // end main runtime
goto :EOF

@end // end batch / begin JScript chimera

var fso = WSH.CreateObject('scripting.filesystemobject'),
    DOM = WSH.CreateObject('htmlfile'),
    htmlfile = fso.OpenTextFile(WSH.Arguments(0), 1),
    html = htmlfile.ReadAll();

DOM.write(html);
htmlfile.Close();

var scrape = DOM.getElementById('pair_today').getElementsByTagName('h1')[0].innerText;
WSH.Echo(scrape.match(/^.*=\s+(\S+).*$/)[0]);


You know, as long as you're invoking Windows Script Host anyway, if you're acquiring your html file using wget or similar, you might be able to get rid of that dependency. Unless the page you're downloading uses a convoluted series of cookies and session redirects, you can replace wget with the Microsoft.XMLHTTP COM object and download the page via XHR (or as those with less organized minds would say, Ajax). (Based on fetch.bat.)

@if (@CodeSection == @Batch) @then

@echo off
setlocal

set "from=%~1"
set "to=%~2"
set "URL=http://host.domain/currency?from=%from%&to=%to%"

for /f "delims=" %%I in ('cscript /nologo /e:jscript "%~f0" "%URL%"') do set "conv=%%I"

echo %conv%

rem // end main runtime
goto :EOF

@end // end batch / begin JScript chimera

var x = WSH.CreateObject("Microsoft.XMLHTTP"),
    DOM = WSH.CreateObject('htmlfile');

x.open("GET",WSH.Arguments(0),true);
x.setRequestHeader('User-Agent','XMLHTTP/1.0');
x.send('');
while (x.readyState!=4) {WSH.Sleep(50)};

DOM.Write(x.responseText);

var scrape = DOM.getElementById('pair_today').getElementsByTagName('h1')[0].innerText;
WSH.Echo(scrape.match(/^.*=\s+(\S+).*$/)[0]);

这篇关于需要从批处理文件中的HTML文档中解析出字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆