我们可以读取/刮去从HTML站点特定数据成批量做以下? [英] Can we Fetch/Scrape particular data from html site into Batch to do following?

查看:120
本文介绍了我们可以读取/刮去从HTML站点特定数据成批量做以下?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

真棒工作由家伙家伙在这个职位上这网站
结果
但我需要根据我的需要修改此脚本。
搜索结果我不是铁杆codeR但帮助不大,我可以得到它的工作。
结果搜索结果

我的网站有时间戳HH:MM AM / PM格式结果。
因此,这是在我的网站上。结果

 更新时间为下午11时14分3月11日的

我想解析下午2:14 并与当前的时间标志进行比较(减去它),并获得分钟。如果差值大于30分钟,那么我想它做些什么结果。

有人可以帮助我?谢谢

结果
我想最好的理解和修改此code。结果
仅供参考 Google.com仅仅是例子,而不是谷歌这将是链接mysite的。

 关闭@echo
SETLOCAL EnableDelayedExpansion
设置curTimestamp =%日期:7,2〜%%日期:〜3.3%_%日期:10.4〜_%%的时间:〜0.2%_%时间:〜3,2%
FOR / F令牌= *%% A IN(TIME / T)现在要做的SET = %%一
为%% f由于(Q R 5 T公司的U,V,W x和y z)的做,如果存在%%˚F德尔%%˚F
wget的&LT; A HREF =htt​​p://google.com目标=_空白相对=nofollow&GT; HTTP://google.com< / A&GT;找到更新&LT; index.html的&GT; q
FOR / F令牌= 1-4 delims ==%%一中(q)办理(
回声%% D&GT;&GT; - [R

FOR / F令牌= 2-3 delims =&gt;中%%一个在(R)做(
呼应%%一个%% B&GT;&GT;小号

FOR / F令牌= 1-2 delims =&LT; %%一个在(S)做(
呼应%%一个%% B&GT;&GT; ŧ

FOR / F令牌= 1-2 delims = ^ /%%一个在(t)办理(
呼应%%一个%% B&GT;&GT; ü

FOR / F令牌= 1,3 delims =%%一个在(U)做(
回声%% b %% A&GT;&GT; v

FOR / F令牌= 1-4 delims = - %%一个在(V)做(
回声%%ç%% b %%一个%% D&GT;&GT; W¯¯

FOR / F令牌= * delims =%%一中(W)做(
设置海峡= %%一
集海峡= STR:一月= 01!
集海峡= STR:二月= 02!
集海峡= STR:三月= 03!
设置海峡= STR:四月= 04!
集海峡= STR:五月为05!
集海峡= STR:君= 06!
设置海峡= STR:七月= 07!
集海峡= STR:八月= 08!
设置海峡= STR:九月= 09!
集海峡= STR:月= 10!
集海峡= STR:月= 11!
集海峡= STR:月= 12!
回音!海峡! &GT;&GT; X

排序&LT; X - GT; ÿ
FOR / F令牌= 4 delims =%%一中(Y)做(
设置AVG = %%一

回声wget的www.google.com
回声%curTimestamp%
ECHO%现在%
::为%% f由于(Q R 5 T公司的U,V,W x和y z)的做,如果存在%%˚F德尔%%˚F
暂停


解决方案

下面是使用JScript的与日期进行数学能力批处理脚本/ JScript的混合体。由于网页从来没有指定,这个脚本假设一年的这个的一年,因此,你可能会在一月初运行此,如果该文件最后修改于十二月意想不到的效果。总之,这里。

  @if(@X)==(@ Y)@end / *(批号+的JScript脚本混合INIT):: ***批处理脚本*****关闭@echo
SETLOCAL enabledelayedexpansion
FOR / Fdelims =%%我在('wget的%〜1-O- -q 2 ^&GT; NUL ^ | FINDSTR /我。最后*更新*为*')做(
    FOR / Fdelims =%%倍In('CSCRIPT / NOLOGO / E:JScript的%〜F0我%%')做(        日期DIFF&GT REM测试是;时间= 30分钟
        集/ AthirtyMinutes = 30 * 60 * 1000
        如果%%点¯xGEQ!thirtyMinutes! (
            呼应做的巫术,你做。
        )        REM只是为了展示,你可以做一些数学做出的时间差进一步感。
        REM SET毫秒= %%点¯x
        集/ A秒= %% X / 1000秒%% = 60
        集/ A=分钟%% X / 1000至1060年,分%% = 60
        集/一个小时数= %% X /六十○分之一千/ 60小时= %% 24
        集/一个天= %% X /60分之1000/ 60/24
        回声%〜1最后修改!天!天!小时!小时!分钟!几分钟前。
    )    REM一旦循环已经解雇了,退出。
    退出/ B
)REM如果网页中不包含最近更新为
退出/ B
:: ***的JScript脚本***** /
变参= [];
为(变量I = 0; I&下; WScript.arguments.length;我++){args.push(WScript.arguments(ⅰ))}
VAR T = args.join('').replace(/ ^ \\ S + |&LT; [^&GT;] +&GT; | \\ s + $ /克,'')。取代(/ \\&安培; NBSP; /克, ' ')。分裂(' ');
VAR H = T [4] .split(':')[0];
如果(/pm/i.test(t[5]))H = H * 1 + 12;
变种DS = T [6] +''+ T [7] +,+新的Date()和getFullYear()+''+ H +:+ T [4] .split(':')[ 1];
VAR差异=新的Date() - 新的日期(DS);
WScript.echo(差异);

示例输出:

  C:\\用户\\我\\桌面&GT; test.bat的http://stackoverflow.com/questions/15364653/
做到这一点巫术,你做。
http://stackoverflow.com/questions/15364653/最后修改0天23小时19分钟前。

Awesome work guys by guys in this post on this site.
But I need modifying this script according to my needs.

I am not hardcore coder but with little help I can get it working.


I have website it has timestamp in HH:MM AM/PM format.
So this is what is on my website.

Last Updated as of 11:14 pm March 11th

I would like to parse 2:14 PM and compare it(subtract it) with current time stamp, and get the minutes. if difference is greater than 30 mins then I want it do something.

Can someone help me with this? Thanks


I am trying best to understand and modify this code.
FYI Google.com is just example, instead of google it would be link to mysite.

@echo off
setLocal EnableDelayedExpansion
set curTimestamp=%date:~7,2% %date:~3,3%_%date:~10,4%_%time:~0,2%_%time:~3,2%
FOR /F "TOKENS=*" %%A IN ('TIME/T') DO SET Now=%%A


for %%F in (q r s t u v w x y z) do if exist %%F del %%F


wget <a href="http://google.com" target="_blank" rel="nofollow">http://google.com</a>

find "Updated" < index.html > q


for /f "tokens=1-4 delims==" %%a in (q) do (
echo %%d >> r
)
for /f "tokens=2-3 delims=>" %%a in (r) do (
echo %%a %%b >> s
)
for /f "tokens=1-2 delims=<" %%a in (s) do (
echo %%a %%b >> t
)
for /f "tokens=1-2 delims=^/" %%a in (t) do (
echo %%a %%b >> u
)
for /f "tokens=1,3 delims= " %%a in (u) do (
echo %%b %%a >> v
)
for /f "tokens=1-4 delims=- " %%a in (v) do (
echo %%c %%b %%a %%d >> w
)
for /f "tokens=* delims= " %%a in (w) do (
set str=%%a
set str=!str:Jan=01!
set str=!str:Feb=02!
set str=!str:Mar=03!
set str=!str:Apr=04!
set str=!str:May=05!
set str=!str:Jun=06!
set str=!str:Jul=07!
set str=!str:Aug=08!
set str=!str:Sep=09!
set str=!str:Oct=10!
set str=!str:Nov=11!
set str=!str:Dec=12!
echo !str! >> x
)
sort < x > y
for /f "tokens=4 delims= " %%a in (y) do (
set AVG=%%a
)
echo wget www.google.com
echo %curTimestamp%
ECHO %Now%
::for %%F in (q r s t u v w x y z) do if exist %%F del %%F
Pause

解决方案

Here's a batch script / JScript hybrid using JScript's ability to perform math with dates. Since the web page never specifies, this script assumes the year is this year, so you might get unexpected results running this at the beginning of January if the document was last modified in December. Anyway, here.

@if (@X)==(@Y) @end /* (batch + jscript hybrid script init)

:: *** Batch script *****

@echo off
setlocal enabledelayedexpansion
for /f "delims=" %%I in ('wget "%~1" -O- -q 2^>NUL ^| findstr /i "last.*updated.*as.*of"') do (
    for /f "delims=" %%x in ('cscript /nologo /e:jscript "%~f0" "%%I"') do (

        rem test whether date diff >= 30 minutes
        set /a "thirtyMinutes = 30 * 60 * 1000"
        if %%x GEQ !thirtyMinutes! (
            echo Do that voodoo that you do.
        )

        rem Just to demonstrate, you can do some maths to make further sense of the date difference.
        rem set milliseconds=%%x
        set /a "seconds = %%x / 1000, seconds %%= 60"
        set /a "minutes = %%x / 1000 / 60, minutes %%= 60"
        set /a "hours = %%x / 1000 / 60 / 60, hours %%= 24"
        set /a "days = %%x / 1000 / 60 / 60 / 24"
        echo %~1 last modified !days! days !hours! hours !minutes! minutes ago.
    )

    rem Once the for loop has fired, exit.
    exit /b
)

rem In case the web page does not contain "last updated as of"
exit /b


:: *** JScript script *****/
var args = [];
for (var i=0; i<WScript.arguments.length; i++) { args.push(WScript.arguments(i)) }
var t = args.join(' ').replace(/^\s+|<[^>]+>|\s+$/g,'').replace(/\&nbsp;/g, ' ').split(' ');
var h = t[4].split(':')[0];
if (/pm/i.test(t[5])) h = h * 1 + 12;
var ds = t[6] + ' ' + t[7] + ', ' + new Date().getFullYear() + ' ' + h + ':' + t[4].split(':')[1];
var diff = new Date() - new Date(ds);
WScript.echo(diff);

Example output:

C:\Users\me\Desktop>test.bat http://stackoverflow.com/questions/15364653/
Do that voodoo that you do.
http://stackoverflow.com/questions/15364653/ last modified 0 days 23 hours 18 minutes ago.

这篇关于我们可以读取/刮去从HTML站点特定数据成批量做以下?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆