我如何才能使用cmd找到网站的源代码? [英] How can I find the source code for a website using only cmd?

查看:609
本文介绍了我如何才能使用cmd找到网站的源代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好的,这里是网页: https:// www。 faa.gov/air_traffic/flight_info/aeronav/digital_products/vfr/

我想要做的是下载该网页的源代码(相当于在浏览器中右键单击并选择查看源代码),但我需要在批处理文件中执行此操作,而不使用wget之类的外部工具。我知道如何在批处理文件中使用bitsadmin下载文件,但我遇到了麻烦,因为我不知道网页的实际URL。我已经尝试将index.html和index.htm以及各种页面名称添加到最后,并且没有一个是有效的。那么如何才能找到要下载的ACTUAL页面名称?



关注者的更多信息:目的是解析代码以确定不断变化的文件名GEO-TIFF文件,然后自动下载这些文件(而不是需要手动右键单击每个文件并保存 - 约55次)。

解决方案

您可以在Windows Scripting Host(VBScript或JScript)中使用 Microsoft.XMLHTTP COM对象。这是一个混合Batch + JScript示例(应该以.bat扩展名保存):

  @if(@CodeSection == @Batch)@then 
@echo off& setlocal

seturl = https://www.faa.gov/air_traffic/flight_info/aeronav/digital_products/vfr/

cscript / nologo / e:JScript %~f0%url%

goto:EOF
@end // end批处理/开始JScript

var xhr = WSH.CreateObject('Microsoft .XMLHTTP');

xhr.open('GET',WSH.Arguments(0),true);
xhr.setRequestHeader('User-Agent','XMLHTTP / 1.0');
xhr.send('');
while(xhr.readyState!= 4)WSH.Sleep(50);

WSH.Echo(xhr.responseText);

示例用法类似于 scriptname.bat> saved.html 。或者,既然你要走这么远,那么你可以让JScript将原始HTML数据转化为有用的东西。下面是一个使用DOM方法擦除该页面上所有表的示例,构建表数据的对象,然后将其序列化为JSON,以便于通过其他工具进行解析或反序列化:



< pre class =lang-js prettyprint-override> @if(@CodeSection == @Batch)@then
@echo off& setlocal

seturl = https://www.faa.gov/air_traffic/flight_info/aeronav/digital_products/vfr/

cscript / nologo / e:JScript %~f0%url%

goto:EOF
@end // end批处理/开始JScript

var xhr = WSH.CreateObject('Microsoft .XMLHTTP'),
DOM = WSH.CreateObject('htmlfile'),
JSON,obj = {};

xhr.open('GET',WSH.Arguments(0),true);
xhr.setRequestHeader('User-Agent','XMLHTTP / 1.0');
xhr.send('');
while(xhr.readyState!= 4)WSH.Sleep(50);

DOM.write('< meta http-equiv =x-ua-compatiblecontent =IE = 9/>'
+ xhr.responseText);

JSON = DOM.parentWindow.JSON;

var tables = DOM.getElementsByTagName('table');

for(var i = 0; i< tables.length; i ++){
var cols = [],
rows = tables [i] .rows,
caption = tables [i] .caption?表[i] .caption.innerText:i; (var k = 0; k <0; j if(!cols.length){
)的

。 rows [j] .cells.length; k ++){
var cell = rows [j] .cells [k] .innerText;
cols.push(cell);
}
obj [caption] = {};
} else {
var row = rows [j] .cells [0] .innerText;
obj [caption] [row] = {}; (var k = 1; k var a = rows [j] .cells [k] .getElementsByTagName('a')的

links = new DOM.parentWindow.Array(); (a& a.length){
for(var l = 0; l< a.length; l ++)links.push(a [l] .href);
obj [caption] [row] [cols [k]] = links;
} else {
obj [caption] [row] [cols [k]] = rows [j] .cells [k] .innerText;





$ b WSH.Echo(JSON.stringify(obj,null,''));
DOM.close();

这可以让你像分层结构一样查询数据,比如这个PowerShell脚本(保存)扩展名为.ps1):

  add-type -as System.Web.Extensions 
$ JSON = New-Object Web.Script.Serialization.JavaScriptSerializer
$ data = cmd / c test.bat
$ obj = $ JSON.DeserializeObject($ data)
$ obj [ '直升机路线图'] ['波士顿'] ['当前版号和日期']

这一切都可以与Windows内置的功能一起使用,而不需要任何第三方应用程序或下载超出网站请求faa.gov。


Ok, so here's the web page: https://www.faa.gov/air_traffic/flight_info/aeronav/digital_products/vfr/

What I want to do is download the source of that web page (the equivalent of right-clicking in a browser and selectin View Source), but I need to do it in a batch file without the use of outside tools like wget. I know how to download files using bitsadmin in a batch file, but I'm running into trouble because I don't know the actual URL of the web page. I've tried adding index.html and index.htm and all sorts of page names to the end and none of the are valid. So how can I find the ACTUAL page name to download?

More info for those who care: the purpose is to parse the code to determine the ever-changing filenames of the GEO-TIFF files on the page, then download those files automatically (rather than needing to manually right-click on each file and save-as about 55 times).

解决方案

You could use the Microsoft.XMLHTTP COM object in Windows Scripting Host (VBScript or JScript). Here's a hybrid Batch + JScript example (should be saved with a .bat extension):

@if (@CodeSection == @Batch) @then
@echo off & setlocal

set "url=https://www.faa.gov/air_traffic/flight_info/aeronav/digital_products/vfr/"

cscript /nologo /e:JScript "%~f0" "%url%"

goto :EOF
@end // end Batch / begin JScript

var xhr = WSH.CreateObject('Microsoft.XMLHTTP');

xhr.open('GET', WSH.Arguments(0), true);
xhr.setRequestHeader('User-Agent','XMLHTTP/1.0');
xhr.send('');
while (xhr.readyState != 4) WSH.Sleep(50);

WSH.Echo(xhr.responseText);

Example usage would be something like scriptname.bat > saved.html. Or since you're going this far, you might as well let JScript turn that raw HTML data into something useful. Here's an example that scrapes all the tables on that page using DOM methods, builds an object of the table data, then serializes it into JSON for easier parsing or deserialization by other tools:

@if (@CodeSection == @Batch) @then
@echo off & setlocal

set "url=https://www.faa.gov/air_traffic/flight_info/aeronav/digital_products/vfr/"

cscript /nologo /e:JScript "%~f0" "%url%"

goto :EOF
@end // end Batch / begin JScript

var xhr = WSH.CreateObject('Microsoft.XMLHTTP'),
    DOM = WSH.CreateObject('htmlfile'),
    JSON, obj = {};

xhr.open('GET', WSH.Arguments(0), true);
xhr.setRequestHeader('User-Agent','XMLHTTP/1.0');
xhr.send('');
while (xhr.readyState != 4) WSH.Sleep(50);

DOM.write('<meta http-equiv="x-ua-compatible" content="IE=9" />'
    + xhr.responseText);

JSON = DOM.parentWindow.JSON;

var tables = DOM.getElementsByTagName('table');

for (var i=0; i<tables.length; i++) {
    var cols = [],
        rows = tables[i].rows,
        caption = tables[i].caption ? tables[i].caption.innerText : i;

    for (var j=0; j<rows.length; j++) {
        if (!cols.length) {
            for (var k=0; k < rows[j].cells.length; k++) {
                var cell = rows[j].cells[k].innerText;
                cols.push(cell);
            }
            obj[caption] = {};
        } else {
            var row = rows[j].cells[0].innerText;
            obj[caption][row] = {};
            for (var k=1; k < rows[j].cells.length; k++) {
                var a = rows[j].cells[k].getElementsByTagName('a'),
                    links = new DOM.parentWindow.Array();
                if (a && a.length) {
                    for (var l=0; l<a.length; l++) links.push(a[l].href);
                    obj[caption][row][cols[k]] = links;
                } else {
                    obj[caption][row][cols[k]] = rows[j].cells[k].innerText;
                }
            }
        }
    }
}

WSH.Echo(JSON.stringify(obj, null, '    '));
DOM.close();

That lets you do neat stuff like query the data in a hierarchical structure, like this PowerShell script (saved with a .ps1 extension):

add-type -as System.Web.Extensions
$JSON = New-Object Web.Script.Serialization.JavaScriptSerializer
$data = cmd /c test.bat
$obj = $JSON.DeserializeObject($data)
$obj['Helicopter Route Charts']['Boston']['Current Edition No. and Date']

This all works with functionality built into Windows without requiring any 3rd party applications or downloads beyond the web request to faa.gov.

这篇关于我如何才能使用cmd找到网站的源代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆