如何下载网页的源代码 [英] How Can I Download The Source Code Of A Webpage

查看:94
本文介绍了如何下载网页的源代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,我想下载网页的源代码。我使用了URL方法和Jsoup方法,但没有获得实际源代码中提到的确切数据。例如 -

 <  输入    type   =  image    name   =  ctl00 $ dtlAlbums $ ctl00 $ imbAlbumImage    id   =  ctl00_dtlAlbums_ctl00_imbAlbumImage    title   = 独立日Celebr ...     border   =  0    onmouseover   =  AlbumImageSlideShow('ctl00_dtlAlbums_ctl00_imbAlbumImage',' ctl00_dtlAlbums_ctl00_hdThumbnails','0','Uploads / imagegallary / 135 / Thumbnails / IMG_3206.JPG','Uploads / imagegallary / 135 / Thumbnails /');    onmouseout   =  AlbumImageSlideShow('ctl00_dtlAlbums_ctl00_imbAlbumImage',' ctl00_dtlAlbums_ctl00_hdThumbnails','1','Uploads / imagegallary / 135 / Thumbnails / IMG_3206.JPG','Uploads / imagegallary / 135 / Thumbnails /');    src   = 上传/ imagegallary / 135 /缩略图/IMG_3206.JPG\"   alt   =  独立ce Day Celebr ...    style   =  height:79px; width:148px; border-width:0px;   < span class =code-keyword> /  



>


此标签中的
最后一个属性样式没有被jsoup的代码检测到。如果我从URL方法下载它,它会更改border =/>中的样式标记属性。

任何机构都可以告诉我下载网页确切源代码的方法。

我的代码是 -

网址url = 网址(  http://www.apcob.org/); 
InputStream = url.openStream(); // 抛出IOException
BufferedReader br = BufferedReader( new InputStreamReader( ));
String 行;
文件fileDir = 文件(contextpath + \\extractedtxt.txt);
Writer fw = new BufferedWriter( new OutputStreamWriter( new FileOutputStream(fileDir), UTF8));
while ((line = br.readLine())!= null
{
// System.out.println(line\\\
+ line);

fw.write( \ n + line);
}

InputStream in = new FileInputStream( new 文件(contextpath + extractedtxt.txt;) );
字符串 baseUrl = http:// www.apcob.org /;
文档doc = Jsoup.parse( in UTF-8,的baseUrl);
系统。 out .println(doc);







我遵循的第二种方法是 -

< pre lang =c#>文档doc = Jsoup.connect(url_of_currentpage)。 get ();

解决方案

dtlAlbums


ctl00


imbAlbumImage id = ctl00_dtlAlbums_ctl00_imbAlbumImage title = 独立日Celebr ... border = 0 onmouseover = AlbumImageSlideShow('ctl00_dtlAlbums_ctl00_imbAlbumImage','ctl00_dtlAlbums_ctl00_hdThumbnails','0','Uploads / imagegallary / 135 / Thumbnails / IMG_3206.JPG','Uploads / imagegallary / 135 / Thumbnails /'); onmouseout = AlbumImageSlideShow('ctl00_dtlAlbums_ctl00_imbAlbumImage','ctl00_dtlAlbums_ctl00_hdThumbnails','1','Uploads / imagegallary / 135 / Thumbnails / IMG_3206.JPG','Uploads / imagegallary / 135 / Thumbnails /'); src = Uploads / imagegallary / 135 / Thumbnails / IMG_3206.JPG alt = 独立日Celebr ... style = height:79px; width:148px; border-width:0px; /



>


$ b这个标签中的b $ b最后一个属性样式没有被jsoup的代码检测到。如果我从URL方法下载它,它会更改border =/>中的样式标记属性。

任何机构都可以告诉我下载网页确切源代码的方法。

我的代码是 -

网址url = 网址(  http://www.apcob.org/); 
InputStream = url.openStream(); // 抛出IOException
BufferedReader br = BufferedReader( new InputStreamReader( ));
String 行;
文件fileDir = 文件(contextpath + \\extractedtxt.txt);
Writer fw = new BufferedWriter( new OutputStreamWriter( new FileOutputStream(fileDir), UTF8));
while ((line = br.readLine())!= null
{
// System.out.println(line\\\
+ line);

fw.write( \ n + line);
}

InputStream in = new FileInputStream( new 文件(contextpath + extractedtxt.txt;) );
字符串 baseUrl = http:// www.apcob.org /;
文档doc = Jsoup.parse( in UTF-8,的baseUrl);
系统。 out .println(doc);







我遵循的第二种方法是 -

< pre lang =c#>文档doc = Jsoup.connect(url_of_currentpage)。 get ();


hello everyone, i want to download the source code of the webpage . i have used URL method and Jsoup method but not getting the exact data as mentioned in actual source code . for example-

<input type="image" name="ctl00$dtlAlbums$ctl00$imbAlbumImage" id="ctl00_dtlAlbums_ctl00_imbAlbumImage" title="Independence Day Celebr..." border="0" onmouseover="AlbumImageSlideShow('ctl00_dtlAlbums_ctl00_imbAlbumImage','ctl00_dtlAlbums_ctl00_hdThumbnails','0','Uploads/imagegallary/135/Thumbnails/IMG_3206.JPG','Uploads/imagegallary/135/Thumbnails/');" onmouseout="AlbumImageSlideShow('ctl00_dtlAlbums_ctl00_imbAlbumImage','ctl00_dtlAlbums_ctl00_hdThumbnails','1','Uploads/imagegallary/135/Thumbnails/IMG_3206.JPG','Uploads/imagegallary/135/Thumbnails/');" src="Uploads/imagegallary/135/Thumbnails/IMG_3206.JPG" alt="Independence Day Celebr..." style="height:79px;width:148px;border-width:0px;" /


>

in this tag the last attribute style is not detecting by the code of jsoup. and if i am downloading it from URL method it changes the style tag in border=""/> attribute.
can any body tell me the way to download the exact source code of a webpage.
my code is-

URL url=new URL("http://www.apcob.org/");
  InputStream is = url.openStream();  // throws an IOException
    BufferedReader  br = new BufferedReader(new InputStreamReader(is));
    String line;
    File fileDir = new File(contextpath+"\\extractedtxt.txt");
    Writer fw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(fileDir), "UTF8"));
    while ((line = br.readLine()) != null)
    {
     // System.out.println("line\n "+line);
      fw.write("\n"+line);
    }

   InputStream in = new FileInputStream(new File(contextpath+"extractedtxt.txt";));
    String baseUrl="http://www.apcob.org/";
    Document doc=Jsoup.parse(in,"UTF-8",baseUrl);
  System.out.println(doc);




second method i followed is-

Document doc = Jsoup.connect(url_of_currentpage).get();

解决方案

dtlAlbums


ctl00


imbAlbumImage" id="ctl00_dtlAlbums_ctl00_imbAlbumImage" title="Independence Day Celebr..." border="0" onmouseover="AlbumImageSlideShow('ctl00_dtlAlbums_ctl00_imbAlbumImage','ctl00_dtlAlbums_ctl00_hdThumbnails','0','Uploads/imagegallary/135/Thumbnails/IMG_3206.JPG','Uploads/imagegallary/135/Thumbnails/');" onmouseout="AlbumImageSlideShow('ctl00_dtlAlbums_ctl00_imbAlbumImage','ctl00_dtlAlbums_ctl00_hdThumbnails','1','Uploads/imagegallary/135/Thumbnails/IMG_3206.JPG','Uploads/imagegallary/135/Thumbnails/');" src="Uploads/imagegallary/135/Thumbnails/IMG_3206.JPG" alt="Independence Day Celebr..." style="height:79px;width:148px;border-width:0px;" /


>

in this tag the last attribute style is not detecting by the code of jsoup. and if i am downloading it from URL method it changes the style tag in border=""/> attribute.
can any body tell me the way to download the exact source code of a webpage.
my code is-

URL url=new URL("http://www.apcob.org/");
  InputStream is = url.openStream();  // throws an IOException
    BufferedReader  br = new BufferedReader(new InputStreamReader(is));
    String line;
    File fileDir = new File(contextpath+"\\extractedtxt.txt");
    Writer fw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(fileDir), "UTF8"));
    while ((line = br.readLine()) != null)
    {
     // System.out.println("line\n "+line);
      fw.write("\n"+line);
    }

   InputStream in = new FileInputStream(new File(contextpath+"extractedtxt.txt";));
    String baseUrl="http://www.apcob.org/";
    Document doc=Jsoup.parse(in,"UTF-8",baseUrl);
  System.out.println(doc);




second method i followed is-

Document doc = Jsoup.connect(url_of_currentpage).get();


这篇关于如何下载网页的源代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆