如何下载网页的源代码 [英] How Can I Download The Source Code Of A Webpage

查看：94 发布时间：2019/6/11 16:47:32 Java HTML

本文介绍了如何下载网页的源代码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

大家好，我想下载网页的源代码。我使用了URL方法和Jsoup方法，但没有获得实际源代码中提到的确切数据。例如 -

 <  输入    type   =  image    name   =  ctl00 $ dtlAlbums $ ctl00 $ imbAlbumImage    id   =  ctl00_dtlAlbums_ctl00_imbAlbumImage    title   = 独立日Celebr ...     border   =  0    onmouseover   =  AlbumImageSlideShow（'ctl00_dtlAlbums_ctl00_imbAlbumImage'，' ctl00_dtlAlbums_ctl00_hdThumbnails'，'0'，'Uploads / imagegallary / 135 / Thumbnails / IMG_3206.JPG'，'Uploads / imagegallary / 135 / Thumbnails /'）;    onmouseout   =  AlbumImageSlideShow（'ctl00_dtlAlbums_ctl00_imbAlbumImage'，' ctl00_dtlAlbums_ctl00_hdThumbnails'，'1'，'Uploads / imagegallary / 135 / Thumbnails / IMG_3206.JPG'，'Uploads / imagegallary / 135 / Thumbnails /'）;    src   = 上传/ imagegallary / 135 /缩略图/IMG_3206.JPG\"   alt   =  独立ce Day Celebr ...    style   =  height：79px; width：148px; border-width：0px;   < span class =code-keyword> /

>

此标签中的
最后一个属性样式没有被jsoup的代码检测到。如果我从URL方法下载它，它会更改border =/>中的样式标记属性。

任何机构都可以告诉我下载网页确切源代码的方法。

我的代码是 -

网址url = 新网址（  http://www.apcob.org/）; 
 InputStream 是 = url.openStream（）;  //  抛出IOException  
 BufferedReader br = 新 BufferedReader（ new  InputStreamReader（ ））; 
  String 行; 
文件fileDir = 新文件（contextpath +   \\extractedtxt.txt）; 
 Writer fw =  new  BufferedWriter（ new  OutputStreamWriter（ new  FileOutputStream（fileDir），  UTF8））; 
  while （（line = br.readLine（））！=  null ）
 {
  //   System.out.println（line\\\
+ line）;  
 fw.write（  \ n + line）; 
} 
 
 InputStream  in  =  new  FileInputStream（ new 文件（contextpath +   extractedtxt.txt;） ）; 
 字符串 baseUrl =   http：// www.apcob.org /; 
文档doc = Jsoup.parse（ in ，  UTF-8，的baseUrl）; 
系统。 out  .println（doc）;

我遵循的第二种方法是 -

< pre lang =c＃>文档doc = Jsoup.connect（url_of_currentpage）。 get （）;

解决方案

dtlAlbums

ctl00

imbAlbumImage id = ctl00_dtlAlbums_ctl00_imbAlbumImage title = 独立日Celebr ... border = 0 onmouseover = AlbumImageSlideShow（'ctl00_dtlAlbums_ctl00_imbAlbumImage'，'ctl00_dtlAlbums_ctl00_hdThumbnails'，'0'，'Uploads / imagegallary / 135 / Thumbnails / IMG_3206.JPG'，'Uploads / imagegallary / 135 / Thumbnails /'）; onmouseout = AlbumImageSlideShow（'ctl00_dtlAlbums_ctl00_imbAlbumImage'，'ctl00_dtlAlbums_ctl00_hdThumbnails'，'1'，'Uploads / imagegallary / 135 / Thumbnails / IMG_3206.JPG'，'Uploads / imagegallary / 135 / Thumbnails /'）; src = Uploads / imagegallary / 135 / Thumbnails / IMG_3206.JPG alt = 独立日Celebr ... style = height：79px; width：148px; border-width：0px; /

>

$ b这个标签中的b $ b最后一个属性样式没有被jsoup的代码检测到。如果我从URL方法下载它，它会更改border =/>中的样式标记属性。

任何机构都可以告诉我下载网页确切源代码的方法。

我的代码是 -
网址url = 新网址（  http://www.apcob.org/）; 
 InputStream 是 = url.openStream（）;  //  抛出IOException  
 BufferedReader br = 新 BufferedReader（ new  InputStreamReader（ ））; 
  String 行; 
文件fileDir = 新文件（contextpath +   \\extractedtxt.txt）; 
 Writer fw =  new  BufferedWriter（ new  OutputStreamWriter（ new  FileOutputStream（fileDir），  UTF8））; 
  while （（line = br.readLine（））！=  null ）
 {
  //   System.out.println（line\\\
+ line）;  
 fw.write（  \ n + line）; 
} 
 
 InputStream  in  =  new  FileInputStream（ new 文件（contextpath +   extractedtxt.txt;） ）; 
 字符串 baseUrl =   http：// www.apcob.org /; 
文档doc = Jsoup.parse（ in ，  UTF-8，的baseUrl）; 
系统。 out  .println（doc）; 
 
我遵循的第二种方法是 -

< pre lang =c＃>文档doc = Jsoup.connect（url_of_currentpage）。 get （）;

hello everyone, i want to download the source code of the webpage . i have used URL method and Jsoup method but not getting the exact data as mentioned in actual source code . for example-

<input type="image" name="ctl00$dtlAlbums$ctl00$imbAlbumImage" id="ctl00_dtlAlbums_ctl00_imbAlbumImage" title="Independence Day Celebr..." border="0" onmouseover="AlbumImageSlideShow('ctl00_dtlAlbums_ctl00_imbAlbumImage','ctl00_dtlAlbums_ctl00_hdThumbnails','0','Uploads/imagegallary/135/Thumbnails/IMG_3206.JPG','Uploads/imagegallary/135/Thumbnails/');" onmouseout="AlbumImageSlideShow('ctl00_dtlAlbums_ctl00_imbAlbumImage','ctl00_dtlAlbums_ctl00_hdThumbnails','1','Uploads/imagegallary/135/Thumbnails/IMG_3206.JPG','Uploads/imagegallary/135/Thumbnails/');" src="Uploads/imagegallary/135/Thumbnails/IMG_3206.JPG" alt="Independence Day Celebr..." style="height:79px;width:148px;border-width:0px;" /

>

in this tag the last attribute style is not detecting by the code of jsoup. and if i am downloading it from URL method it changes the style tag in border=""/> attribute.
can any body tell me the way to download the exact source code of a webpage.
my code is-

URL url=new URL("http://www.apcob.org/");
  InputStream is = url.openStream();  // throws an IOException
    BufferedReader  br = new BufferedReader(new InputStreamReader(is));
    String line;
    File fileDir = new File(contextpath+"\\extractedtxt.txt");
    Writer fw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(fileDir), "UTF8"));
    while ((line = br.readLine()) != null)
    {
     // System.out.println("line\n "+line);
      fw.write("\n"+line);
    }

   InputStream in = new FileInputStream(new File(contextpath+"extractedtxt.txt";));
    String baseUrl="http://www.apcob.org/";
    Document doc=Jsoup.parse(in,"UTF-8",baseUrl);
  System.out.println(doc);

second method i followed is-

Document doc = Jsoup.connect(url_of_currentpage).get();

解决方案

dtlAlbums

ctl00

imbAlbumImage" id="ctl00_dtlAlbums_ctl00_imbAlbumImage" title="Independence Day Celebr..." border="0" onmouseover="AlbumImageSlideShow('ctl00_dtlAlbums_ctl00_imbAlbumImage','ctl00_dtlAlbums_ctl00_hdThumbnails','0','Uploads/imagegallary/135/Thumbnails/IMG_3206.JPG','Uploads/imagegallary/135/Thumbnails/');" onmouseout="AlbumImageSlideShow('ctl00_dtlAlbums_ctl00_imbAlbumImage','ctl00_dtlAlbums_ctl00_hdThumbnails','1','Uploads/imagegallary/135/Thumbnails/IMG_3206.JPG','Uploads/imagegallary/135/Thumbnails/');" src="Uploads/imagegallary/135/Thumbnails/IMG_3206.JPG" alt="Independence Day Celebr..." style="height:79px;width:148px;border-width:0px;" /

>

in this tag the last attribute style is not detecting by the code of jsoup. and if i am downloading it from URL method it changes the style tag in border=""/> attribute.
can any body tell me the way to download the exact source code of a webpage.
my code is-
URL url=new URL("http://www.apcob.org/");
  InputStream is = url.openStream();  // throws an IOException
    BufferedReader  br = new BufferedReader(new InputStreamReader(is));
    String line;
    File fileDir = new File(contextpath+"\\extractedtxt.txt");
    Writer fw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(fileDir), "UTF8"));
    while ((line = br.readLine()) != null)
    {
     // System.out.println("line\n "+line);
      fw.write("\n"+line);
    }

   InputStream in = new FileInputStream(new File(contextpath+"extractedtxt.txt";));
    String baseUrl="http://www.apcob.org/";
    Document doc=Jsoup.parse(in,"UTF-8",baseUrl);
  System.out.println(doc);
second method i followed is-
Document doc = Jsoup.connect(url_of_currentpage).get();

这篇关于如何下载网页的源代码的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何下载网页的源代码 [英] How Can I Download The Source Code Of A Webpage

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

如何下载网页的源代码 [英] How Can I Download The Source Code Of A Webpage

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭