如何下载网页的源代码 [英] How Can I Download The Source Code Of A Webpage
问题描述
< 输入 type = image name = ctl00 $ dtlAlbums $ ctl00 $ imbAlbumImage id = ctl00_dtlAlbums_ctl00_imbAlbumImage title = 独立日Celebr ... border = 0 onmouseover = AlbumImageSlideShow('ctl00_dtlAlbums_ctl00_imbAlbumImage',' ctl00_dtlAlbums_ctl00_hdThumbnails','0','Uploads / imagegallary / 135 / Thumbnails / IMG_3206.JPG','Uploads / imagegallary / 135 / Thumbnails /'); onmouseout = AlbumImageSlideShow('ctl00_dtlAlbums_ctl00_imbAlbumImage',' ctl00_dtlAlbums_ctl00_hdThumbnails','1','Uploads / imagegallary / 135 / Thumbnails / IMG_3206.JPG','Uploads / imagegallary / 135 / Thumbnails /'); src = 上传/ imagegallary / 135 /缩略图/IMG_3206.JPG\" alt = 独立ce Day Celebr ... style = height:79px; width:148px; border-width:0px; < span class =code-keyword> /
>
此标签中的
最后一个属性样式没有被jsoup的代码检测到。如果我从URL方法下载它,它会更改border =/>中的样式标记属性。
任何机构都可以告诉我下载网页确切源代码的方法。
我的代码是 -
网址url = 新网址( http://www.apcob.org/跨度>);
InputStream 是 = url.openStream(); // 抛出IOException
BufferedReader br = 新 BufferedReader( new InputStreamReader( ));
String 行;
文件fileDir = 新文件(contextpath + \\extractedtxt.txt跨度>);
Writer fw = new BufferedWriter( new OutputStreamWriter( new FileOutputStream(fileDir), UTF8));
while ((line = br.readLine())!= null )
{
// System.out.println(line\\\
+ line);
fw.write( \ n + line);
}
InputStream in = new FileInputStream( new 文件(contextpath + extractedtxt.txt;) );
字符串 baseUrl = http:// www.apcob.org /跨度>;
文档doc = Jsoup.parse( in , UTF-8跨度>,的baseUrl);
系统。 out .println(doc);
我遵循的第二种方法是 -
< pre lang =c#>文档doc = Jsoup.connect(url_of_currentpage)。 get ();
dtlAlbums
ctl00
imbAlbumImage id = ctl00_dtlAlbums_ctl00_imbAlbumImage title = 独立日Celebr ... border = 0 onmouseover = AlbumImageSlideShow('ctl00_dtlAlbums_ctl00_imbAlbumImage','ctl00_dtlAlbums_ctl00_hdThumbnails','0','Uploads / imagegallary / 135 / Thumbnails / IMG_3206.JPG','Uploads / imagegallary / 135 / Thumbnails /'); onmouseout = AlbumImageSlideShow('ctl00_dtlAlbums_ctl00_imbAlbumImage','ctl00_dtlAlbums_ctl00_hdThumbnails','1','Uploads / imagegallary / 135 / Thumbnails / IMG_3206.JPG','Uploads / imagegallary / 135 / Thumbnails /'); src = Uploads / imagegallary / 135 / Thumbnails / IMG_3206.JPG alt = 独立日Celebr ... style = height:79px; width:148px; border-width:0px; /
>
$ b这个标签中的b $ b最后一个属性样式没有被jsoup的代码检测到。如果我从URL方法下载它,它会更改border =/>中的样式标记属性。
任何机构都可以告诉我下载网页确切源代码的方法。
我的代码是 -
网址url = 新网址( http://www.apcob.org/跨度>);
InputStream 是 = url.openStream(); // 抛出IOException
BufferedReader br = 新 BufferedReader( new InputStreamReader( ));
String 行;
文件fileDir = 新文件(contextpath + \\extractedtxt.txt跨度>);
Writer fw = new BufferedWriter( new OutputStreamWriter( new FileOutputStream(fileDir), UTF8));
while ((line = br.readLine())!= null )
{
// System.out.println(line\\\
+ line);
fw.write( \ n + line);
}
InputStream in = new FileInputStream( new 文件(contextpath + extractedtxt.txt;) );
字符串 baseUrl = http:// www.apcob.org /跨度>;
文档doc = Jsoup.parse( in , UTF-8跨度>,的baseUrl);
系统。 out .println(doc);
我遵循的第二种方法是 -
< pre lang =c#>文档doc = Jsoup.connect(url_of_currentpage)。 get ();
hello everyone, i want to download the source code of the webpage . i have used URL method and Jsoup method but not getting the exact data as mentioned in actual source code . for example-
<input type="image" name="ctl00$dtlAlbums$ctl00$imbAlbumImage" id="ctl00_dtlAlbums_ctl00_imbAlbumImage" title="Independence Day Celebr..." border="0" onmouseover="AlbumImageSlideShow('ctl00_dtlAlbums_ctl00_imbAlbumImage','ctl00_dtlAlbums_ctl00_hdThumbnails','0','Uploads/imagegallary/135/Thumbnails/IMG_3206.JPG','Uploads/imagegallary/135/Thumbnails/');" onmouseout="AlbumImageSlideShow('ctl00_dtlAlbums_ctl00_imbAlbumImage','ctl00_dtlAlbums_ctl00_hdThumbnails','1','Uploads/imagegallary/135/Thumbnails/IMG_3206.JPG','Uploads/imagegallary/135/Thumbnails/');" src="Uploads/imagegallary/135/Thumbnails/IMG_3206.JPG" alt="Independence Day Celebr..." style="height:79px;width:148px;border-width:0px;" /
>
in this tag the last attribute style is not detecting by the code of jsoup. and if i am downloading it from URL method it changes the style tag in border=""/> attribute.
can any body tell me the way to download the exact source code of a webpage.
my code is-
URL url=new URL("http://www.apcob.org/");
InputStream is = url.openStream(); // throws an IOException
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String line;
File fileDir = new File(contextpath+"\\extractedtxt.txt");
Writer fw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(fileDir), "UTF8"));
while ((line = br.readLine()) != null)
{
// System.out.println("line\n "+line);
fw.write("\n"+line);
}
InputStream in = new FileInputStream(new File(contextpath+"extractedtxt.txt";));
String baseUrl="http://www.apcob.org/";
Document doc=Jsoup.parse(in,"UTF-8",baseUrl);
System.out.println(doc);
second method i followed is-
Document doc = Jsoup.connect(url_of_currentpage).get();
dtlAlbums
ctl00
imbAlbumImage" id="ctl00_dtlAlbums_ctl00_imbAlbumImage" title="Independence Day Celebr..." border="0" onmouseover="AlbumImageSlideShow('ctl00_dtlAlbums_ctl00_imbAlbumImage','ctl00_dtlAlbums_ctl00_hdThumbnails','0','Uploads/imagegallary/135/Thumbnails/IMG_3206.JPG','Uploads/imagegallary/135/Thumbnails/');" onmouseout="AlbumImageSlideShow('ctl00_dtlAlbums_ctl00_imbAlbumImage','ctl00_dtlAlbums_ctl00_hdThumbnails','1','Uploads/imagegallary/135/Thumbnails/IMG_3206.JPG','Uploads/imagegallary/135/Thumbnails/');" src="Uploads/imagegallary/135/Thumbnails/IMG_3206.JPG" alt="Independence Day Celebr..." style="height:79px;width:148px;border-width:0px;" /
>
in this tag the last attribute style is not detecting by the code of jsoup. and if i am downloading it from URL method it changes the style tag in border=""/> attribute.
can any body tell me the way to download the exact source code of a webpage.
my code is-
URL url=new URL("http://www.apcob.org/"); InputStream is = url.openStream(); // throws an IOException BufferedReader br = new BufferedReader(new InputStreamReader(is)); String line; File fileDir = new File(contextpath+"\\extractedtxt.txt"); Writer fw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(fileDir), "UTF8")); while ((line = br.readLine()) != null) { // System.out.println("line\n "+line); fw.write("\n"+line); } InputStream in = new FileInputStream(new File(contextpath+"extractedtxt.txt";)); String baseUrl="http://www.apcob.org/"; Document doc=Jsoup.parse(in,"UTF-8",baseUrl); System.out.println(doc);
second method i followed is-
Document doc = Jsoup.connect(url_of_currentpage).get();
这篇关于如何下载网页的源代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!