为什么当我从网络浏览器打开它并用Java读取它时,html源代码是不同的? [英] Why html source is different when I opened it from web browser and read it in Java?

查看:123
本文介绍了为什么当我从网络浏览器打开它并用Java读取它时,html源代码是不同的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



当我从网络浏览器打开html源代码时,我可以看到那里的数据。

但是当我从java读取这个html页面时。我无法访问数据。

在我保存了这个html文件后,将其作为本地文件读取,然后我能够读取那里的数据。



以eBay.com.au为例。

// ------- - 示例---------



目标网页
网址:http://www.ebay.com.au/sch/i .html?_trksid = p3907.m570.l1311& _nkw = imac + 27& _sacat = 0& _from = R40

这是我的Java代码

  import org.htmlcleaner.CleanerProperties; 
import org.htmlcleaner.TagNode;
导入org.htmlcleaner.HtmlCleaner;
import java.net.URL;


public class HtmlCleanerTest
{

public static void main(String [] args)throws Exception
{

CleanerProperties props = new CleanerProperties();

URL myURL =新网址(http://www.ebay.com.au/sch/i.html?_trksid=p3907.m570.l1311&_nkw=imac+27&_sacat=0& ; _from = R40\" );

TagNode tagNode = new HtmlCleaner(props).clean(myURL);

Object [] myNodes = tagNode.getElementsByAttValue(class,s1,true,true);

for(Object oNote:myNodes)
{
TagNode n =(TagNode)oNote;
System.out.println(n.getText());

}
}
}

我可以通过使用此代码获得每个产品的价格,但我希望通过使用此代码获取卖家的位置信息。我该怎么做?



// ---重新编辑--------------------- --------------解决方案-------------我已经找到了解决我的问题的方法,
我在这里为像我这样的人发布它有同样的问题。
我不是说这是最好的解决方案,但我希望它能给你一个想法。
在这里。

  import org.openqa.selenium.By; 
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
导入org.openqa.selenium.firefox.FirefoxDriver;
import java.util.List ;;

public class Test {
public static void main(String [] args)
{
WebDriver driver = new FirefoxDriver();
driver.get(http://www.ebay.com.au/sch/i.html?scp=ce0&_sacat=0&_from=R40&_nkw=imac+27&_pppn=r1&_rdc = 1\" );

driver.findElement(By.id(e1-14))。click();

driver.findElement(By.name(Stores))。click();
driver.findElement(By.id(e1-3))。click();

driver.quit();
}
}

/ ------- -------
------ END -------
--------------
/



我带着一个问题来到这里,如果HTML文件带有Javascript,我们该如何从Javascript中获取数据,并且完全执行Javascript。我猜我不是很好的提问者。

解决方案

可能页面有一些由浏览器执行的JavaScript代码,数据加载到HTML之后。使用Java只读HTML不会执行JavaScript,因此额外的数据在流中不可见。



编辑:
像HtmlUnit这样的库可能有助于解决在某种程度上加载Ajaxified Html页面的常见问题: http://htmlunit.sourceforge.net/

I have a question about parsering online html page.

when I open html source from a web browser, I can see the data in there.

But when I read this html page from java. I can not reach the data.

after I saved this html file, and read it as local file,

then I am able to read the data from there.

I take eBay.com.au as a example.

//--------Example---------

target web page URL:http://www.ebay.com.au/sch/i.html?_trksid=p3907.m570.l1311&_nkw=imac+27&_sacat=0&_from=R40

Here is my Java code

import org.htmlcleaner.CleanerProperties;
import org.htmlcleaner.TagNode;
import org.htmlcleaner.HtmlCleaner;
import java.net.URL;


public class HtmlCleanerTest
{

    public static void main(String[] args) throws Exception
    {

        CleanerProperties props = new CleanerProperties();

        URL myURL = new URL("http://www.ebay.com.au/sch/i.html?_trksid=p3907.m570.l1311&_nkw=imac+27&_sacat=0&_from=R40");

        TagNode tagNode = new HtmlCleaner(props).clean(myURL);

        Object[] myNodes = tagNode.getElementsByAttValue("class", "s1", true, true);

        for(Object oNote : myNodes)
        {
            TagNode n = (TagNode) oNote;
            System.out.println(n.getText());

        }
    }
}

I can get each product price by using this code, but I expected to get sellers location info by using this. How do I do that?

//---RE-edited -------------------------------

I have found a way to solv my question, I posted it here for someone like me has same problem. I am not saying it is best solution for this, but I hope it may give you a thought. here it is.

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;
import java.util.List;;

public class Test{
    public static void main(String[] args)
    {
        WebDriver driver = new FirefoxDriver();
        driver.get("http://www.ebay.com.au/sch/i.html?scp=ce0&_sacat=0&_from=R40&_nkw=imac+27&_pppn=r1&_rdc=1");

        driver.findElement(By.id("e1-14")).click();

        driver.findElement(By.name("Stores")).click();
        driver.findElement(By.id("e1-3")).click();

        driver.quit();
    }
}

/-------------- ------END------- --------------/

I came to here with one question, what if HTML File come with Javascript, How do we grab data from it with Javascript complete executed. I guess I am not very good questioner.

解决方案

Probably the page has some JavaScript code that is executed by the browser and loads more data to the page, after the HTML has been loaded. Reading only the HTML with Java does not execute the JavaScript, hence additional data is not visible in the stream.

Edit: A library like HtmlUnit may help in solving the common problem of loading Ajaxified Html pages to a certain degree: http://htmlunit.sourceforge.net/

这篇关于为什么当我从网络浏览器打开它并用Java读取它时,html源代码是不同的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆