使用Java在html页面内容中查找元素的Xpath [英] Find Xpath of an element in a html page content using java

查看:605
本文介绍了使用Java在html页面内容中查找元素的Xpath的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开始使用xpath表达式,

I'm begginer to xpath expression ,

我的网址如下:

其中包含html页面内容,使用以下xpaths会在javascript中产生相同的ul元素:

which holds html pagecontent,using following xpaths it results same ul element in javascript:

  1. //*[@id="moreStock_5257711"]
  2. //*[@id="priceWrap"]/div[1]/div/a/following-sibling::ul
  3. //html/body/div/div/div/div/div/div/div/div/div/div/a/following-sibling::ul
  1. //*[@id="moreStock_5257711"]
  2. //*[@id="priceWrap"]/div[1]/div/a/following-sibling::ul
  3. //html/body/div/div/div/div/div/div/div/div/div/div/a/following-sibling::ul

使用此xpaths应该如何在Java中获得相同的ul元素

using this xpaths how sholud i get same ul element in java

我尝试使用xhtml失败的"html清理程序"-

I have tried using "html cleaner" it failed in xpath -

"//*[@id="priceWrap"]/div[1]/div/a/following-sibling::ul",
"//html/body/div/div/div/div/div/div/div/div/div/div/a/following-sibling::ul"

此xpath已为"//* [@ id ='moreStock_5257711']"工作. 因此,下面我在html清洁器中尝试过的代码

it got worked for "//*[@id='moreStock_5257711']" this xpath. So below code which i have tried in html cleaner

package com.test.htmlcleaner.HtmlCleaner;

import java.io.IOException;

import org.htmlcleaner.CleanerProperties;
import org.htmlcleaner.HtmlCleaner;
import org.htmlcleaner.TagNode;
import org.htmlcleaner.XPatherException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

public class Test {
 public static void main(String[] args) {

  try {
 HtmlCleaner htmCleaner = new HtmlCleaner();
   CleanerProperties cleanerProperties = htmCleaner.getProperties();
   cleanerProperties.setTranslateSpecialEntities(true);
   cleanerProperties.setTransResCharsToNCR(true);
   cleanerProperties.setOmitComments(true);

   String s = "http://www.newark.com/white-rodgers/586-902/contactor-spst-no-12vdc-200a-bracket/dp/35M1913?MER=PPSO_N_P_EverywhereElse_None"; 
     Document doc = Jsoup.connect(s).timeout(30000).userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.120 Safari/535.2").get();

    String pageContent=doc.toString();
    TagNode node = htmCleaner.clean(pageContent);
    Object[] statsNode = node.evaluateXPath("//*[@id='moreStock_5257711']");
    if(statsNode.length > 0) {    
             for(int i=0;i<statsNode.length;i++){
               TagNode resultNode = (TagNode)statsNode[i];
               System.out.println("hi");
                System.out.println("Element Text : " +resultNode.getText().toString().trim());                 
               }
          }
  } catch (IOException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  } catch (XPatherException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  }
 }
}

我需要使用Java中的一个软件包对所有xpaths进行工作

I required all xpaths sholud work with one package in java

任何人都可以建议我使用Java来获取获取ul元素的所有xpaths表达式.

Can any one suggest me to get working all xpaths expression of getting ul element using java.

感谢您的问候.

推荐答案

尝试调试由HtmlCleaner创建的实际HTML DOM树.使用以下代码:

Try to debug the actual HTML DOM tree being created by HtmlCleaner. Use the following code:

String pageContent = doc.toString();
TagNode node = htmCleaner.clean(pageContent);

StringWriter buffer = new StringWriter();
node.serialize(new PrettyHtmlSerializer(cleanerProperties), buffer);

System.out.println(buffer.toSting());

现在,尝试将所有XPath应用于此缓冲区输出,并查看为什么它们不起作用.

Now, try to apply all the XPaths on this buffer output and see why they don't work.

这篇关于使用Java在html页面内容中查找元素的Xpath的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆