用java中的xpath和selenium解析HTML表格数据 [英] Parsing HTML table data with xpath and selenium in java

查看:306
本文介绍了用java中的xpath和selenium解析HTML表格数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想收集数据并在没有标签的情况下组织它。它看起来像这样

 < table class =SpecTable> 
< col width =40%/>
< col width =60%/>
< tr>
< td class =LightRowHead>光学变焦:< / td>
< td class =LightRow> 15x< / td>
< / tr>
< tr>
< td class =DarkRowHead>数码变焦:< / td>
< td class =DarkRow> 6x< / td>
< / tr>
< tr>
< td class =LightRowHead>电池类型:< / td>
< td class =LightRow>碱性< / td>
< / tr>
< tr>
< td class =DarkRowHead>分辨率Megapixels:< / td>
< td class =DarkRow> 14 MP< / td>
< / tr>
< / table>

我希望能够提取所有信息字符串,以便我可以存储明文文件与此:


光学变焦:15倍数字变焦:6倍电池类型:碱性分辨率
像素:14 MP

p>



  public static void main(String [] args){

FirefoxProfile profile = new FirefoxProfile();
profile.setPreference(general.useragent.override,一些UA字符串);
WebDriver driver = new FirefoxDriver(profile);

String Url =http://www.walmart.com/ip/Generic-14-MP-X400-BK/19863348;
driver.get(Url);
列表< WebElement> resultsDiv = driver.findElements(By.xpath(// table [contains(@ class,'SpecTable')// td));

System.out.println(resultsDiv.size());
for(int i = 0; i< resultsDiv.size(); i ++){
System.out.println(i + 1 +。+ resultsDiv.get(i).getText() );

$ / code>

我用Selenium在Java中进行编程,我找不到正确的XPath表达式为什么有人可以找出我为什么会犯这个错误,也许给我一些关于如何正确解析这些数据的指针?我对Selenium和XPath非常新,但我需要这个工作。



另外,如果任何人有很好的资源来让我快速学习Selenium和XPath,那也会很大赞赏!

解决方案

可能这会满足您的需求:

  string text = driver.findElement(By.cssSelector(table.SpecTable))。getText(); 

字符串 text 将包含所有文本节点类SpecTable的表。
我更喜欢使用 css ,因为它受IE支持,速度比xpath快。但至于xpath教程,请尝试这个


I want to take the data and organize it without the tags. It looks something like this

<table class="SpecTable">
    <col width="40%" />
    <col width="60%" />
    <tr>
        <td class="LightRowHead">Optical Zoom:</td>
        <td class="LightRow">15x</td>
    </tr>
    <tr>
        <td class="DarkRowHead">Digital Zoom:</td>
        <td class="DarkRow">6x</td>
    </tr>
    <tr>
        <td class="LightRowHead">Battery Type:</td>
        <td class="LightRow">Alkaline</td>
    </tr>
    <tr>
        <td class="DarkRowHead">Resolution Megapixels:</td>
        <td class="DarkRow">14 MP</td>
    </tr>
</table>

and I want to be able to extract all the strings of information so that I can store in a plaintext file with just this:

Optical Zoom: 15x Digital Zoom: 6x Battery Type: Alkaline Resolution Megapixels: 14 MP

public static void main(String[] args) {

        FirefoxProfile profile = new FirefoxProfile();
        profile.setPreference("general.useragent.override", "some UA string");
        WebDriver driver = new FirefoxDriver(profile);

        String Url = "http://www.walmart.com/ip/Generic-14-MP-X400-BK/19863348";
        driver.get(Url);
        List<WebElement> resultsDiv = driver.findElements(By.xpath("//table[contains (@class,'SpecTable')//td"));

        System.out.println(resultsDiv.size());
        for (int i=0; i<resultsDiv.size(); i++) {
            System.out.println(i+1 + ". " + resultsDiv.get(i).getText());
        }

I am programming in Java with Selenium and I cannot figure out the correct XPath expression for it.

Can someone figure out why I err on this and maybe give me some pointers on how I can parse this data correctly? Im very new to Selenium and XPaths but I need this for work.

Also if anyone has any good sources for me to learn Selenium and XPath fast, those would also be greatly appreciated!

解决方案

Probably this will suite your needs:

string text = driver.findElement(By.cssSelector("table.SpecTable")).getText();

String text will contain all text nodes from the table with class SpecTable. I prefer using css, because it's supported by IE and faster than xpath. But as for xpath tutorials try this and this.

这篇关于用java中的xpath和selenium解析HTML表格数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆