如何使用Selenium和WebDriver从表中识别和打印数据列表? [英] How to identify and print lists of data from a table using Selenium and WebDriver?

查看:133
本文介绍了如何使用Selenium和WebDriver从表中识别和打印数据列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张桌子,想通过col抓取数据,然后逐个打印每条数据.表格如下所示: https://i.stack.imgur.com/7F60c. png

I have a table and want to grab data by col and then print every piece of data one by one. Here is what the table looks like: https://i.stack.imgur.com/7F60c.png

这是HTML:

<div id="ember10911" class="ember-view flowsheet-table">
    <div class="header flowsheet-column">
        <div class="flowsheet-cell flowsheet-row"> </div>
        <div id="ember10912" class="ember-view flowsheet-row header-row">
            <h4 class="header4semibold " data-ember-action="10913">
                <div class="ellipsis" style="width: 302px">Vitals</div>
            </h4>
        </div>
        <div id="ember10914" class="ember-view flowsheet-row ellipsis"> Height </div>
        <div id="ember10915" class="ember-view flowsheet-row ellipsis"> Weight </div>
        <div id="ember10916" class="ember-view flowsheet-row ellipsis"> BMI </div>
        <div id="ember10917" class="ember-view flowsheet-row ellipsis"> BP </div>
        <div id="ember10918" class="ember-view flowsheet-row ellipsis"> Temperature </div>
        <div id="ember10919" class="ember-view flowsheet-row ellipsis"> Pulse </div>
        <div id="ember10920" class="ember-view flowsheet-row ellipsis"> Respiratory rate </div>
        <div id="ember10921" class="ember-view flowsheet-row ellipsis"> O2 Saturation </div>
        <div id="ember10922" class="ember-view flowsheet-row ellipsis"> Pain </div>
        <div id="ember10923" class="ember-view flowsheet-row ellipsis"> Head Circumference </div>
    </div>
    <div class="flowsheet-scroll-region">
        <div id="ember11083" class="ember-view flowsheet-column">
            <div class="flowsheet-cell flowsheet-table-header selectable" data-ember-action="11084">
                <div class="p-semibold">08/24/17</div>
                <div class="p-semibold">8:44 PM</div>
            </div>
            <div id="ember11086" class="ember-view flowsheet-cell header-row"> </div>
            <div id="ember11088" class="ember-view flowsheet-cell"> </div>
            <div id="ember11090" class="ember-view flowsheet-cell selectable">
                <span data-element="flowsheet-cell-value" class="">152 </span>
                <span class="p-666" data-element="flowsheet-cell-units">lb</span>
            </div>
            <div id="ember11092" class="ember-view flowsheet-cell"> </div>
            <div id="ember11094" class="ember-view flowsheet-cell selectable wide-cell">
                <span data-element="flowsheet-cell-value" class="">102/64 </span>
                <span class="p-666" data-element="flowsheet-cell-units">mmHg</span>
            </div>
            <div id="ember11096" class="ember-view flowsheet-cell selectable">
                <span data-element="flowsheet-cell-value" class="">97.9 </span>
                <span class="p-666" data-element="flowsheet-cell-units">°F</span>
            </div>
            <div id="ember11098" class="ember-view flowsheet-cell selectable">
                <span data-element="flowsheet-cell-value" class="">72 </span>
                <span class="p-666" data-element="flowsheet-cell-units">bpm</span>
            </div>
            <div id="ember11100" class="ember-view flowsheet-cell"> </div>
            <div id="ember11102" class="ember-view flowsheet-cell"> </div>
            <div id="ember11104" class="ember-view flowsheet-cell"> </div>
            <div id="ember11106" class="ember-view flowsheet-cell"> </div>
        </div>
    </div>
</div>

我尝试过:

List<WebElement> list = driver.findElements(By.xpath("//*[@id=\"ember4341\"]"));
for (int i = 0; i < list.size(); i++) {
    System.out.print("Date" + driver.findElement(By.xpath("//*[@id=\"ember4929\"]/div[1]/div[1]")) + "Height" + driver.findElement(By.xpath("//*[@id=\"ember4944\"]/span[1]")) + "Weight" + driver.findElement(By.xpath("//*[@id=\"ember4946\"]/span[1]"))+......);
    System.out.println("");
}

我想使用一个列表来存储每个col的信息,然后使用for循环进行遍历.我期望的是:

I want to use a list to store each col's information, and then use a for loop to go through it. What I expect is:

    "Date 10/17/16 Height 64 Weight 106........."
    "Date 11/17/17 Height 64 Weight 109.99......"

.......

但是我得到的是:

"Date[[ChromeDriver: chrome on MAC (bf163179196c60704f582de0d421323c)] -> xpath: //*[@id="ember4929"]/div[1]/div[1]]Height[[ChromeDriver: chrome on MAC (bf163179196c60704f582de0d421323c)] -> xpath: //*[@id="ember4944"]/span[1]]Weight[[ChromeDriver: chrome on MAC (bf163179196c60704f582de0d421323c)] -> xpath: //*[@id="ember4946"]/span[1]]"

我也尝试获取第一列的数据并打印,我使用代码:

I also tried to just get first col's data and print, I use code:

System.out.println("Date: " + driver.findElement(By.cssSelector("#ember5060 > div.flowsheet-cell.flowsheet-table-header.selectable > div:nth-child(1)")).getText() + "  " + "Height: " + driver.findElement(By.cssSelector("#ember5075 > div")).getText() + "   " + "Weight: " + driver.findElement(By.cssSelector("#ember5077 > div")).getText());

,它奏效了.我知道了:

and it worked. I got:

Date: 10/25/17  Height: 67 in   Weight: 139.99 lb

现在我可以打印第一个col的数据了,但是我不知道如何迭代地打印表中的每个col.所以我该怎么做?谢谢!

Now I can print the first col's data, But I do not know how to do it iteratively and print each col in the table. So what should I do? Thanks!

推荐答案

该问题有多个HTML修订版,这是我用来提出解决方案的版本:

The question had several HTML revisions, and this is the one I used to come up with the solution:

<div id="ember4342" class="ember-view flowsheet-table">
    <div class="header flowsheet-column">
        <div class="flowsheet-cell flowsheet-row">
        </div>
        <div id="ember4352" class="ember-view flowsheet-row header-row">
            <h4 class="header4semibold " data-ember-action="4353">
                <div class="ellipsis" style="width: 694px">Vitals</div>
            </h4>
        </div>
        <div id="ember4355" class="ember-view flowsheet-row ellipsis"> Height
        </div>
        <div id="ember4357" class="ember-view flowsheet-row ellipsis"> Weight
        </div>
        <div id="ember4359" class="ember-view flowsheet-row ellipsis"> BMI
        </div>
        <div id="ember4361" class="ember-view flowsheet-row ellipsis"> BMI Percentile
        </div>
        <div id="ember4363" class="ember-view flowsheet-row ellipsis"> BP
        </div>
        <div id="ember4365" class="ember-view flowsheet-row ellipsis"> Temperature
        </div>
        <div id="ember4367" class="ember-view flowsheet-row ellipsis"> Pulse
        </div>
        <div id="ember4369" class="ember-view flowsheet-row ellipsis"> Respiratory rate
        </div>
        <div id="ember4371" class="ember-view flowsheet-row ellipsis"> O2 Saturation
        </div>
        <div id="ember4373" class="ember-view flowsheet-row ellipsis"> Pain
        </div>
        <div id="ember4375" class="ember-view flowsheet-row ellipsis"> Head Circumference
        </div>
    </div>
    <div class="flowsheet-scroll-region">
        <div id="ember5636" class="ember-view flowsheet-column">
            <div class="flowsheet-cell flowsheet-table-header selectable" data-ember-action="5637">
                <!---->
                <div class="p-semibold">10/17/16</div>
                <div class="p-semibold"><!----></div>
                <!----></div>
            <div id="ember5639" class="ember-view flowsheet-cell header-row"><!----></div>
            <div id="ember5641" class="ember-view flowsheet-cell selectable"><!----><!---->
                <span data-element="flowsheet-cell-value" class="">64 </span><span class="p-666"
                                                                                   data-element="flowsheet-cell-units">in</span>
            </div>
            <div id="ember5643" class="ember-view flowsheet-cell selectable"><!----><!---->
                <span data-element="flowsheet-cell-value" class="">106 </span><span class="p-666"
                                                                                    data-element="flowsheet-cell-units">lb</span>
            </div>
            <div id="ember5645" class="ember-view flowsheet-cell selectable"><!----><!---->
                <span data-element="flowsheet-cell-value" class="">18.19 </span><span class="p-666"
                                                                                      data-element="flowsheet-cell-units"><!----></span>
            </div>
            <div id="ember5647" class="ember-view flowsheet-cell"><!----></div>
            <div id="ember5649" class="ember-view flowsheet-cell selectable wide-cell"><!----><!---->
                <span data-element="flowsheet-cell-value" class="">111/83 </span><span class="p-666"
                                                                                       data-element="flowsheet-cell-units">mmHg</span>
            </div>
            <div id="ember5651" class="ember-view flowsheet-cell selectable"><!----><!---->
                <span data-element="flowsheet-cell-value" class="">100.9 </span><span class="p-666"
                                                                                      data-element="flowsheet-cell-units">°F</span>
            </div>
            <div id="ember5653" class="ember-view flowsheet-cell selectable"><!----><!---->
                <span data-element="flowsheet-cell-value" class="">86 </span><span class="p-666"
                                                                                   data-element="flowsheet-cell-units">bpm</span>
            </div>
            <div id="ember5655" class="ember-view flowsheet-cell selectable"><!----><!---->
                <span data-element="flowsheet-cell-value" class="">14 </span><span class="p-666"
                                                                                   data-element="flowsheet-cell-units">bpm</span>
            </div>
            <div id="ember5657" class="ember-view flowsheet-cell selectable"><!----></div>
            <div id="ember5659" class="ember-view flowsheet-cell selectable"><!----></div>
            <div id="ember5661" class="ember-view flowsheet-cell selectable"><!----></div>
        </div>
        <div id="ember5663" class="ember-view flowsheet-column editable">
            <div class="flowsheet-cell flowsheet-table-header selectable" data-ember-action="5664">
                <!---->
                <div class="p-link-semibold" data-element="flowsheet-column-date">11/17/17</div>
                <div class="p-link-semibold" data-element="flowsheet-column-time"><!----></div>
                <!----></div>
            <div id="ember5666" class="ember-view flowsheet-cell header-row"><!----></div>
            <div id="ember5668" class="ember-view flowsheet-cell selectable"><!----><!---->
                <div data-element="flowsheet-cell-value-and-units" class="p-link">64 in</div>
            </div>
            <div id="ember5670" class="ember-view flowsheet-cell selectable"><!----><!---->
                <div data-element="flowsheet-cell-value-and-units" class="p-link">109.99 lb</div>
            </div>
            <div id="ember5672" class="ember-view flowsheet-cell selectable"><!----><!---->
                <span data-element="flowsheet-cell-value" class="">18.88 </span><span class="p-666"
                                                                                      data-element="flowsheet-cell-units"><!----></span>
            </div>
            <div id="ember5674" class="ember-view flowsheet-cell"><!----></div>
            <div id="ember5676" class="ember-view flowsheet-cell selectable wide-cell"><!----><!---->
                <div data-element="flowsheet-cell-value-and-units" class="p-link">116/72 mmHg</div>
            </div>
            <div id="ember5678" class="ember-view flowsheet-cell selectable"><!----><!---->
                <div data-element="flowsheet-cell-value-and-units" class="p-link">101.2 °F</div>
            </div>
            <div id="ember5680" class="ember-view flowsheet-cell selectable"><!----><!---->
                <div data-element="flowsheet-cell-value-and-units" class="p-link">87 bpm</div>
            </div>
            <div id="ember5682" class="ember-view flowsheet-cell selectable"><!----><!---->
                <div data-element="flowsheet-cell-value-and-units" class="p-link">16 bpm</div>
            </div>
            <div id="ember5684" class="ember-view flowsheet-cell selectable"><!----></div>
            <div id="ember5686" class="ember-view flowsheet-cell selectable"><!----></div>
            <div id="ember5688" class="ember-view flowsheet-cell selectable"><!----></div>
        </div>
        <div id="ember5690" class="ember-view flowsheet-column editable">
            <div class="flowsheet-cell flowsheet-table-header selectable" data-ember-action="5691">
                <!---->
                <div class="p-link-semibold" data-element="flowsheet-column-date">01/17/18</div>
                <div class="p-link-semibold" data-element="flowsheet-column-time"><!----></div>
                <!----></div>
            <div id="ember5693" class="ember-view flowsheet-cell header-row"><!----></div>
            <div id="ember5695" class="ember-view flowsheet-cell selectable"><!----><!---->
                <div data-element="flowsheet-cell-value-and-units" class="p-link">64 in</div>
            </div>
            <div id="ember5697" class="ember-view flowsheet-cell selectable"><!----><!---->
                <div data-element="flowsheet-cell-value-and-units" class="p-link">106 lb</div>
            </div>
            <div id="ember5699" class="ember-view flowsheet-cell selectable"><!----><!---->
                <span data-element="flowsheet-cell-value" class="">18.19 </span><span class="p-666"
                                                                                      data-element="flowsheet-cell-units"><!----></span>
            </div>
            <div id="ember5701" class="ember-view flowsheet-cell"><!----></div>
            <div id="ember5703" class="ember-view flowsheet-cell selectable wide-cell"><!----><!---->
                <div data-element="flowsheet-cell-value-and-units" class="p-link">123/84 mmHg</div>
            </div>
            <div id="ember5705" class="ember-view flowsheet-cell selectable"><!----><!---->
                <div data-element="flowsheet-cell-value-and-units" class="p-link">100.3 °F</div>
            </div>
            <div id="ember5707" class="ember-view flowsheet-cell selectable"><!----><!---->
                <div data-element="flowsheet-cell-value-and-units" class="p-link">77 bpm</div>
            </div>
            <div id="ember5709" class="ember-view flowsheet-cell selectable"><!----><!---->
                <div data-element="flowsheet-cell-value-and-units" class="p-link">14 bpm</div>
            </div>
            <div id="ember5711" class="ember-view flowsheet-cell selectable"><!----></div>
            <div id="ember5713" class="ember-view flowsheet-cell selectable"><!----></div>
            <div id="ember5715" class="ember-view flowsheet-cell selectable"><!----></div>
        </div>
        <div id="ember5717" class="ember-view flowsheet-column editable">
            <div class="flowsheet-cell flowsheet-table-header selectable" data-ember-action="5718">
                <!---->
                <div class="p-link-semibold" data-element="flowsheet-column-date">02/17/18</div>
                <div class="p-link-semibold" data-element="flowsheet-column-time"><!----></div>
                <!----></div>
            <div id="ember5720" class="ember-view flowsheet-cell header-row"><!----></div>
            <div id="ember5722" class="ember-view flowsheet-cell selectable"><!----><!---->
                <div data-element="flowsheet-cell-value-and-units" class="p-link">64 in</div>
            </div>
            <div id="ember5724" class="ember-view flowsheet-cell selectable"><!----><!---->
                <div data-element="flowsheet-cell-value-and-units" class="p-link">106 lb</div>
            </div>
            <div id="ember5726" class="ember-view flowsheet-cell selectable"><!----><!---->
                <span data-element="flowsheet-cell-value" class="">18.19 </span><span class="p-666"
                                                                                      data-element="flowsheet-cell-units"><!----></span>
            </div>
            <div id="ember5728" class="ember-view flowsheet-cell"><!----></div>
            <div id="ember5730" class="ember-view flowsheet-cell selectable wide-cell"><!----><!---->
                <div data-element="flowsheet-cell-value-and-units" class="p-link">121/70 mmHg</div>
            </div>
            <div id="ember5732" class="ember-view flowsheet-cell selectable"><!----><!---->
                <div data-element="flowsheet-cell-value-and-units" class="p-link">100.8 °F</div>
            </div>
            <div id="ember5734" class="ember-view flowsheet-cell selectable"><!----><!---->
                <div data-element="flowsheet-cell-value-and-units" class="p-link">77 bpm</div>
            </div>
            <div id="ember5736" class="ember-view flowsheet-cell selectable"><!----><!---->
                <div data-element="flowsheet-cell-value-and-units" class="p-link">18 bpm</div>
            </div>
            <div id="ember5738" class="ember-view flowsheet-cell selectable"><!----></div>
            <div id="ember5740" class="ember-view flowsheet-cell selectable"><!----></div>
            <div id="ember5742" class="ember-view flowsheet-cell selectable"><!----></div>
        </div>
        <div id="ember5743" class="ember-view flowsheet-column flowsheet-add-column current-context">
            <div class="flowsheet-cell selectable" data-ember-action="5744">
                <a class="icon icon-add" data-element="flowsheet-add-column-button"></a>
            </div>
            <div class="flowsheet-cell header-row"></div>
            <div class="flowsheet-cell selectable" data-ember-action="5745"></div>
            <div class="flowsheet-cell selectable" data-ember-action="5746"></div>
            <div class="flowsheet-cell selectable" data-ember-action="5747"></div>
            <div class="flowsheet-cell selectable" data-ember-action="5748"></div>
            <div class="flowsheet-cell selectable" data-ember-action="5749"></div>
            <div class="flowsheet-cell selectable" data-ember-action="5750"></div>
            <div class="flowsheet-cell selectable" data-ember-action="5751"></div>
            <div class="flowsheet-cell selectable" data-ember-action="5752"></div>
            <div class="flowsheet-cell selectable" data-ember-action="5753"></div>
            <div class="flowsheet-cell selectable" data-ember-action="5754"></div>
            <div class="flowsheet-cell selectable" data-ember-action="5755"></div>
        </div>
    </div>

    <div id="ember4390" class="ember-view"><!----></div>

    <div id="ember4399" class="ember-view"><!---->
        <!----></div>

    <div id="ember4400" class="ember-view"><!----></div>
</div>

这是完整的脚本.运行需要几秒钟,因为由于单位和值元素结构中的潜在差异,有些短暂但必不可少的超时要忍受:

Here is the full script. It takes a few seconds to run, because there were some short but necessary timeouts to endure due to potential variances in the units and values element structure:

import org.openqa.selenium.By;
import org.openqa.selenium.SearchContext;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.TimeUnit;

public class WebDriverScratch {

    private static final String CHROME_DRIVER = "C:\\Users\\Joe\\Downloads\\chromedriver_win32\\chromedriver.exe";
    private static final String HTML_FILE = "file:///C:/Users/Joe/IdeaProjects/scratch/src/main/resources/test.html";
    private static final long SHORT_TIMEOUT = 1;                // 1 second
    private static final long VERY_SHORT_TIMEOUT_MILLIS = 100;  // 100 milliseconds
    private static final String EMPTY = "";

    public static void main(String[] args) {
        WebDriver driver = getDriver();
        driver.get(HTML_FILE);
        driver.manage().timeouts().implicitlyWait(SHORT_TIMEOUT, TimeUnit.SECONDS);

        List<WebElement> labels = driver.findElements(By.cssSelector("div.header div.flowsheet-row.ellipsis"));
        List<WebElement> dateElms = driver.findElements(By.cssSelector("div.flowsheet-table-header div:first-child"));
        List<ColumnData> cols = new ArrayList<>(dateElms.size());
        for (WebElement date : dateElms) {
            // parse the data for this column
            ColumnData col = new ColumnData(date.getText());
            WebElement column = date.findElement(By.xpath("../.."));
            List<WebElement> dataCells = column.findElements(By.cssSelector("div.ember-view.flowsheet-cell:not(.header-row)"));
            for (int i = 0; i < dataCells.size(); i++) {
                // get the header label by index
                WebElement dataCell = dataCells.get(i);
                String label = labels.get(i).getText();
                WebElement valueAndUnits = findElementOrNull(driver, dataCell, By.cssSelector("div[data-element='flowsheet-cell-value-and-units']"));
                if (valueAndUnits != null) {
                    col.addEntry(label, valueAndUnits.getText());
                } else {
                    // could be empty data cell, or separate value and units
                    WebElement value = findElementOrNull(driver, dataCell, By.cssSelector("span[data-element='flowsheet-cell-value']"));
                    if (value != null) {
                        // since value element is present, assume units element is too
                        WebElement units = dataCell.findElement(By.cssSelector("span[data-element='flowsheet-cell-units']"));
                        col.addEntry(label, value.getText(), units.getText());
                    } else {
                        // empty data
                        col.addEntry(label, EMPTY);
                    }
                }
            }
            // done parsing data for this column
            cols.add(col);
        }
        driver.close();

        // print it all out
        printData(cols);
    }

    /**
     * simple container for holding parsed data
     */
    private static class ColumnData {

        public class Entry {
            public final String label;
            public final String valueAndUnits;

            public Entry(String label, String valueAndUnits) {
                this.label = label;
                this.valueAndUnits = valueAndUnits;
            }
        }

        public final String date;
        public final List<Entry> entries = new ArrayList<>();

        public ColumnData(String date) {
            this.date = date;
        }

        public void addEntry(String label, String valueAndUnits) {
            entries.add(new Entry(label, valueAndUnits));
        }

        public void addEntry(String label, String value, String units) {
            entries.add(new Entry(label, String.format("%s %s", value, units)));
        }
    }

    private static WebDriver getDriver() {
        System.setProperty("webdriver.chrome.driver", CHROME_DRIVER);
        ChromeOptions options = new ChromeOptions();
        options.addArguments("disable-infobars");
        options.addArguments("start-maximized");
        return new ChromeDriver(options);
    }

    /**
     * returns null if element is not present
     */
    private static WebElement findElementOrNull(WebDriver driver, SearchContext parent, By by) {
        driver.manage().timeouts().implicitlyWait(VERY_SHORT_TIMEOUT_MILLIS, TimeUnit.MILLISECONDS);
        try {
            return parent.findElement(by);
        } catch (RuntimeException ex) {
            return null;
        } finally {
            driver.manage().timeouts().implicitlyWait(SHORT_TIMEOUT, TimeUnit.SECONDS);
        }
    }

    private static void printData(List<ColumnData> allData) {
        for (ColumnData data : allData) {
            System.out.println("Date: " + data.date);
            for (ColumnData.Entry entry : data.entries) {
                System.out.println(String.format("  %s: %s", entry.label, entry.valueAndUnits));
            }
        }
    }
}

输出:

Date: 10/17/16
  Height: 64 in
  Weight: 106 lb
  BMI: 18.19 
  BMI Percentile: 
  BP: 111/83 mmHg
  Temperature: 100.9 °F
  Pulse: 86 bpm
  Respiratory rate: 14 bpm
  O2 Saturation: 
  Pain: 
  Head Circumference: 
Date: 11/17/17
  Height: 64 in
  Weight: 109.99 lb
  BMI: 18.88 
  BMI Percentile: 
  BP: 116/72 mmHg
  Temperature: 101.2 °F
  Pulse: 87 bpm
  Respiratory rate: 16 bpm
  O2 Saturation: 
  Pain: 
  Head Circumference: 
Date: 01/17/18
  Height: 64 in
  Weight: 106 lb
  BMI: 18.19 
  BMI Percentile: 
  BP: 123/84 mmHg
  Temperature: 100.3 °F
  Pulse: 77 bpm
  Respiratory rate: 14 bpm
  O2 Saturation: 
  Pain: 
  Head Circumference: 
Date: 02/17/18
  Height: 64 in
  Weight: 106 lb
  BMI: 18.19 
  BMI Percentile: 
  BP: 121/70 mmHg
  Temperature: 100.8 °F
  Pulse: 77 bpm
  Respiratory rate: 18 bpm
  O2 Saturation: 
  Pain: 
  Head Circumference: 

这篇关于如何使用Selenium和WebDriver从表中识别和打印数据列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆