如何从 <td> 中提取值通过使用硒 [英] How to extract the value from the <td> by using selenium

查看:25
本文介绍了如何从 <td> 中提取值通过使用硒的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我使用 selenium webdriver 来解析来自网站的日期 (class_name, tag_name, Xpath, css_selector) 并且我所有获取数据的尝试都没有成功.在这个例子中,我正在尝试使用 BS4.我得到的只是来自th"的文本.如何从td"获取文本?

So i'm using selenium webdriver to parse date from web site by (class_name, tag_name, Xpath, css_selector) and all my attempts to fetch data are unsuccessful. And in this exapmple i'm trying with BS4. And all i get it's only text from "th". How to get text from "td"?

driver.get(slot)
html = driver.page_source
soup  = BeautifulSoup(html, 'html.parser')
table = soup.find_all('tr')
for x in table:
    print(x.get_text())`enter code here

html

<table data-v data-qa="table" class"table">
   <tr data-v>
      <th data-v> Name </th>
      <th data-v> Last_name </th>
      <th data-v> Phone </th>
      <th data-v> City </th>
      <th data-v> Salary </th>
      </tr data-v>
   <tr data-v data-qa="table-row">
      <td data-v class="table-name not-editable">Tetyana</td>
      <td data-v class="table-last-name not-editable">Ferguson</td>
      <td data-v class="table-phone not-editable">252-823-1658</td>
      <td data-v class="table-city not-editable">Tarboro</td>
      <td data-v class="table-salary not-editable">10000</td>
      </tr data-v>
   <tr data-v data-qa="table-row">
      <td data-v class="table-name not-editable">Alyonka</td>
      <td data-v class="table-last-name not-editable">Andrews</td>
      <td data-v class="table-phone not-editable">603-608-7504</td>
      <td data-v class="table-city not-editable">Northwood</td>
      <td data-v class="table-salary not-editable">12000</td>
      </tr data-v>
</table>

推荐答案

您已将tr"放在 soup.find_all('tr') 中,因此您将 tr(table) 标题.如果你把 'td' 放在那里,那么你会得到 td(row) 数据.阅读表格时,您也可以尝试使用 pandas.read_html 方法,该方法在许多情况下都会有所帮助.

you have put "tr' in soup.find_all('tr') so you tr(table) header. if you put 'td' there then you will get td(row) data. When reading the table, you can also try pandas.read_html method which in many scenarios will be helpful.

import pandas as pd
from bs4 import BeautifulSoup
html_src="""
<table data-v data-qa="table" class"table">
   <tr data-v>
      <th data-v> Name </th>
      <th data-v> Last_name </th>
      <th data-v> Phone </th>
      <th data-v> City </th>
      <th data-v> Salary </th>
      </tr data-v>
   <tr data-v data-qa="table-row">
      <td data-v class="table-name not-editable">Tetyana</td>
      <td data-v class="table-last-name not-editable">Ferguson</td>
      <td data-v class="table-phone not-editable">252-823-1658</td>
      <td data-v class="table-city not-editable">Tarboro</td>
      <td data-v class="table-salary not-editable">10000</td>
      </tr data-v>
   <tr data-v data-qa="table-row">
      <td data-v class="table-name not-editable">Alyonka</td>
      <td data-v class="table-last-name not-editable">Andrews</td>
      <td data-v class="table-phone not-editable">603-608-7504</td>
      <td data-v class="table-city not-editable">Northwood</td>
      <td data-v class="table-salary not-editable">12000</td>
      </tr data-v>
</table>
"""

选项 1:

df=pd.read_html(html_src)
print(df[0].head(10))
#output
 Name Last_name         Phone       City  Salary
0  Tetyana  Ferguson  252-823-1658    Tarboro   10000
1  Alyonka   Andrews  603-608-7504  Northwood   12000

选项 2:

soup=BeautifulSoup(html_src)
for each in soup.find_all("td"):
    print(each.get_text())
#output:
Tetyana
Ferguson
252-823-1658
Tarboro
10000
Alyonka
Andrews
603-608-7504
Northwood
12000

这篇关于如何从 &lt;td&gt; 中提取值通过使用硒的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆