如何从div中只获取姓名和联系电话? [英] How can i get only name and contact number from div?
问题描述
我正在尝试从 div 中获取姓名和联系电话,而 div 具有三个跨度,但问题是有时 div 只有一个跨度,有时为两个跨度,有时为三个跨度.
第一个跨度有名称.
第二个跨度有其他数据.
第三个span有联系电话
这里是 HTML
<span class="listing-field" id="yui_3_18_1_1_1554645615890_3863">bethbudinich</span><span class="listing-field"><a href="http://Www.redfin.com"target="_blank">查看列表网站</a></span><span class="listing-field" id="yui_3_18_1_1_1554645615890_4443">(206)793-8336
这是我的代码
尝试:name= browser.find_element_by_xpath("//span[@class='listing-field'][1]")name = name.text.strip()打印(姓名:"+姓名)除了:打印(名称丢失")名称 = "不适用"尝试:contact_info= browser.find_element_by_xpath("//span[@class='listing-字段'][3]")contact_info = contact_info.text.strip()打印(联系方式:" + contact_info)除了:print("contact_info 丢失")天 = "不适用"
我的代码没有给我正确的结果.谁能为我提供最好的解决方案.谢谢
您可以迭代抛出联系人并检查是否有子 a
元素以及是否匹配电话号码模式:
contacts = browser.find_elements_by_css_selector("span.listing-field")联系人姓名 = []contact_phone = "不适用"contact_web = "不适用"对于范围内的 i (0, len(contacts)):如果 len(contacts[i].find_elements_by_tag_name("a")) >0:contact_web = contacts[i].find_element_by_tag_name("a").get_attribute("href")elif re.search("\\(\\d+\\)\\s+\\d+-\\d+", contacts[i].text):contact_phone = 联系人[i].text别的:contact_name.append(contacts[i].text)contact_name = ", ".join(contact_name) 如果 len(contact_name) >0 其他不适用"
输出:
<块引用>contact_name: ['凯文霍华德', '霍华德企业']
contact_phone:'(206) 334-8414'
页面有验证码.为了更好地使用 requests,所有信息都在json 格式.
I'm trying to get name and contact number from div and div has three span, but the problem is that sometime div has only one span, some time two and sometime three span.
First span has name.
Second span has other data.
Third span has contact number
Here is HTML
<div class="ds-body-small" id="yui_3_18_1_1_1554645615890_3864">
<span class="listing-field" id="yui_3_18_1_1_1554645615890_3863">beth
budinich</span>
<span class="listing-field"><a href="http://Www.redfin.com"
target="_blank">See listing website</a></span>
<span class="listing-field" id="yui_3_18_1_1_1554645615890_4443">(206)
793-8336</span>
</div>
Here is my Code
try:
name= browser.find_element_by_xpath("//span[@class='listing-field'][1]")
name = name.text.strip()
print("name : " + name)
except:
print("Name are missing")
name = "N/A"
try:
contact_info= browser.find_element_by_xpath("//span[@class='listing-
field'][3]")
contact_info = contact_info.text.strip()
print("contact info : " + contact_info)
except:
print("contact_info are missing")
days = "N/A"
My code is not giving me correct result. Can anyone provide me best possible solution. Thanks
You can iterate throw contacts and check, if there's child a
element and if match phone number pattern:
contacts = browser.find_elements_by_css_selector("span.listing-field")
contact_name = []
contact_phone = "N/A"
contact_web = "N/A"
for i in range(0, len(contacts)):
if len(contacts[i].find_elements_by_tag_name("a")) > 0:
contact_web = contacts[i].find_element_by_tag_name("a").get_attribute("href")
elif re.search("\\(\\d+\\)\\s+\\d+-\\d+", contacts[i].text):
contact_phone = contacts[i].text
else:
contact_name.append(contacts[i].text)
contact_name = ", ".join(contact_name) if len(contact_name) > 0 else "N/A"
Output:
contact_name: ['Kevin Howard', 'Howard enterprise']
contact_phone: '(206) 334-8414'
The page has captcha. To scrape better to use requests, all information provided in json format.
这篇关于如何从div中只获取姓名和联系电话?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!