在< br>之间提取标签与Nokogiri？ [英] Extracting between <br> tags with Nokogiri?

查看：77 发布时间：2018/6/19 16:17:18 html ruby parsing nokogiri

本文介绍了在< br>之间提取标签与Nokogiri？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图从此网站中提取电话号码和地址使用Nokogiri。它们都在< br> 标签之间。我该如何做到这一点？

如果网站停工，下面是一些HTML的摘录我希望提取电话号码和地址：

 < table width =900style =margin：8px; padding ：5px; font-family：Verdana，Geneva，sans-serif; font-size：12px; line-height：165％; color：＃333333; border-bottom：1px solid #cccccc;>< tbody><< ; tr valign =top>< td> 
< strong> Alana's Cafe< / strong>< br> 
< em> Cafe / Desserts< / em> 
< br> 
 650 348-0417 
< br> 
 1408 Burlingame Ave 
< br> 
< a href =http://www.alanascafe.com/burlingame.htmltarget =_blank> http：//www.alanascafe.com/burlingame.html< / a> 
 
< / td>< td align =right> 
< a href =index.cfm？vid = 44885style =text-decoration：none; color：black> 
< img src =iconmap.pngheight =30border =0>< br> 
地图< / a>< / td>< / tr>< / tbody>< / table> 
 
< table width =900style =margin：8px; padding：5px; font-family：Verdana，Geneva，sans-serif; font-size：12px; line-height：165 ％; color：＃333333; border-bottom：1px solid #cccccc;>< tbody>< tr valign =top>< td> 
< strong>琥珀月亮印度餐厅和酒吧< / strong>< br> 
< em>印度< / em> 
 
< br> 
 1425 Burlingame Ave 
 
 
< / td>< td align =right> 
< a href =index.cfm？vid = 44872style =text-decoration：none; color：black> 
< img src =iconmap.pngheight =30border =0>< br> 
地图< / a>< / td>< / tr>< / tbody>< / table>

解决方案

最简单的就像这样：

  data = doc.search（'em'）。map {| em | em.search（'〜br'）。map {| br | br.next.text.strip}} 
＃=> [[650 348-0417，1408 Burlingame Ave，http://www.alanascafe.com/burlingame.html]等...

这意味着：对于每个em，映射后面的每个兄弟元素br元素之后的文本。

更新

要将其分类为手机/地址，您可以执行以下操作：

data.map {|行| {：电话=> row [0] [/ ^ [\d \（\） - ] + $ /]？ row.shift：nil，：address => row.shift}} ＃=> [{：phone =>650 348-0417，：address =>1408 Burlingame Ave}等等...

I am trying to extract the phone number and the address from this site using Nokogiri. Both of them are between <br> tags. How can I do this?

In case the site is down, here is an excerpt of some of the HTML from which I wish to extract the phone number and address:
<table width="900" style=" margin:8px; padding:5px; font-family:Verdana, Geneva, sans-serif; font-size:12px; line-height:165%; color:#333333; border-bottom:1px solid #cccccc; "><tbody><tr valign="top"><td> <strong>Alana's Cafe</strong><br> <em>Cafe/Desserts </em> <br> 650 348-0417 <br> 1408 Burlingame Ave <br> <a href="http://www.alanascafe.com/burlingame.html" target="_blank">http://www.alanascafe.com/burlingame.html</a> </td><td align="right"> <a href="index.cfm?vid=44885" style="text-decoration:none; color:black"> <img src="iconmap.png" height="30" border="0"><br> Map</a></td></tr></tbody></table> <table width="900" style=" margin:8px; padding:5px; font-family:Verdana, Geneva, sans-serif; font-size:12px; line-height:165%; color:#333333; border-bottom:1px solid #cccccc; "><tbody><tr valign="top"><td> <strong>Amber Moon Indian Restaurant and Bar</strong><br> <em>Indian </em> <br> 1425 Burlingame Ave </td><td align="right"> <a href="index.cfm?vid=44872" style="text-decoration:none; color:black"> <img src="iconmap.png" height="30" border="0"><br> Map</a></td></tr></tbody></table>

解决方案
Simplest would be something like:
data = doc.search('em').map{|em| em.search('~ br').map{|br| br.next.text.strip}} #=> [["650 348-0417", "1408 Burlingame Ave", "http://www.alanascafe.com/burlingame.html"], etc...
That means: For each em, map the text after each following sibling br element.

Update

To sort that into phone / address you could do:
data.map{|row| {:phone => row[0][/^[\d -]+$/] ? row.shift : nil, :address => row.shift}} #=> [{:phone=>"650 348-0417", :address=>"1408 Burlingame Ave"}, etc...

这篇关于在< br>之间提取标签与Nokogiri？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在< br>之间提取标签与Nokogiri？ [英] Extracting between <br> tags with Nokogiri?

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

在&lt; br&gt;之间提取标签与Nokogiri？ [英] Extracting between &lt;br&gt; tags with Nokogiri?

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

在< br>之间提取标签与Nokogiri？ [英] Extracting between <br> tags with Nokogiri?

登录关闭