在php / regex中解析表内容,并得到td的结果 [英] Parsing table content in php/regex and getting result by td

查看:168
本文介绍了在php / regex中解析表内容,并得到td的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张像这样的表,我花了一整天的时间试图从中获取数据:

 < table class =table table-condensed> 
< tr>
< td> Monthely租金< / td>
< td>< strong> Fr。 1'950。 < /强>< / TD>
< / tr>

< tr>
< td>客房< / td>
< td>< strong> 3< / strong>< / td>
< / tr>

< tr>
< td>表面< / td>
< td>< strong> 93m2< / strong>< / td>

< / tr>

< tr>
< td>合同日期< / td>
< td>< strong> 01.04.17< / strong>< / td>
< / tr>

< / table>

正如你可以看到数据组织良好,我试图得到这个结果: p>

 月租=>神父。 1'950。 
房间=> 3
Surface => 93m2
合约日期=> 01.04.17

我有一个变量 $ table 并尝试使用DOM

  $ dom = new DOMDocument(); 
$ dom-> loadHTML($ table);
$ dom = new \DomXPath($ dom);
$ result = $ dom-> query('// table / tr');
return $ result;

但无济于事,是否有更简单的方法来获取php / regex中的内容? p>

解决方案

您正在DOM和Xpath的正确轨道上。不要使用正则表达式来解析HTML / XML。 RegEx用于匹配文本,经常用作解析器的一部分。但是一个格式的解析器知道它的功能 - RegEx没有。



你应该保持你的变量名称更干净一点。不要在同一上下文中为同一个变量分配不同的类型。它只显示变量名称可能是通用的。



DOMXpath :: query()允许您使用Xpath表达式,但只能返回一个节点列表的表达式。 DOMXpath :: evaluate()也允许您提取标量值。



所以你可以获取 tr 元素,迭代它们,并使用附加表达式使用 tr 元素作为上下文获取两个值。

  $ document = new \DOMDocument(); 
$ document-> loadHTML($ table);
$ xpath = new \DOMXPath($ document);

foreach($ xpath-> evaluate('// table / tr')as $ tr){
var_dump(
$ xpath-> evaluate('string td [1])',$ tr),
$ xpath-> evaluate('string(td [2] / strong)',$ tr)
);
}

输出:



<$
(8)房间$ b()
$ b string(1)3
string(7)Surface
string(4)93m2
string(16)合约日期
string 8)01.04.17


I have a table like this which I spent a full day trying to get the data from:

<table class="table table-condensed">
<tr>
<td>Monthely rent</td>
<td><strong>Fr. 1'950. </strong></td>
</tr>

<tr>
<td>Rooms(s)</td>
<td><strong>3</strong></td>
</tr>

<tr>
<td>Surface</td>
<td><strong>93m2</strong></td>

</tr>

<tr>
<td>Date of Contract</td>
<td><strong>01.04.17</strong></td>
</tr>

</table>

As you can see the data is well organized, and I am trying to get this result:

monthly rent => Fr. 1'950. 
Rooms(s) => 3
Surface => 93m2
Date of Contract => 01.04.17

I have the table contained inside a variable $table and tried to use DOM

$dom = new DOMDocument(); 
$dom->loadHTML($table);
$dom = new \DomXPath($dom);
$result = $dom->query('//table/tr');
return $result; 

But to no avail, is there any easier way to get the contents in php/regex?

解决方案

You're on the right track with DOM and Xpath. Do not use Regular Expressions to parse HTML/XML. RegEx are for matching text and often used as a part of a parser. But a parser for a format knows about it features - a RegEx does not.

You should keep you variable names a little more clean. Do not assign different types to the same variable in the same context. It only shows that the variable name might be to generic.

DOMXpath::query() allows you to use Xpath expressions, but only expression that return a node list. DOMXpath::evaluate() allows you to fetch scalar values, too.

So you can fetch the tr elements, iterate them and use additional expression to fetch the two values using the tr element as the context.

$document = new \DOMDocument(); 
$document->loadHTML($table);
$xpath = new \DOMXPath($document);

foreach ($xpath->evaluate('//table/tr') as $tr) {
  var_dump(
     $xpath->evaluate('string(td[1])', $tr),
     $xpath->evaluate('string(td[2]/strong)', $tr)
  );
}

Output:

string(13) "Monthely rent"
string(11) "Fr. 1'950. "
string(8) "Rooms(s)"
string(1) "3"
string(7) "Surface"
string(4) "93m2"
string(16) "Date of Contract"
string(8) "01.04.17"

这篇关于在php / regex中解析表内容,并得到td的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆