PHP:使用 xpath 从 html 表中提取多个数据 [英] PHP: extract multiple data from html table with xpath
问题描述
我必须从 HTML 页面读取信息并将其传输到多个数组以进行进一步处理.我使用 xpath 的方法并不成功,以至于我无法访问我想要的数据.
I have to read information from an HTML page and transfer it to multiple arrays for further processing. My approaches with xpath have not been so successful that I had access to the data I wanted.
正文部分包含一个具有不同行数的表格,如下例所示:
The body section contains a table with different numbers of lines, as in the following example:
...
</tr>
<tr>
<td class="name" title="43PUS6551" datalabel="43PUS6551">
<span>43PUS6551</span>
</td>
<td datalabel="Internetnutzung" class="usage">eingeschränkt</td>
<td datalabel="Onlinezeit heute" class="bar time">
<span title="03:20 von 14:00 Stunden">
<span style="width:23.81%;"/>
</span>
</td>
<td datalabel="Zugangsprofil" class="profile">
<select name="profile:user6418">
<option value="filtprof1">Standard</option>
<option value="filtprof3">Unbeschränkt</option>
<option value="filtprof4">Gesperrt</option>
<option value="filtprof5334">Network</option>
<option value="filtprof5333" selected="selected">Stream</option>
<option value="filtprof4526">X-Box_One</option>
</select>
</td>
<td datalabel="" class="btncolumn">
<button type="submit" name="edit" id="uiEdit:user6418" value="filtprof5333" class="icon edit" title="Bearbeiten"/>
</td>
</tr>
<tr>
...
我需要一个数组,它包含第 2 行的 title
属性作为键,并从 获取属性
name
部分(第 12 行)作为值.
I need one array, which contains the title
attribute from line 2 as key and gets the attribute name
from the <select>
section (line 12) as value.
$devices = [
'43PUS6551' => 'profile:user6418'
…
]
我从这个开始,我能够收到这个数组的密钥:
I started with this and I´m able to receive the keys for this array:
$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->loadHTML($response);
$xmlSite = simplexml_import_dom($dom);
$devices = [];
$rows = $xmlSite->xpath('//tr/td[@title=@datalabel]');
foreach ($rows as $row) {
$key = utf8_decode((string)$row->attributes()['title']);
但现在我正在努力获得指定值.我尝试了不同的方法:向上使用 parent
并返回到节点 或使用
following-sibling
.但是我太愚蠢了,无法正确使用 xpath 合成器.
But now I'm struggling to get the designated value. I tried different ways: upwards with parent
and back down to the node <select>
or with following-sibling
. But I'm too stupid to use the xpath synthas properly.
如果我做到了,我需要一个数组,其中包含 <select>
部分(第 12 行)中的属性 name
作为键和属性 来自
也被 部分的 value
selected
作为值.
If I accomplished that, I need an array which contains the attribute name
from the <select>
section (line 12) as key and the attribute value
from the <option>
section which is also selcted
as value.
$filters = [
'profile:user6418' => 'filtprof5333'
…
]
最后,我需要一个包含来自 部分的数据的数组(出现在每一行中):
Finally, I need one array containing the data from the <option>
section (appears in every row):
$profiles = [
'Standard' => 'filtprof1',
'Unbeschränkt' => 'filtprof3,
…
'X-Box-One' => 'filtprof4526',
]
对正确的 xpath-hints 的任何帮助将不胜感激
Any help for propper xpath-hints will be appreciated
推荐答案
以下是我想要的功能:
// convert html response into SimpleXML
$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->loadHTML($response);
$xmlSite = simplexml_import_dom($dom);
// initialize processing values
$devices = [];
$options = [];
$filters = [];
// parse SimpleXML with xpath to get current data
$rows = $xmlSite->xpath('//tr/td[@title=@datalabel]'); // these are the rows with assignments of devices to filters
foreach ($rows as $row) {
$key = utf8_decode((string)$row->attributes()['title']); // name (label) of the devices
if (preg_match('/Alle /', $key)) { // skip standard settings
continue;
}
$select = $row->xpath('parent::*//select[@name]'); // find the line with the currently assigned ID for the device
$value = (string)$select[0]->attributes()['name']; // get the current ID ('profile:user*' or 'profile:landevice*')
$devices[$key] = $value;
$options = $select[0]->xpath('option'); // the defined filters (dropdown in each row)
foreach ($options as $option) {
$profiles[utf8_decode((string)$option)] = (string)$option->attributes()['value']; // get label and ID of filters
if (isset($option->attributes()['selected'])) { // determine the filter currently assigned to the device
$filters[$value] = (string)$option->attributes()['value']; // get device (ID) and filter (ID)
}
}
}
这篇关于PHP:使用 xpath 从 html 表中提取多个数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!