如何从PHP的HTML列表中提取结构化文本? [英] How can I extract structured text from an HTML list in PHP?
问题描述
我有这个字符串:
<ul>
<li id="1">Page 1</li>
<li id="2">Page 2
<ul>
<li id="3">Sub Page A</li>
<li id="4">Sub Page B</li>
<li id="5">Sub Page C
<ul>
<li id="6">Sub Sub Page I</li>
</ul>
</li>
</ul>
</li>
<li id="7">Page 3
<ul>
<li id="8">Sub Page D</li>
</ul>
</li>
<li id="9">Page 4</li>
</ul>
并且我想用PHP爆炸所有信息,并使之像这样:
and I want to explode every information with PHP and make it like:
----------------------------------
| ID | ORDER | PARENT | CHILDREN |
----------------------------------
| 1 | 1 | 0 | 0 |
| 2 | 2 | 0 | 3,4,5 |
| 3 | 1 | 2 | 0 |
| 4 | 2 | 2 | 0 |
| 5 | 3 | 2 | 6 |
| 6 | 1 | 5 | 0 |
| 7 | 3 | 0 | 8 |
| 8 | 1 | 7 | 0 |
| 9 | 4 | 0 | 0 |
----------------------------------
有关其他信息,此列表对我而言意味着以下含义:
For extra information, this is what this list means for me:
ID 1是第1个(第1页),有0个父母和0个孩子,
ID 1 is 1st (Page 1) and has 0 parents and 0 children,
ID 2是第二个(第2页),有0个父母和孩子ID 3、4、5,
ID 2 is 2nd (Page 2) and has 0 parents and children IDs 3,4,5,
ID 3是第1个(子页面A),并且具有父ID 2和0个孩子,
ID 3 is 1st (Sub Page A) and has parent ID 2 and 0 children,
ID 4是第二个(子页面B),并且具有父ID 2和0个孩子,
ID 4 is 2nd (Sub Page B) and has parent ID 2 and 0 children,
ID 5是第3个(子页面C),并且具有父ID 2和子ID 6,
ID 5 is 3rd (Sub Page C) and has parent ID 2 and children ID 6,
ID 6是第1个(子页面I),其父ID 5和0个孩子,
ID 6 is 1st (Sub Page I) and has parent ID 5 and 0 children,
ID 7是第3页(第3页),有0个父母和孩子ID 8,
ID 7 is 3th (Page 3) and has 0 parents and children ID 8,
ID 8是第1个(子页面I),其父ID 7和0个孩子,
ID 8 is 1st (Sub Page I) and has parent ID 7 and 0 children,
ID 9是第4页(第4页),有0个父母和0个孩子.
ID 9 is 4th (Page 4) and has 0 parents and 0 children.
如果这太难了,谁能建议使用另一种方法从此字符串中获取该信息?
If this is too tough, can anyone sugest how to get that info from this string with another method?
推荐答案
这不是字符串",而是HTML.您需要使用HTML解析器,例如 DOMDocument 或
That's not "a string", it's HTML. You need to use an HTML parser like DOMDocument or simple_html_dom.
在 http://htmlparsing.com/php.html
这篇关于如何从PHP的HTML列表中提取结构化文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!