使用 PHP 和 RegEx 从站点的源代码中获取所有选项值 [英] Using PHP and RegEx to fetch all option values from a site's source code
问题描述
我正在学习 RegEx 和网站抓取,并且有以下问题,如果得到回答,应该会显着加快我的学习过程.
I'm learning RegEx and site crawling, and have the following question which, if answered, should speed my learning process up significantly.
我从一个 htmlencoded 格式的网站中获取了表单元素.也就是说,我有所有标签完好无损的 $content 字符串,如下所示:
I have fetched the form element from a web site in htmlencoded format. That is to say, I have the $content string with all the tags intact, like so:
$content = "<form name="sth" action="">
<select name="city">
<option value="one">One town</option>
<option value="two">Another town</option>
<option value="three">Yet Another town</option>
...
</select>
</form>
我想以这种方式获取网站上的所有选项:
I would like to fetch all the options on the site, in this manner:
array("One Town" => "one", "Another Town" => "two", "Yet Another Town" => "three" ...);
现在,我知道这可以通过操作字符串、切片、切块、在每个字符串中搜索子字符串等来轻松完成,直到我拥有所需的一切.但我确信必须有一种更简单的方法来使用正则表达式,它应该立即从给定的字符串中获取所有结果.谁能帮我找到一条捷径?我搜索了网络上最好的正则表达式网站,但无济于事.
Now, I know this can easily be done by manipulating the string, slicing it an dicing it, searching for substrings within each string, and so on, until I have everything I need. But I'm certain there must be a simpler way of doing it with regex, which should fetch all the results from a given string instantly. Can anyone help me find a shortcut for this? I have searched the web's finest regex sites, but to no avail.
非常感谢
推荐答案
参见 解析 HTML 的最佳方法.在下面找到 DOM 解决方案:
See Best methods to parse HTML. Find the DOM solution below:
$dom = new DOMDocument;
$dom->loadHTMLFile('http://example.com');
$options = array();
foreach($dom->getElementsByTagName('option') as $option) {
$options[$option->nodeValue] = $option->getAttribute('value');
}
这可以使用正则表达式完成 也一样,但我觉得用正则表达式编写可靠的 HTML 解析器是不切实际的,因为 PHP 有很多现成的本机和第 3 方解析器.
This can be done with Regex too, but I dont find it practical to write a reliable HTML parser with Regex when there is plenty of native and 3rd party parsers readily available for PHP.
这篇关于使用 PHP 和 RegEx 从站点的源代码中获取所有选项值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!