使用 PHP 和 RegEx 从站点的源代码中获取所有选项值 [英] Using PHP and RegEx to fetch all option values from a site's source code

查看:36
本文介绍了使用 PHP 和 RegEx 从站点的源代码中获取所有选项值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习 RegEx 和网站抓取,并且有以下问题,如果得到回答,应该会显着加快我的学习过程.

I'm learning RegEx and site crawling, and have the following question which, if answered, should speed my learning process up significantly.

我从一个 htmlencoded 格式的网站中获取了表单元素.也就是说,我有所有标签完好无损的 $content 字符串,如下所示:

I have fetched the form element from a web site in htmlencoded format. That is to say, I have the $content string with all the tags intact, like so:

$content = "<form name="sth" action="">
<select name="city">
<option value="one">One town</option>
<option value="two">Another town</option>
<option value="three">Yet Another town</option>
...
</select>
</form>

我想以这种方式获取网站上的所有选项:

I would like to fetch all the options on the site, in this manner:

array("One Town" => "one", "Another Town" => "two", "Yet Another Town" => "three" ...);

现在,我知道这可以通过操作字符串、切片、切块、在每个字符串中搜索子字符串等来轻松完成,直到我拥有所需的一切.但我确信必须有一种更简单的方法来使用正则表达式,它应该立即从给定的字符串中获取所有结果.谁能帮我找到一条捷径?我搜索了网络上最好的正则表达式网站,但无济于事.

Now, I know this can easily be done by manipulating the string, slicing it an dicing it, searching for substrings within each string, and so on, until I have everything I need. But I'm certain there must be a simpler way of doing it with regex, which should fetch all the results from a given string instantly. Can anyone help me find a shortcut for this? I have searched the web's finest regex sites, but to no avail.

非常感谢

推荐答案

参见 解析 HTML 的最佳方法.在下面找到 DOM 解决方案:

See Best methods to parse HTML. Find the DOM solution below:

$dom = new DOMDocument;
$dom->loadHTMLFile('http://example.com');
$options = array();
foreach($dom->getElementsByTagName('option') as $option) {
    $options[$option->nodeValue] = $option->getAttribute('value');
}

可以使用正则表达式完成 也一样,但我觉得用正则表达式编写可靠的 HTML 解析器是不切实际的,因为 PHP 有很多现成的本机和第 3 方解析器.

This can be done with Regex too, but I dont find it practical to write a reliable HTML parser with Regex when there is plenty of native and 3rd party parsers readily available for PHP.

这篇关于使用 PHP 和 RegEx 从站点的源代码中获取所有选项值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆