使用CURL从外部网页选择特定的div [英] Selecting a specific div from a extern webpage using CURL
问题描述
您好,任何人都可以帮助我如何从网页的内容中选择特定的div。
Hi can anyone help me how to select a specific div from the content of a webpage.
我想说,我想要获得div id =wrapper_content
来自网页 http://www.test.com/page3.php
。
Let's say i want to get the div with id="wrapper_content"
from webpage http://www.test.com/page3.php
.
我当前的代码看起来像这样:(不工作)
My current code looks something like this: (not working)
//REG EXP.
$s_searchFor = '@^/.dont know what to put here..@ui';
//CURL
$ch = curl_init();
$timeout = 5; // set to zero for no timeout
curl_setopt ($ch, CURLOPT_URL, 'http://www.test.com/page3.php');
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
if(!preg_match($s_searchFor, $ch))
{
$file_contents = curl_exec($ch);
}
curl_close($ch);
// display file
echo $file_contents;
所以我想知道如何使用reg表达式来查找一个特定的div, 取消设置网页的其余部分,使 $ file_content
只包含div。
So i'd like to know how i can use reg expressions to find a specific div and how to unset the rest of the webpage so that $file_content
only contains the div.
推荐答案
HTML不是常规,因此您不应使用正则表达式。相反,我会推荐一个HTML解析器,如简单HTML DOM 或 DOM
HTML isn't regular, so you shouldn't use regex. Instead I would recommend a HTML Parser such as Simple HTML DOM or DOM
如果您打算使用Simple HTML DOM如下所示:
If you were going to use Simple HTML DOM you would do something like the following:
$html = str_get_html($file_contents);
$elem = $html->find('div[id=wrapper_content]', 0);
即使您使用正则表达式,代码仍然无法正常工作。您需要获取页面的内容才能使用正则表达式。
Even if you used regex your code still wouldn't work correctly. You need to get the contents of the page before you can use regex.
//wrong
if(!preg_match($s_searchFor, $ch)){
$file_contents = curl_exec($ch);
}
//right
$file_contents = curl_exec($ch); //get the page contents
preg_match($s_searchFor, $file_contents, $matches); //match the element
$file_contents = $matches[0]; //set the file_contents var to the matched elements
这篇关于使用CURL从外部网页选择特定的div的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!