使用PHP和cURL抓取div内容 [英] Scrape div contents using PHP and cURL
问题描述
我是cURL的新手.我一直在尝试抓取此亚马逊链接,((例如,图片,书名,作者和20本书的价格)转换为html页面.到目前为止,我已经使用下面的代码打印页面了
I'm new to cURL. I have been trying to scrape contents of this amazon link, (ie., image, book title, author and price of the 20 books) into a html page. So far I've got is print the page using the below code
<?php
function curl($url) {
$options = Array(
CURLOPT_RETURNTRANSFER => TRUE,
CURLOPT_FOLLOWLOCATION => TRUE,
CURLOPT_AUTOREFERER => TRUE,
CURLOPT_CONNECTTIMEOUT => 120,
CURLOPT_TIMEOUT => 120,
CURLOPT_MAXREDIRS => 10,
CURLOPT_URL => $url,
);
$ch = curl_init();
curl_setopt_array($ch, $options);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
?>
$url = "http://www.amazon.in/gp/bestsellers/books/1318209031/ref=zg_bs_nav_b_2_1318203031";
$results_page = curl($url);
echo $results_page;
我曾尝试使用正则表达式,但失败了;我已经连续尝试了6个小时,力所能及,真的很累,希望能在这里找到解决方案.仅仅感谢还不够解决方案,而是提前提出.:)
I have tried using regex and failed; I have tried everything possible for 6hrs straight and got really tired, hoping I will find solution here; just thanks isn't enough for the solution but tq in advance. :)
更新:找到了一个非常有用的网站(单击此处)对于像我这样的初学者(虽然不使用cURL).
UPDATE: Found a really helpful site(click here) for beginners like me(without using cURL though).
推荐答案
您确实应该使用 AWSECommerce API ,但是这是一种利用Yahoo的 YQL 服务:
You really should be using the AWSECommerce API, but here's a way to leverage Yahoo's YQL service:
<?php
$query = sprintf(
'http://query.yahooapis.com/v1/public/yql?q=%s',
urlencode('SELECT * FROM html WHERE url = "http://www.amazon.in/gp/bestsellers/books/1318209031/ref=zg_bs_nav_b_2_1318203031" AND xpath=\'//div[@class="zg_itemImmersion"]\'')
);
$xml = new SimpleXMLElement($query, null, true);
foreach ($xml->results->div as $product) {
vprintf("%s\n", array(
$product->div[1]->div[1]->a,
));
}
/*
Engineering Thermodynamics
A Textbook of Fluids Mechanics
The Design of Everyday Things
A Forest History of India
Computer Networking
The Story of Microsoft
Private Empire: ExxonMobil and Americ...
Project Management Metrics, KPIs, and...
Design and Analysis of Experiments: I...
IES - 2013: General English
Foundation of Software Testing: ISTQB...
Faster: 100 Ways to Improve your Digi...
A Textbook of Fluid Mechanics and Hyd...
Software Engineering for Embedded Sys...
Communication Skills for Engineers
Making Things Move DIY Mechanisms for...
Virtual Instrumentation Using Labview
Geometric Dimensioning and Tolerancin...
Power System Protection & Switchgear...
Computer Networks
*/
这篇关于使用PHP和cURL抓取div内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!