使用PHP和cURL抓取div内容 [英] Scrape div contents using PHP and cURL

查看:40
本文介绍了使用PHP和cURL抓取div内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是cURL的新手.我一直在尝试抓取此亚马逊链接,((例如,图片,书名,作者和20本书的价格)转换为html页面.到目前为止,我已经使用下面的代码打印页面了

I'm new to cURL. I have been trying to scrape contents of this amazon link, (ie., image, book title, author and price of the 20 books) into a html page. So far I've got is print the page using the below code

<?php
function curl($url) {
    $options = Array(
        CURLOPT_RETURNTRANSFER => TRUE,
        CURLOPT_FOLLOWLOCATION => TRUE,
        CURLOPT_AUTOREFERER => TRUE,
        CURLOPT_CONNECTTIMEOUT => 120,
        CURLOPT_TIMEOUT => 120,
        CURLOPT_MAXREDIRS => 10,
        CURLOPT_URL => $url,
    );

    $ch = curl_init();
    curl_setopt_array($ch, $options);
    $data = curl_exec($ch);
    curl_close($ch);
    return $data;
}
?>

$url = "http://www.amazon.in/gp/bestsellers/books/1318209031/ref=zg_bs_nav_b_2_1318203031";
$results_page = curl($url);
echo $results_page;

我曾尝试使用正则表达式,但失败了;我已经连续尝试了6个小时,力所能及,真的很累,希望能在这里找到解决方案.仅仅感谢还不够解决方案,而是提前提出.:)

I have tried using regex and failed; I have tried everything possible for 6hrs straight and got really tired, hoping I will find solution here; just thanks isn't enough for the solution but tq in advance. :)

更新:找到了一个非常有用的网站(单击此处)对于像我这样的初学者(虽然不使用cURL).

UPDATE: Found a really helpful site(click here) for beginners like me(without using cURL though).

推荐答案

您确实应该使用 AWSECommerce API ,但是这是一种利用Yahoo的 YQL 服务:

You really should be using the AWSECommerce API, but here's a way to leverage Yahoo's YQL service:

<?php
$query = sprintf(
    'http://query.yahooapis.com/v1/public/yql?q=%s',
    urlencode('SELECT * FROM html WHERE url = "http://www.amazon.in/gp/bestsellers/books/1318209031/ref=zg_bs_nav_b_2_1318203031" AND xpath=\'//div[@class="zg_itemImmersion"]\'')
);

$xml = new SimpleXMLElement($query, null, true);

foreach ($xml->results->div as $product) {
    vprintf("%s\n", array(
        $product->div[1]->div[1]->a,
    ));
}

/*
    Engineering Thermodynamics
    A Textbook of Fluids Mechanics
    The Design of Everyday Things
    A Forest History of India
    Computer Networking
    The Story of Microsoft
    Private Empire: ExxonMobil and Americ...
    Project Management Metrics, KPIs, and...
    Design and Analysis of Experiments: I...
    IES - 2013: General English
    Foundation of Software Testing: ISTQB...
    Faster: 100 Ways to Improve your Digi...
    A Textbook of Fluid Mechanics and Hyd...
    Software Engineering for Embedded Sys...
    Communication Skills for Engineers
    Making Things Move DIY Mechanisms for...
    Virtual Instrumentation Using Labview
    Geometric Dimensioning and Tolerancin...
    Power System Protection & Switchgear...
    Computer Networks
*/

这篇关于使用PHP和cURL抓取div内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆