网页抓取在PHP [英] Web scraping in PHP

查看：193 发布时间：2017/3/6 1:07:44 php html curl html-parsing web-scraping

本文介绍了网页抓取在PHP的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在寻找一种方法，从用户在 PHP 中提供的网址中预览另一个网页， / a>。

I'm looking for a way to make a small preview of another page from a URL given by the user in PHP.

我只想检索网页的标题，图片（如网站的标志）和一些文字或说明如果它可用。有没有任何简单的方法来做这个没有任何外部库/类？感谢

I'd like to retrieve only the title of the page, an image (like the logo of the website) and a bit of text or a description if it's available. Is there any simple way to do this without any external libraries/classes? Thanks

到目前为止，我已经尝试使用DOCDocument类，加载HTML并在屏幕上显示它，但我不认为这是正确的方法

So far I've tried using the DOCDocument class, loading the HTML and displaying it on the screen, but I don't think that's the proper way to do it

推荐答案

我建议您考虑 simple_html_dom 。这将使它很容易。

I recommend you consider simple_html_dom for this. It will make it very easy.

这是一个如何拉标题和第一张图片的工作示例。

Here is a working example of how to pull the title, and first image.

<?php
require 'simple_html_dom.php';

$html = file_get_html('http://www.google.com/');
$title = $html->find('title', 0);
$image = $html->find('img', 0);

echo $title->plaintext."<br>\n";
echo $image->src;
?>

这里是第二个例子，我应该注意，在HTML上使用正则表达式不是一个好主意。

Here is a second example that will do the same without an external library. I should note that using regex on HTML is NOT a good idea.

<?php
$data = file_get_contents('http://www.google.com/');

preg_match('/<title>([^<]+)<\/title>/i', $data, $matches);
$title = $matches[1];

preg_match('/<img[^>]*src=[\'"]([^\'"]+)[\'"][^>]*>/i', $data, $matches);
$img = $matches[1];

echo $title."<br>\n";
echo $img;
?>

这篇关于网页抓取在PHP的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

网页抓取在PHP [英] Web scraping in PHP

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

网页抓取在PHP [英] Web scraping in PHP

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭