如何通过PHP获取网页的Open Graph Protocol? [英] How to get Open Graph Protocol of a webpage by php?

查看:84
本文介绍了如何通过PHP获取网页的Open Graph Protocol?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

PHP有一个简单的命令来获取网页的元标记(get_meta_tags),但这只适用于具有名称属性的元标记。然而,现在开放图协议变得越来越流行。从网页获取opg的最简单方法是什么?例如:

 < meta property =og:urlcontent => 
< meta property =og:titlecontent =>
< meta property =og:descriptioncontent =>
< meta property =og:typecontent =>

我看到的基本方法是通过cURL获取页面并使用正则表达式解析页面。任何想法?

解决方案

从HTML解析数据时,你真的不应该使用正则表达式。查看 DOMXPath查询功能



现在,实际的代码可以是:
$ b

更好的XPath查询由Stefan Gehrig,所以代码可以缩短为:

  libxml_use_internal_errors(true); //是的,如果你担心使用@和警告
$ doc = new DomDocument();
$ doc-> loadHTML($ html);
$ xpath = new DOMXPath($ doc);
$ query ='// * / meta [starts-with(@property,\'og:\')]';
$ metas = $ xpath-> query($ query);
$ rmetas = array();
foreach($ metas as $ meta){
$ property = $ meta-> getAttribute('property');
$ content = $ meta-> getAttribute('content');
$ rmetas [$ property] = $ content;
}
var_dump($ rmetas);

而不是:

  $ doc = new DomDocument(); 
@ $ doc-> loadHTML($ html);
$ xpath = new DOMXPath($ doc);
$ query ='// * / meta';
$ metas = $ xpath-> query($ query);
$ rmetas = array();
foreach($ metas as $ meta){
$ property = $ meta-> getAttribute('property');
$ content = $ meta-> getAttribute('content');
if(!empty($ property)&& preg_match('#^ og:#',$ property)){
$ rmetas [$ property] = $ content;
}
}
var_dump($ rmetas);


PHP has a simple command to get meta tags of a webpage (get_meta_tags), but this only works for meta tags with name attributes. However, Open Graph Protocol is becoming more and more popular these days. What is the easiest way to get the values of opg from a webpage. For example:

<meta property="og:url" content=""> 
<meta property="og:title" content=""> 
<meta property="og:description" content=""> 
<meta property="og:type" content=""> 

The basic way I see is to get the page via cURL and parse it with regex. Any idea?

解决方案

When parsing data from HTML, you really shouldn't use regex. Take a look at the DOMXPath Query function.

Now, the actual code could be :

[EDIT] A better query for XPath was given by Stefan Gehrig, so the code can be shortened to :

libxml_use_internal_errors(true); // Yeah if you are so worried about using @ with warnings
$doc = new DomDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$query = '//*/meta[starts-with(@property, \'og:\')]';
$metas = $xpath->query($query);
$rmetas = array();
foreach ($metas as $meta) {
    $property = $meta->getAttribute('property');
    $content = $meta->getAttribute('content');
    $rmetas[$property] = $content;
}
var_dump($rmetas);

Instead of :

$doc = new DomDocument();
@$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$query = '//*/meta';
$metas = $xpath->query($query);
$rmetas = array();
foreach ($metas as $meta) {
    $property = $meta->getAttribute('property');
    $content = $meta->getAttribute('content');
    if(!empty($property) && preg_match('#^og:#', $property)) {
        $rmetas[$property] = $content;
    }
}
var_dump($rmetas);

这篇关于如何通过PHP获取网页的Open Graph Protocol?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆