正则表达式从所有元标记中拉出所有属性 [英] regex to pull all attributes out of all meta tags

查看：95 发布时间：2020/5/9 2:32:21 php regex preg-match-all meta-tags

本文介绍了正则表达式从所有元标记中拉出所有属性的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试从HTML页面中提取元标记，以比较两个页面(实时页面和开发页面)，以查看在站点重新设计/重构后，它们的SEO是否相同.我需要比较标题，元标记(描述，opengraph等)，h1，我们的分析(Omniture)和我们的广告标记(doubleclick)都是相同的.

I'm trying to pull meta tags out of a html page, to compare two pages (live and dev) to see if they're SEO is the same after a site redesign/refactor. I need to compare title, meta tags (description, opengraph etc.), h1's, our analytics (Omniture), and our ad tags (doubleclick) are all the same.

我的问题是获取元标记 http://php.net/manual/zh/function.get-meta -tags.php 仅当它们具有name =属性时才有效，与在cricava dot com上的mariano"的解决方案相同.

My problem is getting meta tags http://php.net/manual/en/function.get-meta-tags.php only works if they have a name= attribute, same with "mariano at cricava dot com"'s solution.

我不想将其限制为具有某些属性，我可以假设我们所有的元标记都具有name =或property =或http-equiv =并适当地更改了正则表达式，但不能完全确定因为这是一个庞大的网站，并且标签中可能包含任何乱七八糟的东西(因此该工具可以检查这些东西！)，并希望使其保持尽可能的动态.

I don't want to restrict it to having certain attributes, I could make the assumption that all our meta tags have either a name=, or property= or http-equiv= and change the regex appropriately but cannot be entirely sure as it's a massive website and any random crap could be in the tags (hence this tool is to check this stuff!) and would like to leave it as dynamic as possible.

我有

$page = @file_get_contents('http://.../');
preg_match_all('#<meta(?:\s+?([^\=]+)\=\"(.+?)\")+?\s*?/?>#sui', $page, $matches, PREG_SET_ORDER)

但是子模式彼此覆盖，因此这只会拉出最后一个attribute-name = attribute-value对

but the subpatterns override each other, so this only pulls out the last attribute-name=attribute-value pair

Array
(
    [0] => Array
        (
            [0] => <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
            [1] => content
            [2] => text/html; charset=UTF-8
        )

    [1] => Array
        (
            [0] => <meta name="description" content="some description" />
            [1] => content
            [2] => some description
        )

    [2] => Array
        (
            [0] => <meta property="og:type" content="website" />
            [1] => content
            [2] => website
        )
...

我需要所有meta标签的所有属性.我可以分两步进行操作，提取<meta ([^>]*)>的内容，然后对结果进行第二个正则表达式，但是使用regex的功能似乎没有必要?

I need all the attributes for all the meta tags. I could do this in two steps, pulling the contents of <meta ([^>]*)> then doing a second regular expression on the results, but that seems unnecessary with the power of regex?

正则表达式从所有元标记中拉出所有属性 [英] regex to pull all attributes out of all meta tags

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

正则表达式从所有元标记中拉出所有属性 [英] regex to pull all attributes out of all meta tags

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭