如何从元标记中获取内容的价值? [英] How to get value of content from meta tag?

查看:69
本文介绍了如何从元标记中获取内容的价值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述





我正在努力从元标记中获取值。到目前为止,我已经取得了成功,但我已经获得了如下所示的元标记:



I'm working on getting values from meta tags. So far I've gotten success but stuck at a point where i'm getting meta tag like below:

<meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image">





通过此我无法提取其中的url字符串元标记的内容属性。



我尝试过:





through this i'm not able to extract url string which is in the content property of meta tag.

What I have tried:

Regex meta = new Regex(@"<meta\s*(?:(?:\b(\w|-)+\b\s*(?:=\s*(?:""[^""]*""|'" +
                          @"[^']*'|[^""'<> ]+)\s*)?)*)/?\s*>");

WebClient web = new WebClient();
					web.UseDefaultCredentials = true;
					string page = web.DownloadString(url);


                    WebClient client = new WebClient();

                    // Add a user agent header in case the 
                    // requested URI contains a query.

                    client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");

                    Stream data = client.OpenRead(url);
                    StreamReader reader = new StreamReader(data);
                    string s = reader.ReadToEnd();
                    //Console.WriteLine(s);
                    data.Close();
                    reader.Close();



                    MatchCollection mc = meta.Matches(s);
                    int mIdx = 0;
                    foreach (Match m in mc)
                    {
                        for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
                        {
                            
                            metadata.Add(m.Groups[gIdx].Value);
                        }
                        mIdx++;
                    }







任何解决方案?

推荐答案

使用RegEx调试器查看匹配失败的位置

Debuggex:在线视觉正则表达式测试仪。 JavaScript,Python和PCRE。 [ ^ ]

粘贴您的RegEx。

粘贴您的数据以匹配。

使用光标查看失败的位置。

当你有一个有效的RegEx,使用顶部的Code Snipset按钮。



你会发现问题不是你想的那样。



perlre - perldoc.perl.org [ ^ ]



[更新]

Nota:有超过1种RegEx方言,JavaScript regEx不是C#RegEx,区别在于详细信息。

找到在C#中使用的方言并找出差异。

By JavaScript和C#字符串不能以相同的方式处理特殊字符的方式。
Use a RegEx debugger to see where the match fail
Debuggex: Online visual regex tester. JavaScript, Python, and PCRE.[^]
Paste your RegEx.
Paste your data to match.
Use the cursor to see where it fail.
When you have a valid RegEx, use Code Snipset button on top.

You will see the problem is not what you think.

perlre - perldoc.perl.org[^]

[Update]
Nota: There is more than 1 RegEx dialect, JavaScript regEx is not C# RegEx, difference is in details.
Find which dialect is used in C# and find differences.
By the way JavaScript and C# strings do not handle special chars the same way.


您可以使用HTML解析器而不是使用Regex。我首先推荐HTML Agility Pack: Html Agility Pack - Home [ ^ ]。



-SA
Instead of using Regex, you can use an HTML parser. I would recommend HTML Agility Pack, first of all: Html Agility Pack — Home[^].

—SA


这篇关于如何从元标记中获取内容的价值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆