在代码中提取URL [英] extracting url in the code

查看:81
本文介绍了在代码中提取URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


我有以下读取网页的代码

Hi
I have following code that reads a webpage

 using (Stream stream = request.GetResponse().GetResponseStream())
{
  StreamReader sr = new StreamReader(stream);
  htmlpage= sr.ReadToEnd();
  sr.Close();
}


一旦获得该网页,我将尝试提取我的网站网址,以确保它们被正确转发.
我的问题是,当我获得url时,有些结果很好,而有些在URL的开头和结尾都有多余的代码,例如

(Javascrip:xyw(''http://www.mysite.com/xyzpage.html'')
我试图摆脱url开头和结尾的所有内容,所以我最终得到了
http://www.mysite.com/xyzpage.html
我尝试了以下操作,但这根本不起作用


once i get the webpage I am trying to get my website urls extracted to make sure they are correctly being forwarded.
My problem is when i get url out, some come out fine while some have extra code infront and end of the url for example

(Javascrip:xyw(''http://www.mysite.com/xyzpage.html'')
I am trying to get rid of anything infront of and at the end of url so i end up with
http://www.mysite.com/xyzpage.html
I tried following, which doesnt work at all

string value = Regex.Match(str, @"\((\w+)\)").Groups[1].Value);


任何想法,因为我不擅长如何写该正则表达式.


any idea how to write that regex as I am not good at it.

推荐答案

如果您知道它们必须以http://开头,为什么不做这部分呢您需要的匹配项中的哪一个?您的正则表达式现在非常模糊,只是匹配引号之间的所有内容"
If you know that they must start with http://, why not make that part of your required match ? Your regex now is incredibly vague, it''s just ''match everything between the quotes''


这篇关于在代码中提取URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆