从字符串中提取网址 [英] Extract Url From a String

查看:92
本文介绍了从字符串中提取网址的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个网址:

url = "http://timesofindia.feedsportal.com/fy/8at2EuL0ihSIb3s7/story01.htmA"

末尾有一些不需要的字符,例如 A,TRE.我想删除它,所以 URL 将是这样的:

There are some unwanted characters like A,TRE, at the end. I want to remove this so the URL will be like this:

url = http://timesofindia.feedsportal.com/fy/8at2EuL0ihSIb3s7/story01.htm

如何删除它们?

推荐答案

如果你的 url 总是以 .htm.apsx.php 结尾代码>你可以用一个简单的正则表达式来解决它:

If your url always finish with .htm, .apsx or .php you can solve it with a simple regex:

url = url[/^(.+\.(htm|aspx|php))(:?.*)$/, 1]

测试在Rubular这里.

首先我使用 此方法 获取子字符串,类似于切片.然后是正则表达式.从左到右:

First I use this method to get a substring, works like slice. Then comes the regex. From left to right:

^                   # Start of line
  (                   # Capture everything wanted enclosed
    .+                  # 1 or more of any character
    \.                  # With a dot after it
    (htm|aspx|php)      # htm or aspx or php
  )                   # Close url asked in question
  (                   # Capture undesirable part
    :?                  # Optional
    .*                  # 0 or more any character
  )                   # Close undesirable part
$                   # End of line

这篇关于从字符串中提取网址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆