使用php获取字符串中的所有网址 [英] Get all urls in a string with php

查看:129
本文介绍了使用php获取字符串中的所有网址的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试找出一种从文本字符串中获取 URL 数组的方法.文本的格式有点像这样:

I'm trying to figure out a way to get an array of URLs from a string of text. The text will be somewhat formatted like this:

这里有一些随机文本

http://techcrunch.com/2012/7月20日/Kickstarter的-flashr-希望对化妆的-iPhone的屏幕菜单-A-大规模通知光/GRCC = 88888Z0ZwdgtZ0Z0Z0Z0Z0&安培; grcc2 = 835637c33f965e6cdd34c87219233711〜1342828462249〜fca4fa8af1286d8a77f26033fdeed202〜510f37324b14c50a5e9121f955fac3fa〜1342747216490〜0〜0〜0~0~0~0~0~0~7~3~

http://techcrunch.com/2012/07/20/last-day-to-purchase-extra-early-bird-tickets-for-disrupt-sf/

显然,这些链接可以是任何东西(并且可以有很多链接,这些只是我现在正在测试的链接.如果我使用像我的正则表达式这样的简单 URL 工作正常.

Obviously, those links can be anything (and there can be many links, those are just the ones I'm testing with now. If I use a simple URL like my regex works fine.

我正在使用:

preg_match_all('((https?|ftp|gopher|telnet|file|notes|ms-help):'.
    '((//)|(\\\\))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)',
    $bodyMessage, $matches, PREG_PATTERN_ORDER);

当我执行 print_r( $matches); 时,我得到的结果是:

When I do a print_r( $matches); the result I get is:

Array ( [0] => Array (
    [0] => http://techcrunch.com/2012/07/20/kickstarter-flashr-wants-to-make-the-iphon=
    [1] => http://techcrunch.com/2012/07/20/last-day-to-purchase-extra-early-bird-tick= 
    [2] => http://techcrunch.co=
    [3] => http://techcrunch.com/2012/07/20/kickstarter-flashr-wants-to-make-the-ip= 
    [4] => http://techcrunch.com/2012/07/20/last-day-to-purc=
    [5] => http://tec=
)
...

该数组中的这些项目都不是来自上述链接的完整链接.

None of those items in that array are full links from the links above.

有人知道获得我需要的东西的好方法吗?我找到了一堆正则表达式来获取 PHP 链接,但没有一个有效.

Anyone know of a good way to get what I need? I've found a bunch of regex stuff to get links for PHP, but none of it works.

谢谢!

好的,我从电子邮件中提取这些链接.该脚本解析电子邮件,获取邮件正文,然后尝试从中获取链接.在调查电子邮件后,似乎出于某种原因在 url 中间添加了一个空格.这是我的 PHP 脚本所看到的正文消息的输出.

Ok, so i'm pulling these links from an e-mail. The script parses the email, grabs the body of the message, and then tries to grab the links from that. After investigating the email, it appears as if it is for some reason adding a space in the middle of the url. Here is the output of the body message as seen by my PHP script.

 --00248c711bb99ca36d04c54ba5c6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable http://techcrunch.com/2012/07/20/kickstarter-flashr-wants-to-make-the-iphon= es-bezel-a-massive-notification-light/?grcc=3D88888Z0ZwdgtZ0Z0Z0Z0Z0&grcc2= =3D835637c33f965e6cdd34c87219233711~1342828462249~fca4fa8af1286d8a77f26033f= deed202~510f37324b14c50a5e9121f955fac3fa~1342747216490~0~0~0~0~0~0~0~0~7~3~ http://techcrunch.com/2012/07/20/last-day-to-purchase-extra-early-bird-tick= ets-for-disrupt-sf/ --00248c711bb99ca36d04c54ba5c6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable 

关于如何使它不破坏 URL 的任何建议?

Any suggestions on how to make it not break the URLS?

编辑 2

根据 Laurnet 的建议,我运行了以下代码:

As per Laurnet's suggestion, I ran this code:

 $bodyMessage = str_replace("= ", "",$bodyMessage);

但是当我回显出来时,它似乎不想替换="

However when I echo that out, it doesn't seem to want to replace "= "

 --00248c711bb99ca36d04c54ba5c6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable http://techcrunch.com/2012/07/20/kickstarter-flashr-wants-to-make-the-iphon= es-bezel-a-massive-notification-light/?grcc=3D88888Z0ZwdgtZ0Z0Z0Z0Z0&grcc2= =3D835637c33f965e6cdd34c87219233711~1342828462249~fca4fa8af1286d8a77f26033f= deed202~510f37324b14c50a5e9121f955fac3fa~1342747216490~0~0~0~0~0~0~0~0~7~3~ http://techcrunch.com/2012/07/20/last-day-to-purchase-extra-early-bird-tick= ets-for-disrupt-sf/ --00248c711bb99ca36d04c54ba5c6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable 

推荐答案

    /**
     *
     * @get URLs from string (string maybe a url)
     *
     * @param string $string

     * @return array
     *
     */
    function getUrls($string) {
        $regex = '/https?\:\/\/[^\" ]+/i';
        preg_match_all($regex, $string, $matches);
        //return (array_reverse($matches[0]));
        return ($matches[0]);
}

这篇关于使用php获取字符串中的所有网址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆