如何使用正则表达式查找字符串中的所有YouTube视频ID? [英] How do I find all YouTube video ids in a string using a regex?

查看:107
本文介绍了如何使用正则表达式查找字符串中的所有YouTube视频ID?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文本字段,用户可以在其中编写任何内容.

I have a textfield where users can write anything.

例如:

Lorem Ipsum只是伪文本. http://www.youtube.com/watch?v=DUQi_R4SgWo 印刷和排版的 行业.洛雷姆·伊普森(Lorem Ipsum)一直是 行业标准伪文本 自1500年代以来,当一个未知数 打印机拿了一个厨房, 把它打乱成一个标本 书.它不仅幸存了五个 几个世纪以来,也跃入 电子排版,剩余 基本上没有变化. http://www.youtube.com/watch?v=A_6gNZCkajU&feature=relmfu 它在1960年代随着 Letraset床单的发行 包含Lorem Ipsum段落,以及 最近在桌面出版方面 像Aldus PageMaker这样的软件 包括Lorem Ipsum的版本.

Lorem Ipsum is simply dummy text. http://www.youtube.com/watch?v=DUQi_R4SgWo of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. http://www.youtube.com/watch?v=A_6gNZCkajU&feature=relmfu It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.

现在,我想解析它并找到所有YouTube视频URL及其ID.

Now I would like to parse it and find all YouTube video URLs and their ids.

有什么想法吗?

推荐答案

YouTube视频URL可能会以多种格式出现:

A YouTube video URL may be encountered in a variety of formats:

  • 最新的简短格式:http://youtu.be/NLqAF9hrVbY
  • iframe:http://www.youtube.com/embed/NLqAF9hrVbY
  • iframe(安全):https://www.youtube.com/embed/NLqAF9hrVbY
  • 对象参数:http://www.youtube.com/v/NLqAF9hrVbY?fs=1&hl=en_US
  • 对象嵌入:http://www.youtube.com/v/NLqAF9hrVbY?fs=1&hl=en_US
  • 观看:http://www.youtube.com/watch?v=NLqAF9hrVbY
  • 用户:http://www.youtube.com/user/Scobleizer#p/u/1/1p3vcRhsYGo
  • ytscreeningroom:http://www.youtube.com/ytscreeningroom?v=NRHVzbJVx8I
  • 任何事情/事!:http://www.youtube.com/sandalsResorts#p/c/54B8C800269D7C1B/2/PPS-8DMrAn4
  • 任何/子域/太多:http://gdata.youtube.com/feeds/api/videos/NLqAF9hrVbY
  • 更多参数:http://www.youtube.com/watch?v=spDj54kf-vY&feature=g-vrec
  • 查询中可能包含点:http://www.youtube.com/watch?v=spDj54kf-vY&feature=youtu.be
  • nocookie域:http://www.youtube-nocookie.com
  • latest short format: http://youtu.be/NLqAF9hrVbY
  • iframe: http://www.youtube.com/embed/NLqAF9hrVbY
  • iframe (secure): https://www.youtube.com/embed/NLqAF9hrVbY
  • object param: http://www.youtube.com/v/NLqAF9hrVbY?fs=1&hl=en_US
  • object embed: http://www.youtube.com/v/NLqAF9hrVbY?fs=1&hl=en_US
  • watch: http://www.youtube.com/watch?v=NLqAF9hrVbY
  • users: http://www.youtube.com/user/Scobleizer#p/u/1/1p3vcRhsYGo
  • ytscreeningroom: http://www.youtube.com/ytscreeningroom?v=NRHVzbJVx8I
  • any/thing/goes!: http://www.youtube.com/sandalsResorts#p/c/54B8C800269D7C1B/2/PPS-8DMrAn4
  • any/subdomain/too: http://gdata.youtube.com/feeds/api/videos/NLqAF9hrVbY
  • more params: http://www.youtube.com/watch?v=spDj54kf-vY&feature=g-vrec
  • query may have dot: http://www.youtube.com/watch?v=spDj54kf-vY&feature=youtu.be
  • nocookie domain: http://www.youtube-nocookie.com

这是一个带有带注释的正则表达式的PHP函数,该正则表达式与这些URL形式中的每一个匹配,并将它们转换为链接(如果它们还不是链接):

Here is a PHP function with a commented regex that matches each of these URL forms and converts them to links (if they are not links already):

// Linkify youtube URLs which are not already links.
function linkifyYouTubeURLs($text) {
    $text = preg_replace('~(?#!js YouTubeId Rev:20160125_1800)
        # Match non-linked youtube URL in the wild. (Rev:20130823)
        https?://          # Required scheme. Either http or https.
        (?:[0-9A-Z-]+\.)?  # Optional subdomain.
        (?:                # Group host alternatives.
          youtu\.be/       # Either youtu.be,
        | youtube          # or youtube.com or
          (?:-nocookie)?   # youtube-nocookie.com
          \.com            # followed by
          \S*?             # Allow anything up to VIDEO_ID,
          [^\w\s-]         # but char before ID is non-ID char.
        )                  # End host alternatives.
        ([\w-]{11})        # $1: VIDEO_ID is exactly 11 chars.
        (?=[^\w-]|$)       # Assert next char is non-ID or EOS.
        (?!                # Assert URL is not pre-linked.
          [?=&+%\w.-]*     # Allow URL (query) remainder.
          (?:              # Group pre-linked alternatives.
            [\'"][^<>]*>   # Either inside a start tag,
          | </a>           # or inside <a> element text contents.
          )                # End recognized pre-linked alts.
        )                  # End negative lookahead assertion.
        [?=&+%\w.-]*       # Consume any URL (query) remainder.
        ~ix', '<a href="http://www.youtube.com/watch?v=$1">YouTube link: $1</a>',
        $text);
    return $text;
}

; //结束$ YouTubeId.

; // End $YouTubeId.

这是具有完全相同的正则表达式的JavaScript版本(已删除注释):

And here is a JavaScript version with the exact same regex (with comments removed):

// Linkify youtube URLs which are not already links.
function linkifyYouTubeURLs(text) {
    var re = /https?:\/\/(?:[0-9A-Z-]+\.)?(?:youtu\.be\/|youtube(?:-nocookie)?\.com\S*?[^\w\s-])([\w-]{11})(?=[^\w-]|$)(?![?=&+%\w.-]*(?:['"][^<>]*>|<\/a>))[?=&+%\w.-]*/ig;
    return text.replace(re,
        '<a href="http://www.youtube.com/watch?v=$1">YouTube link: $1</a>');
}

注释:

  • URL的VIDEO_ID部分被捕获为一个唯一捕获组:$1.
  • 如果您知道您的文本不包含任何预链接的URL,则可以安全地删除测试此情况的否定超前断言(断言以注释开头:未预定义链接URL ." )这会在某种程度上加快正则表达式的速度.
  • 可以将替换字符串修改为适合的字符串.上面提供的链接仅创建了指向通用"http://www.youtube.com/watch?v=VIDEO_ID"样式URL的链接,并将链接文本设置为:"YouTube link: VIDEO_ID".
  • The VIDEO_ID portion of the URL is captured in the one and only capture group: $1.
  • If you know that your text does not contain any pre-linked URLs, you can safely remove the negative lookahead assertion which tests for this condition (The assertion beginning with the comment: "Assert URL is not pre-linked.") This will speed up the regex somewhat.
  • The replace string can be modified to suit. The one provided above simply creates a link to the generic "http://www.youtube.com/watch?v=VIDEO_ID" style URL and sets the link text to: "YouTube link: VIDEO_ID".

编辑2011-07-05::在ID字符类中添加了-连字符

Edit 2011-07-05: Added - hyphen to ID char class

编辑2011-07-17::修复了正则表达式以使用YouTube ID后占用URL的其余任何部分(例如 query )的问题.添加了'i' ignore-case 修饰符.将函数重命名为camelCase.改进的预链接超前测试.

Edit 2011-07-17: Fixed regex to consume any remaining part (e.g. query) of URL following YouTube ID. Added 'i' ignore-case modifier. Renamed function to camelCase. Improved pre-linked lookahead test.

编辑2011-07-27::添加了新的YouTube URL的用户"和"ytscreeningroom"格式.

Edit 2011-07-27: Added new "user" and "ytscreeningroom" formats of YouTube URLs.

编辑2011-08-02:进行了简化/通用化处理,以处理新的任意/全部/正常" YouTube URL.

Edit 2011-08-02: Simplified/generalized to handle new "any/thing/goes" YouTube URLs.

编辑2011-08-25:的一些修改:

  • 添加了linkifyYouTubeURLs()函数的Javascript版本.
  • 先前版本的方案(HTTP协议)部分是可选的,因此可以匹配无效的URL.使计划成为必需的一部分.
  • 以前的版本在VIDEO_ID周围使用了\b单词边界锚.但是,如果VIDEO_ID以-破折号开头或结尾,则此操作将无效.已修复,可以处理这种情况.
  • 更改了VIDEO_ID表达式,使其长度必须恰好为11个字符.
  • 如果先前版本的VIDEO_ID后面有查询字符串,则先前版本无法排除预链接的URL.改进了否定超前断言来解决此问题.
  • 在字符类匹配查询字符串中添加了+%.
  • 将PHP版本正则表达式定界符从:%更改为一个:~.
  • 在注释"部分添加了一些方便的注释.
  • Added a Javascript version of: linkifyYouTubeURLs() function.
  • Previous version had the scheme (HTTP protocol) part optional and thus would match invalid URLs. Made the scheme part required.
  • Previous version used the \b word boundary anchor around the VIDEO_ID. However, this will not work if the VIDEO_ID begins or ends with a - dash. Fixed so that it handles this condition.
  • Changed the VIDEO_ID expression so that it must be exactly 11 characters long.
  • The previous version failed to exclude pre-linked URLs if they had a query string following the VIDEO_ID. Improved the negative lookahead assertion to fix this.
  • Added + and % to character class matching query string.
  • Changed PHP version regex delimiter from: % to a: ~.
  • Added a "Notes" section with some handy notes.

编辑2011-10-12::YouTube URL主机部分现在可以具有任何子域(不仅仅是www.).

Edit 2011-10-12: YouTube URL host part may now have any subdomain (not just www.).

编辑2012-05-01:现在,消费URL"部分可能允许使用-".

Edit 2012-05-01: The consume URL section may now allow for '-'.

编辑2013-08-23::添加了@Mei提供的其他格式. (查询部分可能有一个.点.

Edit 2013-08-23: Added additional format provided by @Mei. (The query part may have a . dot.

编辑2013-11-30::添加了@CRONUS提供的其他格式:youtube-nocookie.com.

Edit 2013-11-30: Added additional format provided by @CRONUS: youtube-nocookie.com.

编辑2016-01-25::修复了正则表达式以处理CRONUS提供的错误情况.

Edit 2016-01-25: Fixed regex to handle error case provided by CRONUS.

这篇关于如何使用正则表达式查找字符串中的所有YouTube视频ID?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆