推特主题标签中允许使用哪些字符? [英] What characters are allowed in twitter hashtags?

查看:35
本文介绍了推特主题标签中允许使用哪些字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在开发包含 twitter 客户端的 iOS 应用程序时,我必须允许用户生成的主题标签(可以在应用程序内的其他地方创建,而不仅仅是在推文正文中创建).

In developing an iOS app containing a twitter client, I must allow for user generated hashtags (which may be created elsewhere within the app, not just in the tweet body).

我想确保任何此类主题标签对 twitter 都是有效的,因此我想对输入的值进行错误检查以查找无效字符.请注意,用户可能来自非英语国家.

I would like to ensure any such hashtags are valid for twitter, so I would like to error check the entered value for invalid characters. Bear in mind that users may be from non-English speaking countries.

我知道通常的限制,例如不以数字开头的主题标签,并且没有特殊的标点符号,但我想知道是否有技术上允许在主题标签中的所有附加字符的已知列表(即国际字符).

I am aware of the usual limitations, such as not beginning a hashtag with a number, and no special punctuation characters, but I was wondering if there is a known list of all additional characters that are technically allowed within hashtags (i.e. international characters).

推荐答案

Karl,正如您正确指出的那样,任何语言中的任何词都可以是有效的 twitter 主题标签(只要它满足一些基本标准).因此,您要求的是有效国际单词字符的列表.我敢肯定有人已经在某处编制了这样的列表,但使用它并不是实现您最初目标的最有效方法:确保给定的主题标签对 twitter 有效.

Karl, as you've rightly pointed out, any word in any language can be a valid twitter hashtag (as long as it meets a number of basic criteria). As such what you are asking for is a list of valid international word characters. I'm sure someone has compiled such a list somewhere, but using it would not be the most efficient approach to reaching what appears to be your initial goal: ensuring that a given hashtag is valid for twitter.

我相信,您正在寻找的是一个正则表达式,它可以匹配Unicode 范围内的所有单词字符.这样的表达式不会依赖于您的语言环境,并且会匹配现代排版中可以作为单词的一部分出现的所有字符.

I believe, what you are looking for is a regular expression that can match all word characters within a Unicode range. Such an expression would not be dependant on your locale and would match all characters in the modern typography that can appear as part of a word.

您没有指定使用什么语言编写应用程序,因此我无法帮助您进行特定于语言的实现.但是,基本方法如下:

You didn't specify what language you are writing your app in, so I can't help you with a language specific implementation. However, the basic approach would be as follows:

  1. 检查是否有任何括号表达式或字符类已经支持您的语言中的 Unicode 字符范围.如果是,则使用它们.

  1. Check if any of the bracket expressions or character classes already support Unicode character ranges in your language. If yes, then use them.

检查是否有正则表达式修饰符可以为您的语言启用Unicode 字符范围支持.

Check if there is regex modifier that can enable Unicode character range support for your language.

大多数现代语言都以非常相似的方式实现正则表达式,其中很多都大量借鉴了 Perl,所以我希望以下两个示例能让您走上正轨:

Most modern languages implement regular expressions in a fairly similar way and a lot of them borrow heavily from Perl, so I hope the following two example will put you on the right track:

Perl:

使用 POSIX 括号表达式(例如:[[:alpha:]][[:allnum:]][[:digit:]] 等),因为与字符类(例如:\w)相比,它们可以让您更好地控制要匹配的字符.

Use POSIX bracket expressions (eg: [[:alpha:]], [[:allnum:]], [[:digit:]], etc) as they give you greater control over the characters you want to match, compared to character classes (eg: \w).

使用 /u 修饰符在模式匹配时启用 Unicode 支持.在这个修饰符下,ASCII 平台有效地变成了 Unicode 平台;因此,例如,\w 将匹配 Unicode 中超过 100,000 个单词字符中的任何一个.

Use /u modifier to enable Unicode support when pattern matching. Under this modifier, the ASCII platform effectively becomes a Unicode platform; and hence, for example, \w will match any of the more than 100,000 word characters in Unicode.

有关详细信息,请参阅 Perl 文档:

See Perl documentation for more info:

红宝石:

使用 POSIX 括号表达式,因为它们包含非 ASCII 字符.例如,/\d/只匹配 ASCII 十进制数字 (0-9);而/[[:digit:]]/匹配 Unicode Nd 类别中的任何字符.

Use POSIX bracket expressions as they encompass non-ASCII characters. For instance, /\d/ matches only the ASCII decimal digits (0-9); whereas /[[:digit:]]/ matches any character in the Unicode Nd category.

有关详细信息,请参阅 Ruby 文档:

See Ruby documentation for more info:

示例:

给定一个主题标签列表,以下正则表达式将匹配以单词字符(包括国际单词字符)开头的所有主题标签,后跟至少一个其他单词字符、数字或下划线:

Given a list of hashtags, the following regex will match all hashtags that start with a word character (inc. international word characters) followed by at least one other word character, a number or an underscore:

    m/^#[[:alpha:]][[:alnum:]_]+$/u     # Perl

    /^#[[:alpha:]][[:alnum:]_]+$/       # Ruby

这篇关于推特主题标签中允许使用哪些字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆