谁能破解这个推特正则表达式? [英] Who can crack this twitter regexp?

查看:26
本文介绍了谁能破解这个推特正则表达式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从 http://search.twitter.com/search.atom?q=%23eu-jele%C4%A1%C4%A1i

主题标签位于内容中,即 RSS 提要中的标题节点.它们以 #

为前缀

我遇到的问题是非英文字母(超出 a-zA-Z 范围).

如果您查看 RSS 提要,然后查看 html 源代码,我的挣扎可能会更清楚.

 <title>以及更多:#eu-jele&#289;&#289;i #eu-kest #ue-wybiera #eu-eleger #ue-alege #eu-vyvolenej #eu-izvoli #eu-elegir #eu-v&#228;lja #eu-elect</title>

在找到我的 rexexp 匹配之前,我是否需要对标题节点做一些事情.

我的最终目标是用 Twitter 搜索 url 替换主题标签,例如http://search.twitter.com/search.atom?q=%23eu-jele%C4%A1%C4%A1i

这里有一些示例代码可以帮助您.

<预><代码><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><meta http-equiv="Content-Type";内容=文本/html;字符集=UTF-8"/><html xmlns="http://www.w3.org/1999/xhtml";xml:lang="en";lang="en"><身体><?php$title="还有更多:#eu-jele&#289;&#289;i #eu-kiest #ue-wybiera #eu-eleger #ue-alege #eu-vyvolenej #eu-izvoli #eu-elegir#eu-v&#228;lja #eu-elect";//这是 hashtags.org 使用的正则表达式 (http://twitter.pbwiki.com/Hashtags)$r = preg_replace("/(?:(?:^#|[\s\(\[]#(?!\d\s))(\w+(?:[_\-\.\+\/]\w+)*)+)/","<a href=\"http://search.twitter.com/search?q=%23\1\">\1</a> ", $title);echo "<p>$r</p>";$r = preg_replace("/(#.+?)(?:(\s|$))/",<a href=\"http://search.twitter.com/search?q=\1\">\1</a>", $title);echo "<p>$r</p>";//这是我想要的最终结果echo "<p><a href=\"http://search.twitter.com/search?q=%23eu-jeleġġi\>#eu-jeleġġi</a></p>";?></html>

任何建议或解决方案将不胜感激.

解决方案

或者只是

(#\S+)

I would like to grab all the hashtags using PHP from http://search.twitter.com/search.atom?q=%23eu-jele%C4%A1%C4%A1i

The hashtags are in the content, title nodes within the RSS feed. They are prefixed with #

The problem I am having is with non-English letters (outside of the range a-zA-Z).

If you look at the RSS feed and then view the html source my struggle might be clearer.

    <title>And more: #eu-jele&#289;&#289;i #eu-kiest #ue-wybiera #eu-eleger #ue-alege #eu-vyvolenej #eu-izvoli #eu-elegir #eu-v&#228;lja #eu-elect</title>

Do I need to do some something with the title node before I find my rexexp matches.

My ultimate aim is to replace the hashtag with the twitter search url e.g. http://search.twitter.com/search.atom?q=%23eu-jele%C4%A1%C4%A1i

Here is some sample code to help you along.


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

<body>
<?php 
$title="And more: #eu-jele&#289;&#289;i #eu-kiest #ue-wybiera #eu-eleger #ue-alege #eu-vyvolenej #eu-izvoli #eu-elegir #eu-v&#228;lja #eu-elect";

// this is the regexp that hashtags.org use (http://twitter.pbwiki.com/Hashtags)
$r = preg_replace("/(?:(?:^#|[\s\(\[]#(?!\d\s))(\w+(?:[_\-\.\+\/]\w+)*)+)/"," <a href=\"http://search.twitter.com/search?q=%23\1\">\1</a> ", $title);
echo "<p>$r</p>";

$r = preg_replace("/(#.+?)(?:(\s|$))/"," <a href=\"http://search.twitter.com/search?q=\1\">\1</a> ", $title);
echo "<p>$r</p>";

// This is my desired end result
echo "<p><a href=\"http://search.twitter.com/search?q=%23eu-jeleġġi\">#eu-jeleġġi</a></p>";
?>

</body>
</html>

Any advice or solution would be greatly appreciated.

解决方案

Or just

(#\S+)

这篇关于谁能破解这个推特正则表达式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆