将基于学位的地理坐标与正则表达式匹配 [英] Matching degree-based geographical coordinates with a regular expression
问题描述
我希望能够识别出以下形式的模式
I'd like to be able to identify patterns of the form
28°44'30"N., 33°12'36"E.
这是我到目前为止的内容:
Here's what I have so far:
use utf8;
qr{
(?:
\d{1,3} \s* ° \s*
\d{1,2} \s* ' \s*
\d{1,2} \s* " \s*
[ENSW] \s* \.?
\s* ,? \s*
){2}
}x;
不需要也就是说,这不匹配。这与扩展字符(即度数符号)有关系吗?还是我只是把这段时间搞砸了?
Needless to say, this doesn't match. Does it have anything to do with the extended characters (namely the degree symbol)? Or am I just screwing this up big time?
如果您知道可以解决我的问题的信息,我也希望能找到 CPAN
的路线。我看过 Regex :: Common 和 Geo :: Formatter ,但是这些都不是我想要的。有什么主意吗?
I'd also appreciate directions to CPAN
, if you know of something there that will solve my problem. I've looked at Regex::Common and Geo::Formatter, but none of these do what I want. Any ideas?
更新
事实证明,我需要使用utf8取出
从文件中读取坐标时。我用坐标手动初始化了一个变量,它可以很好地匹配,但是一旦我从文件中读取同一行,它就不会匹配。取出使用utf8
解决了这个问题。我想我不太了解 utf8
在做什么。
It turns out that I needed to take out use utf8
when reading the coordinates from a file. If I manually initialize a variable with a coordinate, it would match fine, but as soon as I read that same line from a file, it wouldn't match. Taking out use utf8
solved that. I guess I don't really understand what utf8
is doing.
推荐答案
尝试删除 use utf8
语句。
度数符号对应于我当前编码中的字符值0xB0(无论如何)也就是说,但它不是UTF8)。 0xB0是UTF8中的连续字节;期望序列的第二个,第三个或第四个字符以0xC2和0xF4之间的某个字符开头。在 utf8
中使用该字符串会给您带来错误。
The degree symbol corresponds to character value 0xB0 in my current encoding (whatever that is, but it ain't UTF8). 0xB0 is a "continuation byte" in UTF8; it is expected to by the second, third, or fourth character of a sequence that begins with something between 0xC2 and 0xF4. Using that string with utf8
will give you an error.
这篇关于将基于学位的地理坐标与正则表达式匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!