查找并替换NSString中的长单词? [英] Find and replace long words in an NSString?

查看:389
本文介绍了查找并替换NSString中的长单词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一种方法,该方法将搜索NSString,确定字符串中的单个单词是否超过6个字符长,然后将该单词替换为其他单词(诸如"hello"之类的任意单词).

I'm trying to write a method that will search an NSString, determine if an individual word within the string is over 6 characters long and replace that word with some other word (something arbitrary like 'hello').

我从一个长段落开始,我需要以一个NSString对象结束,该对象的格式和间距不受查找和替换的影响.

I am starting with a long paragraph and I need to end up with a single NSString object whose format and spacing has not been affected by the find and replace.

推荐答案

为什么要另外一个答案?

使用componentsSeparatedByString:的简单解决方案存在一些细微问题:

Why another answer?

There are a couple of subtle problems with the simple solutions using componentsSeparatedByString::

  1. 标点符号不作为单词定界符处理.
  2. 只删除空格字符(换行符,制表符)的空白.
  3. 在长字符串上,浪费了很多内存.
  4. 很慢.

示例

假设替换词为"–",例如...

Example

Assuming a substitution word of "–" a string like ...

基本上"是D.H.C.总结,
"bokanovskification包括一系列发展停滞."

"Essentially," the D.H.C. concluded,
"bokanovskification consists of a series of arrests of development."

...将导致...

... would result in ...

– D.H.C. – – –系列中的– –

– the D.H.C. – – of a series of – of –

...,而正确的输出将是:

... while the correct output would be:

"–",D.H.C. –,
" – – –系列中的–."

"–," the D.H.C. –,
"– – of a series of – of –."

解决方案

幸运的是,可可中有一个更好但更简单的解决方案:-[NSString enumerateSubstringsInRange:options:usingBlock:]

它提供对由options参数定义的子字符串的快速迭代.一种可能性是NSStringEnumerationByWords,它枚举了实际上是真实单词(在当前语言环境中)的所有子字符串.它甚至可以检测不使用分隔符(空格)来分隔单词的语言中的单个单词,例如日语.

It provides fast iteration over substrings defined by the options argument. One possibility is the NSStringEnumerationByWords which enumerates all substrings that are actually real words (in the current locale). It even detects individual words in languages that don't use delimiters (spaces) to separate words, like japanese.

这是一个适用于行话文件的简单演示项目(1.6 MB,237,239个字).它比较了三种不同的解决方案:

Here's a simple demo project that works on the jargon file (1.6 MB, 237,239 words). It compares three different solutions:

  1. componentsSeparatedByString:270毫秒
  2. enumerateSubstringsInRange:125毫秒
  3. stringByReplacingOccurrencesOfString,如@Monolo所述:200毫秒

实施

它的核心是替换循环:

Implementation

The core of it is the replacement loop:

NSMutableString *result = [NSMutableString stringWithCapacity:[originalString length]];
__block NSUInteger location = 0;
[originalString enumerateSubstringsInRange:(NSRange){0, [originalString length]}
                                   options:NSStringEnumerationByWords | NSStringEnumerationLocalized | NSStringEnumerationSubstringNotRequired
                                usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {

                                    if (substringRange.length > maxChar) {
                                        NSString *charactersBetweenLongWords = [originalString substringWithRange:(NSRange){ location, substringRange.location - location }];
                                        [result appendString:charactersBetweenLongWords];
                                        [result appendString:replaceWord];
                                        location = substringRange.location + substringRange.length;
                                    }

                                }];
[result appendString:[originalString substringFromIndex:location]];

注意事项

Monolo 所指出的那样,建议的代码使用NSString的长度来确定a的字符数.单词.至少可以这样说,这是一个有问题的方法.实际上,字符串的length指定用于对字符串进行编码的代码片段的数量,该值通常与人们所假定的字符数量有所不同.

Caveat

As pointed out by Monolo the proposed code uses NSString's length to determine the number of characters of a word. That's a questionable approach, to say the least. In fact a string's length specifies the number of code fragments used to encode the string, a value that often defers from what a human would assume the number of characters.

由于字符"一词在各种情况下具有不同的含义,并且OP没有指定要使用哪种字符计数,我只是将代码保留原样.如果您希望获得不同的计数,请参考讨论该主题的文档:

As the term "character" has different meanings in various contexts and the OP didn't specify which kind of character count to use I just leave the code as it was. If you want a different count please refer to the documentation that discusses the topic:

  • Apple's String Programming Guide, Characters and Grapheme Clusters
  • Unicode FAQ: How are characters counted when measuring the length or position of a character in a string?

这篇关于查找并替换NSString中的长单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆