您将如何在Objective-C中扫描字符串数组以寻找一组子字符串? [英] How would you scan an array of strings for a set of substrings in objective-c?

查看:74
本文介绍了您将如何在Objective-C中扫描字符串数组以寻找一组子字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我基本上有一系列单词和短语.其中一些包含诅咒.我想创建一种自动扫描数组中每个单元的诅咒的方法.如果没有诅咒,请将其添加到新数组中.

我意识到我可以用一堆if/else if语句和rangeOfString方法来做到这一点,但令我感到震惊的是,我找不到能够搜索NSString的方法.一堆单词.

我是否可能忽略了一些可以用来扫描单个字符串的子字符串数组的东西?

例如: 如果我有一系列短语,例如:

@[@"hey how are you",
  @"what is going on?",
  @"whats up dude?",
  @"do you want to get chipotle?"]

我希望能够进行扫描,然后从以下数组中得出一个不包含任何单词的新数组:

@[@"you", @"hey"]

解决方案

您声明自己是:

对于无法找到同时搜索一堆单词的NSString方法感到震惊

尽管这似乎是一个奇怪的反应-编程毕竟是关于构建解决方案,但这是一种解决方案,它使用一种方法同时搜索所有单词,但属于NSRegularExpression而不是NSString. /p>

我们的样本数据:

NSArray *sampleLines = @[@"Hey how are you",
                         @"What is going on?",
                         @"What’s up dude?",
                         @"Do you want to get chipotle?",
                         @"They are the youth"
                         ];
NSArray *stopWords = @[@"you", @"hey"];

要检查的最后一个示例行是否与部分单词不匹配.增加了大写字母以测试不区分大小写的匹配.

我们构造一个RE来匹配任何停用词:

  • \b-单词边界,在此示例中设置为使用Unicode单词边界的选项
  • (?: ... )-一个非捕获组,因为它比捕获组要快一点,所以无论如何它都将与整个比赛相同
  • |-或

示例停用词的模式:\b(?:you|hey)\b

// don't forget to use \\ in a string literal to insert a backslash into the pattern
NSString *pattern = [NSString stringWithFormat:@"\\b(?:%@)\\b", [stopWords componentsJoinedByString:@"|"]];
NSError *error = nil;
NSRegularExpression *stopRE = [NSRegularExpression regularExpressionWithPattern:pattern
                                                                        options:(NSRegularExpressionCaseInsensitive | NSRegularExpressionUseUnicodeWordBoundaries)
                                                                          error:&error];
// always check error returns
if (error)
{
    NSLog(@"RE construction failed: %@", error);
    return;
}

遍历示例行,检查它们是否包含停用词,并在控制台上显示结果:

for (NSString *aLine in sampleLines)
{
    // check for all words anywhere in line in one go
    NSRange match = [stopRE rangeOfFirstMatchInString:aLine
                                              options:0
                                                range:NSMakeRange(0, aLine.length)];
    BOOL containsStopWord = match.location != NSNotFound;
    NSLog(@"%@: %@", aLine, containsStopWord ? @"Bad" : @"OK");
}

正则表达式匹配应该高效,并且由于该示例从不复制单个单词或匹配作为NSString对象,因此不应像枚举单个单词的方法那样创建大量临时对象.

HTH

So I basically have an array of words and phrases. Some of them contain curses. I want to create a method that automatically scans each of the units in the array for curses. If it doesn't have a curse, add it to a new array.

I realize I can do this with a bunch of if/else if statements and rangeOfString methods, but I am appalled that I have not been able to find a method of NSString that will search for a bunch of words at the same time.

Is there something I might've overlooked that could be used to scan a single string for an array of substrings?

For example: If I have an array of phrases like:

@[@"hey how are you",
  @"what is going on?",
  @"whats up dude?",
  @"do you want to get chipotle?"]

I want to be able to scan then derive a new array that doesn't contain any of the words from the following array:

@[@"you", @"hey"]

解决方案

As you state you are:

appalled that I have not been able to find a method of NSString that will search for a bunch of words at the same time

though this seems a strange reaction - programming is about building solutions after all, here is a solution which searches for all the words at the same time using a single method, but belonging to NSRegularExpression rather than NSString.

Our sample data:

NSArray *sampleLines = @[@"Hey how are you",
                         @"What is going on?",
                         @"What’s up dude?",
                         @"Do you want to get chipotle?",
                         @"They are the youth"
                         ];
NSArray *stopWords = @[@"you", @"hey"];

The last sample line to check we don't match partial words. Capitalisation added to test for case insensitive matching.

We construct a RE to match any of the stop words:

  • \b - word boundary, options set to use Unicode word boundaries in this example
  • (?: ... ) - a non-capturing group, just used as it is slightly faster than a capturing one and it will be the same as the whole match anyway
  • | - or

Pattern for exmaple stop words: \b(?:you|hey)\b

// don't forget to use \\ in a string literal to insert a backslash into the pattern
NSString *pattern = [NSString stringWithFormat:@"\\b(?:%@)\\b", [stopWords componentsJoinedByString:@"|"]];
NSError *error = nil;
NSRegularExpression *stopRE = [NSRegularExpression regularExpressionWithPattern:pattern
                                                                        options:(NSRegularExpressionCaseInsensitive | NSRegularExpressionUseUnicodeWordBoundaries)
                                                                          error:&error];
// always check error returns
if (error)
{
    NSLog(@"RE construction failed: %@", error);
    return;
}

Iterate through sample lines checking if they contain a stop word or not and display result on console:

for (NSString *aLine in sampleLines)
{
    // check for all words anywhere in line in one go
    NSRange match = [stopRE rangeOfFirstMatchInString:aLine
                                              options:0
                                                range:NSMakeRange(0, aLine.length)];
    BOOL containsStopWord = match.location != NSNotFound;
    NSLog(@"%@: %@", aLine, containsStopWord ? @"Bad" : @"OK");
}

Regular expression matching should be efficient, and as the example never copies individual words or matches as NSString objects this should not create a lot of temporary objects as methods which enumerate the individual words do.

HTH

这篇关于您将如何在Objective-C中扫描字符串数组以寻找一组子字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆