拆分方法vs子字符串和IndexOf [英] Split Method vs Substring and IndexOf

查看:86
本文介绍了拆分方法vs子字符串和IndexOf的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我正在写一个解析CSV的程序.我正在使用split方法将值分成字符串数组,但是我在一些文章中读到,使用substring和indexOf更快.我基本上写了我将要使用这两种方法的内容,看来split会更好.有人可以解释一下这是如何更好的,或者如果我没有正确使用这些方法?这是我写的:

so I'm writing a program that parses a CSV. I'm using the split method to separate the values into a string array, but I've read in some articles that it's faster to use substring and indexOf. I wrote essentially what I would do with those two methods, and it seems like split would be better. Could someone explain how this is better, or if maybe I'm not correctly utilizing these methods? Here's what I wrote:

int indexOne = 0, indexTwo;
for (int i = 0; i < 4; i++) //there's four diff values in one line
{
   if (line.indexOf(",", indexOne) != -1)
   {
       indexTwo = line.indexOf(",", indexOne);
       lineArr[i] = line.substring(indexOne, indexTwo);
       indexOne = indexTwo+1;
   }
}

推荐答案

下面的代码摘自Oracle JDK 8 update 73随附的源代码.如在"fastpath"场景中看到的那样, char字符串,它使用类似于您的逻辑的indexOf陷入循环.

The code below is taken from the source shipped with Oracles JDK 8 update 73. As you can see in the "fastpath" scenario when you pass in a one-char String it falls to a loop using indexOf similar to your logic.

简短的回答是,是的,您的代码要快一些,但是我将让您决定这是否足以避免在您的用例中使用split.

The short answer is yes your code is a little faster but I'll leave it to you to decide if that is enough of a benefit to avoid using split in your use case.

我个人倾向于@pczeus评论使用split,除非您确实有证据表明它引起了问题.

Personally I tend to agree with @pczeus comment use split unless you actually have evidence that it is causing an issue.

 public String[] split(String regex, int limit) {
    /* fastpath if the regex is a
     (1)one-char String and this character is not one of the
        RegEx's meta characters ".$|()[{^?*+\\", or
     (2)two-char String and the first char is the backslash and
        the second is not the ascii digit or ascii letter.
     */
    char ch = 0;
    if (((regex.value.length == 1 &&
         ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
         (regex.length() == 2 &&
          regex.charAt(0) == '\\' &&
          (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
          ((ch-'a')|('z'-ch)) < 0 &&
          ((ch-'A')|('Z'-ch)) < 0)) &&
        (ch < Character.MIN_HIGH_SURROGATE ||
         ch > Character.MAX_LOW_SURROGATE))
    {
        int off = 0;
        int next = 0;
        boolean limited = limit > 0;
        ArrayList<String> list = new ArrayList<>();
        while ((next = indexOf(ch, off)) != -1) {
            if (!limited || list.size() < limit - 1) {
                list.add(substring(off, next));
                off = next + 1;
            } else {    // last one
                //assert (list.size() == limit - 1);
                list.add(substring(off, value.length));
                off = value.length;
                break;
            }
        }
        // If no match was found, return this
        if (off == 0)
            return new String[]{this};

        // Add remaining segment
        if (!limited || list.size() < limit)
            list.add(substring(off, value.length));

        // Construct result
        int resultSize = list.size();
        if (limit == 0) {
            while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
                resultSize--;
            }
        }
        String[] result = new String[resultSize];
        return list.subList(0, resultSize).toArray(result);
    }
    return Pattern.compile(regex).split(this, limit);
}

这篇关于拆分方法vs子字符串和IndexOf的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆