使用StringTokenizer复制String.split [英] Replicating String.split with StringTokenizer

查看:151
本文介绍了使用StringTokenizer复制String.split的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

受到这个的鼓励,以及我有数十亿字符串要解析的事实,我试图修改我的代码以接受 StringTokenizer 而不是 String []

Encouraged by this, and the fact I have billions of string to parse, I tried to modify my code to accept StringTokenizer instead of String[]

我之间留下的唯一的东西当你正在做的时候,获得美味的x2性能提升

The only thing left between me and getting that delicious x2 performance boost is the fact that when you're doing

"dog,,cat".split(",")
//output: ["dog","","cat"]

StringTokenizer("dog,,cat")
// nextToken() = "dog"
// nextToken() = "cat"

我如何实现类似StringTokenizer的结果?是否有更快的方法来执行此操作?

How can I achieve similar results with the StringTokenizer? Are there faster ways to do this?

推荐答案

您实际上只是用逗号进行标记吗?如果是这样,我会编写自己的标记化器 - 它可能最终比可以查找多个标记的更通用的StringTokenizer更有效,并且您可以使它按照您的喜好行事。对于这样一个简单的用例,它可以是一个简单的实现。

Are you only actually tokenizing on commas? If so, I'd write my own tokenizer - it may well end up being even more efficient than the more general purpose StringTokenizer which can look for multiple tokens, and you can make it behave however you'd like. For such a simple use case, it can be a simple implementation.

如果它有用,你甚至可以实现 Iterable< String> 并通过强类型获得增强的for-for循环支持,而不是 StringTokenizer Enumeration 支持>。如果你想要任何帮助编码这样的野兽,请告诉我 - 它真的不应该太难。

If it would be useful, you could even implement Iterable<String> and get enhanced-for-loop support with strong typing instead of the Enumeration support provided by StringTokenizer. Let me know if you want any help coding such a beast up - it really shouldn't be too hard.

另外,我会尝试对你的实际运行性能测试在从现有解决方案中跳得太远之前的数据。你知道在 String.split 中你花了多少实际的执行时间?我知道你有很多字符串需要解析,但是如果你事后做了很多重要的事情,我希望它比分裂要重要得多。

Additionally, I'd try running performance tests on your actual data before leaping too far from an existing solution. Do you have any idea how much of your execution time is actually spent in String.split? I know you have a lot of strings to parse, but if you're doing anything significant with them afterwards, I'd expect that to be much more significant than the splitting.

这篇关于使用StringTokenizer复制String.split的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆