从String映射到整数 - 执行各种方法 [英] Mapping from String to integer - performance of various approaches

查看:184
本文介绍了从String映射到整数 - 执行各种方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我需要从 String 映射到一个整数。整数是唯一的,从0开始形成一个连续的范围。也就是说:

  Hello  - > 0 
世界 - > 1
Foo - > 2
Bar - > 3
垃圾邮件 - > 4
鸡蛋 - > 5

至少有两种直接的方法来做到这一点。使用hashmap:

  HashMap< String,Integer> map = ... 
int integer = map.get(string); //加上可能是空的检查以避免NPE在拆箱。

或列表:

 列表< String> list = ... 
int integer = list.indexOf(string); //加上可能检查-1。

我应该使用哪种方法,为什么?可以相对的表现取决于列表/地图的大小,因为 List#indexOf()是使用的线性搜索String#equals() - > O(n)效率,而 HashMap#get()使用哈希来缩小搜索范围 - >当地图大时肯定更有效,但是当只有少量元素(在计算哈希时必须有一些开销,对吗?)。



由于基准测试Java代码是非常困难的,所以我想得到一些有教养的猜测。我的推理上面是正确的(列表更好的小,地图是更好的大)?阈值大小是多少?各种 List HashMap 实现有什么区别?

解决方案

第三个选项,可能我最喜欢的是使用 trie



                 ;        



我打赌它在表演中击败 HashMap (没有碰撞+事实上,计算哈希码是 O(字符串的长度)反正),也可能还有列表在某些情况下(例如,如果您的字符串具有长的公共前缀,因为indexOf将在等于方法中浪费大量时间)。



在列表和地图之间选择我会去一个 Map (如 HashMap )。这是我的推理:




  • 可读性


  • 在正确的地方进行优化

    >

    如果您使用列表,您将会针对小型案例进行优化。这可能不是瓶颈的地方。




第四个选项将是使用一个 LinkedHashMap ,如果大小很小,则迭代它,而如果大小较大,则获取相关联的号码。 / p>

第五个选项是将决定封装在一个单独的类中。在这种情况下,您甚至可以在列表增长时在运行时更改策略。


Let's say that I need to make a mapping from String to an integer. The integers are unique and form a continuous range starting from 0. That is:

Hello -> 0
World -> 1
Foo   -> 2
Bar   -> 3
Spam  -> 4
Eggs  -> 5
etc.

There are at least two straightforward ways to do it. With a hashmap:

HashMap<String, Integer> map = ...
int integer = map.get(string); // Plus maybe null check to avoid NPE in unboxing.

Or with a list:

List<String> list = ...
int integer = list.indexOf(string); // Plus maybe check for -1.

Which approach should I use, and why? Arguably the relative performance depends on the size of the list/map, since List#indexOf() is a linear search using String#equals() -> O(n) efficiency, while HashMap#get() uses hash to narrow down the search -> certainly more efficient when the map is big, but maybe inferior when there are just few elements (there must be some overhead in calculating the hash, right?).

Since benchmarking Java code properly is notoriously hard, I would like to get some educated guesses. Is my reasoning above correct (list is better for small, map is better for large)? What is the threshold size approximately? What difference do various List and HashMap implementations make?

解决方案

A third option and possibly my favorite would be to use a trie:

                    

I bet it beats the HashMap in performance (no collisions + the fact that computing the hash-code is O(length of string) anyway), and possibly also the List approach in some cases (such as if your strings have long common prefixes, as the indexOf would waste lot of time in the equals methods).

When choosing between List and Map I would go for a Map (such as HashMap). Here is my reasoning:

  • Readability

    The Map interface simply provides a more intuitive interface for this use case.

  • Optimization in the right place

    I'd say if you're using a List you would be optimizing for the small cases anyway. That's probably not where the bottle neck is.

A fourth option would be to use a LinkedHashMap, iterate through it if the size is small, and get the associated number if the size is large.

A fifth option is to encapsulate the decision in a separate class all together. In this case you could even implement it to change strategy in runtime as the list grows.

这篇关于从String映射到整数 - 执行各种方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆