TreeSet没有添加所有元素? [英] TreeSet not adding all elements?

查看:146
本文介绍了TreeSet没有添加所有元素?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在研究不同Java集合类型的速度,并且遇到了一些奇怪的东西。我将1,000,000个对象从静态数组添加到不同的集合类型并返回所需的时间。这部分代码工作正常。

I have been looking into the speeds of different Java collection types and have come across something weird. I am adding 1,000,000 objects from a static array to a different collection type and returning the time required. This part of the code works fine.

根据进一步调查,我注意到 TreeSet 没有接收到所有1,000,000个对象,并且正在接收不同的每次金额。下面是将对象从数组传输到 TreeSet 的方法:

Under further investigation I noticed that the TreeSet is not receiving all of the 1,000,000 objects, and is receiving a different amount each time. Below is the method to transfer the objects from an array to the TreeSet:

    public int treeSet(int num)
    {
       Date before = new Date();

       for(int i=0; i<num; i++) 
       {
           treeSet.add(personsArray[i]);
       }

       Date after = new Date();
       return (int) (after.getTime() - before.getTime());
    }

下面是调用treeSet()方法并测试其大小的代码。

Below is the code which calls the treeSet() method and tests for its size.

    System.out.println("\tTree set with 1,000,000 objects--" + t.treeSet(1000000));
    System.out.println("Tree set contains " + t.treeSet.size() + " elements");

这个输出是:

    Tree set with 1,000,000 objects--1192
    Tree set contains 975741 elements

我希望有人可以向我解释为什么 TreeSet 没有收到所有对象以及收到不一致金额的原因。

I'm hoping someone can explain to me why the TreeSet is not receiving all of the objects and why it is receiving inconsistent amounts.

推荐答案

您几乎肯定会生成重复的Person对象。

You are almost certainly generating duplicate Person objects.

评论,你说每个人都是从文字,名字和姓氏中随机生成的包含数百名称和年龄的文件。假设有两种性别可能性,每种名字和姓氏有300种可能性,还有100种可能的年龄值。这总共有18,000,000个可能的独特人。

In your comment, you said each person is randomly generated from sex, first names and last names from a text file containing "hundreds" of names, and age. Let's say there are two possibilities for sex, 300 possibilities for each of first name and last name, and 100 possible values of age. That's a total of 18,000,000 possible unique people.

让我们进一步假设 equals()在此对象上正确实现,也就是说,它会正确地检查所有这些字段。

Let's further assume that equals() is implemented correctly on this object, that is, that it checks all of these fields correctly.

你在18,000,000种可能的空间中使用随机特征生成1,000,000个独特的人。

You're generating 1,000,000 unique people using random characteristics out of a space of 18,000,000 possibilities.

直观地说,您可能认为重复的可能性微不足道,但重复的可能性实际上约为1.0减去epsilon。这被称为生日问题或有时候是生日悖论。

Intuitively, you might think there's a "miniscule" chance of duplicates, but the probability of there being duplicates is in fact about 1.0 minus epsilon. This is known as the Birthday Problem or sometimes the Birthday Paradox.

如该页面所示,任意两个选择之间发生碰撞的概率约为

As given on that page, the probability of a collision occuring between any two choices is about


1 - ((d-1)/ d)^ n(n-1)/ 2

1 - ((d-1) / d) ^ n(n-1)/2

其中 d 是域中值的数量, n 是所做选择的数量。我不完全确定,但是当d = 18,000,000和n = 1,000,000的值时,我认为这大约是 1.0 - 1E-323 。 (编辑:正确的价值大约是 1.0 - 2.84E-12294 。这很接近于一个。)

where d is the number of values in the domain, and n is the number of choices made. I'm not entirely sure, but with values of d = 18,000,000 and n = 1,000,000 I think this works out to be about 1.0 - 1E-323. ( The correct value is about 1.0 - 2.84E-12294. That's pretty darned close to one.)

此选择中预期的碰撞次数由以下公式给出:

The expected number of collisions in such a choice is given by this formula:


n - d + d *((d-1)/ d)^ n

n - d + d * ((d-1) / d) ^ n

如果d = 18,000,000且n = 1,000,000,则可达到约27,000 。也就是说,平均而言你会遇到27,000次碰撞。这非常接近TreeSet中缺失元素的数量,这就是碰撞会表现出来的方式。我承认我选择的数字非常接近您所看到的数字,但我的假设和结果完全合情合理。

If d = 18,000,000 and n = 1,000,000 then this works out to about 27,000. That is, on average you'd get 27,000 collisions. That's pretty close to the number of "missing" elements in your TreeSet, which is how collisions would manifest themselves. I admit I picked my numbers to come pretty close to what you're seeing, but my assumptions and the results are entirely plausible.

你需要重新考虑你的方式重新生成您存储在集合中的数据。

You need to rethink the way you're generating the data you're storing into the set.

这篇关于TreeSet没有添加所有元素?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆