从数据集中随机抽样，同时保留原始概率分布 [英] Random sampling from a dataset, while preserving original probability distribution

查看：186 发布时间：2019/1/2 15:13:05 java sampling probability-density

本文介绍了从数据集中随机抽样，同时保留原始概率分布的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一组> 2000个数字，从测量中收集。我想从这个数据集中抽样，每次测试约10次，同时保持整体的概率分布，并在每次测试中（尽可能地扩展）。例如，在每个测试中，我想要一些小值，一些中产阶级值，一些大值，其中均值和方差大致接近原始分布。结合所有测试，我还想要所有样本的总平均值和方差，大约接近原始分布。

因为我的数据集是

图1.约2k数据元素的密度图。

我正在使用Java，现在我正在使用均匀分布，并使用数据集中的随机int，并返回该位置的数据元素：

  public int getRandomData（）{
 int data [] = {1231,414,222,4211 ,,，41,203,123,432，...}; 
 length = data.length; 
随机r = new Random（）; 
 int randomInt = r.nextInt（length）; 
返回数据[randomInt]; 
}

我不知道它是否正常工作，因为我使用数据按顺序测量，它具有大量的序列相关性。

解决方案

它可以按你的需要工作。数据的顺序无关紧要。

I have a set of >2000 numbers, gathered from measurement. I want to sample from this data set, ~10 times in each test, while preserving probability distribution overall, and in each test (to extent approximately possible). For example, in each test, I want some small value, some middle class value, some big value, with the mean and variance approximately close to the original distribution. Combining all the tests, I also want the total mean and variance of all the samples, approximately close to the original distribution.

As my dataset is a long-tail probability distribution, the amount of data at each quantile are not the same:

Fig 1. Density plot of ~2k elements of data.

I am using Java, and right now I am using a uniform distribution, and use a random int from the dataset, and return the data element at that position:

public int getRandomData() {
    int data[] ={1231,414,222,4211,,41,203,123,432,...};
    length=data.length;
    Random r=new Random();
    int randomInt = r.nextInt(length);
    return data[randomInt];
}

I don't know if it works as I want, because I use data in order it is measured, which has great amount of serial correlation.

解决方案

It works as you want. The order of the data is irrelevant.

这篇关于从数据集中随机抽样，同时保留原始概率分布的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从数据集中随机抽样，同时保留原始概率分布 [英] Random sampling from a dataset, while preserving original probability distribution

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

从数据集中随机抽样，同时保留原始概率分布 [英] Random sampling from a dataset, while preserving original probability distribution

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭