从数据分位数中随机抽样,同时保留原始概率分布 [英] Random sampling from data quantiles, while preserving original probability distribution

查看:49
本文介绍了从数据分位数中随机抽样,同时保留原始概率分布的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

按照我之前的题为:

图 1. ~2k 数据元素的密度图.

这是用于计算分位数的 R 代码:

q=quantile(data, probs = seq(0, 1, by= 0.1))

然后我将数据分成 5 个分位数(每个分位数作为一个数组)并从每个分区中取样.例如,我在 Java 中执行此操作:

public int getRandomData(int quantile) {整数数据[][] = {1,2,3,4,5},{6,7,8,9,10},{11,12,13,14,15},{16,17,18,19,20},{21,22,23,24,25}};长度=数据[分位数][].长度;随机 r=new Random();int randomInt = r.nextInt(length);返回数据[分位数][randomInt];}

那么,每个测试和所有测试的样本是否都保留了原始分布的特征,例如均值和方差?如果没有,如何安排抽样以实现这一目标?

解决方案

保留原始分布的特征,例如均值和方差?

这将具有类似的分布.您可能需要进行额外检查以确保它满足您的要求,也许再试一次,但这会让您接近.

<块引用>

如果没有,如何安排抽样以实现这一目标?

除非您有所有数据的重复,即所有数据都翻倍,否则您需要拥有每个样本值之一.这是获得完全相同分布的唯一方法.

Following my previous question titled: "Random sampling from a dataset, while preserving original probability distribution", I want to sample from a set of >2000 numbers, gathered from measurement. I want to perform several tests (I take maximum of 10 samples in each tests), while preserving probability distribution in overall testiong process, and in each test (as much as possible). Now, instead of completely random sampling, I partition data into 5 quantiles, and in 10 tests, I sample 2 data elements from each quantile, using a uniformly random distribution for the array of data in each quantile.

The problem with the completely random sampling was that as the distribution of data is long-tailed, I was getting almost the same values in each test. I want some small value samples, some middle value samples, and some big value samples in each test. So I sampled as described.

Fig 1. Density plot of ~2k elements of data.

This is the R code for calculating quantiles:

q=quantile(data, probs = seq(0, 1, by= 0.1))

And then I partition data into 5 quantiles (each one as an array) and sample from each partition. For example, I do this in Java:

public int getRandomData(int quantile) {
    int data[][] = {1,2,3,4,5}
                  ,{6,7,8,9,10}
                  ,{11,12,13,14,15}
                  ,{16,17,18,19,20}
                  ,{21,22,23,24,25}};
    length=data[quantile][].length;
    Random r=new Random();
    int randomInt = r.nextInt(length);
    return data[quantile][randomInt];
}

So, does the samples for each tests and all tests overall, preserve the characteristics of the original distribution, for example mean and variance? If not, how to arrange sampling to achieve this goal?

解决方案

preserve the characteristics of the original distribution, for example mean and variance?

This will have a similar distribution. You might want to have an additional check to ensure it meets your requirement, and perhaps try again, but this will get you close.

If not, how to arrange sampling to achieve this goal?

Unless you have duplication of all data i.e. double everything, you need to have one of every sample value. This is the only way to get exactly the same distribution.

这篇关于从数据分位数中随机抽样,同时保留原始概率分布的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆