根据现有数据生成随机数据 [英] Generate random data based on existing data

查看:167
本文介绍了根据现有数据生成随机数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

python 有没有办法根据现有数据的分布生成随机数据?

is there a way in python to generate random data based on the distribution of the alreday existing data?

这是我的数据集的统计参数:

Here are the statistical parameters of my dataset:

Data
count   209.000000
mean    1.280144
std     0.374602
min     0.880000
25%     1.060000
50%     1.150000
75%     1.400000
max     4.140000

因为它不是正态分布,所以用 np.random.normal 是不可能的.有什么想法吗?

as it is no normal distribution it is not possible to do it with np.random.normal. Any Ideas?

谢谢.

执行 KDE:

from sklearn.neighbors import KernelDensity
# Gaussian KDE
kde = KernelDensity(kernel='gaussian', bandwidth=0.525566).fit(data['y'].to_numpy().reshape(-1, 1))
sns.distplot(kde.sample(2400))

推荐答案

一般来说,真实世界的数据并不像正态分布或威布尔分布那样完全遵循良好"分布.

In general, real-world data doesn't exactly follow a "nice" distribution like the normal or Weibull distributions.

与机器学习类似,从数据点分布中采样通常有两个步骤:

Similarly to machine learning, there are generally two steps to sampling from a distribution of data points:

  • 将数据模型拟合数据.

然后,在随机性的帮助下,预测一个基于该模型的新数据点.

Then, predict a new data point based on that model, with the help of randomness.

有几种方法可以根据该估计来估计数据和样本的分布:

There are several ways to estimate the distribution of data and sample from that estimate:

  • 核密度估计.
  • 高斯混合模型.
  • 直方图.
  • 回归模型.
  • 其他机器学习模型.

此外,诸如最大似然估计之类的方法可以将已知分布(例如正态分布)拟合到数据中,但估计的分布通常比核密度估计更粗糙或其他机器学习模型.

In addition, methods such as maximum likelihood estimation make it possible to fit a known distribution (such as the normal distribution) to data, but the estimated distribution is generally rougher than with kernel density estimation or other machine learning models.

另见我的部分来自数据点分布的随机数".

这篇关于根据现有数据生成随机数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆