根据现有数据生成随机数据 [英] Generate random data based on existing data

查看：167 发布时间：2021/6/8 18:56:28 python random statistics normal-distribution weibull

本文介绍了根据现有数据生成随机数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

python 有没有办法根据现有数据的分布生成随机数据?

is there a way in python to generate random data based on the distribution of the alreday existing data?

这是我的数据集的统计参数:

Here are the statistical parameters of my dataset:

Data
count   209.000000
mean    1.280144
std     0.374602
min     0.880000
25%     1.060000
50%     1.150000
75%     1.400000
max     4.140000

因为它不是正态分布，所以用 np.random.normal 是不可能的.有什么想法吗?

as it is no normal distribution it is not possible to do it with np.random.normal. Any Ideas?

谢谢.

执行 KDE:

from sklearn.neighbors import KernelDensity
# Gaussian KDE
kde = KernelDensity(kernel='gaussian', bandwidth=0.525566).fit(data['y'].to_numpy().reshape(-1, 1))
sns.distplot(kde.sample(2400))

推荐答案

一般来说，真实世界的数据并不像正态分布或威布尔分布那样完全遵循良好"分布.

In general, real-world data doesn't exactly follow a "nice" distribution like the normal or Weibull distributions.

与机器学习类似，从数据点分布中采样通常有两个步骤:

Similarly to machine learning, there are generally two steps to sampling from a distribution of data points:

将数据模型拟合数据.

然后，在随机性的帮助下，预测一个基于该模型的新数据点.

Then, predict a new data point based on that model, with the help of randomness.

有几种方法可以根据该估计来估计数据和样本的分布:

There are several ways to estimate the distribution of data and sample from that estimate:

核密度估计.
高斯混合模型.
直方图.
~~回归模型.~~
其他机器学习模型.

此外，诸如最大似然估计之类的方法可以将已知分布(例如正态分布)拟合到数据中，但估计的分布通常比核密度估计更粗糙或其他机器学习模型.

In addition, methods such as maximum likelihood estimation make it possible to fit a known distribution (such as the normal distribution) to data, but the estimated distribution is generally rougher than with kernel density estimation or other machine learning models.

另见我的部分来自数据点分布的随机数".

这篇关于根据现有数据生成随机数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

根据现有数据生成随机数据 [英] Generate random data based on existing data

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

根据现有数据生成随机数据 [英] Generate random data based on existing data

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭