根据现有数据生成随机数据 [英] Generate random data based on existing data
问题描述
python 有没有办法根据现有数据的分布生成随机数据?
is there a way in python to generate random data based on the distribution of the alreday existing data?
这是我的数据集的统计参数:
Here are the statistical parameters of my dataset:
Data
count 209.000000
mean 1.280144
std 0.374602
min 0.880000
25% 1.060000
50% 1.150000
75% 1.400000
max 4.140000
因为它不是正态分布,所以用 np.random.normal 是不可能的.有什么想法吗?
as it is no normal distribution it is not possible to do it with np.random.normal. Any Ideas?
谢谢.
执行 KDE:
from sklearn.neighbors import KernelDensity
# Gaussian KDE
kde = KernelDensity(kernel='gaussian', bandwidth=0.525566).fit(data['y'].to_numpy().reshape(-1, 1))
sns.distplot(kde.sample(2400))
推荐答案
一般来说,真实世界的数据并不像正态分布或威布尔分布那样完全遵循良好"分布.
In general, real-world data doesn't exactly follow a "nice" distribution like the normal or Weibull distributions.
与机器学习类似,从数据点分布中采样通常有两个步骤:
Similarly to machine learning, there are generally two steps to sampling from a distribution of data points:
将数据模型拟合数据.
然后,在随机性的帮助下,预测一个基于该模型的新数据点.
Then, predict a new data point based on that model, with the help of randomness.
有几种方法可以根据该估计来估计数据和样本的分布:
There are several ways to estimate the distribution of data and sample from that estimate:
- 核密度估计.
- 高斯混合模型.
- 直方图.
回归模型.- 其他机器学习模型.
此外,诸如最大似然估计之类的方法可以将已知分布(例如正态分布)拟合到数据中,但估计的分布通常比核密度估计更粗糙或其他机器学习模型.
In addition, methods such as maximum likelihood estimation make it possible to fit a known distribution (such as the normal distribution) to data, but the estimated distribution is generally rougher than with kernel density estimation or other machine learning models.
另见我的部分来自数据点分布的随机数".
这篇关于根据现有数据生成随机数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!