如何在Redshift中进行分层随机抽样? [英] How to do a stratified random sample in Redshift?

查看：101 发布时间：2021/4/3 19:17:20 sql amazon-redshift

本文介绍了如何在Redshift中进行分层随机抽样?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要对从不同类别购买的客户进行随机抽样.有8个类别，订单分布在其中.

I need to take a random sample of customers who have purchased from different categories. There are 8 categories, and orders are spread among them.

如果我想随机抽取已购买商品的客户，但保持每个类别的订单比例不变，我该如何在我的sql代码中进行设置?

If I wanted to take a random sample of customers who have made a purchase, but keep the proportion of orders per category the same, how would I set that up in my sql code?

以下是突出显示此内容的表格(不包含客户数据-我希望我的客户列表基于订单的代表比例):

A table highlighting this is below (it doesn't include customer data - I want my list of customers to be based of the representative proportion of orders):

表可以在这里找到: https://imgur.com/a/Q0lMHWf

推荐答案

基本上，您将使用 row_number()， order by ，然后选择第n个值.因此，对于约1％的分层样本，请执行以下操作:

Basically, you would use row_number(), order by, and choose the nth values. So, for about a 1% stratified sample, do:

select t.*
from (select t.*,
             row_number() over (order by category, rand()) as seqnum
      from t
     ) t
where mod(seqnum, 101) = 1
order by category;

基本思想是，您可以通过按类别对结果集进行排序并对结果进行第n个样本来获得分层样本.

The basic idea is that you can get a stratified sample by ordering the result set by the categories and doing an nth sample of the result.

这篇关于如何在Redshift中进行分层随机抽样?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在Redshift中进行分层随机抽样? [英] How to do a stratified random sample in Redshift?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在Redshift中进行分层随机抽样? [英] How to do a stratified random sample in Redshift?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭