使用 PostgreSQL 中其他表的随机值生成表 [英] Generate table with random values from other tables in PostgreSQL

查看:66
本文介绍了使用 PostgreSQL 中其他表的随机值生成表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下代码使用 uuid_generate_v4() 为 Id 列生成 100000 行随机值.但是,嵌套选择始终选择同一行,因此所有插入的行对于这些列都具有相同的值.目标是创建一个包含 100k 行的表,其中随机值取自其他示例表.每个示例表只有两列(Id 和从中获取值的列).如何存档?

The following code generates 100000 rows with random values for the Id column with uuid_generate_v4(). However, the nested selects are always choosing the same row so all the inserted rows have the same values for those columns. The goal is to create a table with 100k rows with random values taken from the other sample tables. Each of the sample tables only have two columns (Id and the column from which the values are taken). How can this be archived?

insert into "Tag" (
    "Id", "Time", "Account", "Name", "Value", "RollUpTableId"
)
select
    uuid_generate_v4(),
    current_timestamp,
    (select "Account" from "AccountSamples" OFFSET floor(random()*358) LIMIT 1),
    (select "Name" from "TagNameSamples" OFFSET floor(random()*19) LIMIT 1),
    (select "Value" from "TagValueSamples" OFFSET floor(random()*26) LIMIT 1),
    uuid_generate_v4()
from generate_series(1, 100000);

我也试过选择账户"来自AccountSamples"其中Id"= (trunc(random() * 358)::整数)

I've also tried with select "Account" from "AccountSamples" where "Id" = (trunc(random() * 358)::integer)

推荐答案

很可能,Postgres 正在优化子查询,并且不会为每一行重新执行它们.

Likely, Postgres is optimizing the subqueries, and does not reexcutes them for each row.

我建议在子查询中随机枚举,然后加入:

I would recommend enumerating randomly in subqueries, then joining:

select uuid_generate_v4(), a."Account", tns."Name", tvs."Value"
from (
    select "Account", row_number() over(order by random()) rn from "AccountSamples"
) a
inner join (
    select "Name",    row_number() over(order by random()) rn from "TagNameSamples"
) tns on tns.rn = a.rn
inner join (
    select "Value",   row_number() over(order by random()) rn from "TagValueSamples"
) tvs on tvs.rn = a.rn
where a.rn <= 10

这与原始查询中的逻辑不完全相同,因为给定的行可能只被选择一次 - 但我认为这是一个合理的近似值.

This is not the exact same logic as in your original query, because a given row may only be selected once - but I think that's a reasonable approximation.

如果你的一些表可能有超过 10 行,那么 generate_series()left join 更安全:

If some of your tables may have more than 10 rows, then generate_series() and left joins are safer:

select uuid_generate_v4(), a."Account", tns."Name", tvs."Value"
from generate_series(1, 10) x(rn)
left join (
    select "Account", row_number() over(order by random()) rn from "AccountSamples"
) a on a.rn = x.rn
left join (
    select "Name",    row_number() over(order by random()) rn from "TagNameSamples"
) tns on tns.rn = x.rn
left join (
    select "Value",   row_number() over(order by random()) rn from "TagValueSamples"
) tvs on tvs.rn = x.rn

这篇关于使用 PostgreSQL 中其他表的随机值生成表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆