Redshift查询:错误xx000磁盘已满redshift [英] Redshift Querying: error xx000 disk full redshift

查看:378
本文介绍了Redshift查询:错误xx000磁盘已满redshift的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我执行了以下查询

select employee_name, max(employee_dept) as dept
from employeeDB 
where employee_name is not null and employee_name != ''
group by employee_name
order by employee_name asc
limit 1000

并收到错误ERROR: XX000: Disk Full.

通过执行以下查询进行调查后,我发现我有941 GB的可用空间和5000 GB的已用空间.

select
  sum(capacity)/1024 as capacity_gbytes, 
  sum(used)/1024 as used_gbytes, 
  (sum(capacity) - sum(used))/1024 as free_gbytes 
from 
  stv_partitions where part_begin=0;

任何人都可以建议如何缓解这一挑战,以便我获得预期的结果吗?

解决方案

+-------+              +-------+
|-------|              |-------|
||10 kb||              ||25 kb||
+-------+              +-------+
|xxxxxxx|              |xxxxxxx|
|xxxxxxx|              |xxxxxxx|
|xxxxxxx+------------->+xxxxxxx|
+-------+              |xxxxxxx|
||10 kb||              |xxxxxxx|
+-------+              |xxxxxxx|
|xxxxxxx|              |xxxxxxx|
|xxxxxxx|              |xxxxxxx|
+-------+              |xxxxxxx|
||05 kb||              |xxxxxxx|
+-------+              +-------+

看上面的表示.让我们假设xxxxxxx代表磁盘上的已占用空间,而数字代表可用空间.

这两个方案都代表25 kb的空闲空间.但是在情况1 中,如果必须插入(或执行操作)而需要连续的内存分配,例如15 kb,则将无法执行此操作.尽管可以使用25 kb的空间,但是由于该空间不是连续的,因此您可能会得到Memory / Disk Full Error,因此该空间将被浪费或将分配给内存需求非常低的任务.

情况2 中,一块连续的内存可用.可以轻松执行需要~25kb内存的任务,

这不仅适用于Redshift或DBMS;对于远程涉及内存管理的所有内容(包括操作系统),它都是正确的.

是什么原因导致此类内存分区(称为碎片化)?

碎片是由连续创建和删除(修改)磁盘上的文件引起的.删除占用空间的文件后,它将在其中创建一个巨大的内存孔.小于内存孔的文件可能会占用该空间,否则该空间将被浪费.

应该怎么做?

碎片整理!!在您的特定情况下,Amazon Redshift会将规定提供给 VACUUM 表和/或架构.您可能有足够的磁盘空间,但没有足够的连续内存以使引擎无法分配给您执行的任务.

I executed the below query

select employee_name, max(employee_dept) as dept
from employeeDB 
where employee_name is not null and employee_name != ''
group by employee_name
order by employee_name asc
limit 1000

and received the error ERROR: XX000: Disk Full.

upon investigation by executing the below query i found that i have 941 GB free space and 5000 GB used space.

select
  sum(capacity)/1024 as capacity_gbytes, 
  sum(used)/1024 as used_gbytes, 
  (sum(capacity) - sum(used))/1024 as free_gbytes 
from 
  stv_partitions where part_begin=0;

Can anyone suggest how to mitigate this challenge so that i can get the desired results?

解决方案

+-------+              +-------+
|-------|              |-------|
||10 kb||              ||25 kb||
+-------+              +-------+
|xxxxxxx|              |xxxxxxx|
|xxxxxxx|              |xxxxxxx|
|xxxxxxx+------------->+xxxxxxx|
+-------+              |xxxxxxx|
||10 kb||              |xxxxxxx|
+-------+              |xxxxxxx|
|xxxxxxx|              |xxxxxxx|
|xxxxxxx|              |xxxxxxx|
+-------+              |xxxxxxx|
||05 kb||              |xxxxxxx|
+-------+              +-------+

Look at the representation above. Let us assume that xxxxxxx represents the occupied space on the disk while the numbers represent the empty space available.

Both the scenarios represent a vacant space of 25 kb. But in case 1, if you have to insert (or perform operations) that would require a contiguous memory allocation of, say 15 kb, you won't be able to do that. Although a space of 25 kb is available, but since that isn't contiguous, you might get a Memory / Disk Full Error and thus either the space will go waste or will be assigned for tasks that are very low on memory requirement.

In case 2, a block of contiguous memory is available. A task requiring ~25kb of memory can easily be executed,

This isn't only with Redshift or DBMS; it holds true with anything that remotely involves memory management, including Operating Systems.

What causes such memory partitions (called Fragmentation)?

Fragmentation is caused by continuously creating and deleting (modifying) files on disk. When a file occupying a space is removed, it creates a gaping memory hole there. A file of size less than the memory hole can occupy that space or the space will go waste.

What should be done?

Defragment! In your specific case, Amazon Redshift provides the provision to VACUUM tables and/or schema. You might be having enough disk space, but not enough contiguous memory that the engine would be able to allocate to the task executed by you.

这篇关于Redshift查询:错误xx000磁盘已满redshift的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆