如何从 Databricks Delta 表中删除一列? [英] How to drop a column from a Databricks Delta table?

查看:54
本文介绍了如何从 Databricks Delta 表中删除一列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近开始发现 Databricks 并面临需要删除增量表的某个列的情况.当我使用 PostgreSQL 时,它就像

I have recently started discovering Databricks and faced a situation where I need to drop a certain column of a delta table. When I worked with PostgreSQL it was as easy as

ALTER TABLE main.metrics_table 
DROP COLUMN metric_1;

我正在浏览 Databricks 文档在 DELETE 上,但它仅涵盖删除与谓词匹配的行.

I was looking through Databricks documentation on DELETE but it covers only DELETE the rows that match a predicate.

我还找到了关于 DROP 数据库、DROP 函数和 DROP 表的文档,但绝对没有关于如何从增量表中删除列的内容.我在这里缺少什么?是否有从增量表中删除列的标准方法?

I've also found docs on DROP database, DROP function and DROP table but absolutely nothing on how to delete a column from a delta table. What am I missing here? Is there a standard way to drop a column from a delta table?

推荐答案

Databricks 表没有删除列选项:https://docs.databricks.com/spark/latest/spark-sql/language-manual/alter-table-or-view.html#delta-schema-constructs

There is no drop column option on Databricks tables: https://docs.databricks.com/spark/latest/spark-sql/language-manual/alter-table-or-view.html#delta-schema-constructs

请记住,与关系数据库不同的是,您的存储中有物理镶木地板文件,您的表"只是已应用于它们的架构.

Remember that unlike a relational database there are physical parquet files in your storage, your "table" is just a schema that has been applied to them.

在关系世界中,您可以更新表元数据以轻松删除列,而在大数据世界中,您必须重新编写底层文件.

In the relational world you can update the table metadata to remove a column easily, in a big data world you have to re-write the underlying files.

从技术上讲,镶木地板可以处理模式演变(请参阅镶木地板格式的模式演变).但是 Delta 的 Databricks 实现没有.这可能太复杂了,不值得.

Technically parquet can handle schema evolution (see Schema evolution in parquet format). But the Databricks implementation of Delta does not. It probably just too complicated to be worth it.

因此在这种情况下的解决方案是创建一个新表并插入您希望从旧表中保留的列.

Therefore the solution in this case is to create a new table and insert the columns you want to keep from the old table.

这篇关于如何从 Databricks Delta 表中删除一列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆