如何基于雪花中的任何文本文件自动创建表? [英] how to create table automatically based on any text file in snowflake?

查看:60
本文介绍了如何基于雪花中的任何文本文件自动创建表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有任何工具或方法可以基于任何文本文件自动创建表?

Is there any tool or any ways that creates tables automatically based on any text files?

我有100多个csv文件,每个文件都有不同的列数.如果先在雪花中手动创建表定义然后加载数据,那将是很多工作.我正在寻找一种无需创建表即可加载数据的特定方法.

I have 100+ csv files and every files has different numbers of columns. It would be so much work if create table definition first in snowflake manually and then load the data. I am looking for a specific way to loading data without creating a table.

请让我知道是否有人知道如何解决此问题.谢谢!

Please let me know if anyone know how to tackle this. Thanks!

推荐答案

数据处理框架,例如

Data processing frameworks such as Spark and Pandas have readers that can parse CSV header lines and form schemas with inferred data types (not just strings). You can leverage this to create new tables.

以下示例作为示例:

  • 使用熊猫'的Snowflake Connector for Python的SQL写功能(通过 SQL炼金术)
  • 假设每个文件需要一个新表
  • 假设输入文件路径的文件名部分是表名
  • 假设CSV是标准格式的,并且具有列名标题
  • 使用相同的数据库和架构名称创建所有表
  • Uses Pandas's SQL write capability with the Snowflake Connector for Python (via SQL Alchemy)
  • Assumes you need one new table per file
  • Assumes the filename portion of the input file path is the table name
  • Assumes the CSVs are of standard formatting, and have column name headers
  • Creates all tables under the same database and schema name
import sqlalchemy as sql
import pandas as pd
import os

# Setup an SQL Alchemy Engine object
# This will provide a connection pool for Pandas to use later
engine = sql.create_engine(
    'snowflake://{u}:{p}@{a}/{d}/{s}?warehouse={w}&role={r}'.format(
        u='USERNAME',
        p='PASSWORD',
        a='account.region',
        r='ROLE_NAME',
        d='DATABASE',
        s='SCHEMA',
        w='WAREHOUSE_NAME',
    )
)

# List of (n) input CSV file paths
csv_input_filepaths = [
    '/tmp/test1.csv',
    '/tmp/test2.csv',
    '/tmp/test3.csv',
]

try:
    # Process each path
    for path in csv_input_filepaths:

        # Use filename component of path as tablename
        # '/tmp/test1.csv' creates table named 'test1', etc.
        filename, _ext = os.path.splitext(os.path.basename(path))

        # Default CSV reading options in Pandas sniff and infer headers
        # It will auto-populate schema and types based on data
        data = pd.read_csv(path)

        # Stores into Snowflake (will create the table name if it does not exist)
        # Default args will attempt to create an index, so we disable that
        data.to_sql(filename, engine, index = False)

finally:
    # Tear down all connections gracefully pre-exit
    engine.dispose()

这篇关于如何基于雪花中的任何文本文件自动创建表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆