从文本中提取文件的特定领域 [英] Extract specific fields from text file

查看:110
本文介绍了从文本中提取文件的特定领域的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用头名超过5K场/列的csv文件。我想只导入某些特定的领域我的数据库。

I have a csv file with over 5k fields/columns with header names. I would like to import only some specific fields to my database.

我使用本地INFILE为此需要进口其他更小的文件

I am using local infile for other smaller files which need to be imported

LOAD DATA
LOCAL INFILE 'C:/wamp/www/imports/new_export.csv'
INTO TABLE table1
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
(colour,shape,size);

分配虚拟变量列的跳过可能会很麻烦,另外我想preFER的情况下,使用的字段标题,以面向未来的引用文件中有额外的字段

Assigning dummy variables for columns to skip might be cumbersome, Also I would prefer to reference using the fields headers to future proof in case the file has additional fields

我使用的文件awk中的文件加载到数据库之前考虑。但我似乎已经在搜索中找到的例子不工作。

I am considering using awk on the file before loading the file to the database. But the examples I have found in search don't seem to work.

本上最好的办法任何建议将是AP preciated。

Any suggestions on best approach for this would be appreciated.

推荐答案

这是类似于MVG的答案,但它不要求 GAWK 4,从而使用 -F 在这个问题的答案建议。这也显示了上市所需的字段和遍历列表的技术。这可能使code更易于维护,如果有一个大的列表

This is similar to MvG's answer, but it doesn't require gawk 4 and thus uses -F as suggested in that answer. It also shows a technique for listing the desired fields and iterating over the list. This may make the code easier to maintain if there is a large list.

#!/usr/bin/awk -f
BEGIN {
    col_list = "colour shape size" # continuing with as many as desired for output
    num_cols = split(col_list, cols)
    FS = OFS = ","
}

NR==1 {
    for (i = 1; i <= NF; i++) {
        p[$i] = i # remember column for name
    }
    # next # enable this line to suppress headers.
}

{
    delim = ""
    for (i = 1; i <= num_cols; i++) {
        printf "%s%s", delim, $p[cols[i]]
        delim = OFS
    }
    printf "\n"
}

这篇关于从文本中提取文件的特定领域的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆