从文本中提取文件的特定领域 [英] Extract specific fields from text file
问题描述
我用头名超过5K场/列的csv文件。我想只导入某些特定的领域我的数据库。
I have a csv file with over 5k fields/columns with header names. I would like to import only some specific fields to my database.
我使用本地INFILE为此需要进口其他更小的文件
I am using local infile for other smaller files which need to be imported
LOAD DATA
LOCAL INFILE 'C:/wamp/www/imports/new_export.csv'
INTO TABLE table1
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
(colour,shape,size);
分配虚拟变量列的跳过可能会很麻烦,另外我想preFER的情况下,使用的字段标题,以面向未来的引用文件中有额外的字段
Assigning dummy variables for columns to skip might be cumbersome, Also I would prefer to reference using the fields headers to future proof in case the file has additional fields
我使用的文件awk中的文件加载到数据库之前考虑。但我似乎已经在搜索中找到的例子不工作。
I am considering using awk on the file before loading the file to the database. But the examples I have found in search don't seem to work.
本上最好的办法任何建议将是AP preciated。
Any suggestions on best approach for this would be appreciated.
推荐答案
这是类似于MVG的答案,但它不要求 GAWK
4,从而使用 -F
在这个问题的答案建议。这也显示了上市所需的字段和遍历列表的技术。这可能使code更易于维护,如果有一个大的列表
This is similar to MvG's answer, but it doesn't require gawk
4 and thus uses -F
as suggested in that answer. It also shows a technique for listing the desired fields and iterating over the list. This may make the code easier to maintain if there is a large list.
#!/usr/bin/awk -f
BEGIN {
col_list = "colour shape size" # continuing with as many as desired for output
num_cols = split(col_list, cols)
FS = OFS = ","
}
NR==1 {
for (i = 1; i <= NF; i++) {
p[$i] = i # remember column for name
}
# next # enable this line to suppress headers.
}
{
delim = ""
for (i = 1; i <= num_cols; i++) {
printf "%s%s", delim, $p[cols[i]]
delim = OFS
}
printf "\n"
}
这篇关于从文本中提取文件的特定领域的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!