在数据库中存储可变行/列CSV文件 [英] Storing Variable Row/Column CSV Files In Database

查看:120
本文介绍了在数据库中存储可变行/列CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个由应用程序创建的CSV数据的大量集合,我希望将其存储在数据库(最好是SQL Server)中.该数据可以具有任意数量的列和任意数量的行,并且将每个数据存储为单独的表没有多大意义.能够搜索此数据也将是很棒的.将这些数据放入数据库的最佳方法是什么.

I have a large collection of CSV data that is created by an application which I would like to store in a database preferably SQL Server. This data can have any number of columns and any number of rows and storing each one as a separate table doesn't make much sense. It would also be great to be able tosearch on this data. What is the best way of putting this data into a database.

例如(并且我在这里大大简化了事情),请考虑仅3个看起来像这样的CSV文件:

For example (and I am simplifying things greatly here), consider just 3 CSV files that might look like:

File 1:
aaa,bbb,ccc
ddd,eee,fff
ggg,hhh,iii

File 2:
jjj,kkk
lll,mmm

File 3:
nnn,ooo,ppp,qqq,rrr
sss,ttt,uuu,vvv,www
xxx,yyy,zzz,111,222
333,444,555,666,777

我可能过度简化了此操作,但由于严格的NDA,我无法发布实际数据.

I might be over simplifying this but I can't post actual data due to strict NDA.

如何最好将其存储在数据库中?从理论上讲,将有成千上万个文件,每个文件可以具有不同的列宽和不同的行数.

How would it be best to store this in a database? There will be thousands of files each of which could in theory by in different column widths and different number of rows.

可以使用数据集市来实现这一目标吗?有指针吗?

Could a Data Mart be used to achieve this and if so how? Any pointers?

推荐答案

为每个文件在"csv文件"表中创建一条记录.

For each file create a record in a 'csv file' table.

对于每个列名,在"csv文件标题名称"表中创建一条记录,并带有相应的列索引.

For each column-name create a record in a 'csv file header name' table with a corresponding column index.

为每个csv行创建一个键值哈希图,其中键"是列索引",值"是行"数据.将此哈希图序列化为XML字符串,然后将此XML存储在"csv文件数据"表XML列中.

For each csv row create a key value hashmap where 'key' is the 'column index' and 'value' is the 'row' data. Serialise this hashmap to an XML string and then store this XML in a 'csv file data' table XML column.

然后,您可以使用XPath来选择XML行数据,并加入列索引"列以检索原始文件列标题.

You can then use XPath to SELECT the XML row data JOINing the 'column index' columns to retrieve the original file column headers.

表格

CSVFile
PK  FilePath
...
7   [\\server1\somedir\foo.csv]
9   [\\server1\dir\bar.csv]
...

CSVFileColumnHeader
PK  FileId  ColumnIndex ColumnName
...
980 7       5           [foo quant]
981 7       6           [foo size]
982 9       3           [bar depth]
..

CSVFileRowData
PK      FileId  RowIndex    RowDataAsXML
..  
1054    7       35          <ArrayOfSerialisableKeyValuePair>...<SerialisableKeyValuePair><Key>5</Key><Value>17</Value></SerialisableKeyValuePair><SerialisableKeyValuePair><Key>6</Key><Value>8cm</Value></SerialisableKeyValuePair>...</ArrayOfSerialisableKeyValuePair>
1055    7       36          <ArrayOfSerialisableKeyValuePair>...<SerialisableKeyValuePair><Key>5</Key><Value>8</Value></SerialisableKeyValuePair><SerialisableKeyValuePair><Key>6</Key><Value>35cm</Value></SerialisableKeyValuePair>...</ArrayOfSerialisableKeyValuePair>
1056    9       4           <ArrayOfSerialisableKeyValuePair>...<SerialisableKeyValuePair><Key>3</Key><Value>4 metres</Value></SerialisableKeyValuePair>...</ArrayOfSerialisableKeyValuePair>
...

然后是这样的XPath查询:

And then an XPath query like this:

SELECT  
    CFR.FileId                                      'FileId'
    ,tab.col.value('./Key[1]', 'INT')               'ColumnIndex'
    ,CFR.RowIndex                                   'RowIndex'
    ,tab.col.value('./Value[1]', 'VARCHAR(250)')    'RowValue'
    ,CFC.ColumnName                                 'ColumnName'
FROM 
            CSVFileRowData  CFR
CROSS APPLY RowDataAsXML.nodes('//SerialisableKeyValuePair')tab(col)
INNER JOIN  CSVFileColumnHeader CFC ON tab.col.value('./Key[1]', 'INT') = CFC.ColumnIndex

将以以下格式返回数据:

Would return data in this format:

FileId  ColumnIndex RowIndex    RowValue        ColumnName
...
7       5           35          [17]            [foo quant]
7       6           35          [8cm]           [foo size]
..

这篇关于在数据库中存储可变行/列CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆