Excel比较两个csv文件并显示区别 [英] Excel comparing two csv files and showing the difference
问题描述
我想比较两个大的csv文件和/或csv文件和一个.txt文件。我认为的.txt文件可能需要转换为csv文件只是为了简单,但可能需要或可能不需要。我想要使用excel,c ++或python。我需要比较一个接受值列表到一个被测量的列表,并找到它们之间的差异,如果有一个。 Excel可能是最简单的方法做到这一点,但python或c ++可能工作,以及。这不是家庭作业,所以不要担心这种事情。非常感谢代码建议和/或模板。或链接到网站
编辑1
我阅读过有关Python的difflib
编辑2
$ b $ b
这两个文件都有一系列的列(不是在它们之间绘制的线或任何东西),在那些命名列下面会有数字。我需要比较第1列中的数字,第1列第1列第2列,如果有差异显示差异在另一个csv文件
您可以使用ADO(ODBC / JET / OLEDB文本驱动程序)将正常.txt / .csv / .tab / .flr文件视为SQL数据库中的表,启用语言。然后可以使用SQL的权力(DISTINCT,GROUP,(LEFT)JOINS,...)进行比较。
添加了您的评论: p>
这是你的问题,我不想推你不想去的地方。但SQL是一个很好的(最好的?)工具,如果你需要比较表格数据。作为证明一个脚本的输出在两个.txt文件中的差异的证据:
======= txt文件播放
------- file1.txt
AC;AM
40000; -19083,00
40100; 20000,00
40200; 350004,00
40300; 3498,99
------- file2.txt
AC;AM
40000; -19083,00
40300; 3498,99
40105; -234567,00
40200; 350,00
=======有些诊断SQL
-------< NULL>表示:在F1中但不在F2中(LEFT JOIN)
SELECT T1.AC,T1.AM,T2.AM FROM [file1.txt] AS T1 LEFT JOIN [file2.txt] AS T2 ON(T1.AC =
T2.AC)
-------结果
AC File1 File2
40000 -19083 -19083
40100 20000< NULL>
40200 350004 350
40300 3498,99 3498,99
-------< NULL>表示:不在其他文件中(LEFT JOIN,UNION)
SELECT T1.AC,T1.AM,T2.AM FROM [file1.txt] AS T1 LEFT JOIN [file2.txt] AS T2 ON(T1。 AS = $ t $ AC
T2.AC)UNION SELECT T2.AC,T1.AM,T2.AM FROM [file2.txt] AS T2 LEFT JOIN [file1.txt] AS T1
ON(T1.AC = T2.AC)
-------结果
AC File1 File2
40000 -19083 -19083
40100 20000< NULL>
40105< NULL> -234567
40200 350004 350
40300 3498,99 3498,99
-------问题:缺少,不同的值
SELECT T1.AC ,T1.AM,T2.AM FROM [file1.txt] AS T1 LEFT JOIN [file2.txt] AS T2 ON(T1.AC =
T2.AC)WHERE T2.AM IS NULL OR T1.AM< ;> T2.AM UNION SELECT T2.AC,T1.AM,T2.AM FROM [file2。
txt] AS T2 LEFT JOIN [file1.txt] AS T1 ON(T1.AC = T2.AC)WHERE T1.AM IS NULL OR T1.AM<>
T2.AM
-------结果
AC File1 File2
40100 20000< NULL>
40105< NULL> -234567
40200 350004 350
其他补充:
这篇文章涉及ADO和文本文件;在您的计算机上查找文件adoNNN.chm
(NNN =版本号,例如210)这是关于
ADO的好书。
您可以使用Access或OpenOffice Base来实验应用于链接/引用(未导入!)文本数据库的SQL语句
。
在掌握了最初的障碍之后,脚本/程序将很容易:将
连接到数据库,即连接到包含文件的文件夹,并且使用schema.ini
文件来定义文件的结构=表。
上面的输出是由以下输出生成的:
Const adClipString = 2
Dim oFS:Set oFS = CreateObject(Scripting.FileSystemObject)
Dim sDir:sDir = oFS.GetAbsolutePathName(.\txt)
Dim oDB:设置oDb = CreateObject(ADODB.Connection)
oDB.OpenProvider = Microsoft.Jet.OLEDB.4.0; Data Source =& sDir& ; Extended Properties =text
Dim sSQL
Dim sFiNa
WScript.Echo=======,要播放的.txt文件
对于每个sFiNa在数组(file1.txt,file2.txt)
WScript.Echo-------,sFiNa
WScript.Echo oFS.OpenTextFile txt \& sFiNa).ReadAll()
Next
WScript.Echo=======,一些诊断SQL
Dim aSQL
对于每个aSQL In数组(_
Array(< NULL>表示:在F1中但不在F2(LEFT JOIN)_
, SELECT T1.AC,T1.AM,T2.AM FROM_
,[file1.txt] AS T1_
,LEFT JOIN [file2.txt] AS T2 ON AC = T2.AC)_
),))_
,Array(表示:不在其他文件(LEFT JOIN,UNION)_
,join(Array(_
SELECT T1.AC,T1.AM,T2.AM from_
,[file1.txt] AS T1_
,LEFT JOIN [ file2.txt] AS T2 ON(T1.AC = T2.AC)_
,UNION_
,SELECT T2.AC,T1.AM,T2.AM FROM_
,[file2.txt] AS T2_
,LEFT JOIN [file1.txt] AS T1 ON(T1.AC = T2.AC)_
),) _
,Array(问题:缺少,不同的值_
,Join(Array(_
SELECT T1.AC,T1.AM,T2.AM FROM_
,[file1.txt] AS T1_
,LEFT JOIN [file2.txt] AS T2 ON(T1.AC = T2.AC)_
,WHERE T2.AM IS NULL或T1.AM<> T2.AM_
,UNION_
,SELECT T2.AC,T1.AM,T2.AM FROM_
,[file2.txt] AS T2_
,LEFT JOIN [file1.txt] AS T1 ON(T1.AC = T2.AC)_
,WHERE T1.AM IS NULL OR T1.AM< T2.AM_
),))_
)
sSQL = aSQL(1)
WScript.Echo-------,aSQL 0)
WScript.Echo sSQL
Dim oRS:设置oRS = oDB.Execute(sSQL)
WScript.Echo-------结果
WScript.Echo Join(Array(AC,File1,File2),vbTab)
WScript.Echo oRS.GetString(adClipString,,vbTab,vbCrLf,< NULL>)
Next
oDB.Close
如果删除/忽略fat(创建SQL语句,诊断输出) ,它将
下降到6行
Dim oDB:Set oDb = CreateObject(ADODB.Connection)
oDB.OpenProvider = Microsoft.Jet.OLEDB.4.0; Data Source =& sDir&;扩展属性=文本
sSQL =...
Dim oRS:Set oRS = oDB.Execute(sSQL)
WScript.Echo oRS.GetString(adClipString,,vbTab,vbCrLf,< NULL>)
oDB.Close
可以轻松地移植到每个启用COM的语言,因为ADO
对象重型起重。当
要保存一个结果集时,.GetString方法很方便:只需旋转分隔符/ delimiter / Null参数
并将其转储到文件
oFS.CreateTextFile(...).WriteLine oRS.GetString(_
adClipString,,,,vbCrLf,
)
(不要忘记向您的schema.ini添加该表的定义)。在
课程中,您还可以使用SELECT / INSERT INTO,但这样的语句可能不会很容易获得/传递ADO文本驱动程序的解析器。
添加wrt计算:
开始一个5 x 2主/批准文件,包含:
Num0 Num1 Num2 Num3 Num4
7,6 6,1 3,8 0,9 8,9
0,9 9,4 4,7 8,8 9,9
将其转换为expected.txt
Num0 Num1 Num2 Num3 Num4现货
7,6 6,1 3,8 0,9 8,9 1
0,9 9,4 4, 7 8,8 9,9 2
通过追加Spot列使其符合
[expected.txt]
ColNameHeader = True
CharacterSet = 1252
Format = Delimited b $ b Col1 = Num0 Float
Col2 = Num1 Float
Col3 = Num2 Float
Col4 = Num3 Float
Col5 = Num4 Float
Col6 = Spot Integer
。类似地,转换一个测量文件,如:
Num0 Num1 Num2 Num3 Num4
7,1,1 3,8 0,9 8,9
0,9 9,4 4,7 8,8 9,9
to measured.txt
Num0 Num1 Num2 Num3 Num4 Spot
7,1 1,1 3, 8 0,9 8,9 1
0,9 9,4 4,7 8,8 9,9 2
应用
sSQL = Join(Array(_
SELECT E.Num0 - M .Num0 AS Dif0_
,E.Num1 - M.Num1 AS Dif1_
,E.Num2 - M.Num2 AS Dif2_
,,E .Num3 - M.Num3 AS Dif3_
,E.Num4 - M.Num4 AS Dif4_
,E.Spot AS现货_
expected.txt] AS E_
,INNER JOIN [measured.txt] AS M_
,ON E.Spot = M.Spot_
),
将结果集写入differences.txt
aFNames = Array(Num0,...Spot)
oFS.CreateTextFile(sFSpec).Write _
Join(aFNames,sFSep)& sRSep& oRS.GetString(adClipString,,sFSep,sRSep,)
即可获得:
Num0 Num1 Num2 Num3 Num4 Spot
0,5 5 0 0 0 1
0 0 0 0 0 2
I'm looking to compare two big sets of csv files and/or a csv file and a .txt file. I "think" the .txt file may need to be converted to a csv file just for simplicity sake but that may or may not be needed. I either want to use excel, c++, or python. I need to compare one "accepted" value list to a list that is measured and find the difference between them if there is one. Excel may be the easiest way to do this but python or c++ may work just as well. This is not homework so don't worry about that sort of thing. Code advice and/or templates is greatly appreciated. or links to websites
EDIT 1
I've read about Python's difflib or differ class but unfamiliar how to use it and may be more than I want.
EDIT 2
The Files both will have a series of columns(not with lines drawn between them or anything) and below those "named" columns there will be numbers. I need to compare the number in column 1 spot one in file one to column 1 spot one of file 2 and if there is a difference show the difference in another csv file
You can use ADO (ODBC/JET/OLEDB Text Driver) to treat 'decent' .txt/.csv/.tab/.flr files as tables in a SQL Database from every COM-enabled language. Then the comparisons could be done using the power of SQL (DISTINCT, GROUP, (LEFT) JOINS, ...).
Added with regard to your comment:
It's your problem and I don't want to push you where you don't want to go. But SQL is a good (the best?) tool, if you need to compare tabular data. As evidence the output of a script that spots the differences in two .txt files:
======= The .txt files to play with
------- file1.txt
"AC";"AM"
40000;-19083,00
40100;20000,00
40200;350004,00
40300;3498,99
------- file2.txt
"AC";"AM"
40000;-19083,00
40300;3498,99
40105;-234567,00
40200;350,00
======= Some diagnostic SQL
------- <NULL> indicates: In F1 but not in F2 (LEFT JOIN)
SELECT T1.AC, T1.AM, T2.AM FROM [file1.txt] AS T1 LEFT JOIN [file2.txt] AS T2 ON (T1.AC =
T2.AC)
------- Result
AC File1 File2
40000 -19083 -19083
40100 20000 <NULL>
40200 350004 350
40300 3498,99 3498,99
------- <NULL> indicates: Not in the other file (LEFT JOIN, UNION)
SELECT T1.AC, T1.AM, T2.AM FROM [file1.txt] AS T1 LEFT JOIN [file2.txt] AS T2 ON (T1.AC =
T2.AC) UNION SELECT T2.AC, T1.AM, T2.AM FROM [file2.txt] AS T2 LEFT JOIN [file1.txt] AS T1
ON (T1.AC = T2.AC)
------- Result
AC File1 File2
40000 -19083 -19083
40100 20000 <NULL>
40105 <NULL> -234567
40200 350004 350
40300 3498,99 3498,99
------- the problems: missing, different values
SELECT T1.AC, T1.AM, T2.AM FROM [file1.txt] AS T1 LEFT JOIN [file2.txt] AS T2 ON (T1.AC =
T2.AC) WHERE T2.AM IS NULL OR T1.AM <> T2.AM UNION SELECT T2.AC, T1.AM, T2.AM FROM [file2.
txt] AS T2 LEFT JOIN [file1.txt] AS T1 ON (T1.AC = T2.AC) WHERE T1.AM IS NULL OR T1.AM <>
T2.AM
------- Result
AC File1 File2
40100 20000 <NULL>
40105 <NULL> -234567
40200 350004 350
Further additions:
This article deals with ADO and text files; look for a file adoNNN.chm (NNN=Version number, e.g. 210) on your computer; this is a good book about ADO.
You can use Access or OpenOffice Base to experiment with SQL statements applied to a linked/referenced (not imported!) text database.
A script/program will be easy after you mastered the initial hurdle: connecting to the the database, i.e. to a folder containing the files and a schema.ini file to define the structure of the files=tables.
The output above was generated by:
Const adClipString = 2
Dim oFS : Set oFS = CreateObject( "Scripting.FileSystemObject" )
Dim sDir : sDir = oFS.GetAbsolutePathName( ".\txt" )
Dim oDB : Set oDb = CreateObject( "ADODB.Connection" )
oDB.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & sDir & ";Extended Properties=""text"""
Dim sSQL
Dim sFiNa
WScript.Echo "=======", "The .txt files to play with"
For Each sFiNa In Array( "file1.txt", "file2.txt" )
WScript.Echo "-------", sFiNa
WScript.Echo oFS.OpenTextFile( "txt\" & sFiNa ).ReadAll()
Next
WScript.Echo "=======", "Some diagnostic SQL"
Dim aSQL
For Each aSQL In Array( _
Array( "<NULL> indicates: In F1 but not in F2 (LEFT JOIN)" _
, Join( Array( _
"SELECT T1.AC, T1.AM, T2.AM FROM" _
, "[file1.txt] AS T1" _
, "LEFT JOIN [file2.txt] AS T2 ON (T1.AC = T2.AC)" _
), " " ) ) _
, Array( "<NULL> indicates: Not in the other file (LEFT JOIN, UNION)" _
, Join( Array( _
"SELECT T1.AC, T1.AM, T2.AM FROM" _
, "[file1.txt] AS T1" _
, "LEFT JOIN [file2.txt] AS T2 ON (T1.AC = T2.AC)" _
, "UNION" _
, "SELECT T2.AC, T1.AM, T2.AM FROM" _
, "[file2.txt] AS T2" _
, "LEFT JOIN [file1.txt] AS T1 ON (T1.AC = T2.AC)" _
), " " ) ) _
, Array( "the problems: missing, different value" _
, Join( Array( _
"SELECT T1.AC, T1.AM, T2.AM FROM" _
, "[file1.txt] AS T1" _
, "LEFT JOIN [file2.txt] AS T2 ON (T1.AC = T2.AC)" _
, "WHERE T2.AM IS NULL OR T1.AM <> T2.AM" _
, "UNION" _
, "SELECT T2.AC, T1.AM, T2.AM FROM" _
, "[file2.txt] AS T2" _
, "LEFT JOIN [file1.txt] AS T1 ON (T1.AC = T2.AC)" _
, "WHERE T1.AM IS NULL OR T1.AM <> T2.AM" _
), " " ) ) _
)
sSQL = aSQL( 1 )
WScript.Echo "-------", aSQL( 0 )
WScript.Echo sSQL
Dim oRS : Set oRS = oDB.Execute( sSQL )
WScript.Echo "------- Result"
WScript.Echo Join( Array( "AC", "File1", "File2" ), vbTab )
WScript.Echo oRS.GetString( adClipString, , vbTab, vbCrLf, "<NULL>" )
Next
oDB.Close
If you delete/ignore the fat (create SQL statements, diagnostics output), it boils down to 6 lines
Dim oDB : Set oDb = CreateObject( "ADODB.Connection" )
oDB.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & sDir & ";Extended Properties=""text"""
sSQL = "..."
Dim oRS : Set oRS = oDB.Execute( sSQL )
WScript.Echo oRS.GetString( adClipString, , vbTab, vbCrLf, "<NULL>" )
oDB.Close
which can be 'ported' easily to every COM-enabled language, because the ADO objects do all the heavy lifting. The .GetString method comes handy, when you want to save a resultset: just twiddle the separator/delimiter/Null arguments and dump it to file
oFS.CreateTextFile( ... ).WriteLine oRS.GetString( _
adClipString, , ",", vbCrLf, ""
)
(don't forget to add a definition for that table to your schema.ini). Of course you also can use a "SELECT/INSERT INTO", but such statements may not be easy to get right/passed the ADO Text Driver's parser.
Addition wrt Computations:
Start with a 5 x 2 master/approved file containing:
Num0 Num1 Num2 Num3 Num4
7,6 6,1 3,8 0,9 8,9
0,9 9,4 4,7 8,8 9,9
transform it to expected.txt
Num0 Num1 Num2 Num3 Num4 Spot
7,6 6,1 3,8 0,9 8,9 1
0,9 9,4 4,7 8,8 9,9 2
by appending the Spot column so it conforms to
[expected.txt]
ColNameHeader=True
CharacterSet=1252
Format=Delimited(;)
Col1=Num0 Float
Col2=Num1 Float
Col3=Num2 Float
Col4=Num3 Float
Col5=Num4 Float
Col6=Spot Integer
in your schema.ini file. Similarly, transform a measure file like:
Num0 Num1 Num2 Num3 Num4
7,1 1,1 3,8 0,9 8,9
0,9 9,4 4,7 8,8 9,9
to measured.txt
Num0 Num1 Num2 Num3 Num4 Spot
7,1 1,1 3,8 0,9 8,9 1
0,9 9,4 4,7 8,8 9,9 2
Apply
sSQL = Join( Array( _
"SELECT E.Num0 - M.Num0 AS Dif0" _
, ", E.Num1 - M.Num1 AS Dif1" _
, ", E.Num2 - M.Num2 AS Dif2" _
, ", E.Num3 - M.Num3 AS Dif3" _
, ", E.Num4 - M.Num4 AS Dif4" _
, ", E.Spot AS Spot" _
, "FROM [expected.txt] AS E" _
, "INNER JOIN [measured.txt] AS M" _
, "ON E.Spot = M.Spot" _
), " " )
Write the resultset to differences.txt
aFNames = Array( "Num0", ... "Spot" ) oFS.CreateTextFile( sFSpec ).Write _ Join( aFNames, sFSep ) & sRSep & oRS.GetString( adClipString, , sFSep, sRSep, "" )
and you get:
Num0 Num1 Num2 Num3 Num4 Spot
0,5 5 0 0 0 1
0 0 0 0 0 2
这篇关于Excel比较两个csv文件并显示区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!