使用Java检测txt文件上的重复元组[fi,(j-1),fi,j,fi,j + 1] [英] Detect repeated tuples [fi,(j-1), fi,j ,fi,j+1] on txt file using java
问题描述
我正在寻找一个小的代码段,它将在文件的(a)行中找到并检测到该行,并提醒用户该行包含不可接受的条目
但找不到。
I'm looking for a small code snippet that will find and detect in (a) line(s) in file and alert user that the line(or lines) include(s) unacceptable entries
but could not find.
例如,我在以下文件中:
So for example I have in a file following:
myFile.txt:
myFile.txt:
Field1,Field2,Field3,Field4,Field5,Field6,Field7
a,b,a,d,e,f,g
h,i,h,i,h,ff,f27
f31,f32,f33,f34,f35,f36,f37
f41,f42,f43,f44,f45,f46,f47
f51,f52,f53,f54,f55,f56,f57
f61,f62,a,b,a,f66,f67
f71,f72,f73,f74,f75,f76,f77
f81,f82,f83,f84,f85,f86,f87
f91,f92,f93,f94,f95,f96,f97
f101,f102,f103,f104,f105,f106,f107
f111,f112,f113,f114,f115,f116,f117
f121,f122,f123,f124,f125,f126,f127
f131,f132,f133,f134,f135,f136,f137
f141,f142,f143,f144,f145,f146,f147
f151,f152,f153,f154,f155,f156,f157
f161,a,b,a,f165,f166,f167
i,h,ff,f174,f175,f176,f177
f181,f182,f183,f184,f185,f186,f187
f191,f192,f193,f194,f195,f196,f197
f201,f202,f203,f204,f205,f206,f207
f211,f212,f213,f214,f215,f216,f217
f221,f222,f223,f224,f225,f226,f227
f231,f232,f233,f234,f235,f236,f237
f241,f242,f243,f244,f245,f246,f247
f251,f252,f253,f254,f255,f256,f257
f261,f262,f263,f264,f265,f266,f267
f271,f272,f273,f274,f275,f276,f277
f281,f282,f283,i,h,ff,f287
fn1,fn2,fn3,fn4,fn5,fn6,fn7
f301,f302,f303,f304,f305,f306,f307
TXT文件上的所有值都被当作字符串。
ALL VALUES ON TXT FILE ARE TREATED AS STRINGS.
一行(或几行)中的不可接受项是包含fi,j的行,其中a元组[fi,(j-1),fi,j,fi,j + 1]在txt文件中之前或之后已经存在。例如,对于目标字段X,请检测左侧XL上的字段和右侧XR上的字段是否与txt文件中的任何先前字段都不匹配,因此如果匹配,我们必须输出:行号X上的已归档X这是有问题的,因为在先前的行号
中已经定义了元组[XL,X,XR]并且我们diplay:
-all会引起冲突的行:这意味着,
+前一行(在txt文件
读取中将接受第一个出现的行)和
+有问题的行(在txt文件中读取
的前一行之后,因此将是
-接受的第一次出现元组的行号,但是接受的
-未被接受的元组的最终行号将被忽略
-该元组[XL,X,XR
unacceptable entrie in a line(or lines) are the lines that include a fi,j where a tuple [fi,(j-1), fi,j ,fi,j+1] existed already before or after in the txt file. i.e for a targeted field X detect if the field on the left XL and the field on the right XR don't match on any previous field in the txt file and hence if It matches we have to output: the filed X on the line Number is problematic because is the Tuple [XL,X,XR] is already defined on the previous Line number
and we diplay :
- all The lines that will cause a conflict: That means,
+ The previous Line (that first occurence will be accepted on txt file
reading) and
+ The problematic Lines(that follow The previous Line on txt file reading
and hence would be ignored)
- The row number for accepted first occurence Tuple but accepted
- The eventually row numbers for Not accepted Tuples that would be ignored
- The Tuples [XL,X,XR] that cause the problem.
示例:
Field1;Field2;Field3;Field4;Field5;Field6;Field7<--------Headers
a;b;a;d;e;f;g
h;i;h;i;h;ff;f27
f31;f32;f33;f34;f35;f36;f37
f41;f42;f43;f44;f45;f46;f47
f51;f52;f53;f54;f55;f56;f57
f61;f62;a;b;a;f66;f67
............................
f161;a;b;a;f165;f166;f167
i;h;ff;f174;f175;f176;f177
...........................
f281;f282;f283;i;h;ff;f287
fn1;fn2;fn3;fn4;fn5;fn6;fn7
它将显示:
[a;b;a], accepetd on line 1 but rejected on lines: 6,16
Line accepted is : a;b;a;d;e;f;g
Line(s) rejected are: f61;f62;a;b;a;f66;f67
f161;a;b;a;f165;f166;f167
[h;i;h], Not accepted at all. rejected on lines: 2
Line accepted is: empty
Lines rejected : h;i;h;i;h;ff;f27
[i;h;ff],Not accepted at all. rejected on lines: 2,17,28
Line accepted is: empty
Lines rejected :
h;i;h;i;h;ff;f27
i;h;ff;f174;f175;f176;f177
f281;f282;f283;i;h;ff;f287
注意:如果接受的行列表为空,即当问题出现在同一行时,将完全不显示。
N.B: Not accepted at all will be displayed if the list of accepted Line is empty i.e when the problem occurs at the same line.
任何建议,帮助是欢迎。
Any advice,help is welcome.
我给出了答案。
非常感谢。
推荐答案
这是对象的重点。您应该创建一个反映您正在使用的东西的对象模型。
This is sort of the point of objects. You should create an object model that reflects the things you are working with.
因此,首先您要创建一个类,像这样
So first You would create a class, something like this
public class SeptTuple {
public final String field1, field2, ..., field7
public SeptTuple(String f1, String f2, ..., String f7) {
field1 = f1;
...
field7 = f7;
}
@Override
public boolean equals(Object o) {
if(!(o instanceof SeptTuple))
return false;
SeptTuple s = (SeptTuple)o;
return Objects.equals(field1, s.field1) && Objects.equals(field2, s.field2) && ... && Objects.equals(field7, s.field7)
}
@Override
public int hashcode() {
// If 2 objects are equal, they must return the same hashcode
return Objects.hash(field1, field2, ..., field7);
}
}
然后,一旦做到这一点,就可以找到骗子
And then once you make that, finding dupes is as easy as
Map<SeptTuple, SeptTuple> map = new HashMap<>();
....
// If already set, map will return the old value on put
SeptTuple temp = map.put(newSetTuple, newSetTuple);
if(temp != null) {
// handle clash
}
如果需要在每一行的子集中找到相等的部分,则可以将此解决方案分解为尽可能多的对象,以准确表示元组的每个元素。 (您需要制作3个类来表示元组的每个部分。)
If you need to find equal parts in subsets of each row, than break this solution down into as many objects as you need to accurately represent each element of the tuple. (You will need to make 3 classes to represent each part of your tuple.)
这篇关于使用Java检测txt文件上的重复元组[fi,(j-1),fi,j,fi,j + 1]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!