使用Java检测txt文件上的重复元组[fi,(j-1),fi,j,fi,j + 1] [英] Detect repeated tuples [fi,(j-1), fi,j ,fi,j+1] on txt file using java

查看:113
本文介绍了使用Java检测txt文件上的重复元组[fi,(j-1),fi,j,fi,j + 1]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一个小的代码段,它将在文件的(a)行中找到并检测到该行,并提醒用户该行包含不可接受的条目

但找不到。

I'm looking for a small code snippet that will find and detect in (a) line(s) in file and alert user that the line(or lines) include(s) unacceptable entries
but could not find.

例如,我在以下文件中:

So for example I have in a file following:

myFile.txt:

myFile.txt:

Field1,Field2,Field3,Field4,Field5,Field6,Field7
a,b,a,d,e,f,g
h,i,h,i,h,ff,f27
f31,f32,f33,f34,f35,f36,f37
f41,f42,f43,f44,f45,f46,f47
f51,f52,f53,f54,f55,f56,f57
f61,f62,a,b,a,f66,f67
f71,f72,f73,f74,f75,f76,f77
f81,f82,f83,f84,f85,f86,f87
f91,f92,f93,f94,f95,f96,f97
f101,f102,f103,f104,f105,f106,f107
f111,f112,f113,f114,f115,f116,f117
f121,f122,f123,f124,f125,f126,f127
f131,f132,f133,f134,f135,f136,f137
f141,f142,f143,f144,f145,f146,f147
f151,f152,f153,f154,f155,f156,f157
f161,a,b,a,f165,f166,f167
i,h,ff,f174,f175,f176,f177
f181,f182,f183,f184,f185,f186,f187
f191,f192,f193,f194,f195,f196,f197
f201,f202,f203,f204,f205,f206,f207
f211,f212,f213,f214,f215,f216,f217
f221,f222,f223,f224,f225,f226,f227
f231,f232,f233,f234,f235,f236,f237
f241,f242,f243,f244,f245,f246,f247
f251,f252,f253,f254,f255,f256,f257
f261,f262,f263,f264,f265,f266,f267
f271,f272,f273,f274,f275,f276,f277
f281,f282,f283,i,h,ff,f287
fn1,fn2,fn3,fn4,fn5,fn6,fn7
f301,f302,f303,f304,f305,f306,f307

TXT文件上的所有值都被当作字符串。

ALL VALUES ON TXT FILE ARE TREATED AS STRINGS.

一行(或几行)中的不可接受项是包含fi,j的行,其中a元组[fi,(j-1),fi,j,fi,j + 1]在txt文件中之前或之后已经存在。例如,对于目标字段X,请检测左侧XL上的字段和右侧XR上的字段是否与txt文件中的任何先前字段都不匹配,因此如果匹配,我们必须输出:行号X上的已归档X这是有问题的,因为在先前的行号

中已经定义了元组[XL,X,XR]并且我们diplay:
-all会引起冲突的行:这意味着,
+前一行(在txt文件
读取中将接受第一个出现的行)和
+有问题的行(在txt文件中读取
的前一行之后,因此将是

-接受的第一次出现元组的行号,但是接受的
-未被接受的元组的最终行号将被忽略
-该元组[XL,X,XR

unacceptable entrie in a line(or lines) are the lines that include a fi,j where a tuple [fi,(j-1), fi,j ,fi,j+1] existed already before or after in the txt file. i.e for a targeted field X detect if the field on the left XL and the field on the right XR don't match on any previous field in the txt file and hence if It matches we have to output: the filed X on the line Number is problematic because is the Tuple [XL,X,XR] is already defined on the previous Line number
and we diplay : - all The lines that will cause a conflict: That means, + The previous Line (that first occurence will be accepted on txt file reading) and + The problematic Lines(that follow The previous Line on txt file reading and hence would be ignored)
- The row number for accepted first occurence Tuple but accepted - The eventually row numbers for Not accepted Tuples that would be ignored - The Tuples [XL,X,XR] that cause the problem.

示例:

Field1;Field2;Field3;Field4;Field5;Field6;Field7<--------Headers
a;b;a;d;e;f;g
h;i;h;i;h;ff;f27
f31;f32;f33;f34;f35;f36;f37
f41;f42;f43;f44;f45;f46;f47
f51;f52;f53;f54;f55;f56;f57
f61;f62;a;b;a;f66;f67
............................
f161;a;b;a;f165;f166;f167
i;h;ff;f174;f175;f176;f177
...........................
f281;f282;f283;i;h;ff;f287
fn1;fn2;fn3;fn4;fn5;fn6;fn7

它将显示:

[a;b;a], accepetd on line 1 but rejected on lines: 6,16
Line accepted is : a;b;a;d;e;f;g
Line(s) rejected are: f61;f62;a;b;a;f66;f67
                      f161;a;b;a;f165;f166;f167

[h;i;h], Not accepted at all. rejected on lines: 2 
Line accepted is: empty
Lines rejected :  h;i;h;i;h;ff;f27

[i;h;ff],Not accepted at all. rejected on lines: 2,17,28
Line accepted is: empty
Lines rejected :
             h;i;h;i;h;ff;f27
             i;h;ff;f174;f175;f176;f177
             f281;f282;f283;i;h;ff;f287

注意:如果接受的行列表为空,即当问题出现在同一行时,将完全不显示。

N.B: Not accepted at all will be displayed if the list of accepted Line is empty i.e when the problem occurs at the same line.

任何建议,帮助是欢迎。

Any advice,help is welcome.

我给出了答案。

非常感谢。

推荐答案

这是对象的重点。您应该创建一个反映您正在使用的东西的对象模型。

This is sort of the point of objects. You should create an object model that reflects the things you are working with.

因此,首先您要创建一个类,像这样

So first You would create a class, something like this

public class SeptTuple {
  public final String field1, field2, ..., field7

  public SeptTuple(String f1, String f2, ..., String f7) {
    field1 = f1;
    ...
    field7 = f7;
  }

  @Override
  public boolean equals(Object o) {
    if(!(o instanceof SeptTuple))
      return false;

    SeptTuple s = (SeptTuple)o;
    return Objects.equals(field1, s.field1) && Objects.equals(field2, s.field2) && ... && Objects.equals(field7, s.field7)
  }

  @Override
  public int hashcode() {
    // If 2 objects are equal, they must return the same hashcode
    return Objects.hash(field1, field2, ..., field7);
  }
}

然后,一旦做到这一点,就可以找到骗子

And then once you make that, finding dupes is as easy as

Map<SeptTuple, SeptTuple> map = new HashMap<>();
....
// If already set, map will return the old value on put
SeptTuple temp = map.put(newSetTuple, newSetTuple);
if(temp != null) {
   // handle clash
}

如果需要在每一行的子集中找到相等的部分,则可以将此解决方案分解为尽可能多的对象,以准确表示元组的每个元素。 (您需要制作3个类来表示元组的每个部分。)

If you need to find equal parts in subsets of each row, than break this solution down into as many objects as you need to accurately represent each element of the tuple. (You will need to make 3 classes to represent each part of your tuple.)

这篇关于使用Java检测txt文件上的重复元组[fi,(j-1),fi,j,fi,j + 1]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆