在文件中查找唯一字符 [英] Find Unique Characters in a File
问题描述
我有一个包含450,000多个条目行的文件.每个条目的长度约为7个字符.我想知道的是此文件的唯一字符.
I have a file with 450,000+ rows of entries. Each entry is about 7 characters in length. What I want to know is the unique characters of this file.
例如,如果我的文件是以下文件;
For instance, if my file were the following;
Entry
-----
Yabba
Dabba
Doo
那么结果将是
唯一字符:{abdoy}
Unique characters: {abdoy}
注意,我不在乎大小写,也不需要订购结果.有件事告诉我,这对于Linux人员来说很容易解决.
Notice I don't care about case and don't need to order the results. Something tells me this is very easy for the Linux folks to solve.
我正在寻找一种非常快速的解决方案.我真的不想创建代码来遍历每个条目,遍历每个字符...等等.我正在寻找一个不错的脚本解决方案.
I'm looking for a very fast solution. I really don't want to have to create code to loop over each entry, loop through each character...and so on. I'm looking for a nice script solution.
通过快速,我的意思是快速实施...不一定快速运行.
By Fast, I mean fast to implement...not necessarily fast to run.
推荐答案
下面是一个 PowerShell 示例:
gc file.txt | select -Skip 2 | % { $_.ToCharArray() } | sort -CaseSensitive -Unique
产生:
D
Y
一个
b
o
D
Y
a
b
o
我喜欢它很容易阅读.
编辑:这是一个更快的版本:
EDIT: Here's a faster version:
$letters = @{} ; gc file.txt | select -Skip 2 | % { $_.ToCharArray() } | % { $letters[$_] = $true } ; $letters.Keys
这篇关于在文件中查找唯一字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!