使用另一行中的值更新数据文件中的一行 [英] Updating a row in a data file with values from another row
问题描述
我有一些制表符分隔格式的数据,这些数据给出了来自用户代理(UA)的设备标识的结果.但是有几行错误地标识了设备,我需要将其更改为正确的设备.
I have some data in tab delimited form that gives the result of device identification from user-agents (UAs). but there are several rows where the devices are wrongly identified and I need to change them to the correct ones.
例如,在某些情况下,iphone或htc wildfire UA被标识为另一部电话.因此,在某些情况下,我需要通过在UA中搜索某些关键字来使用正确的设备更新设备信息.例如,
For instance there are cases when and iphone or htc wildfire UA is identified as another phone. So for there cases I need to update the device information with the correct device by searching for certain keywords in the UA. for example,
781 Mozilla/5.0 (Linux; U; Android 2.1-update1; fi-fi; HTC_Wildfire_A3333 Build/ERE27) AppleWebKit/530.17 (KHTML, like Gecko) Version/4.0 Mobile Safari/530.17 htc_wildfire_ver1_suba3333 HTC Wildfire Android
这是正确的,但类似的情况是错误的
this is correct but a similar case is wrong
775 Mozilla/5.0 (Linux; U; Android 2.1-update1; fi-fi; HTC Wildfire Build/ERE27) AppleWebKit/525.10+ (KHTML, like Gecko) Version/3.0.4 Mobile Safari/523.12.2 (AdMob-ANDROID-20100709) T-Mobile Pulse Android
所以,我必须做这样的事情.我知道,如果"UA"列中包含"HTC"和"Wildfire"一词,那就是该手机.因此,我想查找所有具有字符串HTC和Wildfire的UA,但是第3列和第4列(制造商和型号)不正确,然后使用我知道正确的第781行的正确设备信息更新它们.我会手动输入第781行正确的代码,如果未正确识别设备,则在所有这些情况下,我将从第3列开始的信息放在第781行.
So, I have to do something like this. I know that if the UA column contains the term HTC and Wildfire it is that phone. So, I want to look for all the UAs that have the strings HTC and Wildfire but the columns 3 and 4 (manufucturer and model) are wrong and then update them with the correct device information from row 781 which I know is correct. I would manually put in the code that row 781 is correct and if the device is not correctly identified I would put the info from column 3 onwards of row 781 for all these cases.
当然这是一种情况,有几种类似的情况,我将对每种情况重复相同的逻辑.除了这四列以外,还有其他未显示的列.
Of course this is one case and there are several cases like this and I would repeat the same logic for each of them. Also there are other columns besides these four that I've not shown.
我将如何在perl脚本中完成此操作(最好,但是bash解决方案也可以).
how would i accomplish this in a perl script (preferably, but a bash solution is also ok).
推荐答案
- 通过遍历输入文件来创建具有所有不同(UA,制造商,型号)三元组的文件(设备),并将三元组作为键存储在哈希中;将已排序的密钥写入设备
- 手动编辑设备(删除错误"行)
- 将设备加载到哈希中,使用UA作为键,(制造商,型号)作为值.循环输入文件,使用当前行的UA字段查找设备,并使用哈希中的适当值更改两个字段(如有必要).
my @Log = (
[ 'HTC', 'badModelHTC' ]
, [ 'ABC', 'badModelABC' ]
, [ 'HTC', 'goodModelHTC' ]
, [ 'ABC', 'badModelABC' ]
, [ 'ABC', 'goodModelABC' ]
, [ 'HTC', 'goodModelHTC' ]
, [ 'ABC', 'badModelABC' ]
);
my %Devs;
printf "----------- Log org\n";
for (@Log) {
printf "%s %s\n", @{$_};
my $key = join '-', @{$_};
$Devs{ $key } = $_->[ 1 ];
}
printf "----------- Devs org\n";
for (sort( keys( %Devs ) )) {
printf "%s => %s\n", $_, $Devs{ $_ };
if (/bad/) {
delete $Devs{ $_ }; # fake manual removal
}
}
# fake manual shortening of keys
my %Tmp = %Devs;
%Devs = ();
for (keys %Tmp) {
$Devs{ (split( /-/, $_))[ 0 ] } = $Tmp{ $_ };
}
printf "----------- Devs corrected\n";
for (sort( keys( %Devs ) )) {
printf "%s => %s\n", $_, $Devs{ $_ };
}
printf "----------- Log corrected\n";
for (@Log) {
$_->[ 1 ] = $Devs{ $_->[ 0 ] };
printf "%s %s\n", @{$_};
}
输出:
----------- Log org
HTC badModelHTC
ABC badModelABC
HTC goodModelHTC
ABC badModelABC
ABC goodModelABC
HTC goodModelHTC
ABC badModelABC
----------- Devs org
ABC-badModelABC => badModelABC
ABC-goodModelABC => goodModelABC
HTC-badModelHTC => badModelHTC
HTC-goodModelHTC => goodModelHTC
----------- Devs corrected
ABC => goodModelABC
HTC => goodModelHTC
----------- Log corrected
HTC goodModelHTC
ABC goodModelABC
HTC goodModelHTC
ABC goodModelABC
ABC goodModelABC
HTC goodModelHTC
ABC goodModelABC
这篇关于使用另一行中的值更新数据文件中的一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!