两个结构的猫:不相同的字段 [英] cat of two struct: not the same fields
问题描述
我有多个csv文件
a.csv
field_a, field_b
111, 121
112, 122
b.csv
field_a, field_c
211, 231
212, 232
c.csv
field_a, field_b, field_c
311, 321, 331
312, 322, 332
我想将它们连接起来
output.csv
output.csv
field_a,field_b,field_c
111, 121, NA
112, 122, NA
211, NA, 231
212, NA, 232
311, 321, 331
312, 322, 332
我想用八度来做.
到目前为止我做了什么:
What i did so far:
a=csv2cell(a.csv)
A=cell2struct(a(2:end,:),a(1,:),1)
现在我正在寻找类似的东西
and now i'm looking for something like
合并(A,B,C) 或者 vertcat(A,B,C)
merge(A,B,C) or vertcat(A,B,C)
但是我不明白,所有字段都在输出中.
but i didn't get it, that all fields are in the output.
Rhi这样做是这样的:
Whith R i did it like this:
filelist<-list.files()
for (i in 1:length(filelist)) {
datas[[i]]<-list(as.data.frame(read.csv(filelist[i])))
merged <- merge(merged,datas[[i]], all=TRUE)}
但是for循环太慢了.因此,我正在寻找一次将它们全部合并的可能性.
but the for-loop is terrible slow. So i'm looking for a possibility to merged them all at once.
推荐答案
我终于做到了:
使用八度(MATLAB)
With Octave (MATLAB)
% FileNames=readdir(pwd);
d=dir(pwd);
isDirIdx = [d.isdir];
names = {d.name};
FileNames = names(~isDirIdx);
for ii = 1:numel(FileNames)
% Load csv to cell
datas{ii}=csv2cell(FileNames{ii});
% Then I convert them to a struct
Datas{ii}=cell2struct((datas{ii}(2:end,:)),[datas{ii}(1,:)],2);
try fields=[fields, fieldnames(Datas{ii})'];% fails for the first loop, becauce 'fields' doesn't exist yet
catch
fields=[fieldnames(Datas{ii})']; % create 'fields' in the first loop
end
Datalenght(ii)=numel(Datas{ii}(1));
end
cd(startdir)
for jj=1:numel(Datas)
missing_fields{jj} = setdiff(fields,fieldnames(Datas{jj}));
for kk=1:numel(missing_fields{jj})
[Datas{jj}.(missing_fields{jj}{kk})]=deal(NaN);%*zeros(numel(datas{jj}(2:end,1)),1);)
end
end
问题是,我没有看到将结构导出到csv的简便方法.所以我切换回R.因为我没有足够的内存,所以我无法在r中加载所有文件并将其导出为一个csv.所以首先我将每个netcdf文件导出到具有完全相同值的csv.然后,我用unix/gnu cat命令将它们全部串联起来.
The problem was, i didn't saw a easy way to export the struct to a csv. So I switch back to R. Because I have not enough memory, i couldn't load all files in r and export them as one csv. So first i exported every netcdf file to a csv with exactly the same values. Then I concatenated them all with the unix/gnu cat command.
R:
# Converts all NetCDF (*.nc) in a folder to ASCII (csv)
# when there are more then one, all csv will have the same fields
# when there is a field missing in one NetCDF file, this scripts adds 'NA' Values
# it saves memory, because there is always only one NetCDF-File in the memory.
# Needs package RNetCDF:
# http://cran.r-project.org/web/packages/RNetCDF/index.html
# load package
library('RNetCDF')
# get list of all files to merge
filelist<-list.files()
# initialise variable names
varnames_all<-{}
varnames_file<-list(filelist)
n_files<-length(filelist)
n_vars<-rep(NA,n_files) # initialise
# get variables-names of each NetCDF file
for (i in 1:n_files) {
ncfile<-open.nc(filelist[i]) # open nc file
print(paste(filelist[i],"opend!"))
# get number of variable in the NetCDF
n_vars[i]<-file.inq.nc(ncfile)$nvars
varnames="" # initialise and clear
# read every variable name
for (j in 0:(n_vars[i]-1)) {
varnames[j]<-var.inq.nc(ncfile,j)$name
}
close.nc(ncfile)
varnames_file[[i]]<-varnames # add to the list of all files
varnames_all<-(c(varnames_all,varnames)) # concat to one array
}
varnames_all<-unique(varnames_all) # take every varname only once
print("Existing variable names:")
print(varnames_all)
#initialise a data.frame for load the NetCDF
datas<-data.frame()
for (i in 1:length(filelist)) {
print(filelist[i])
ncfile<-open.nc(filelist[i]) # open nc file
print(paste("reading ", filelist[i], "..."))
datas<-as.data.frame(read.nc(ncfile)) #import data from ncfile as data frame
close.nc(ncfile)
#check witch variables are missing
missing_vars<-setdiff(varnames_all,colnames(datas))
# Add missing variables a colums with NA
datas[missing_vars]<-NA
print(paste("writing ", filelist[i], " to ", filelist[i],".csv ...", sep=""))
#reorder colum in the same way as in the array varname_all
datas<-datas[varnames_all]
# Write File
write.csv(datas,file=paste(filelist[i],".csv", sep=""))
# clear Memory
rm(datas)
}
那只猫是直挺的
#!/bin/bash
# Concatenate csv files, whitch have exactly the same fields
## Change to the directory, from where the files is executed
path=$PWD
cd $path
if [ $# -gt 0 ]; then
cd $1
fi
# get a list of all data files
datafile_list=$( ls )
read -a datafile_array <<< $datafile_list
echo "copying files ..."
echo "copying file:" ${datafile_array[0]}
cat < ./${datafile_array[0]} > ../outputCat.csv
for (( i=1; i<${#datafile_array[@]}; i++))
do
echo "copying file" ${datafile_array[$i]}
cat < ./${datafile_array[$i]} | tail -n+2 >> ../outputCat.csv
done
这篇关于两个结构的猫:不相同的字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!