计算csv文件中每个字段的最大长度 [英] calculate max length of each field in csv file

查看:406
本文介绍了计算csv文件中每个字段的最大长度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个groovy脚本,通过一个csv迭代并存储每个字段的最大长度的文件:

  def csv = new File('./ myfile.csv')。text 

def max = [] as ArrayList

csv.eachLine {line,count - >

def params = line.split(',')

//跳过标题行
if(count> 0)
{
params.eachWithIndex(){p,index - >
if(p.length()> max [index]){
max [index] = p.length()
}
}
}
}
println最大字段长度:$ {max}



喜欢使用R实现相同的目标,理想情况下使用库函数。



如何打印csv文件中的最大字段长度?



输入示例:

  foo,bar 
abcd,12345
def,234567

输出:

 最大字段长度:[4,6] 


解决方案

将数据读入数据框架,并在其列中刷新指定的函数。如果数据在文件中,则用 file =myfile.csv替换 text = Lines 。请参阅?read.csv 了解更多参数,根据您的真实文件的样子,可能需要或可能不需要。

 #test data 
Lines< - foo,bar
abcd,12345
def,234567

< - read.csv(text = Lines,colClasses =character)
sapply(DF,function(x)max(nchar(x)))
pre>

给予:

  foo bar 
4 6

注意:一个潜在的问题是,如果你有这样的输入。幸运的是,这个答案是正确的:

  Lines<  - foo,bar 
abcd,1234567e9
def,234567


I have a groovy script that iterates through a csv and stores the maximum length of each field in the file:

def csv = new File('./myfile.csv').text

def max = [ ] as ArrayList

csv.eachLine { line, count ->

    def params = line.split(',')

    // skip the header line
    if (count > 0) 
    {
        params.eachWithIndex() { p, index ->        
            if (p.length() > max[index] ) {
                max[index] = p.length()
            }
        }
     }
}
println "Max length of fields: ${max}"

I would like to achieve the same goal using R, ideally using a library function.

How can I print out the max length of fields in csv file?

Example input:

foo,bar
abcd,12345
def,234567

Output:

Max length of fields: [4, 6]

解决方案

Read in the data into a data frame and sapply the indicated function across its columns. If the data is in a file replace text = Lines with file = "myfile.csv". See ?read.csv for additional arguments which may or may not be needed depending on what your real file looks like.

# test data
Lines <- "foo,bar
abcd,12345
def,234567"

DF <- read.csv(text = Lines, colClasses = "character")
sapply(DF, function(x) max(nchar(x)))

giving:

foo bar 
  4   6 

Note: One potential gotcha is if you have input like this. Fortunately, this answer gets it correct:

Lines <- "foo,bar
abcd,1234567e9
def,234567"

这篇关于计算csv文件中每个字段的最大长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆