R如何处理凌乱的数据格式? [英] What can R do about a messy data format?

查看:103
本文介绍了R如何处理凌乱的数据格式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有时候,我看到数据以类似这个问题。这不是第一次,所以我决定问一个问题,并以一种使发布的数据可口的方式回答这个问题。

Sometimes I see data posted in a Stack Overflow question formatted like in this question. This is not the first time, so I have decided to ask a question about it, and answer the question with a way to make the posted data palatable.

我将发布

+------------+------+------+----------+--------------------------+
|    Date    | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A    | A1   |        0 |                        0 |
| 2018-06-03 | A    | A2   |        0 |                        1 |
| 2018-06-03 | A    | A3   |        0 |                        2 |
| 2018-06-03 | A    | A4   |        1 |                        1 |
| 2018-06-03 | A    | A5   |        2 |                        1 |
| 2018-06-04 | A    | A6   |        0 |                        3 |
| 2018-06-01 | B    | B1   |        0 |                        1 |
| 2018-06-02 | B    | B2   |        0 |                        2 |
| 2018-06-03 | B    | B3   |        0 |                        3 |
+------------+------+------+----------+--------------------------+

您会看到这不是发布数据的正确方法。正如用户在评论中写道,

As you can see this is not the right way to post data. As a user wrote in a comment,


您必须花费一些时间来格式化数据,就像您在这里显示
一样。不幸的是,这不是我们复制
的好方法。

It must've taken a bit of time to format the data the way you're showing it here. Unfortunately this is not a good format for us to copy & paste.

我相信这说明了一切。提问者的意图很强,花了一些时间和精力才能使它变得更好,但结果并不理想。

I believe this says it all. The asker is well intended and it took some work and time to try to be nice, but the result is not good.

R代码可以做什么使该表可用, 如果有什么?

What can R code do to make that table usable, if anything? Will it take a great deal of trouble?

推荐答案

使用 data.table :: fread

x = '
+------------+------+------+----------+--------------------------+
|    Date    | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A    | A1   |        0 |                        0 |
| 2018-06-03 | A    | A2   |        0 |                        1 |
| 2018-06-03 | A    | A3   |        0 |                        2 |
| 2018-06-03 | A    | A4   |        1 |                        1 |
| 2018-06-03 | A    | A5   |        2 |                        1 |
| 2018-06-04 | A    | A6   |        0 |                        3 |
| 2018-06-01 | B    | B1   |        0 |                        1 |
| 2018-06-02 | B    | B2   |        0 |                        2 |
| 2018-06-03 | B    | B3   |        0 |                        3 |
+------------+------+------+----------+--------------------------+
'

fread(gsub('\\+.+\\n' ,'', x, perl = T), drop=c(1,7))

#          Date Emp1 Case Priority PriorityCountinLast7days
# 1: 2018-06-01    A   A1        0                        0
# 2: 2018-06-03    A   A2        0                        1
# 3: 2018-06-03    A   A3        0                        2
# 4: 2018-06-03    A   A4        1                        1
# 5: 2018-06-03    A   A5        2                        1
# 6: 2018-06-04    A   A6        0                        3
# 7: 2018-06-01    B   B1        0                        1
# 8: 2018-06-02    B   B2        0                        2
# 9: 2018-06-03    B   B3        0                        3

gsub 部分删除了水平规则。 drop 删除由行尾定界符引起的多余列。

The gsub part removes the horizontal rules. drop removes the extra columns caused by delimiters at the line ends.

这篇关于R如何处理凌乱的数据格式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆