在 R 中,如何在语句末尾(句号)而不是在句子之间的 .(点)处拆分文本/段落 [英] In R how to split text/paragraph at end of statement (full stop) but not at the .(dot) in between the sentences

查看:77
本文介绍了在 R 中,如何在语句末尾(句号)而不是在句子之间的 .(点)处拆分文本/段落的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 R 中:-例如:-

In R :- For example:-

text_data<-" I have been a Gig subscriber for a decent amount of time.  When the service was originally installed I would observe 900+Mbps speeds.  I have deployed to Kosovo and since then (about 8 months ago) my wife has told me that the internet has become Slow.I consistently avg less than 286 Mbps when utilizing both speedtest.xfinity.com and fast.com as well as speedtest.net Current hardware: 1) Motorola MB8600 Modem 2) Linksys EA9500 I have not had a technician out to the site.Troubleshooting:I have replaced the Coax cable from wall to modem with 2 different new coax cables. I have replaced the ethernet from Modem to Router with 2 different new ethernet cables I have rebooted my Modem as well as attempted a factory reset. I have connected my home PC directly to the Modem (bypass router) with 2 different ethernet cables*NOTE*Wether I use the EA9500 or go directly to the PC I get the same slow speeds.*NOTE 2*I do not have a cable subscrption. No splitters on the line. It goes from Pole ---> Wall Jack --> Modem.There is always significant Uncorrected errors.  Attached are the Upstream and Downstream information and error logs.  These are 4 days after a modem reset."

> textdata<- as.String(text_data)

> a<-strsplit(text_data,".", fixed = TRUE)

输出:-

> a
[[1]]
 [1] " I have been a Gig subscriber for a decent amount of time"                                                                                                                           
 [2] "  When the service was originally installed I would observe 900+Mbps speeds"                                                                                                         
 [3] "  I have deployed to Kosovo and since then (about 8 months ago) my wife has told me that the internet has become Slow"                                                               
 [4] "I consistently avg less than 286 Mbps when utilizing both speedtest"                                                                                                                 
 [5] "xfinity"                                                                                                                                                                             
 [6] "com and fast"                                                                                                                                                                        
 [7] "com as well as speedtest"                                                                                                                                                            
 [8] "net Current hardware: 1) Motorola MB8600 Modem 2) Linksys EA9500 I have not had a technician out to the site"                                                                        
 [9] "Troubleshooting:I have replaced the Coax cable from wall to modem with 2 different new coax cables"                                                                                  
[10] " I have replaced the ethernet from Modem to Router with 2 different new ethernet cables I have rebooted my Modem as well as attempted a factory reset"                               
[11] " I have connected my home PC directly to the Modem (bypass router) with 2 different ethernet cables*NOTE*Wether I use the EA9500 or go directly to the PC I get the same slow speeds"
[12] "*NOTE 2*I do not have a cable subscrption"                                                                                                                                           
[13] " No splitters on the line"                                                                                                                                                           
[14] " It goes from Pole ---> Wall Jack --> Modem"                                                                                                                                         
[15] "There is always significant Uncorrected errors"                                                                                                                                      
[16] "  Attached are the Upstream and Downstream information and error logs"                                                                                                               
[17] "  These are 4 days after a modem reset"   

所需的 R 输出:-文本应该在语句结尾处(句号)分开,而不是在句子之间的 .(点)处分开.)

1)I have been a Gig subscriber for a decent amount of time.  
When the service was originally installed I would observe 900+Mbps speeds.  
2)I have deployed to Kosovo and since then (about 8 months ago) my wife has told me that the internet has become Slow.
3)I consistently avg less than 286 Mbps when utilizing both speedtest.xfinity.com and fast.com as well as speedtest.net Current hardware: 1) Motorola MB8600 Modem 2) Linksys EA9500 I have not had a technician out to the site.
4)Troubleshooting:I have replaced the Coax cable from wall to modem with 2 different new coax cables. 
5) I have replaced the ethernet from Modem to Router with 2 different new ethernet cables I have rebooted my Modem as well as attempted a factory reset. 
6)I have connected my home PC directly to the Modem (bypass router) with 2 different ethernet cables*NOTE*Wether I use the EA9500 or go directly to the PC I get the same slow speeds.
7)*NOTE 2*I do not have a cable subscrption. 
8)No splitters on the line. 
9)It goes from Pole ---> Wall Jack --> Modem.
10)There is always significant Uncorrected errors.  
11)Attached are the Upstream and Downstream information and error logs.  
12)These are 4 days after a modem reset.

请帮忙.

推荐答案

使用此特定数据的工作模式进行编辑;您可以在 . 后跟一个空格或大写字母与模式 \\.(?=( |[A-Z])) 进行拆分.

Edit with working pattern for this particular data; you can split on a . followed by a space or a capital letter with pattern \\.(?=( |[A-Z])).

您需要小心,因为您的句子后面没有正确的空格.这使得无法可靠地区分它们(参见输出中的第三个分割句子).这至少不会像您第一次尝试那样在 .com 的情况下分裂.在这里,我们用大写字母的区别来区分speedtest.xfinity.comSlow.Isite.Troubleshooting,但它不会如果有人忘记空格并忘记将下一个句子大写,则不成立.

You need to be careful because you have sentences that do not have the correct space after them. This makes it impossible to distinguish them reliably (see the third split sentence in output). This at least will not split on cases of .com as your first attempt does. Here, we use the capital letter difference to distinguish between speedtest.xfinity.com and Slow.I or site.Troubleshooting, but it won't hold if someone forgets the space and forgets to capitalise the next sentence.

library(stringr)
text_data <- " I have been a Gig subscriber for a decent amount of time. When the service was originally installed I would observe 900+Mbps speeds. I have deployed to Kosovo and since then (about 8 months ago) my wife has told me that the internet has become Slow.I consistently avg less than 286 Mbps when utilizing both speedtest.xfinity.com and fast.com as well as speedtest.net Current hardware: 1) Motorola MB8600 Modem 2) Linksys EA9500 I have not had a technician out to the site.Troubleshooting:I have replaced the Coax cable from wall to modem with 2 different new coax cables. I have replaced the ethernet from Modem to Router with 2 different new ethernet cables I have rebooted my Modem as well as attempted a factory reset. I have connected my home PC directly to the Modem (bypass router) with 2 different ethernet cablesNOTEWether I use the EA9500 or go directly to the PC I get the same slow speeds.*NOTE 2*I do not have a cable subscrption. No splitters on the line. It goes from Pole ---> Wall Jack --> Modem.There is always significant Uncorrected errors. Attached are the Upstream and Downstream information and error logs. These are 4 days after a modem reset."
text_data %>%
  str_split("\\.(?=( |[A-Z]))")
#> [[1]]
#>  [1] " I have been a Gig subscriber for a decent amount of time"                                                                                                                                                                     
#>  [2] " When the service was originally installed I would observe 900+Mbps speeds"                                                                                                                                                    
#>  [3] " I have deployed to Kosovo and since then (about 8 months ago) my wife has told me that the internet has become Slow"                                                                                                          
#>  [4] "I consistently avg less than 286 Mbps when utilizing both speedtest.xfinity.com and fast.com as well as speedtest.net Current hardware: 1) Motorola MB8600 Modem 2) Linksys EA9500 I have not had a technician out to the site"
#>  [5] "Troubleshooting:I have replaced the Coax cable from wall to modem with 2 different new coax cables"                                                                                                                            
#>  [6] " I have replaced the ethernet from Modem to Router with 2 different new ethernet cables I have rebooted my Modem as well as attempted a factory reset"                                                                         
#>  [7] " I have connected my home PC directly to the Modem (bypass router) with 2 different ethernet cablesNOTEWether I use the EA9500 or go directly to the PC I get the same slow speeds.*NOTE 2*I do not have a cable subscrption"  
#>  [8] " No splitters on the line"                                                                                                                                                                                                     
#>  [9] " It goes from Pole ---> Wall Jack --> Modem"                                                                                                                                                                                   
#> [10] "There is always significant Uncorrected errors"                                                                                                                                                                                
#> [11] " Attached are the Upstream and Downstream information and error logs"                                                                                                                                                          
#> [12] " These are 4 days after a modem reset."

reprex 包 (v0.2.0) 于 2018 年 8 月 12 日创建.

Created on 2018-08-12 by the reprex package (v0.2.0).

这篇关于在 R 中,如何在语句末尾(句号)而不是在句子之间的 .(点)处拆分文本/段落的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆