如何在F#的字符串中找到子字符串? [英] How do I find a substring within a string in F#?

查看:98
本文介绍了如何在F#的字符串中找到子字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在网上找到了一个针对f#的有趣"项目,其背后的想法是找到给定字符串中子字符串的数量.

I found a "fun" project online for f# and the idea behind it is to find the number of substrings within a given string.

提示:

Description:
You are given a DNA sequence:
a string that contains only characters 'A', 'C', 'G', and 'T'.
Your task is to calculate the number of substrings of sequence,
in which each of the symbols appears the same number of times.

Example 1:
For sequence = "ACGTACGT", the output should be 6
All substrings of length 4 contain each symbol exactly once (+5),
and the whole sequence contains each symbol twice (+1).

Example 2:
For sequence = "AAACCGGTTT", the output should be 1
Only substring "AACCGGTT" satisfies the criterion above: it contains each symbol twice.


Input: String, a sequence that consists only of symbols 'A', 'C', 'G', and 'T'.
Length constraint: 0 < sequence.length < 100000.

Output: Integer, the number of substrings where each symbol appears equally many times.

我不确定该去哪里,或更确切地说该怎么做.我在互联网上四处张望,试图找到应该做的事情,而我仅找到以下代码(我添加了输入变量,var变量,并将显示的事物"更改为 input 然后是要搜索的子字符串(我希望这是有道理的)):

I'm not exactly sure where to go with this, or more specifically what to do. I've looked around on the internet to try and find what I'm supposed to do and I've only found the following code (I added the input variable, var variable, and changed the show "things" to input then the substring to search for (i hope that makes sense)):

open System

let countSubstring (where :string) (what : string) =
match what with
| "" -> 0
| _ -> (where.Length - where.Replace(what, @"").Length) / what.Length


[<EntryPoint>]
let main argv =

let input = System.Console.ReadLine();
let var = input.Length;
Console.WriteLine(var);
let show where what =
    printfn @"countSubstring(""%s"", ""%s"") = %d" where what (countSubstring where what)
show input "ACGT"
show input "CGTA"
show input "GTAC"
show input "TACG"
0

无论如何,如果有人可以帮助我,将不胜感激.

Anyways, if anyone can help me with this, it would be greatly appreciated.

预先感谢

推荐答案

首先声明一个函数numberACGT,如果字符A的数目与C,G和T相同,则从字符串返回1,否则返回0.为此,声明一个由4个整数组成的数组N,该数组初始化为0,然后运行该字符串,并增加相应的计数器.后来比较它们之间的数组元素.

First declare a function numberACGT that from a string returns 1 if the number of characters A is the same as C, G and T and 0 otherwise. For this, declare an array N of 4 integers initialized to 0 and run throw the string, incrementing the corresponding counter. In late compare array elements between them.

然后为每个子字符串(固定长度为4的倍数)调用numberACGT并添加到计数器count(在开始时初始化为0)

Then for each sub-string (fixed length multiple of 4) call numberACGT and add to counter count (initialized to 0 at the beginning)

let numberACGT (aString:string) =
    let N = Array.create 4 (0:int)
    let last = aString.Length - 1 
    for i = 0 to last do
        match aString.[i] with
        | 'A' -> N.[0] <- N.[0] + 1
        | 'C' -> N.[1] <- N.[1] + 1
        | 'G' -> N.[2] <- N.[2] + 1
        | _ -> N.[3] <- N.[3] + 1
    if (N.[0] = N.[1]) && (N.[1] = N.[2]) && (N.[2] = N.[3]) then 1 else 0 

let numberSubStrings (aString:string) =
    let mutable count = 0
    let len = aString.Length 
    for k = 1 to len / 4 do //only multiple of 4
        for pos = 0 to len - 4*k do
            count <- count + numberACGT (aString.[pos..pos+4*k-1])
    count

我希望它足够快.

[<EntryPoint>]
let main argv = 
  let stopWatch = System.Diagnostics.Stopwatch.StartNew()
  let input =  Console.ReadLine() in
    printf "%i  " (numberSubStrings input)
  stopWatch.Stop()
  let g =  Console.ReadLine()
  0

结果:

62    4.542700

O(n²)中的新版本:

An new version in O(n²):

let numberSubStringsBis (aString:string) =
    let mutable count = 0 
    let len = aString.Length 
    for pos = 0 to len - 1 do
        let mutable a = 0 
        let mutable  c = 0 
        let mutable g = 0 
        let mutable t = 0 
        let mutable k = pos 
        while k + 3 <= len - 1 do
            for i in [k..k+3] do
                match aString.[i] with
                | 'A' -> a <- a + 1
                | 'C' -> c <- c + 1
                | 'G' -> g <- g + 1
                | _ -> t <- t + 1
            k <- k + 4 
            if a=c && c=g && g=t then count <- count + 1               
    count

这篇关于如何在F#的字符串中找到子字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆