将NMR ascii文件转换为峰列表 [英] Converting NMR ascii file to peak list

查看:113
本文介绍了将NMR ascii文件转换为峰列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些Bruker NMR光谱,我正在用它来创建一个程序,并将其作为项目的一部分.我的程序需要在实际频谱上工作.因此,我将布鲁克NMR光谱的1r文件转换为ASCII.对于Carnitine,这是ascii文件的外观(这不是完整列表.完整列表运行成千上万行.这只是一个快照):

I have some Bruker NMR spectra that i am using to create a program as part of a project. My program needs to work on the actual spectrum. So i converted the 1r files of the Bruker NMR spectra to ASCII. For Carnitine this is what the ascii file looks like(this is not the complete list. The complete list runs into thousands of lines. This is only a snapshot):

-0.807434   -23644  
-0.807067   -22980  
-0.806701   -22967  
-0.806334   -24513  
-0.805967   -27609  
-0.805601   -31145  
-0.805234   -33951  
-0.804867   -35553  
-0.804501   -35880  
-0.804134   -35240  
-0.803767   -34626  
-0.8034  -34613 
-0.803034   -34312  
-0.802667   -32411  
-0.8023  -28925 
-0.801934   -25177  
-0.801567   -22132  
-0.8012  -19395 

这就是频谱:
(来源: wisc.edu )

and this is what the spectrum is:
(source: wisc.edu)

我的程序必须从该数据中识别出峰.所以我需要知道如何解释这些数字.以及如何将它们准确地转换为频谱中的适当值.到目前为止,这是我所学到的:

My program has to identify the peaks from this data. So i need to know how to interpret these numbers. And how exactly they are converted into their appropriate values in the spectrum. So far this is what i have learnt:

1.)第一列代表光谱点位置(ppm)

1.) The first column represents the spectral point position (ppm)

2.)第二列代表每个峰的强度.

2.) The second column represents the intensity of each peak.

3.)注意,在第二列中有一些数字没有完全对齐,但更接近第一列.例如:-34613,-28925,-19395.我认为这很重要.

3.) notice that in the second column there are some numbers which are not perfectly aligned but are closer to the first column. Eg:-34613, -28925, -19395. I think this is significant.

为了全面披露-我正在用R语言进行编程.

For the sake of full disclosure- I am doing the programming in R.

注意:我也曾在Biostar中问过这个问题,但我认为我比在这里得到答案的机会更大,因为似乎没有多少人在那回答问题.

NOTE: I have also asked this in Biostar but i think i have a better chance of getting an answer here than there because not many people seem to be answering questions there.

这是我发现的一种可行的解决方案:

This is one solution that i have found is plausible:

一个朋友给我一个想法,使用awk脚本检查文件中强度从正变为负的确切位置,以找到局部最大值.这是一个工作脚本:

A friend gave me the idea to use an awk script to check where exactly the intensities in the file change from positive to negative to find the local maxima. Here is a working script:

awk 'BEGIN{dydx = 0;}
{ 
  if(NR > 1)
     { dydx = ($2 - y0)/($1 - x0); } 
  if(NR > 2 && last * dydx < 0)
     { printf( "%.4f  %.4f\n", (x0 + $1)/2, log((dydx<0)?-dydx:dydx)); } ;
  last=dydx; x0=$1; y0=$2
}' /home/chaitanya/Work/nmr_spectra/caffeine/pdata/1/spectrumtext.txt  | awk '$2 > 17'

如果您不理解,请告诉我.我将改进说明.

Tell me if you dont understand it. I will improve the explanation.

此外,还有我问过的与此相关的问题.

Also, there is this related question i asked.

推荐答案

这是一个带有可复制代码的可行示例.我不认为这对策略或编码有什么好处,但是它可以帮助您入门.

Here's a worked example with a reproducible piece of code. I don't claim it's any good with regard to the strategy or coding, but it could get you started.

find_peaks <- function (x, y, n.fine = length(x), interval = range(x), ...) {
  maxdif <- max(diff(x)) # longest distance between successive points

  ## selected interval for the search
  range.ind <- seq(which.min(abs(x - interval[1])),
                   which.min(abs(x - interval[2])))
  x <- x[range.ind]
  y <- y[range.ind]

  ## smooth the data
  spl <- smooth.spline(x, y, ...)
  ## finer x positions
  x.fine <- seq(range(x)[1], range(x)[2], length = n.fine)
  ## predicted y positions
  y.spl <- predict(spl, x.fine, der = 0)$y
  ## testing numerically the second derivative
  test <- diff(diff((y.spl), 1) > 0, 1)
  maxima <- which(test == -1) + 1

  ## according to this criterion, we found rough positions
  guess <- data.frame(x=x.fine[maxima], y=y.spl[maxima])

  ## cost function to maximize 
  obj <- function(x) predict(spl, x)$y

  ## optimize the peak position around each guess
  fit <- data.frame(do.call(rbind,
          lapply(guess$x, function(g) {
            fit <- optimize(obj, interval = g + c(-1,1) * maxdif, maximum=TRUE)
            data.frame(x=fit$maximum,y=fit$objective)
          })))

  ## return both guesses and fits
  invisible(list(guess=guess, fit=fit))
}

set.seed(123)
x <- seq(1, 15, length=100)
y <- jitter(cos(x), a=0.2)

plot(x,y)
res <- find_peaks(x,y)
points(res$guess,col="blue")
points(res$fit,col="red")

这篇关于将NMR ascii文件转换为峰列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆