An automated signal alignment algorithm based on dynamic time warping for capillary electrophoresis data

Correcting the retention time variation and measuring the similarity of time series is one of the most popular challenges in the area of analyzing capillary electrophoresis (CE) data. In this study, an automated signal alignment method is proposed by modifying the dynamic time warping (DTW) approach to align the time-series data. Preprocessing tools and further optimizations were developed to increase the performance of the algorithm. As a demonstrative case study, the developed algorithm is applied to the analysis of CE data from a selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) evaluation of the RNA secondary structure. The time-shift problem is one of the main components in the analysis of the SHAPE data. The accuracy and execution time of the algorithm are illustrated with experimental results obtained by applying to different types of data. The experimental results show that the signal alignment algorithm efficiently corrects the retention time variation. The developed tools can be readily adapted for the analysis of other biological datasets or time series.

An automated signal alignment algorithm based on dynamic time warping for capillary electrophoresis data

Correcting the retention time variation and measuring the similarity of time series is one of the most popular challenges in the area of analyzing capillary electrophoresis (CE) data. In this study, an automated signal alignment method is proposed by modifying the dynamic time warping (DTW) approach to align the time-series data. Preprocessing tools and further optimizations were developed to increase the performance of the algorithm. As a demonstrative case study, the developed algorithm is applied to the analysis of CE data from a selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) evaluation of the RNA secondary structure. The time-shift problem is one of the main components in the analysis of the SHAPE data. The accuracy and execution time of the algorithm are illustrated with experimental results obtained by applying to different types of data. The experimental results show that the signal alignment algorithm efficiently corrects the retention time variation. The developed tools can be readily adapted for the analysis of other biological datasets or time series.

___

  • J. Kinser, Python for Bioinformatics, Burlington, Massachusetts, Jones and Bartlett Publishers, 2008.
  • M. Last, A. Kandel, H. Bunke, Data Mining in Time Series Databases, Singapore, World Scientific, 2004.
  • H. Sakoe, S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition”, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 26, pp. 43–49, 1978.
  • S.R. Eddy, “What is dynamic programming?”, Nature Biotechnology, Vol. 22, pp. 909–910, 2004.
  • M.E. Munich, P. Perona, “Visual identi?cation by signature tracking”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 25, pp. 200–217, 2003.
  • V. Niennattrakul, C.A. Ratanamahatana, “On clustering multimedia time series data using K-means and dynamic time warping”, International Conference on Multimedia and Ubiquitous Engineering, pp. 733–738, 2007.
  • T. Kahveci, A. Singh, A. Gurel, “Similarity searching for multi-attribute sequences”, Proceedings of the Scienti?c and Statistical Database Management, pp. 175–184, 2002.
  • Z. Zhang, K. Huang, T. Tan, “Comparison of similarity measures for trajectory clustering in outdoor surveillance scenes”, Proceedings of the 18th International Conference on Pattern Recognition, Vol. 3, pp. 1135–1138, 2006.
  • J. Vial, H. Nocairi, P. Sassiat, S. Mallipatu, G. Cognon, D. Thiebaut, B. Teillet, D. Rutledge, “Combination of dynamic time warping and multivariate analysis for the comparison of comprehensive two-dimensional gas chromatograms application to plant extracts,” Journal of Chromatography A, Vol. 1216, pp. 2866–2872, 2009.
  • M. M¨ uller, Information Retrieval for Music and Motion, Berlin, Springer, pp. 69–84, 2007.
  • E.J. Keogh, M.J. Pazzani, “Derivative dynamic time warping”, First SIAM International Conference on Data Mining, 2001.
  • P. Senin, “Dynamic time warping algorithm review”, Information and Computer Science Department, University of Hawaii, pp. 1–23, 2008.
  • F. Gong, Y.Z. Liang, Y.S. Fung, F.T. Chau, “Correction of retention time shifts for chromatographic fingerprints of herbal medicines”, Journal of Chromatography A, Vol. 1029, pp. 173–83, 2004.
  • G. Tomasi, F. Van Den Berg, C. Andersson, “Correlation optimized warping and dynamic time warping as preprocessing methods for chromatographic data”, Journal of Chemometrics, Vol. 18, pp. 231–241, 2004.
  • M.D. Robinson, D.P. De Souza, W.W. Keen, E.C. Saunders, M.J. McConville, T.P. Speed, V.A. Liki´c, “A dynamic programming approach for the alignment of signal peaks in multiple gas chromatography-mass spectrometry experiments”, BMC Bioinformatics, Vol. 8, p. 419, 2007.
  • K.A. Wilkinson, E.J. Merino, K.M. Weeks, “Selective 2’-hydroxyl acylation analyzed by primer extension (SHAPE): quantitative RNA structure analysis at single nucleotide resolution”, Nature Protocols, Vol. 1, pp. 1610–1616, 2006. S.M. Vasa, N. Guex, K.A. Wilkinson, K.M. Weeks, M.C. Giddings, “ShapeFinder: A software system for highthroughput quantitative analysis of nucleic acid reactivity information resolved by capillary electrophoresis”, RNA, Vol. 14, pp. 1979–1990, 2008.
  • T. O’Haver, “An introduction to signal processing in chemical analysis”, available at http://terpconnect.umd.edu/ ∼toh/spectrum/, 2009.
  • J. Kiusalaas, Numerical Methods in Engineering with Python, Cambridge, Cambridge University Press, 2010.
  • T. Aruk, D. Ustek, O. Kursun, “A novel partial sequence alignment tool for finding large deletions”, The Scientific World Journal, doi 10.1100/2012/694813, 2012.
  • G. vanRossum, F.L. Drake (eds.), Python Reference Manual, available at http://www.python.org/, 2001.
  • F. Jones, T. Oliphant, P. Peterson, “SciPy: open source scientific tools for Python”, available at http://www.scipy.org/, 2001.
  • J.D. Hunter, “Matplotlib: A 2D Graphics Environment”, Computing in Science and Engineering, Vol. 9, pp. 90–95, 200