Bios2mds: An R Package for Comparing Orthologous Protein Families by Metric Multidimensional Scaling

DSpace/Manakin Repository

Bios2mds: An R Package for Comparing Orthologous Protein Families by Metric Multidimensional Scaling

Show full item record

Title: Bios2mds: An R Package for Comparing Orthologous Protein Families by Metric Multidimensional Scaling
Author(s):
Pele, Julien;
Becu, Jean-Michel;
Abdi, Hervé;
Chabbert, Marie
Format: text
Item Type: article
Keywords: Evolution
Phylogeny
Proteins
R (Computer program language)
Abstract: Background: The distance matrix computed from multiple alignments of homologous sequences is widely used by distance-based phylogenetic methods to provide information on the evolution of protein families. This matrix can also be visualized in a low dimensional space by metric multidimensional scaling (MDS). Applied to protein families, MDS provides information complementary to the information derived from tree-based methods. Moreover, MDS gives a unique opportunity to compare orthologous sequence sets because it can add supplementary elements to a reference space.Results: The R package bios2mds (from BIOlogical Sequences to MultiDimensional Scaling) has been designed to analyze multiple sequence alignments by MDS. Bios2mds starts with a sequence alignment, builds a matrix of distances between the aligned sequences, and represents this matrix by MDS to visualize a sequence space. This package also offers the possibility of performing K-means clustering in the MDS derived sequence space. Most importantly, bios2mds includes a function that projects supplementary elements (a.k.a. " out of sample" elements) onto the space defined by reference or " active" elements. Orthologous sequence sets can thus be compared in a straightforward way. The data analysis and visualization tools have been specifically designed for an easy monitoring of the evolutionary drift of protein sub-families.Conclusions: The bios2mds package provides the tools for a complete integrated pipeline aimed at the MDS analysis of multiple sets of orthologous sequences in the R statistical environment. In addition, as the analysis can be carried out from user provided matrices, the projection function can be widely used on any kind of data.
ISSN: 1471-2105
Persistent Link: http://dx.doi.org/10.1186/1471-2105-13-133
http://hdl.handle.net/10735.1/2864
Terms of Use: © 2012 Pelé et al.; licensee BioMed Central Ltd.

Files in this item

Files Size Format View
BBS-FR-Abdi-310612.91.pdf 1.109Mb PDF View/Open

This item appears in the following Collection(s)


  • Abdi, Hervé
    Professor of Cognitive-Neuroscience and Cognitive Psychology

Show full item record