Hierarchical classification of microorganisms based on high-dimensional phenotypic data
V. Tafintseva, E. Vigneau, V. Shapaval, V. Cariou, E.M. Qannari, A. Kohler
The classification of microorganisms by high-dimensional phenotyping methods such as FTIR spectroscopy is often a complicated process due to the complexity of microbial phylogenetic taxonomy. A hierarchical structure developed for such data can often facilitate the classification analysis. The hierarchical tree structure can either be imposed to a given set of phenotypic data by integrating the phylogenetic taxonomic structure or set up by revealing the inherent clusters in the phenotypic data. In this study, we wanted to compare different approaches to hierarchical classification of microorganisms based on high-dimensional phenotypic data. A set of 19 different species of moulds (filamentous fungi) obtained from the mycological strain collection of the Norwegian Veterinary Institute (Oslo, Norway) is used for the study. Hierarchical cluster analysis is performed for setting up the classification trees. Classification algorithms such as Artificial Neural Networks (ANN), Partial Least Squared Discriminant Analysis (PLSDA), and Random Forest (RF) are used and compared. The two methods ANN and RF outperformed all the other approaches even though they did not utilize predefined hierarchical structure. To our knowledge, the Random Forest approach is used here for the first time to classify microorganisms by FTIR spectroscopy.