Data Science

Our group is specialized on the analysis of high-dimensional data in the field of life sciences. We are developing tools for the integration of different types of data from life sciences such as FTIR and Raman spectroscopy data, data from proteomics, metabolomics, DNA, mRNA. We are specialized in methods based on latent variables. Examples are multiblock methods for the integration of different types of data and sparse methods for biomarker detection.

Members of our group have invested heavily in being at the forefront in advanced multivariate data analysis for vibrational spectroscopic techniques (FTIR and Raman etc.) for many years. We have contributed substantially to the development of pre-processing techniques for the separation of scatter contributions from chemical information for infrared spectra of biological materials. These techniques are widely used in the field of life sciences and in the medical.

We are developing techniques for classification and calibration, which are tailored for FTIR spectroscopic data from biological materials. FTIR spectroscopy has been proven a powerful tool for the identification and characterization of microorganisms such as bacteria, yeasts, moulds and pollen. It is a sensitive biophysical technique, which performs microbial identification at the species and in some cases even strains level. Spectral reference libraries are used to establish hierarchical classification trees that allow identification of microorganisms at different taxonomic levels. We are specialized on sparse calibration methods. These methods allow establishing robust calibration models which can easily be interpreted. Biomarker selection tools provide valuable information on the principal differences among classes.

In addition to identification, biochemical composition of microorganisms can be obtained from FTIR analysis using Sparse PLSR based models. This allows screening lots of strains for oil production or other valuable components in microorganisms. We are specialized on establishing calibration models for the prediction of metabolites in cells by FTIR spectroscopy. Examples are the use of FTIR spectroscopy for estimating lipid profiles in biotechnologically relevant strains.


Orange and Quasar:

Orange is an open-source data visualization, machine learning and data mining toolkit. It features a visual programming front-end for explorative data analysis and interactive data visualization, and can also be used as a Python library. This science platform contains a number of data mining tools for classification, prediction and visualization for multivariate spectral data. It was recently extended to Orange Infrared, with tools specifically developed for the visualization and analysis of vibrational spectra, by an international initiative in which BioSpec Norway is heavily involved. It contains handy visualization tools for vibrational spectroscopic data for all scales. It allows visualization of complex vibrational spectroscopy data such as 3D data cubes from macro and nano vibrational spectroscopy. The Orange Platform is currently being extended to Orange Infrared by an international initiative in which BioSpec Norway is heavily involved. See the tutorial video for Spectral Orange, a part of Orange for analyzing spectroscopy data:

Quasar is an open source project, a collection of data analysis toolboxes extending the Orange suite.




DeepHyperSpec: - Combining spectral and image information in the analysis of hyperspectral imaging data
Research Council of Norway (FRINATEK project Nº. TBD)

Belanoda - Multidisciplinary graduate and post-graduate education in big data analysis for life sciences
Senter for internasjonalisering av utdanning (SiU-CPEA-LT-2016/10126)

MIRACLE - Mid-infrared arthroscopy innovative imaging system for real-time clinical in depth examination and diagnosis of degenerative joint diseases
European Commision (H2020-ICT Nº. 780598)

LipoFungi - Bioconversion of low-cost fat materials into high-value PUFA-Carotenoid-rich biomass
Norwegian Research Counsil (BIONÆR, project Nº. 268305)

Single Cell Oil - Single cell oil PUFA production by food rest materials
Norwegian Research Counsil (BIONÆR, project Nº. 234258)

BigSpecData - Data analysis for big vibrational ppectroscopic data
Norwegian Research Counsil (IS-AUR, project Nº. 281263)

FUST - Source tracking and monitoring of mould contamination in food production
European Commision (FP7-SME Nº. 315271)

MERITS - Development and mitigation of metabolic syndrome
Innovation Fund Denmark (Grant Nº. 2014-5158)

AMS - New approaches for management and breeding of dairy cows in automatic milking systems
Norwegian Research Council (MATFONDAVTALE, Project Nº 244231)


Master of Science in Data Science:

The MSc program at NMBU combines the disciplines of informatics, mathematics, statistics and data analysis. You can find out more about the program HERE and HERE.



Tafintseva V., Shapaval V., Smirnova M., Kohler A.
Extended multiplicative signal correction for FTIR spectral quality test and pre‐processing of infrared imaging data.
Journal of Biophotonics 13 (2020)

Solheim J., Gunko E., Petersen D., Grosserruschkamp F., Gerwert K., Kohler A.
An open source code for Mie Extinction EMSC for infrared microscopy spectra of cells and tissues.
Journal of Biophotonics 12 (2019) e201800415

Guoa S., Kohler A., Zimmermann B., Heinke R., Stöckel S., Rösch P., Popp J., Bocklitz T.
EMSC based model transfer for Raman spectroscopy in biological applications.
Analytical Chemistry 90 (2018) 9787.

Tafintseva V., Vigneau E., Shapaval V., Cariou V., Qannari E.M., Kohler A.
Hierarchical classification of microorganisms based on high-dimensional phenotypic data.
Journal of Biophotonics 11 (2018)

Karaman I., Nørskov N.P., Yde C.C., Skou Hedemann M., Bach Knudsen K.E., Kohler A. 
Sparse multi-block PLSR for biomarker discovery when integrating data from LC-MS and NMR metabolomics.
Metabolomics 11 (2015) 367.

Shapaval V., Afseth N.K., Vogt G., Kohler A. 
Fourier Transform Infrared Spectroscopy for the prediction of fatty acid profiles in Mucor fungi in media with different carbone sources.
Microbial Cell Factories 4 (2014)

Kohler A., Boecker U., Shapaval V., Forsmark A., Anderssion M., Warringer J., Martens H., Omholt S.W., Blomberg A.
High-throughput biochemical fingerprinting of Saccharomyces cerevisiae by Fourier transform infrared spectroscopy.
PLOS One 10 (2014) e0118052

Hovde Liland K., Kohler A., Shapaval V.   
Hot PLS—a framework for hierarchically ordered taxonomic classification by partial least squares.
Chemometrics and Intelligent Laboratory Systems 15 (2014)

Hassani S., Hanafi M., Qannari M., Kohler A. 
Deflation strategies for multi-block principal component analysis revisited.
Chemometrics and Intelligent Laboratory Systems. 120 (2013) 154–168.

Karaman I., Qannari M., Martens H., Skou Hedemann M., Bach Knudsen K.E., Kohler A., 
Comparison of Sparse and Jack-knife partial least squares regression methods for variable selection. 
Chemometrics and Intelligent Laboratory Systems. 122 (2013) 65-77.

Eslami A., Qannari E.M., Kohler A., Bougeard S.
Analyses factorielles de données structurées en groupes d’individus (Multivariate data analysis of multi-group datasets). 
Journal de la Société Française de Statistique 44 (2013) 2102.

Eslami A., Qannari E.M., Kohler A., Bougeard S., 
General overview of methods of analysis of multi-group datasets. 
Revue des Nouvelles Technologies de l’information (RNTI) 25 (2013) 113.

Shapaval V., Schmitt J., Møretrø T., Suso HP, Skaar I., Åsli AW., Lilehaug D., and Kohler A.
Characerization of food spoilage fungi by FTIR spectroscopy.
Journal of Applied Microbiology 114 (2013)

Hassani S., Martens H., Qannari M., Kohler A. 
Degrees of freedom estimation in Principal Component Analysis and consensus principal component analysis.
Chemometrics and Intelligent Laboratory Systems 118 (2012) 246-259

Hassani S., Martens H., Qannari M., Hanafi M., Kohler A. 
Model validation and error estimation in multi-block partial least squares regression.
Chemometrics and Intelligent Laboratory Systems 117 (2012) 42-53

Hanafi M., Kohler A., Qannari M.
Connections between multiple co-inertia analysis and consensus principal component analysis.
Chemometrics and Intelligent Laboratory Systems 106 (2011) 37-40. 

Hanafi M., Kohler A., Quannari, M. 
Shedding new light on Hierarchical Principal Component Analysis. 
Journal of Chemometics 24 (2010) 703-709.

Hassani S., Martens H., Qannari M., Hanafi M., Borge G.I., Kohler A. 
Analysis of –omics data: Graphical interpretation- and validation tools in multi–block methods.
Chemometrics and Intelligent Laboratory Systems 104 (2010) 140-153.


Published 23. May 2016 - 14:59 - Updated 19. March 2020 - 9:58