Data Science

Our group is specialized on the analysis of high-dimensional data in the field of life sciences. We are developing tools for the integration of different types of data from life sciences such as FTIR and Raman spectroscopy data, data from proteomics, metabolomics, DNA, mRNA. We are specialized in methods based on latent variables. Examples are multiblock methods for the integration of different types of data and sparse methods for biomarker detection.

Members of our group have invested heavily in being at the forefront in advanced multivariate data analysis for vibrational spectroscopic techniques (FTIR and Raman etc.) for many years. We have contributed substantially to the development of pre-processing techniques for the separation of scatter contributions from chemical information for infrared spectra of biological materials. These techniques are widely used in the field of life sciences and in the medical.

We are developing techniques for classification and calibration, which are tailored for FTIR spectroscopic data from biological materials. FTIR spectroscopy has been proven a powerful tool for the identification and characterization of microorganisms such as bacteria, yeasts, moulds and pollen. It is a sensitive biophysical technique, which performs microbial identification at the species and in some cases even strains level. Spectral reference libraries are used to establish hierarchical classification trees that allow identification of microorganisms at different taxonomic levels. We are specialized on sparse calibration methods. These methods allow establishing robust calibration models which can easily be interpreted. Biomarker selection tools provide valuable information on the principal differences among classes.

In addition to identification, biochemical composition of microorganisms can be obtained from FTIR analysis using Sparse PLSR based models. This allows screening lots of strains for oil production or other valuable components in microorganisms. We are specialized on establishing calibration models for the prediction of metabolites in cells by FTIR spectroscopy. Examples are the use of FTIR spectroscopy for estimating lipid profiles in biotechnologically relevant strains.

 

Orange and Quasar:

Orange is an open-source data visualization, machine learning and data mining toolkit. It features a visual programming front-end for explorative data analysis and interactive data visualization, and can also be used as a Python library. This science platform contains a number of data mining tools for classification, prediction and visualization for multivariate spectral data. It was recently extended to Orange Infrared, with tools specifically developed for the visualization and analysis of vibrational spectra, by an international initiative in which BioSpec Norway is heavily involved. It contains handy visualization tools for vibrational spectroscopic data for all scales. It allows visualization of complex vibrational spectroscopy data such as 3D data cubes from macro and nano vibrational spectroscopy. The Orange Platform is currently being extended to Orange Infrared by an international initiative in which BioSpec Norway is heavily involved. See the tutorial video for Spectral Orange, a part of Orange for analyzing spectroscopy data:


Quasar is an open source project, a collection of data analysis toolboxes extending the Orange suite.

 

 

Projects:

PHOTONFOOD - Flexible Mid-Infrared Photonic Solution for Rapid Farm-to-Fork Sensing of Food Contaminants
European Commision (H2020-ICT-2020-2, Project Nº 101016444)

DigiFoods - Digital Food Quality
Research Council of Norway (SFI project Nº. 309259)

DeepHyperSpec - Combining spectral and image information in the analysis of hyperspectral imaging data
Research Council of Norway (FRINATEK project Nº. 289518)

Belanoda - Multidisciplinary graduate and post-graduate education in big data analysis for life sciences
Senter for internasjonalisering av utdanning (SiU-CPEA-LT-2016/10126)

MIRACLE - Mid-infrared arthroscopy innovative imaging system for real-time clinical in depth examination and diagnosis of degenerative joint diseases
European Commision (H2020-ICT Nº. 780598)

LipoFungi - Bioconversion of low-cost fat materials into high-value PUFA-Carotenoid-rich biomass
Research Council of Norway (BIONÆR, project Nº. 268305)

Single Cell Oil - Single cell oil PUFA production by food rest materials
Research Council of Norway (BIONÆR, project Nº. 234258)

BigSpecData - Data analysis for big vibrational spectroscopic data
Research Council of Norway (IS-AUR, project Nº. 281263)

FUST - Source tracking and monitoring of mould contamination in food production
European Commision (FP7-SME Nº. 315271)

MERITS - Development and mitigation of metabolic syndrome
Innovation Fund Denmark (Grant Nº. 2014-5158)

AMS - New approaches for management and breeding of dairy cows in automatic milking systems
Research Council of Norway (MATFONDAVTALE, Project Nº 244231)

 

Master of Science in Data Science:

The MSc program at NMBU combines the disciplines of informatics, mathematics, statistics and data analysis. You can find out more about the program HERE and HERE.

 

Literature:

Kong B., Brandsrud M.A., Heitmann Solheim J., Nedrebø I., Blümel R., Kohler A.
Effects of the coupling of dielectric spherical particles on signatures in infrared microspectroscopy
Scientific Reports 12 (2022) 13327

Akulava V., Miamin U., Akhremchuk K., Valentovich L., Dolgikh A., Shapaval V.
Isolation, Physiological Characterization, and Antibiotic Susceptibility Testing of Fast-Growing Bacteria from the Sea-Affected Temporary Meltwater Ponds in the Thala Hills Oasis (Enderby Land, East Antarctica).
Biology 11 (2022) 1143

Smirnova M., Tafintseva V., Kohler A., Miamin U., Shapaval V.
Temperature-and Nutrients-Induced Phenotypic Changes of Antarctic Green Snow Bacteria Probed by High-Throughput FTIR Spectroscopy.
Biology 11 (2022) 890

Rehman H.U., Tafintseva V., Zimmermann B., Solheim J., Virtanen V., Shaikh R., Nippolainen E., Afara I., Saarakkala S, Rieppo L., Krebs P., Fomina P., Mizaikoff B., Kohler A.
Preclassification of broadband and sparse infrared data by multiplicative signal correction approach.
Molecules 27 (2022) 2298

Virtanen V., Tafintseva V., Shaikh R., Nippolainen E., Haas J., Afara I., Töyräs J., Kröger H., Solheim J., Zimmermann B., Kohler A., Mizaikoff B., Finnilä M., Rieppo L., Saarakkala S.
Infrared spectroscopy is suitable for objective assessment of articular cartilage health
Osteoarthritis and Cartilage Open 4 (2022) 100250

Tafintseva V., Lintvedt T.A., Solheim J., Zimmermann B., Rehman H.U., Virtanen V., Shaikh R., Nippolainen E., Afara I., Saarakkala S, Rieppo L., Krebs P., Fomina P., Mizaikoff B., Kohler A.
Preprocessing Strategies for Sparse Infrared Spectroscopy: A Case Study on Cartilage Diagnostics.
Molecules 27 (2022) 873

Heitmann Solheim J., Borondics F., Zimmermann B., Sandt C., Muthreich F., Kohler A.
An automated approach for fringe frequency estimation and removal in infrared spectroscopy and hyperspectral imaging of biological samples
Journal of Biophotonics 14 (2021) e202100148

Figoli C.B., Garcea M., Bisioli C., Tafintseva V., Shapaval V., Gómez Peña M., Gibbons L., Althabe F., Yantorno O.M., Horton M., Schmitt J., Lasch P., Kohler A., Bosch A.
A robust metabolomics approach for the evaluation of human embryos from in vitro fertilization
Analyst 146 (2021) 6156

Blazhko U., Shapaval V., Kovalev V., Kohlera A.
Comparison of augmentation and pre-processing for deep learning and chemometric classification of infrared spectra
Chemometrics and Intelligent Laboratory Systems 215 (2021) 104367.

Virtanen V., Nippolainen E., Shaikh R., Afara I.O., Töyräs J., Solheim J., Tafintseva V., Zimmermann B., Kohler A., Saarakkala S., Rieppo L.
Infrared Fiber-Optic Spectroscopy Detects Bovine Articular Cartilage Degeneration
Cartilage (2021)

Curtasu M.V., Tafintseva V., Bendiks Z., Marco M.L., Kohler A., Xu Y., Nørskov N.P., Nygaard Lærke H., Knudsen K.E.B., Hedemann M.S.
Obesity-Related Metabolome and Gut Microbiota Profiles of Juvenile Göttingen Minipigs—Long-Term Intake of Fructose and Resistant Starch
Metabolites 10 (2020) 456

Tafintseva V., Shapaval V., Smirnova M., Kohler A.
Extended multiplicative signal correction for FTIR spectral quality test and pre‐processing of infrared imaging data.
Journal of Biophotonics 13 (2020) e201960112

Trukhan S., Tafintseva V., Tøndel K., Großerueschkamp F., Mosig A., Kovalev V., Gerwert K., Kohler A.
Grayscale representation of infrared microscopy images by Extended Multiplicative Signal Correction for registration with histological images.
Journal of Biophotonics 13 (2020) e201960223

Sirovica, S., Heitmann Solheim J., Skoda, M.W.A., Hirschmugl C.J., Mattson E.C., Aboualizadeh E., Guo Y., Chen X., Kohler A., Romanyk D.L., Rosendahl S.M., Morsch S., Martin R.A., Addison O.
Origin of micro-scale heterogeneity in polymerisation of photo-activated resin composites.
Nature Communications 11 (2020) 1849

Kohler A., Solheim J., Tafintseva V., Zimmermann B., Shapaval V. 
Model-Based Pre-Processing in Vibrational Spectroscopy
Comprehensive Chemometrics, Second Edition (2020) 83

Tafintseva V., Shapaval V., Smirnova M., Kohler A.
Extended multiplicative signal correction for FTIR spectral quality test and pre‐processing of infrared imaging data.
Journal of Biophotonics 13 (2020) e201960112

Solheim J., Gunko E., Petersen D., Grosserruschkamp F., Gerwert K., Kohler A.
An open source code for Mie Extinction EMSC for infrared microscopy spectra of cells and tissues.
Journal of Biophotonics 12 (2019) e201800415

Guoa S., Kohler A., Zimmermann B., Heinke R., Stöckel S., Rösch P., Popp J., Bocklitz T.
EMSC based model transfer for Raman spectroscopy in biological applications.
Analytical Chemistry 90 (2018) 9787.

Tafintseva V., Vigneau E., Shapaval V., Cariou V., Qannari E.M., Kohler A.
Hierarchical classification of microorganisms based on high-dimensional phenotypic data.
Journal of Biophotonics 11 (2018)

Karaman I., Nørskov N.P., Yde C.C., Skou Hedemann M., Bach Knudsen K.E., Kohler A. 
Sparse multi-block PLSR for biomarker discovery when integrating data from LC-MS and NMR metabolomics.
Metabolomics 11 (2015) 367.

Shapaval V., Afseth N.K., Vogt G., Kohler A. 
Fourier Transform Infrared Spectroscopy for the prediction of fatty acid profiles in Mucor fungi in media with different carbone sources.
Microbial Cell Factories 4 (2014)

Kohler A., Boecker U., Shapaval V., Forsmark A., Anderssion M., Warringer J., Martens H., Omholt S.W., Blomberg A.
High-throughput biochemical fingerprinting of Saccharomyces cerevisiae by Fourier transform infrared spectroscopy.
PLOS One 10 (2014) e0118052

Hovde Liland K., Kohler A., Shapaval V.   
Hot PLS—a framework for hierarchically ordered taxonomic classification by partial least squares.
Chemometrics and Intelligent Laboratory Systems 15 (2014)

Hassani S., Hanafi M., Qannari M., Kohler A. 
Deflation strategies for multi-block principal component analysis revisited.
Chemometrics and Intelligent Laboratory Systems. 120 (2013) 154–168.

Karaman I., Qannari M., Martens H., Skou Hedemann M., Bach Knudsen K.E., Kohler A., 
Comparison of Sparse and Jack-knife partial least squares regression methods for variable selection. 
Chemometrics and Intelligent Laboratory Systems. 122 (2013) 65-77.

Eslami A., Qannari E.M., Kohler A., Bougeard S.
Analyses factorielles de données structurées en groupes d’individus (Multivariate data analysis of multi-group datasets). 
Journal de la Société Française de Statistique 44 (2013) 2102.

Eslami A., Qannari E.M., Kohler A., Bougeard S., 
General overview of methods of analysis of multi-group datasets. 
Revue des Nouvelles Technologies de l’information (RNTI) 25 (2013) 113.

Shapaval V., Schmitt J., Møretrø T., Suso HP, Skaar I., Åsli AW., Lilehaug D., and Kohler A.
Characerization of food spoilage fungi by FTIR spectroscopy.
Journal of Applied Microbiology 114 (2013)

Hassani S., Martens H., Qannari M., Kohler A. 
Degrees of freedom estimation in Principal Component Analysis and consensus principal component analysis.
Chemometrics and Intelligent Laboratory Systems 118 (2012) 246-259

Hassani S., Martens H., Qannari M., Hanafi M., Kohler A. 
Model validation and error estimation in multi-block partial least squares regression.
Chemometrics and Intelligent Laboratory Systems 117 (2012) 42-53

Hanafi M., Kohler A., Qannari M.
Connections between multiple co-inertia analysis and consensus principal component analysis.
Chemometrics and Intelligent Laboratory Systems 106 (2011) 37-40. 

Hanafi M., Kohler A., Quannari, M. 
Shedding new light on Hierarchical Principal Component Analysis. 
Journal of Chemometics 24 (2010) 703-709.

Hassani S., Martens H., Qannari M., Hanafi M., Borge G.I., Kohler A. 
Analysis of –omics data: Graphical interpretation- and validation tools in multi–block methods.
Chemometrics and Intelligent Laboratory Systems 104 (2010) 140-153.

 

Published 23. May 2016 - 14:59 - Updated 5. October 2022 - 13:15