"We must remember that big data also can be easily misused," says Biostatistics Professor Solve Sæbø at Norwegian University of Life Sciences (NMBU).
As a professor of statistics he is more than averagely interested in the opportunities and pitfalls afforded by big data.
"The transition from statistics to big data is not long: for what is statistics, other than analysis of large amounts of information and extracting information from large data sets?
“What is new is the sheer volume of data and the improvements in computer technology. Big data has enabled a quantum leap in nearly all aspects of research," says Sæbø.
Sæbø is himself conducting research into how the various personality types learn best.
He is collaborating with psychology Doctor Helge Brovold on the analysis of results from the educational test carried out by the Norwegian Centre for Science Recruitment, where so far around 50,000 young people have voluntarily answered questions related to science education.
The questions concerned careers interest, personality, preferred learning method and preferences for science subjects. Sæbø has used the same test on the students taking the basic course in statistics at NMBU.
Part of the research is based on the standard Five Factor Model of Personality, which groups people in five categories based on how emotionally reactive, open, conscientious, extroverted and agreeable they are.
The purpose is to find out how well standard lectures work for different personality types, compared with more active teaching forms, such as the 'flipped classroom' technique.
As part of the latter teaching method students learn the subject through relevant exercises, group work and discussion, i.e. they learn more through their own efforts. Before class, the students watch pre-recorded lectures in peace and quiet online.
Jazz or brass band
The results so far indicate that the personality types that collaborate well and gain knowledge through talking, can benefit greatly from the flipped classroom pedagogy. The same applies to more creative personality types, what Sæbø calls 'jazz musicians'.
In this way, research based on big data can reveal how to attract more of the jazz musician types into the sciences, and not only the brass band musicians that have traditionally been attracted to sciences in the greatest numbers.
Can go wrong
Big data is information with a high volume, speed and variation.
The analysis of the huge volumes of data is, of course, made very much easier by a machine doing the work and searching for patterns for us. Just imagine what an enormous task it would have been to count, record and compare the various answers from 50,000 people in the science subjects survey.
"This is how it is in a great many research projects today. The data can be collected incredibly easily and analysed in a variety of ways. And this is where expertise in statistics comes in, as such analyses can also push things right off track," emphasises Sæbø.
Cause of rheumatism
It is important to know the difference between whether something causes something else (causality), or if two things exist side by side (correlation). A lot of 'fake news' can be generated through their misconception.
Limited financing in a research project can suffice to give false causality. Researchers are often dependant on many test subjects or persons.
As an example Sæbø mentions a research team that may want to investigate whether the cause of rheumatism can be found in the genetic make-up of the individuals who get the disease.
Genetic analysis methods have now become so advanced that the researchers can test for say 500,000 different genetic variations in a single tissue sample. Let’s say the researchers take tissue samples from 20 persons: ten healthy individuals and ten with the disease. This is time-consuming and labour-intensive, and assume they do not have the financial means to test more.
They analyse the samples for 500,000 different markers and typically find that several such genetic markers concur with the presence of rheumatism. It is easy at this stage to jump to false conclusions, if you have not taken into account a statistical problem known as multiple testing. Sæbø says:
"Because they are testing so many variables, it is highly likely that one or more completely random markers have measurements that go up among the sick subjects, and go down on the healthy subjects."
The researchers may thus conclude that the random markers are an indication or, in the worst case, the cause of the disease.
"This can be easily revealed by retrieving data from 20 new subjects to check only these markers, but unfortunately very often such a follow-up study is not done as part of the main study,” he adds.
Goodbye to private life?
Big data makes it possible to create extremely complex systems that no human could ever do without digital power.
The combination of big data and artificial intelligence is yielding ever more devices to help us in our everyday lives, from autonomous cars and automated agricultural machinery to advanced prostheses and automatic face detection on your mobile phone. The possibilities are almost limitless.
Unfortunately, these tools can be used with less noble intentions than to create good teaching or to bring useful new knowledge to the masses through research, Sæbø admits.
Just think about all the information about ourselves that we post on social media. Facebook analyses our preferences, and shows us advertising for things we have shown interest in.
An overview of what we like on Facebook and search for on Google can provide revealing personality profiles right down to the individual level, provided the statistical expertise is there. This can in turn be used to create extremely personalised direct marketing to private individuals.
Knowledge is power
If the analysts have very many objects, for example all the Facebook users in the USA, and very many variables in the form of likes, click patterns and stated opinions, big data can yield terrifyingly accurate personality profiles.
Analyses of our Facebook and Twitter accounts can generate revealing knowledge about who we are, or at least about those parts of us we choose share on the internet - and some people share an awful lot.
Big data enables huge opportunities: within research, education, innovation and - manipulation. We must remain sceptical and vigilant - and a basic knowledge of statistics is also very useful.
Solve Sæbø rounds off laconically:
"The situation today is the same as in the 16th century when Francis Bacon said: "Knowledge is power".