Course code STIN100

STIN100 Biological Data Analysis

There may be changes to the course due to to corona restrictions. See Canvas and StudentWeb for info.

Norsk emneinformasjon

Search for other courses here

Showing course contents for the educational year 2021 - 2022 .

Course responsible: Jon Olav Vik, Torgeir Rhodén Hvidsten
Teachers: Jon Olav Vik, Simen Rød Sandve, Kathrine Frey Frøslie
ECTS credits: 10
Faculty: Faculty of Chemistry, Biotechnology and Food Science
Teaching language: NO
(NO=norsk, EN=Engelsk)
Limits of class size:
800
Teaching exam periods:
This course starts in the Fall semester. This course has teaching/evaluation in the Fall semester.
Course frequency: Annually
First time: 2018H
Course contents:

Biology has become a data-rich science with datasets that can no longer be analyzed manually. To extract knowledge from data, biologists need knowledge and skills in programming and data analysis that enable them to explore, visualize and interpret data. This must be done reproducibly, so that it is clear how the data has been processed and easy to modify the analyses if desired.

This course provides basic skills in the programming language R and introduces the student to common methods for visualization and analysis of multi-dimensional biological data. The course is organized around supervised student groups analysing relevant data sets.

In a time when trust in scientific knowledge is no longer obvious, yet challenges of sustainability require informed decisions, the understanding of data and verifiable production of knowledge are essential. STIN100 helps ensure that future employers and decision makers can rely on the knowledge basis prepared by our graduates.

Learning outcome:

KNOWLEDGE: The students will acquire

  • broad knowledge in handling, visualizing and analysing multidimensional biological data.
  • familiarity with how some of the most important biological data sets are generated and how this data should be preprocessed to correct for systematic errors.
  • a conceptual framework for mapping data to graphical elements.
  • a repertoire of programming techniques and concepts that are required to perform the analyses in the course.

SKILLS: Students will be able to

  • explain principles behind basic methods for data visualization and analysis.
  • write programs that perform basic data processing tasks (subsetting, transformation and groupwise summaries) and employ simple visualization and data analysis methods.
  • generate reproducible, executable reports that weave together expository text, program code and output.
  • propose biological interpretations of analysis results.
  • efficiently search documentation and internet resources to realize analyzes.
  • simplify data sets for prototyping and debugging of analyzes.

COMPETENCES: Students will be well prepared to

  • explore datasets they encounter in later term papers, theses and working life.
  • perform reproducible research where data processing is fully documented through executable reports.
  • compose data graphics using element appropriate to the data types and the biological structure in the data.
  • pose follow-up questions to data analyses for discussion with domain experts.
  • learn new methods and software packages with the help of documentation, code examples and web resources.
Learning activities:

This course will make you independent in exploring, processing and describing data, and extracting biological meaning from data. Our main tools are the programming language R, the integrated programming environment called R Studio, and literate programming to create beautiful, reproducible reports from your analyses.

Each week begins with a plenary session which motivates new topics, concretizes the week's learning objectives, and introduces new ways of working. These sessions involve student discussions, pair programming and other activities.

In the middle days, students work on their own or in pairs. Teaching assistants are available in two realtime two-hour sessions per week (per student group), physically or in Microsoft Teams. Questions can also be posed in Discussions.in Canvas.

Weekly checkpoint quizzes are due Thursday at 1700 hrs the first few weeks. The quizzes are designed to quickly verify that you can do what's required for the coming week. If something proves difficult, we address it on Friday. In the later parts of the course, there are hand-in assignments every other week, where you work in pairs, and finally three weeks for a final extended report.

Friday wrap-up sessions in Zoom address topics that proved challenging in the checkpoint quiz. We summarize what you have learned and outline the new possibilites that open to you. We outline the coming week and open for questions.

It is possible to attend the course purely online. This enables students in quarantine or risk groups to attend, and in case of coronavirus lockdown we move all activity online. You will then collaborate via Teams with video chat and screen sharing, and set up shared folders in OneDrive to enable pair programming over the internet.

Our learning philosophy is: Active learning, in that you personally write programs and put into words what the data are telling you. Problem-based learning, centering on research questions relevant to NMBU. Collaborative learning, through pair programming and peer assessment. Student-driven, adaptive learning, in that drills and autogenerated exercises are available for you to practice what the checkpoint quizzes indicate that you need most.

Teaching support:

The week plans link to videos motivating each topic, concretize the week's learning objectives, and link to howto videos, exercises and assignments, as explained in the video https://www.youtube.com/watch?v=eR-IzHm5358overview of the kinds of learning material in STIN100 (Norwegian only, sorry).

Questions about data analysis and programming should preferably be posted in Discussions in Canvas with a reproducible example, making it easier to help and sharing with the class. Asking effective questions is a key skill which you will learn during the course.

Checkpoint quizzes get individual feedback, partly automatically and partly from teaching assistants. Report assignments get feedback from teaching assistants.

Teaching assistants are available for questions in plenary sessions and in exercise classes.

See the "Syllabus" section for free online textbooks which are we sometimes refer to.

Syllabus:

STIN100's focus is on doing, and the detailed learning objectives for each week state very in concrete terms what you should be capable of doing by the end of the week. The learning objectives are formulated to make it obvious to you and to us whether or not you have achieved 

See lecture notes, exercises and handouts, and selected parts of the online textbooks https://rstudio-education.github.io/hopr/Hands-on programming with R and https://r4ds.had.co.nzR for data science (especially chapters 3 (Data visualisation), 9 (Introduction to data wrangling), 12 (Tidy data), 18 (Pipes), 27 (R markdown)).

Recommended prerequisites:

Do I need a lot of mathematics? Biology? Computer skills?

You need neither know lots of math nor biology to take the course, but you must know your file system, your keyboard, your web browser and your computer! For some this requires a major sprint the first week or so, but we offer drills with feedback tailored to your level!

Assessment:

Grading is pass/fail based on approved hand-ins of a number of tests and report assignments throughout the semester. If an item is not approved, you'll get specific guidance and one extra attempt.

Approved hand-ins are valid only in the current semester.

Nominal workload:
Plenary sessions: 54 hours. Exercise classes: 52 hours. Self study: 144 hours.
Entrance requirements:
MATRS - General admission requirements, and R1 or (S1+S2) or similar mathematical skills
Type of course:

Four weeks: 2 hours lecture with frequent computer exercises, 4 hours computer exercises with teacher and teaching assistants present.

Three double weeks: 1 hour guest lecture on selected datasets, 1 hour on related programming and analysis techniques, 10 hours analysis and report writing on computers with teacher and teaching assistants present.

Three weeks: 6 hours analysis and report writing on computers with teacher and teaching assistants present.

Note:
Students must bring their own laptop with Windows, Linux or macOS 10.13 or higher to run the computer programs we use. (See current system requirements.)
Examiner:
An external examiner must approve the evaluation arrangements for the course.
Examination details: Continous exam: Passed / Failed