STIN100 Biological Data Analysis

Credits (ECTS):10

Course responsible:Torgeir Rhodén Hvidsten, Jon Olav Vik

Campus / Online:Taught campus Ås

Teaching language:Norsk

Limits of class size:800

Course frequency:Annually

Nominal workload:Plenary sessions: 54 hours. Exercise classes: 52 hours. Self study: 144 hours.

Teaching and exam period:This course starts in the Fall semester. This course has teaching/evaluation in the Fall semester.

About this course

Biology has become a data-rich science with datasets that can no longer be analyzed manually. To extract knowledge from data, biologists need knowledge and skills in programming and data analysis that enable them to explore, visualize and interpret data. This must be done reproducibly, so that it is clear how the data has been processed and easy to modify the analyses if desired.

This course provides basic skills in the programming language R and introduces the student to common methods for visualization and analysis of multi-dimensional biological data. The course is organized around supervised student groups analysing relevant data sets.

In a time when trust in scientific knowledge is no longer obvious, yet challenges of sustainability require informed decisions, the understanding of data and verifiable production of knowledge are essential. STIN100 helps ensure that future employers and decision makers can rely on the knowledge basis prepared by our graduates.

Learning outcome

KNOWLEDGE: The students will acquire

  • broad knowledge in handling, visualizing and analysing multidimensional biological data.
  • familiarity with how some of the most important biological data sets are generated and how this data should be preprocessed to correct for systematic errors.
  • a conceptual framework for mapping data to graphical elements.
  • a repertoire of programming techniques and concepts that are required to perform the analyses in the course.

SKILLS: Students will be able to

  • explain principles behind basic methods for data visualization and analysis.
  • write programs that perform basic data processing tasks (subsetting, transformation and groupwise summaries) and employ simple visualization and data analysis methods.
  • generate reproducible, executable reports that weave together expository text, program code and output.
  • propose biological interpretations of analysis results.
  • efficiently search documentation and internet resources to realize analyzes.
  • simplify data sets for prototyping and debugging of analyzes.

COMPETENCES: Students will be well prepared to

  • explore datasets they encounter in later term papers, theses and working life.
  • perform reproducible research where data processing is fully documented through executable reports.
  • compose data graphics using element appropriate to the data types and the biological structure in the data.
  • pose follow-up questions to data analyses for discussion with domain experts.
  • learn new methods and software packages with the help of documentation, code examples and web resources.
  • This course will make you independent in exploring, processing and describing data, and extracting biological meaning from data. Our main tools are the programming language R, the integrated programming environment called R Studio, and literate programming to create beautiful, reproducible reports from your analyses.

    Each week begins with a plenary session which motivates new topics, concretizes the week's learning objectives, and introduces new ways of working. These sessions involve student discussions, pair programming and other activities.

    In the middle days, students work on their own or in pairs. Teaching assistants are available in two realtime two-hour sessions per week (per student group), physically or in Microsoft Teams. Questions can also be posed in Discussions.in Canvas.

    Weekly checkpoint quizzes are due Thursday at 1700 hrs the first few weeks. The quizzes are designed to quickly verify that you can do what's required for the coming week. If something proves difficult, we address it on Friday. In the later parts of the course, there are hand-in assignments every other week, where you work in pairs, and finally three weeks for a final extended report.

    Friday wrap-up sessions in Zoom address topics that proved challenging in the checkpoint quiz. We summarize what you have learned and outline the new possibilites that open to you. We outline the coming week and open for questions.

    It is possible to attend the course purely online. You will then collaborate via Teams with video chat and screen sharing, and set up shared folders in OneDrive to enable pair programming over the internet. That said, we recommend attending physically if possible, since many key skills in the course are easier to advise on when we can watch how you use your hands and eyes.

    Our learning philosophy is: Active learning, in that you personally write programs and put into words what the data are telling you. Problem-based learning, centering on research questions relevant to NMBU. Collaborative learning, through pair programming and peer assessment. Student-driven, adaptive learning, in that drills and autogenerated exercises are available for you to practice what the checkpoint quizzes indicate that you need most.

  • The week plans link to videos motivating each topic, concretize the week's learning objectives, and link to howto videos, exercises and assignments, as explained in the video overview of the kinds of learning material in STIN100 (Norwegian only, sorry).

    Questions about data analysis and programming should preferably be posted in Discussions in Canvas with a reproducible example, making it easier to help and sharing with the class. Asking effective questions is a key skill which you will learn during the course.

    Checkpoint quizzes get individual feedback, partly automatically and partly from teaching assistants. Report assignments get feedback from teaching assistants.

    Teaching assistants are available for questions in plenary sessions and in exercise classes.

    See the "Syllabus" section for free online textbooks which are we sometimes refer to.

  • Grading is pass/fail based on approved hand-ins of a number of tests and report assignments throughout the semester. If an item is not approved, you'll get specific guidance and one extra attempt.

    Approved hand-ins are valid only in the current semester.

  • An external examiner must approve the evaluation arrangements for the course.
  • Grading is pass/fail based on approved hand-ins of a number of tests and report assignments throughout the semester. If an item is not approved, you'll get specific guidance and one extra attempt.

    Approved hand-ins are valid only in the current semester.

  • Students must bring their own laptop with Windows, Linux or macOS 11 or higher to run the computer programs we use. (See current system requirements.) Chromebooks do not meet the system requirements for the software we use.
  • Four weeks: 2 hours lecture with frequent computer exercises, 4 hours computer exercises with teacher and teaching assistants present.

    Three double weeks: 1 hour guest lecture on selected datasets, 1 hour on related programming and analysis techniques, 10 hours analysis and report writing on computers with teacher and teaching assistants present.

    Three weeks: 6 hours analysis and report writing on computers with teacher and teaching assistants present.

  • Passed / Not Passed
  • MATRS - General admission requirements, and R1 or (S1+S2) or similar mathematical skills