M8 - High Dimensional Data Analysis

Type of Course - Dates - Venue - Description - Target audience - Exam - IMPORTANT: Incorporation in DTP and reimbursement by DS
Course prerequisites - Teachers - Course material - Book recommandationsFees - Enrol

Type of course

 Due to the peak in omikron infections this course will be offered online only.


Six Monday and Thursday evenings in February 2022: February 7, 10, 14, 17, 21 and 24, 2022, from 5.30 pm to 9.30 pm.
Please note: the deadline for UGent PhD students who want a refund to open a dossier on the DS website (Application for Registration) is January 7, 2022.


This is an online course.


Modern high throughput technologies easily generate data on thousands of variables; e.g. health care data, genomics, chemometrics, environmental monitoring, web logs, movie ratings, … 

Conventional statistical methods are no longer suited for effectively analysing such high-dimensional data. 
Multivariate statistical methods may be used, but for often the dimensionality of the data set is much larger than the number of (biological) samples. Modern advances in statistical data analyses allow for the appropriate analysis of such data.

Methods for the analysis of high dimensional data rely heavily on multivariate statistical methods. Therefore a large part of the course content is devoted to multivariate methods, but with a focus on high dimensional settings and issues.

Multivariate statistical analysis covers many methods. In this course a selection of techniques is covered based on our experience that they are frequently used in industry and research institutes.

The course is taught using case studies with applications from different fields (analytical chemistry, ecology, biotechnology, genomics, …).


  1. Dimension reduction: Singular Value Decomposition (SVD), Principal Component Analysis (PCA), Multidimensional Scaling (MDS) and biplots for dimension-reduced data visualisation
  2. Sparse SVD and sparse PCA 
  3. Prediction with high dimensional predictors: principal component regression; ridge, lasso and elastic net penalised regression methods 
  4. Classification (prediction of class membership): (penalised) logistic regression and linear discriminant analysis
  5. Evaluation of prediction models: sensitivity, specificity, ROC curves, mean squared error, cross validation
  6. Clustering
  7. Large scale hypotheses testing: FDR, FDR control methods, empirical Bayes (local) FDR control

Target audience

This course targets professionals and investigators from all areas that are high-dimensional.


Participants can, if they wish, take part in an exam. Upon succeeding in this test a certificate from Ghent University will be issued.
The exam consists of a take home project assignment. Students are required to write a report by a set deadline.

Incorporation in DTP and reimbursement from DS for UGent PhD students

As a UGent PhD student, to be able to incorporate this 'specialist course' in your Doctoral Training Program (DTP) and get a refund of the registration fee from your Doctoral School (DS) you need to follow strict rules: please take the necessary action in time. The deadline to open a dossier on the DS website (Application for Registration) for this course is January 7, 2022. Please note that opening a dossier does not mean that you are enrolled. You still need to enrol via the registration form on this site.

Course prerequisites

Ready at hand knowledge of basic statistics: data exploration and descriptive statistics, statistical modeling, and inference: linear models, confidence intervals, t-tests, F-tests, anova, chi-squared test, such as covered in Module 2, Module 5 and Module 12 of this year's course program.


Foto Lieven ClementProf. dr. Lieven Clement is an Associate Professor of Statistical Genomics at Ghent University. He is an expert in developing statistical methods and open source tools for differential omics data analysis. His lab is built around two strategic research pillars each connected to an omics domain: (single cell) transcriptomics and proteomics. He is a member of the core team that established a new Master of Science in Bioinformatics at Ghent University and has a track record in teaching statistics, statistical genomics and high dimensional data analysis to students in the life sciences and statistical data-analysis. He also gives short courses in statistics and proteomics data analysis in prominent bioinformatics programmes in Europe (Wellcome Trust Advanced Courses and Gulbenkian Institute Training Programme, amongst others). He is a strong advocate of open and reproducible science and teaching to empower researchers and students with freely available, user-friendly, operating system independent, state-of-the-art bioinformatics tools, and by making all research code, data and teaching materials well documented, open and accessible.

Foto Milan MalfaitDrs. Milan Malfait has a background in bioengineering and bioinformatics. He is currently pursuing a PhD in statistical data analysis at Ghent University in collaboration with Janssen Pharmaceutica. His main research topic is to develop robust and scalable statistical methods for differential expression analysis with single-cell transcriptomics data.



Course material

All material will be available on a github course website.

Book recommendations


Different prices apply, depending on your main type of employment.

Employment Module 8 Exam
Industry/Private sector1 1110 30
Non-profit, government, higher education staff2 835 30
(Doctoral) students, retired, unemployed2 375 30

1 If two or more employees from the same company enrol simultaneously for this course a reduction of 20% on the module price is taken into account starting from the second enrolment.

2 UGent staff and UGent doctoral students who pay internally via SAP or internal transfer can participate at these special rates

Enrol for this course