Practicals

The practical part of the Data Mining course will consist of two larger assignments that will teach you about various aspects of data mining as well as introduce you to two software packages. Both assignments are individual and can be completed in your own time, with two separate deadlines in this fall. The assignments contribute for twice 15% of your final grade, with the exam accounting for 70% of the grade. Participation in the two assignments is mandatory, but you are allowed to score a 5.0 for each, in order to pass, just as long as you achieve a 5.5 on average.

Assignment 1 will start on Oct 1, with a submission deadline of Nov 10. This first assignment will consist of a challenge in black-box data mining. You will be provided with a dataset (training set) for which you will need to produce a good classifier. On a separate, unlabeled dataset (test set), you will have to produce predictions with your prefered classifier, and these predictions need to be handed in. You are advised to use the package Weka to do the data preprocessing, selection of classifiers and generation of predicitions.

The two links below contain (optional) exercises that will make you familiar with Weka and the operations required for producing successful predictions. During the second lecture on Oct 1, you can work on these exercises (and the actual challenge) and assistance is available to help you along. No reports on the exercises need to be handed in, and you can work in teams on them. The final predictions will need to be handed in via email, and each submission should be linked to a single student.

Exercise 1: Introduction to the Weka Data Mining Software
Exercise 2: Weka Experiment Environment
Challenge: Data Mining Challenge

The grade for Assignment 1 will be computed on the basis of a ranking of achieved scores on the predictions. A valid prediction (that is, syntactically correct but not necessarily good predictions) will guarantee at least a 6.0. The better your score, the higher your grade will be. The best prediction(s) will score a 10.

Students who completed the similar challenge last year can ask us to cary over the score from last year, in which case they have sufficiently submitted Assignment 1. The grade for this assignment will then be computed on the basis of the score of last year. You are also free to resubmit predictions. Last year's students will still have to complete Assignment 2, though! Last year's grading method (bonus point) doesn't apply to this year.

Assignment 2 will start on November 12, with a submission deadline of Dec 18. The assignment involves the software tool Rapidminer. Details can be found here.