IntroductionThe final assignment of this course is a competition. The challenge is to build to most accurate model for a given dataset, and the most accurate model wins. You are given a labeled dataset on which you can try out several algorithms, and an unlabeled dataset for which you are asked to provide the labels (predictions) using your model. You participate in this challenge by yourself. You can ask questions to classmates, but the final process and report should be done individually. Your submitted solution will count for 2 out of 10 points on your final grade (the remaining 8 points are given based on the written exam). Since this is a competition, the best solutions will gain a bonus point reward: the solutions will be ranked on predictive accuracy, and the top 30% of submissions will gain an additional bonus point. | |||||||||||||||
De resultaten van de data mining challenge zijn beschikbaar! | |||||||||||||||
ExperimentingThe labeled dataset can be found here. Use it to preprocess the data, select algorithms, optimize parameters and build models, using the WEKA Explorer or Experimenter. Note that this a rather large dataset, and some classifiers may require a lot of memory. Therefore, it is good to start WEKA with additional memory, e.g., using 'java -Xmx1000M -jar weka.jar'. | |||||||||||||||
Submitting your predictionsThe unlabeled dataset can be found here. Check that the first line of this file reads '@relation Challenge-unlabeled'. When you have done all preprocessing and have selected your classifier and parameter settings, you should use the generated model to generate predictions for this unlabeled dataset. Guidelines can be found here. For instance, you can do the following:
Using this method, you will get an output that looks like this: === Predictions on test set ===
These are the instance number, actual label (unknown), the prediction (pos or neg), the error (unknown) and the probability of each prediction. If you use WEKA 3.7, the output can be slightly different. Finally, send in the entire prediction list. Timing
What to hand in
Both can be sent to joaquin@liacs.nl. The predictions and the report together will count for 2/10 of your final grade, and the top 30% most accurate predictions will gain an additional bonus point. In case of a tie the report counts as a tie breaker. QuestionsIf you have questions about the assignment, you can contact the course lecturers. Also, there will be a question and answer session on 19-11-12. |