IntroductionThe first assignment of this course is a competition. The challenge is to build a classification model for a given dataset, and all submitted predictions will be ranked (and graded) on achieved accuracy. You are given a labeled dataset on which you can try out several algorithms, and an unlabeled dataset for which you are asked to provide the labels (predictions) using your model. | |||||||||||||||
ExperimentingThe labeled dataset can be found here. Use it to preprocess the data, select algorithms, optimize parameters and build models, using the Weka Explorer or Experimenter. Note that this a rather large dataset, and some classifiers may require a lot of memory. Therefore, it is good to start Weka with additional memory, e.g., using 'java -Xmx1000M -jar weka.jar'. | |||||||||||||||
Submitting your predictionsThe unlabeled dataset can be found here. When you have done all preprocessing and have selected your classifier and parameter settings, you should use the generated model to generate predictions for this unlabeled dataset. For instance, you can do the following:
Using this method, you will get an output that looks like this: === Predictions on test set ===
These are the instance number, actual label (unknown), the prediction (pos or neg), the error (unknown) and the probability of each prediction. If you use WEKA 3.7, the output can be slightly different. Finally, send in the entire prediction list. Timing
What to hand in
You should hand in a predictions file named xxxxxxx-prediction.csv, where xxxxxxx is your student number (no leading 's').
Do not compress your file. The prediction file should start with the header line 'inst# actual predicted error probability distribution'. The number of instances should be exactly equal to the number of records in the unlabeled dataset file. (If it is not, you are probably about to hand in a result on the training data.) Both comma and tab delimited prediction files are accepted.
Both files can be sent to dami@liacs.leidenuniv.nl
|