Handwritten Digit Recognition


Handwritten Digit Recognition

To have full control over preprocessing, we have created our own dataset for handwritten digit recognition. The full preprocessing is described in technical report TR-2005-27 (citation see below), and the digits were contributed by Austrian university students as part of our 2005 lecture AI Methods for Data Analysis, sadly discontinued. We scanned one page per student and automatically extracted the image data for the handwritten digits.

Please contact us if you want to set up something similar for your own lecture. We welcome contributions, as it would take around 5 years for one similar-sized lecture to create a dataset of similar size as MNIST using this approach, and size seems to be the main determinant of error rate for SVM classifiers. Note that MNIST has due to its automated segmentation a segmentation error rate of around 1%, which makes interpretation of quoted error rates less than 1% quite hard.

The approach has the following advantages for you and your students.

  • You can give each class their own data. Cheating is made impossible. ;-)
  • Besides the pixel-based representation here, we also have code for OCR-like features. These are more amenable to feature selection, transformation and construction tasks.
  • You are helping to build a resource to determine the influence of preprocessing on handwritten digit recognition.


All files are in gzipped ARFF format (for WEKA). Please gunzip before use. Digits were downsampled to 16x16 pixels with Mitchell filter with parameter blur set to 2.5 (see paper for more details). SMO -E 5 -C 10 -F gives 6.46% error rate on these datasets (use -t train -T test). This set was contributed by students of the class SS2005.

If you use this dataset, please cite one of the technical reports.


We also revisited some assumptions about machine learning in 2009 and found that state-of-the-art machine learning systems are just as brittle as their old classical AI counterparts. Brittleness in this context means that their generalization performance on the whole task space (estimated by three distinct datasets) is very unsatisfactory -- they are unable to recognize handwritten digits in general, and the models are very specific to each dataset. You can find the empirically well-founded argumentation in the paper. This might be generally true as well, although that would be extremely hard to prove.