CS4811 HW5-Learning Decision Trees

CS4811: Homework 5 --- Learning Decision Trees

Due: Monday, April 2, 2012, beginning of class.
(Assigned: Monday, March 19, 2012.)

Reminder: This is an individual assignment. All the work should be the author's and in accordance with the university's academic integrity policies. You are allowed to use any written source in preparing your answers, but if you use any other source than the textbook and the class notes, you should specify it on your assignment.

Problem:

In this assignment, you will implement the basic algorithm for learning decision trees.

You should implement your own code from scratch. You may consult existing implementations but you may not build on them. The textbook's web site has Java, Python and Lisp implementations, if you'd like to take a look.

Task:

Implement a decision tree learner that uses the "probability error" heuristic that we discussed in class. Test it with two datasets:

The restaurant example in the textbook
The mushroom dataset at the UCI machine learning repository (http://archive.ics.uci.edu/ml/datasets/Mushroom)

For each run, print the trace of how the decision tree is generated. For each node of the decision tree, the trace should include information about each attribute tested and its probability of error value and the attribute that was chosen. When the final decision tree is generated, print the tree in a way that is readable and is convenient to you.

Write a short report that summarizes your implementation and how to cleaned up the data and the experimental findings. Include full instructions on how to execute your code. We should be able to test it with another set of examples without noise from the mushroom domain. Make sure to submit the data file you used and explain how we can feed another data file to your program.

Submit (both online and hardcopy)

The fully commented code.
The report in its original format and in pdf format.

Whenever possible, please use enscript (or equivalent) with two columns to save paper: enscript -2Gr -Pprinter files