Learning agents
Design of a learning element is affected by
- Which components of the performance element are to be learned
- What feedback is available to learn these components
- What representation is used for the components
Type of feedback:
- Supervised learning: correct answers for each example
- Unsupervised learning: correct answers not given
- Reinforcement learning: occasional rewards
Inductive learning
Simplest form: learn a function from examples
$f$ is the target function
An example is a pair $(x,f(x))$
Problem: find a hypothesis h
- such that $h \approx f$
- given a training set of examples
(This is a highly simplified model of real learning:
- Ignores prior knowledge
- Assumes examples are given)
Decision tree learning
One possible representation for hypotheses
E.g., here is the “true” tree for deciding whether to wait:
Expressiveness
Decision trees can express any function of the input attributes.
E.g., for Boolean functions, truth table row → path to leaf:
There is another example:
Hypothesis spaces
How many distinct decision trees with n Boolean attributes?
= number of Boolean functions
= number of distinct truth tables with $2^n$ rows = $2^{2^n}$
Decision tree learning algorithm
Aim: find a small tree consostent with the training examples
Idea: (recurisvely) choose “most significant” attribute as root of (sub)tree
Choosing an attribute
Idea: a good attribute splits the examples into subsets that are (idealy) “all positive” or “all negative”
Using information theory
To implement Choose-Attribute in the DTL algorithm
Information Content(Entropy)
For a training set containing p positive examples and n negative examples:
Information gain
A chosen attribute A divides the training set E into subsets $E_1, … , E_v$ according to their values for A, where A has $v$ distinct values.
Information Gain (IG) or reduction in entropy from the attribute test:
It’s same as