0%

9-Machine Learning

Learning agents

Design of a learning element is affected by

  • Which components of the performance element are to be learned
  • What feedback is available to learn these components
  • What representation is used for the components

Type of feedback:

  • Supervised learning: correct answers for each example
  • Unsupervised learning: correct answers not given
  • Reinforcement learning: occasional rewards

Inductive learning

Simplest form: learn a function from examples

$f$ is the target function

An example is a pair $(x,f(x))$

Problem: find a hypothesis h

  • such that $h \approx f$
  • given a training set of examples

(This is a highly simplified model of real learning:

  • Ignores prior knowledge
  • Assumes examples are given)

Decision tree learning

One possible representation for hypotheses

E.g., here is the “true” tree for deciding whether to wait:

Expressiveness

Decision trees can express any function of the input attributes.

E.g., for Boolean functions, truth table row → path to leaf:

There is another example:

4

Hypothesis spaces

How many distinct decision trees with n Boolean attributes?

= number of Boolean functions

= number of distinct truth tables with $2^n$ rows = $2^{2^n}$

Decision tree learning algorithm

Aim: find a small tree consostent with the training examples

Idea: (recurisvely) choose “most significant” attribute as root of (sub)tree

Choosing an attribute

Idea: a good attribute splits the examples into subsets that are (idealy) “all positive” or “all negative”

Using information theory

To implement Choose-Attribute in the DTL algorithm

Information Content(Entropy)

For a training set containing p positive examples and n negative examples:

5

Information gain

A chosen attribute A divides the training set E into subsets $E_1, … , E_v$ according to their values for A, where A has $v$ distinct values.

6

Information Gain (IG) or reduction in entropy from the attribute test:

7

It’s same as

8

Neural Networks