Hudsonblick 11

Das bereits elfte Posting unseres New Yorker Spreeblick-Korrespondenten mit eigener Kategorie namens Charles (you can comment in german, he enjoys that and sometimes even understands it!) trägt den verheißungsvollen Titel…

Cybernetics and Plan B

We“™d love to have a cut-away view of the universe; a diagram of the hidden gears and equations for all of the mysteries that bounce between competing aphorisms. Physics and chemistry we“™ve already documented remarkably well, but what happens in our own brains? This is where Freud meets flow charts, where Turing and Minsky sought intelligence. This is the land where Lewis Carroll wrote ethnographies and Douglas Hofstadter still describes mountainous landscapes. The land where this man spent his life.

One artifact from this world of computational epistemology is a model called Reinforcement Learning (RL). If you“™re thinking of rats and mazes and little sugar pellets, you“™ve got the right idea. The principle of the model is that a system (a person, a rat, or a computer simulation) reacts to rewards and punishments by changing behavior to optimize outcomes. For a simple algorithm, it has some interesting results.

In the equations that describe this model there are two fundamental parameters that describe how learning occurs: Stubbornness and Greed (more scientifically, α and ε). Strangely, there are no variables for traits like „intelligence“ or „decisiveness“ or „judgment“. It might be that concepts like these are just epiphenomena, all rooted in the greediness and stubbornness of the learner. Or maybe they“™re just not relevant here.

Stubbornness controls the degree of correction that occurs when reality is different from what the learner expected. The greater the Stubbornness, the longer it takes for a learner to realize that a chosen course of action (say, the controversial military occupation of an oil-rich country) is not creating the expected results (joy and laughter and children throwing flowers), but is instead creating something else (quadriplegics and massive debt and international disgust). This is represented by the equation:

New value of an action = Old value of the action + (real outcome — expected outcome) / Stubbornness

Feel free to rewrite that with Greek letters and parameterized functions if it makes you feel better.

So since Stubbornness prevents the system from enjoying the lessons offered by reality, why not make the system as open to new input as possible? Why not let new experiences completely define future expectations? This is why:


The y-axis is the value of each behavior option (each option in its own color). What this graph says is that if we“™re too malleable, then the opinion of every jerk on the street makes us completely change our life. Mathematically, it“™s impossible to benefit from experience this way. To stay between these extremes of stubbornness and malleability, the learner must balance how willing she is to change with a healthy sense of skepticism. No big surprise there, but now we have it in an equation! This equation tells us how much to value the choice just made, after seeing how well it worked.

The other parameter, Greed, tells us which choice to select next time. If Greed is high, then it says to take the highest valued action. „Go directly towards the Porsche-shaped pellet, you moron.“ Greed will tell us to do this even if we have no idea how good the Porsche-shaped pellets really taste or how to get them. Greed will happily take any half-baked notion we have of where the pellets are kept (Business school? Plastics? New York?) and drive us towards it until we“™re on our deathbed, still dreaming of that Big Score. Or maybe we guessed right, who knows? Greed sure doesn“™t.

The Reinforcement Learning literature suggests that you do the greedy thing only some of the time. The rest of the time, go bowling. Ride a bicycle. Write an article for your friend“™s blog. Non-optimal behavior will produce surprises which, if you“™re not too stubborn, you might learn from. But rampant Greed does still have uses.

After doing some computer simulations (nothing up my sleeves!), it looks like once we“™ve figured out all of the angles, Greed is the way to go. That is, once we“™ve figured out the behavior that brings us the greatest reward (measured in happiness, cash, pellets, or whatever), then we should do that, to the exclusion of all else. At least, so long as nothing changes. If we can hold our breath and those perfect pellets don“™t lose their flavor, and no wind or rain or falling airplanes shake our house of cards, we“™ve got it made. If you live in a world with wind, rain, or airplanes, however, then it never ends: you must keep exploring. Because if that house of cards does come down you“™ll need a Plan B, and it had better be none too rusty.

Reinforcement Learning for Dummies (and Computer Scientists)
Bob’s the kinda guy who knows just what to do
„We all got it comin‘, kid.“

Other links and quotes:

…any labor that accepts the conditions of competition with slave labor accepts the conditions of slave labor, and is essentially slave labor…. The answer, of course, is to have a society based on human values other than buying or selling.
–Norbert Wiener

The essential act of war is destruction, not necessarily of human lives, but of the products of human labour. War is a way of shattering to pieces, or pouring into the stratosphere, or sinking in the depths of the sea, materials which might otherwise be used to make the masses too comfortable, and hence, in the long run, too intelligent.
— George Orwell, 1984

(If you“™re in the mood for stomach-churning horror, you should skip H.P.Lovecraft and go right for THE THEORY AND PRACTICE OF OLIGARCHICAL COLLECTIVISM. My god.)

And I“™m happy to report that the debate in the U.S. over „Intelligent Design“ has progressed from patient discourse to outright ridicule. Finally.

5 Kommentare

  1. 01

    Nice article! Well-written and funny introduction to reinforcement learning. As for the question, why there are no other features, like „intelligence“:

    While reinforcement learning is very well understood, it can explain only certain kinds of learning, like rats pressing keys or robots avoiding obstacles. But you would never want to learn driving a car or reading a book by reinforcement learning.

  2. 02

    Thanks, stralau!

    I spent most of my summer looking at this, so I wanted to share the pain ;-)

    Yes, learning to drive a car strictly by trial-and-error would be a bad idea, especially for all of the innocent pedestrians. I guess this is why I play Grand Theft Auto.

  3. 03


  4. 04

    Actually i am not an active serfer, but this this site is really great, i will spread it through my friends.

  5. 05

    Actually i am not an active serfer, but this this site is really great, i will spread it through my friends.


Bitte eine Option auswählen. Mehr Informationen dafür gibt es in der Hilfe.

Bitte eine Option auswählen.

Die Auswahl wurde gespeichert! Cool!



Zum Fortfahren bitte eine Option auswählen. Unten gibt es Hilfestellungen.

  • Alle Cookies akzeptieren:
    Alle Cookies wie z.B. solche für Tracking und Analyse, falls eingesetzt.
  • Nur Spreeblick-Cookies akzeptieren:
    Nur Cookies von Spreeblick (Statistik, Wordpress).
  • Alle Cookies ablehnen:
    Keine Cookies mit Ausnahme von technisch wirklich notwendigen.

Hier kann man die Einstellungen jederzeit anpassen: Impressum/Datenschutz.