• Archiv
  • Shop
  • Hilfe
  • Impressum
  • Kontakt

Spreeblick

I live by the river!
23.11.08
  • Pop
    • Fotografie
    • Musik
    • Film/TV/Radio
    • Netz
    • Literatur
    • Popgun
    • Kunst
    • Unfug
    • Klickfest
    • Sport
    • Games
    • Pigeon
  • Politik
    • Inland
    • Ausland
    • Personen
    • Netzpolitik
    • Medien
    • Unternehmen
  • Produkte
    • Spreeblick
    • Hardware
    • Software
    • Services
    • Dinge
    • Verlosung
  • Positionen
    • Podcast
    • Gates Of The West
    • Toni Mahoni
    • Essays
    • Stories
    • Alltag
    • Reflektionen
    • Ahoi Polloi

Gates Of The West von Carlito | 5

Hudsonblick 11

Das bereits elfte Posting unseres New Yorker Spreeblick-Korrespondenten mit eigener Kategorie namens Charles (you can comment in german, he enjoys that and sometimes even understands it!) trägt den verheißungsvollen Titel…

Cybernetics and Plan B

We’d love to have a cut-away view of the universe; a diagram of the hidden gears and equations for all of the mysteries that bounce between competing aphorisms. Physics and chemistry we’ve already documented remarkably well, but what happens in our own brains? This is where Freud meets flow charts, where Turing and Minsky sought intelligence. This is the land where Lewis Carroll wrote ethnographies and Douglas Hofstadter still describes mountainous landscapes. The land where this man spent his life.

One artifact from this world of computational epistemology is a model called Reinforcement Learning (RL). If you’re thinking of rats and mazes and little sugar pellets, you’ve got the right idea. The principle of the model is that a system (a person, a rat, or a computer simulation) reacts to rewards and punishments by changing behavior to optimize outcomes. For a simple algorithm, it has some interesting results.

In the equations that describe this model there are two fundamental parameters that describe how learning occurs: Stubbornness and Greed (more scientifically, α and ε). Strangely, there are no variables for traits like “intelligence” or “decisiveness” or “judgment”. It might be that concepts like these are just epiphenomena, all rooted in the greediness and stubbornness of the learner. Or maybe they’re just not relevant here.

Stubbornness controls the degree of correction that occurs when reality is different from what the learner expected. The greater the Stubbornness, the longer it takes for a learner to realize that a chosen course of action (say, the controversial military occupation of an oil-rich country) is not creating the expected results (joy and laughter and children throwing flowers), but is instead creating something else (quadriplegics and massive debt and international disgust). This is represented by the equation:

New value of an action = Old value of the action + (real outcome – expected outcome) / Stubbornness

Feel free to rewrite that with Greek letters and parameterized functions if it makes you feel better.

So since Stubbornness prevents the system from enjoying the lessons offered by reality, why not make the system as open to new input as possible? Why not let new experiences completely define future expectations? This is why:

The y-axis is the value of each behavior option (each option in its own color). What this graph says is that if we’re too malleable, then the opinion of every jerk on the street makes us completely change our life. Mathematically, it’s impossible to benefit from experience this way. To stay between these extremes of stubbornness and malleability, the learner must balance how willing she is to change with a healthy sense of skepticism. No big surprise there, but now we have it in an equation! This equation tells us how much to value the choice just made, after seeing how well it worked.

The other parameter, Greed, tells us which choice to select next time. If Greed is high, then it says to take the highest valued action. “Go directly towards the Porsche-shaped pellet, you moron.” Greed will tell us to do this even if we have no idea how good the Porsche-shaped pellets really taste or how to get them. Greed will happily take any half-baked notion we have of where the pellets are kept (Business school? Plastics? New York?) and drive us towards it until we’re on our deathbed, still dreaming of that Big Score. Or maybe we guessed right, who knows? Greed sure doesn’t.

The Reinforcement Learning literature suggests that you do the greedy thing only some of the time. The rest of the time, go bowling. Ride a bicycle. Write an article for your friend’s blog. Non-optimal behavior will produce surprises which, if you’re not too stubborn, you might learn from. But rampant Greed does still have uses.

After doing some computer simulations (nothing up my sleeves!), it looks like once we’ve figured out all of the angles, Greed is the way to go. That is, once we’ve figured out the behavior that brings us the greatest reward (measured in happiness, cash, pellets, or whatever), then we should do that, to the exclusion of all else. At least, so long as nothing changes. If we can hold our breath and those perfect pellets don’t lose their flavor, and no wind or rain or falling airplanes shake our house of cards, we’ve got it made. If you live in a world with wind, rain, or airplanes, however, then it never ends: you must keep exploring. Because if that house of cards does come down you’ll need a Plan B, and it had better be none too rusty.

Reinforcement Learning for Dummies (and Computer Scientists)
Bob’s the kinda guy who knows just what to do
„We all got it comin’, kid.“

Other links and quotes:

…any labor that accepts the conditions of competition with slave labor accepts the conditions of slave labor, and is essentially slave labor…. The answer, of course, is to have a society based on human values other than buying or selling.
–Norbert Wiener

The essential act of war is destruction, not necessarily of human lives, but of the products of human labour. War is a way of shattering to pieces, or pouring into the stratosphere, or sinking in the depths of the sea, materials which might otherwise be used to make the masses too comfortable, and hence, in the long run, too intelligent.
– George Orwell, 1984

(If you’re in the mood for stomach-churning horror, you should skip H.P.Lovecraft and go right for THE THEORY AND PRACTICE OF OLIGARCHICAL COLLECTIVISM. My god.)

And I’m happy to report that the debate in the U.S. over “Intelligent Design” has progressed from patient discourse to outright ridicule. Finally.

Carlito 24.08.2005 um 16:52

Gates Of The West

Du kannst kommentieren, oder einen Trackback von deinem Blog aus setzen.


5 Kommentare

  1. 01

    stralau:

    Nice article! Well-written and funny introduction to reinforcement learning. As for the question, why there are no other features, like “intelligence”:

    While reinforcement learning is very well understood, it can explain only certain kinds of learning, like rats pressing keys or robots avoiding obstacles. But you would never want to learn driving a car or reading a book by reinforcement learning.

    24.08.2005 um 19:51 | Antworten
    Alle Kommentare von stralau
  2. 02

    charles:

    Thanks, stralau!

    I spent most of my summer looking at this, so I wanted to share the pain ;-)

    Yes, learning to drive a car strictly by trial-and-error would be a bad idea, especially for all of the innocent pedestrians. I guess this is why I play Grand Theft Auto.

    25.08.2005 um 17:52 | Antworten
    Alle Kommentare von charles
  3. 03

    mom:

    huh???????

    28.08.2005 um 03:51 | Antworten
    Alle Kommentare von mom
  4. 04

    jucy:

    Actually i am not an active serfer, but this this site is really great, i will spread it through my friends.

    24.01.2007 um 06:07 | Antworten
    Alle Kommentare von jucy
  5. 05

    knit:

    Actually i am not an active serfer, but this this site is really great, i will spread it through my friends.

    24.01.2007 um 06:18 | Antworten
    Alle Kommentare von knit

Diesen Artikel kommentieren

Vorschau

Frischer Spreeblick

  • Der Original Fail-Whale
  • Manuel Wolff erklärt uns den 9/8-Takt und mehr
  • Sarah Palin spielt in der Fortsetzung von Fargo mit
  • Chamäleon tanzt über die Straße
  • Hoffenheim
  • Toni Mahoni! Im Fernsehen!
  • Deutsche Prominente mit Wurst
  • General Motors: The voice of a virtual generation
  • Google Flu Trends
  • Die Mutter aller Listen
  • SUCHE

Eselsohren


    Archiv

    Neueste Artikel

    Pop

    • Der Original Fail-Whale
    • Manuel Wolff erklärt uns den 9/8-Takt und mehr
    • Sarah Palin spielt in der Fortsetzung von Fargo mit
    • Chamäleon tanzt über die Straße

    Politik

    • WatchBerlin wird eingefroren
    • Finanzkrise my ass
    • Alle Redakteure von Capital, Impulse und Börse Online werden entlassen
    • Der hessische Landtag löst sich auf

    Produkte

    • General Motors: The voice of a virtual generation
    • Google Flu Trends
    • Das ist doch der Gipfel
    • Are we not men? We are Tivo!

    Positionen

    • Diese Spreeblicks - Comic zum Blog
    • XING-Profile zum Verlieben
    • Basic bombing
    • Bauer sucht Frau

    Meistkommentiert

    • Gefangen in der Blogosphäre oder: Benutze Faust mit Gesicht
    • One more time: Du bist Deutschland
    • Du bist Deutschland - Ich mach’ mit!
    • Jamba Kurs
    • Peter Alexa - Ein ehemaliger Unterstützer der RAF
    • Grimme für Spreeblick!
    • Ihr, nicht ich!
    • Du Opfer!
    • Politically Incorrect - Die etwas andere Sicht der Dinge
    • Erfolgreich bloggen - so geht’s!
    • Read on, my dear: Spreeblick 2008!
    • China und das Internet
    • Per Anhalter durchs Web 2.0
    • Offener Brief an Musikschaffende
    • Wir sind BILD: The Chrome Conspiracy
    • Ich glaube gar nichts mehr
    • [DIE VERWENDUNG DES URSPRUENGLICHEN NAMENS DES UNTERNEHMENS WURDE UNS UNTERSAGT]: Werbung in Blogs
    • StudiVZ-Spendenaktion
    • TRACKBACK - Die Show mit Spreeblick
    • Bandnamen finden

    Frisch Kommentiert

    • Der Original Fail-Whale
    • Google Flu Trends
    • Birdie, Eagle, Bogey-Woogie
    • Deutsche Prominente mit Wurst
    • Hoffenheim
    • Manuel Wolff erklärt uns den 9/8-Takt und mehr
    • General Motors: The voice of a virtual generation
    • Metal auf dem Los Angeles County Museum of Art
    • Toni Mahoni! Im Fernsehen!
    • Chamäleon tanzt über die Straße
    • Die Mutter aller Listen
    • Sarah Palin spielt in der Fortsetzung von Fargo mit
    • Tiny Masters of Tomorrow: Musikvideos von Grundschülern
    • Cliff & Rexonah - Das ganz große Glück (im Zug nach Osnabrück)
    • Der BGH hat entschieden im Fall Kraftwerk gegen Moses Pelham
    • Das ist doch der Gipfel
    • WatchBerlin wird eingefroren
    • Diese Spreeblicks - Comic zum Blog
    • Punkr
    • Finanzkrise my ass

    Feeds

    • Alle Beiträge
    • Alle Kommentare
    • Podcast
    • Podcast in iTunes

    Ausgezeichnet mit dem Grimme Online Award Powered By Strato

    Spreeblick is proudly powered by WordPress

    blogoscoop