Learning Past Tenses of English [notes on cognitive science]


On the paper “On learning the Past Tenses of English Verbs”, Rumelhart and McClelland aim to explore the ability of networks to extract rules from the input given to them in a different fashion to what traditional rule systems use. This arises in connection to the argument about the LAD  (Language Acquisition Device), where humans are born with innate ability to extract and process rules in language from a very poor input. They built a network that simulates past tense acquisition in children based on the following stages from developmental psychology:

  1. Children use only a small number of verbs in the past tense, all high-frequency words. Most of them are irregulars. There’s no evidence of the use of any rules.
  2. Evidence of implicit knowledge of a linguistic rule emerges. Children use a much larger set of verbs in the past tense.
    • The child can now generate a past tense for an invented word (i.e. rick – ricked)
    • Children supply incorrect regular past-tense endings for words they used correctly in stage 1
  3. Regular and irregular forms co-exist. Children regain knowledge of the correct irregular forms, and they apply the regular form to new verbs. This persists into adulthood.

These stages (as everything in psychology) aren’t really distinct and sequential, but in fact they are rather gradual.

The Model

Structure: 2 basic parts:

  1. A pattern associator network which learns the relationships between the base form and the past-tense form
  2. A decoding network that converts inputs into featural representation of the phonological structure of the past tenses. All learning occurs in the pattern associator.

Units: Two pools of units:

  1. Input Pool: represents input corresponding to the verb form to be learnt
  2. Output Pool: represents the output pattern.

Each unit represents a particular feature of the input or output string. They are called Wickelphone patterns. These are a triple of phonemes, including the phoneme itself, its predecessor and its successor. So \kut\ would be {#-k-u, k-u-t, u-t#}. They have interesting properties:

  • Few words contain the same wickelphone more than once
  • No two words consist of the same sequence of wickelphones.
  • They capture a great deal of context
  • But there are too many of them!

To avoid the high number of wickel’s, they represented them as a distributed pattern of activation over a set of feature detectors.

Connections: the model is fully interconnected, each input is connected to each output unit. Intiailly they are all set to zero. They use PDP learning.


The model showed the characteristics of verb learning in young children, the so-called U-shaped learning curve. The model brought forward other interesting and specific predicitions. All in all they provide a novel way to explain the learning of verbs, which is the application of a uniform procedure in every case. We can see the network as a large rule for generating past tenses. Part of the success of the model arises from the fact that there are similar patterns  blent into one another, reinforcing themselves.


No Responses Yet to “Learning Past Tenses of English [notes on cognitive science]”

  1. Dejar un comentario


Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Cerrar sesión /  Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión /  Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión /  Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión /  Cambiar )


Conectando a %s

A %d blogueros les gusta esto: