Background:
Research on face recognition has shown that people will sometimes mistakenly recognize non-presented faces that are made up of components of different presented faces.
Target > Conjunction > Feature > Novel
This has been interpreted as evidence for a blending or binding mechanism that creates a prototype face from the individual exemplars. However the authors point out that these results are also consistent with a global familiarity account and they attempt to account for the results with Nosofsky's (1986) Generalized Context Model (GCM).
Distance Between Faces
Determining the distance between two faces in this mental space requires remembering some high school math. You can determine how far apart any two faces are (di,j--distance between face i and face j) by means of the Pythagorean Theorem.
Remember that the Pythagorean Theorem states that for a right triangle the square of the hypotenuse is equal to the sum of the squares of the sides
h2 = x2 + y2
So the distance between Fred and Barney is given by the formula:
d2 = (Fred's Pudginess - Barney's Pudginess)2 + (Fred's Age - Barney's Age)2
Now this assumes that pudginess and age are equally important to you when you view faces. But that might not be the case. For that reason you can make some dimensions more important by multiplying them by a weight. Geometrically what this does is stretch out or shrink those dimensions.
d2 =
wpudge *(Fred's Pudginess - Barney's Pudginess)2 +
wage *(Fred's Age - Barney's Age)2
So the distance between the two faces in this multidimensional space is given by...
This general formula can be applied no matter how many dimensions the mental space has. In this article they assume (based on a MDS fit) that faces have six dimensions.
Similarity
So far we can determine the distance between two faces in a mental space. But that has to somehow be translated into some sort of similarity metric.
One problem is that faces can be infinitely far apart in a mental space but there is probably some sort of limit psychologically to how dissimilar two things can be. In fact it might be reasonable to say that at the limit two things can have zero similarity or 100% (1.0) similarity.
So its necessary that however we translate distance into similarity it should be able to take potentially infinte distances and compress them into a realistic range of similarity values. In Nosofsky's model similarity can range between 0 and +1 (Note that in Minerva2 similarity ranges between -1 and +1).
This is done with the following handy-dandy formula:
Similarity = e-c*d
A couple things to remind yourself of when looking at this equation. First off, e is a commonly used mathematical constant (like pi) and its value is equal to 2.71828. Second, any number to a negative power is equal to one over that number to a positive power, thusly
Similarity= e-c*d = 1/(ec*d)
So how do you make sense of a formula like this? When you read an article like this one, one way you can make sense of the equations is to try to imagine extreme cases. So for instance, what happens when two things are identical? In that case the distance between them should be zero. When the distance between two faces is zero (the faces are identical) then the equation becomes
Similarity = 1/(ec*d) = 1/(ec*0) = 1/(e0) = 1 /1 = 1 (because anything to the zero power is 1)
That's very reassuring. If two faces are identical then similarity equals 1.0, the maximum similarity value. So the equation works just the way it should.
Now imagine that the distance between the two faces in the mental space is infinite. That they are dissimilar as they could be. Then the equation becomes...
Similarity = 1/(ec*d) = 1/(ec*¥ )= 1/(e¥ ) = 1/¥ = 0
That too is reassuring. When two faces are as dissimilar as they can be, when the distance between them in the mental space is infinite then similarity equals 0.0, the smallest similarity value possible.
What's this c thing all about? The other thing that you can do to make sense of an equation like this one is to graph it. So for instance, what about this c thing? What's that all about? To get a sense of what c does I graphed the equation using different values of c....
When c is large two concepts have to be
very close together before they will be perceived as being similar.
When c is small, even large differences in
your mental space are perceived as small differences in similarity. So
when c is small its harder to discriminate between the concepts
(i.e. you need a very large difference in mental distance for differences
between the concepts to be obvious).
Familiarity
At test, the system produces a global feeling of familiarity for the face. It does this by determining how similar the test face is to every face in memory and then combining those similarity values by summing them all together (much like Minerva2). So global similarity is given by...
According to GCM recognition decisions are based on the global familiarity of the face. So this means there needs to be a function that translates global familiarity into a probability of saying "old". Note that this probability needs to be between 0 and 1 and so again, a function has to be chosen that has those properties. The equation they use is
That makes sense. The higher the global familiarity the greater the likelihood that the subject will say "old". If familiarity is infinite then the probability of saying "old" should be 100%.
Now consider b. b provides information about the response bias. To see that try setting familiarity to zero in the equation. When familiarity is zero the only reason to say "old" is because you are responding based on a response bias (i.e. you're just guessing). So imagine that familiarity is zero.
So now the equation becomes
P ("Old"|k presented) =1/(1+be -q*0 )
P ("Old"|k presented) = 1/(1+b*1)
P("Old"|k) = 1/(1+b).
P("Old"|k) = 1.0
Now imagine that b is very large, say b = ¥ and familiarity = 0. Now the equation becomes
P("Old"|k) = 1/¥
P("Old"|k) = 0
First thing Busey and Tunnicliff do after explaining all of this stuff is attempt to use the GCM to fit previous face recognition studies. GCM does a good job in producing the qualitative pattern
Target > Conjunction > Feature > Novel
In addition it can simulate effects of right hemisphere legions that had been found by Kroll et. al (1996). They had found that false recognition was greater in patients with right hemisphere legions. To simulate this B&T set c lower in the legion patients than in the controls. Remember that psychologically c corresponds to how confusable the faces are because it lowers the effect that distance has on similarity.
Experiments
All three experiments made use of faces that were created by morphing two parent faces (two faces that had actually been shown to subjects).
The faces that were morphed were either similar to each other or not similar.
The faces that were morphed were either presented sequentially or with many intervening faces. This was based on the possibility that if a memory binding or blending process occurs (rather than merely the familiarity mechanism of GCM) it might be more likely to occur for faces that are presented close together.
Experiment 1:
After presentation participants took an old/new recognition test. The test included targets (that had not been morphed), parents, morphs and novel distractors.
Basic finding was that people were somewhat more likely to choose the morph that had been created from two parents that were similar to each other than they were to choose the actual parents that had been presented.
People were more likely to pick the parent than
the morph in the cases where the morph had been created from two parents
that were not similar to each other.
|
|
|
|
|
|
|
|
|
|
|
|
Experiment 2:
Used a forced choice procedure instead where participants had to choose between a parent and a morph of that parent.
When the morph had been created from two parents that were similar to each other, participants were somewhat more likely to choose the morph than the parent.
When the morph had been created from two parents
that were not similar to each other, participants were more likely to choose
the parent than the morph.
|
|
|
|
| Probability of correctly choosing the parent in forced choice |
|
|
Experiment 3
In Experiment 3 (a&b) they were apparently worried about showing parents and morphs at test to the same subjects because they might somehow interact. So some subjects were tested with targets, morphs and novel faces while other subjects were tested with targets, parents and novel faces.
Results were consistent with previous analyses...
| Similar | Dissimilar | |
| False Recognition of Morphs | 0.700 | 0.474 |
| True Recognition of Parents | 0.566 | 0.689 |
To test GCM they fit the model to the data. Remember from previous articles, what this means is that they found the values of the different parameters that resulted in the lowest possible error.
GCM handled some parts of the data pretty well and other parts not so well.
They next tried an alternative version of Nosofsky's model called Identification. In identification the probability that an item will be stored in memory is a function of its distinctiveness.
Recall that in GCM the probability of saying "old" to an item is given by the equation:
Max (similarityk,j )/S(similarityk,j)
Think about what this does. When a test item is very distinctive, it has high similarity to itself (so the numerator is large) but low similarity to everything else in memory (so the denominator is small). Typical items will be similar to themselves but they will also be similar to many other things in memory. So the numerator will be large but so will the denominator. So the similarity ratio will be large for distinctive items and small for typical items.
Again, this model fits some aspects of the data but not other aspects of the data.
To try to account for these aspects of the data the authors propose a new variation of Nosofsky's model which uses a recall mechanism rather than a global familiarity mechanism.
According to the model when a face is presented you sample one memory trace from long term memory. The probability of sampling any particular face is given by the equation:
P(sample k|i presented) = similarityi,k/Ssimilarityi,j
A rule of this general form has been used in a number of other models including SAM (Gillund and Shiffrin, 1984) and the Construction/Integration Model (Kintsch, 1988). Think about why it makes sense. Its basically saying that the probability that you will retrieve an item from memory given a particular cue is proportional to how similar the cue is to the item.
But retrieving an item from memory isn't enough. You have to somehow decide if the item you've retrieved from memory is the face that you're being presented with at test. To do this you compare the similarity to a criterion. If its above the criterion you say its "old" if its below the criterion you say its "new".
To formalize this idea they create a function which is 1 when the similarity is above the criterion and 0 when it is below the criterion called q(similarityi,k)
The probability that a subject will say an item is "old" is then given by...
P("old"| i presented) = SP(sample k|i presented)*q(similarityi,k)
Basically what this gives you is an expected value for saying that the item is old. As a further modification they suggest making the q(similarityi,k) function continuous and normally distributed.
The new SimSample model provides a good fit with the data. But again qualitatively there are problems. It handles distinctiveness well but fails to make recognition of similar morphs greater than recognition of similar parents.
SimSample With Proportional Prototypes
In their final attempt to account for the data they assume that sometimes people do develop a weak prototype representation. This prototype is simply the morph itself. The weight given to the prototype in memory is given by the variable pw (prototype weight) and pw is a function of the similarity of the two parents to the morph
pwi = (similarity i, p1 + similarity i, p2)p
where p is a free parameter of the model. Notice what this does is makes pw large when the two parents are both similar to the prototype.
The sampling probabilities now become for faces...
P(sample k|i) = similarity i,k /[S(similarityfaces)+S(pw*similarityprototypes)]
And the probability of sampling a prototype becomes...
P(sample k|i) = (pw*similarity i,k )/[S(similarityfaces)+S(pw*similarityprototypes)]
This new model results in a better fit of the data. There is less error and the model correctly predicts that there will be greater recognition of similar morphs than similar parents.
Its important to understand formal models like this but as a reader you don't want to miss the conceptual forest for the mathematical trees (and no this is not an allusion to multinomial models!). Busey and Tunnicliff do a good job throughout the paper of keeping their eyes on the prize, namely the psychological constructs that underlie the formalizations.
In the table below I summarize the main aspects
of the different models and talk about how they do at accounting for the
data
| Model | Primary Mechanism | Strengths | Weaknesses |
| Generalized Context Model | Makes old/new decisions by assessing the global familiarity of the test item | Accounts for a lot of past data with a fairly simple mechanism. | Can account for higher hit rate for distinctive
faces
Similar morphs not chosen more than similar parents |
| Identification | Instead of making old/new decsions with simple global familiarity it uses a similarity ratio in place of familiarity | Accounts for the distinctiveness pattern | Still can't account for similar morphs being chosen more than similar parents |
| SimSample | Replaces both familiarity and similarity ratio with a memory retrieval mechanism | Accounts for distinctiveness pattern | Still can't account for similar morphs being chosen more than similar parents |
| SimSample plus Proportional Prototypes | Assumes that a weak prototype is sometimes formed and this is more likely for similar parents than for dissimilar parents | Accounts for distinctiveness pattern and high false alarm rate for similar morphs | Still not all of the variance accounted for but not to shabby... |
|
|
|
|
|
|