Busey, T.A. & Tunnicliff, J.L.(1999).Accounts of blending, distinctiveness, and typicality in the false recognition of faces. Journal of Experimental Psychology: Learning, Memory, & Cognition, 25,1210-1235.

Background:

Research on face recognition has shown that people will sometimes mistakenly recognize non-presented faces that are made up of components of different presented faces.

Target: Faces that had actually been presented
Conjunction Distractor: Made up of components of two presented faces (called Parent faces)
Feature Distractor: Made up of components of one presented face and one novel face
Novel Distractor: Never before seen face
General finding has been that probability of saying "old" to a face occurs in this ordering

        Target > Conjunction > Feature > Novel

This has been interpreted as evidence for a blending or binding mechanism that creates a prototype face from the individual exemplars.  However the authors point out that these results are also consistent with a global familiarity account and they attempt to account for the results with Nosofsky's (1986) Generalized Context Model (GCM).

Generalized Context Model

Imagine, if you will, that all these faces that you see get put into long term memory and that their similarity to one another (despite Tversky's objections) can be  thought of as a distance in some sort of mental space.
 
So the closer two faces are in this mental space the more similar they are.  To borrow from the article let's say two dimensions along which faces differ are age and pudginess. Here are some faces I studied closely this Saturday morning...
 

Distance Between Faces

Determining the distance between two faces in this mental space requires remembering some high school math. You can determine how far apart any two faces are (di,j--distance between face i and face j) by means of the Pythagorean Theorem.

Remember that the Pythagorean Theorem states that for a right triangle the square of the hypotenuse is equal to the sum of the squares of the sides

    h2 = x2 + y2

 

So the distance between Fred and Barney is given by the formula:

                d2  = (Fred's Pudginess - Barney's Pudginess)2 + (Fred's Age - Barney's Age)2

Now this assumes that pudginess and age are equally important to you when you view faces.  But that might not be the case.  For that reason you can make some dimensions more important by multiplying them by a weight.  Geometrically what this does is stretch out or shrink those dimensions.

   d2  = wpudge *(Fred's Pudginess - Barney's Pudginess)2 + wage *(Fred's Age - Barney's Age)2
 

So the distance between the two faces in this multidimensional space is given by...

This general formula can be applied no matter how many dimensions the mental space has.  In this article they assume (based on a MDS fit) that faces have six dimensions.

Similarity

So far we can determine the distance between two faces in a mental space.  But that has to somehow be translated into some sort of similarity metric.

One problem is that faces can be infinitely far apart in a mental space but there is probably some sort of limit psychologically to how dissimilar two things can be. In fact it might be reasonable to say that at the limit two things can have zero similarity or 100% (1.0) similarity.

So its necessary that however we translate distance into similarity it should be able to take potentially infinte distances and compress them into a realistic range of similarity values.  In Nosofsky's model similarity can range between 0 and +1 (Note that in Minerva2 similarity ranges between -1 and +1).

This is done with the following handy-dandy formula:

        Similarity = e-c*d

A couple things to remind yourself of when looking at this equation.  First off, e is a commonly used mathematical constant (like pi) and its value is equal to 2.71828.  Second, any number to a negative power is equal to one over that number to a positive power, thusly

        Similarity= e-c*d = 1/(ec*d)

So how do you make sense of a formula like this? When you read an article like this one, one way you can make sense of the equations is to try to imagine extreme cases. So for instance, what happens when two things are identical? In that case the distance between them should be zero. When the distance between two faces is zero (the faces are identical) then the equation becomes

    Similarity = 1/(ec*d) = 1/(ec*0) = 1/(e0) = 1 /1 = 1 (because anything to the zero power is 1)

That's very reassuring.  If two faces are identical then similarity equals 1.0, the maximum similarity value.  So the equation works just the way it should.

Now imagine that the distance between the two faces in the mental space is infinite. That they are dissimilar as they could be.  Then the equation becomes...

    Similarity = 1/(ec*d) = 1/(ec*¥ )= 1/(e¥ ) = 1/¥ = 0

That too is reassuring.  When two faces are as dissimilar as they can be, when the distance between them in the mental space is infinite then similarity equals 0.0, the smallest similarity value possible.

What's this c thing all about?  The other thing that you can do to make sense of an equation like this one is to graph it. So for instance, what about this c thing? What's that all about? To get a sense of what c does I graphed the equation using different values of c....

When c is large two concepts have to be very close together before they will be perceived as being similar. When c is small, even large differences in your mental space are perceived as small differences in similarity. So when c is small its harder to discriminate between the concepts (i.e. you need a very large difference in mental distance for differences between the concepts to be obvious).
 
Familiarity

At test, the system produces a global feeling of familiarity for the face.  It does this by determining how similar the test face is to every face in memory and then combining those similarity values by summing them all together (much like Minerva2). So global similarity is given by...

Familiarityk = S (Similarityk,i) Decision Making Function

According to GCM recognition decisions are based on the global familiarity of the face.  So this means there needs to be a function that translates global familiarity into a probability of saying "old".  Note that this probability needs to be between 0 and 1 and so again, a function has to be chosen that has those properties. The equation they use is

P ("Old"|k presented) = 1/(1+be -qfamiliarity ) Remember that the negative power of e means that e is basically in the denominator. So as familiarity increases the expression be-qfamiliarity approaches zero. And as be-qfamiliarity approaches zero, P("Old"|k) approaches 1.

That makes sense. The higher the global familiarity the greater the likelihood that the subject will say "old". If familiarity is infinite then the probability of saying "old" should be 100%.

Now consider b. b provides information about the response bias. To see that try setting familiarity to zero in the equation. When familiarity is zero the only reason to say "old" is because you are responding based on a response bias (i.e. you're just guessing). So imagine that familiarity is zero.

So now the equation becomes

P ("Old"|k presented) = 1/(1+be -qfamiliarity )

P ("Old"|k presented) =1/(1+be -q*0 )

P ("Old"|k presented) = 1/(1+b*1)

P("Old"|k) = 1/(1+b).

Again to see what effect b has try imagining extreme cases. What if b=0? If b=0 and familiarity = 0 then the equation becomes P("Old"|k) = 1/(1+0)

P("Old"|k) = 1.0

So when b is small people have a lenient response bias. They will say "old" even when there is very little familiarity.

Now imagine that b is very large, say b = ¥ and familiarity = 0.  Now the equation becomes

P("Old"|k) = 1/(1+¥ )

P("Old"|k) = 1/¥

P("Old"|k) = 0

So a large b means that when familiarity is very low people will hardly ever say "old" (at the extreme they never say "old" with no familiarity).

Experiments and Model Fits

Initial Model Fit

First thing Busey and Tunnicliff do after explaining all of this stuff is attempt to use the GCM to fit previous face recognition studies. GCM does a good job in producing the qualitative pattern

        Target > Conjunction > Feature > Novel

In addition it can simulate effects of right hemisphere legions that had been found by Kroll et. al (1996). They had found that false recognition was greater in patients with right hemisphere legions. To simulate this B&T set c lower in the legion patients than in the controls. Remember that psychologically c corresponds to how confusable the faces are because it lowers the effect that distance has on similarity.

Experiments

All three experiments made use of faces that were created by morphing two parent faces (two faces that had actually been shown to subjects).

The faces that were morphed were either similar to each other or not similar.

The faces that were morphed were either presented sequentially or with many intervening faces. This was based on the possibility that if a memory binding or blending process occurs (rather than merely the familiarity mechanism of GCM) it might be more likely to occur for faces that are presented close together.

Experiment 1:

After presentation participants took an old/new recognition test. The test included targets (that had not been morphed), parents, morphs and novel distractors.

Basic finding was that people were somewhat more likely to choose the morph that had been created from two parents that were similar to each other than they were to choose the actual parents that had been presented.

People were more likely to pick the parent than the morph in the cases where the morph had been created from two parents that were not similar to each other.
 
 
Similar Parents
Dissimilar Parents
False Recognition of Morphs
0.618
0.468
True Recognition of Parents
0.570
0.661
The authors argue that it is difficult to account for the fact that false recognition of morphs from similar parents is greater than true recognition of the parents with a pure familiarity approach.

Experiment 2:

Used a forced choice procedure instead where participants had to choose between a parent and a morph of that parent.

When the morph had been created from two parents that were similar to each other, participants were somewhat more likely to choose the morph than the parent.

When the morph had been created from two parents that were not similar to each other, participants were more likely to choose the parent than the morph.
 
 
Similar Parents
Dissimilar Parents
Probability of correctly choosing the parent in forced choice
0.461
0.655
 

Experiment 3

In Experiment 3 (a&b) they were apparently worried about showing parents and morphs at test to the same subjects because they might somehow interact.  So some subjects were tested with targets, morphs and novel faces while other subjects were tested with targets, parents and novel faces.

Results were consistent with previous analyses...
 
Similar Dissimilar
False Recognition of Morphs 0.700 0.474
True Recognition of Parents 0.566 0.689

MORE MONSTOROUS MODELLING MAYHEM!

Fits of GCM

To test GCM they fit the model to the data.  Remember from previous articles, what this means is that they found the values of the different parameters that resulted in the lowest possible error.

GCM handled some parts of the data pretty well and other parts not so well.

Fits of Identification

They next tried an alternative version of Nosofsky's model called Identification.  In identification the probability that an item will be stored in memory is a function of its distinctiveness.

Recall that in GCM the probability of saying "old" to an item is given by the equation:

P ("Old"|k presented) = 1/(1+be -qfamiliarity ) In Identification familiarity is replaced in the equation above by the following proportion (called the similarity ratio)

                    Max (similarityk,j )/S(similarityk,j)

Think about what this does.  When a test item is very distinctive, it has high similarity to itself (so the numerator is large) but low similarity to everything else in memory (so the denominator is small).  Typical items will be similar to themselves but they will also be similar to many other things in memory.  So the numerator will be large but so will the denominator.  So the similarity ratio will be large for distinctive items and small for typical items.

Again, this model fits some aspects of the data but not other aspects of the data.

SimSample Model

To try to account for these aspects of the data the authors propose a new variation of Nosofsky's model which uses a recall mechanism rather than a global familiarity mechanism.

According to the model when a face is presented you sample one memory trace from long term memory.  The probability of sampling any particular face is given by the equation:

            P(sample k|i presented) = similarityi,k/Ssimilarityi,j

A rule of this general form has been used in a number of other models including SAM (Gillund and Shiffrin, 1984) and the Construction/Integration Model (Kintsch, 1988).  Think about why it makes sense.  Its basically saying that the probability that you will retrieve an item from memory given a particular cue is proportional to how similar the cue is to the item.

But retrieving an item from memory isn't enough.  You have to somehow decide if the item you've retrieved from memory is the face that you're being presented with at test.  To do this you compare the similarity to a criterion.  If its above the criterion you say its "old" if its below the criterion you say its "new".

To formalize this idea they create a function which is 1 when the similarity is above the criterion and 0 when it is below the criterion called q(similarityi,k)

The probability that a subject will say an item is "old" is then given by...

    P("old"| i presented) = SP(sample k|i presented)*q(similarityi,k)

Basically what this gives you is an expected value for saying that the item is old.  As a further modification they suggest making the q(similarityi,k) function continuous and normally distributed.

The new SimSample model provides a good fit with the data. But again qualitatively there are problems.  It handles distinctiveness well but fails to make recognition of similar morphs greater than recognition of similar parents.

SimSample With Proportional Prototypes

In their final attempt to account for the data they assume that sometimes people do develop a weak prototype representation.  This prototype is simply the morph itself.  The weight given to the prototype in memory is given by the variable pw (prototype weight) and pw is a function of the similarity of the two parents to the morph

                pwi = (similarity i, p1 + similarity i, p2)p

where p is a free parameter of the model.  Notice what this does is makes pw large when the two parents are both similar to the prototype.

The sampling probabilities now become for faces...

    P(sample k|i) = similarity i,k /[S(similarityfaces)+S(pw*similarityprototypes)]

And the probability of sampling a prototype becomes...

    P(sample k|i) = (pw*similarity i,k )/[S(similarityfaces)+S(pw*similarityprototypes)]

This new model results in a better fit of the data.  There is less error and the model correctly predicts that there will be greater recognition of similar morphs than similar parents.

Discussion and Summary

YIKES!  This article sure had a lot of stuff in it and it probably could have been done in multiple reading groups.

Its important to understand formal models like this but as a reader you don't want to miss the conceptual forest for the mathematical trees (and no this is not an allusion to multinomial models!).  Busey and Tunnicliff do a good job throughout the paper of keeping their eyes on the prize, namely the psychological constructs that underlie the formalizations.

In the table below I summarize the main aspects of the different models and talk about how they do at accounting for the data
 
Model Primary Mechanism Strengths Weaknesses
Generalized Context Model Makes old/new decisions by assessing the global familiarity of the test item Accounts for a lot of past data with a fairly simple mechanism. Can account for higher hit rate for distinctive faces 

Similar morphs not chosen more than similar parents

Identification Instead of making old/new decsions with simple global familiarity it uses a similarity ratio in place of familiarity Accounts for the distinctiveness pattern Still can't account for similar morphs being chosen more than similar parents
SimSample Replaces both familiarity and similarity ratio with a memory retrieval mechanism Accounts for distinctiveness pattern Still can't account for similar morphs being chosen more than similar parents
SimSample plus Proportional Prototypes Assumes that a weak prototype is sometimes formed and this is more likely for similar parents than for dissimilar parents Accounts for distinctiveness pattern and high false alarm rate for similar morphs Still not all of the variance accounted for but not to shabby...
 
Recall that what the authors were trying to get at was whether the face recognition data could be interpretted simply by a measure of global similarity (as in  the GCM account) or whether one needs to assume some sort of blending mechanism too. They conclude after trying a number of different purely exemplar approaches that blended representations (i.e. prototypes) really are needed to account for all of the data both quantitatively and qualitatively.



 
University of Arkansas
Department of Psychology
Lampinen Lab
False Memory Reading Group
False Memory Reading Group Summer 2000