Chapter 9 · The Face as a Reader · Activity 9.3 · Master

Personality from Expression

For the advanced, and deliberately as reflection: you retrace the study that predicts even value stances from many involuntary reactions — and think about the limits of the most delicate reader of them all.

Duration double lesson + discussion Difficulty high Group project / whole class Reflection M

In a nutshell

What: This Master activity doesn't merely build a program but retraces a real piece of research — "Your Face Mirrors Your Deepest Beliefs" — in which a camera predicted personality and even moral stances from the involuntary facial reactions to a series of videos. You build the measuring chain yourselves and then discuss what is disquieting about it.

Why "Master": the technical hurdle is higher (many measurements, a second learning step), and the real yield is not the code but the judgement: where is such a tool useful, where does it become dangerous?

What it's about

The face reveals not only how you feel right now but — across many reactions — who you are. In one of our studies we showed 85 people fifteen short videos of very different kinds: funny, moving, disgusting, provoking. While they watched, a camera read their involuntary facial reactions once a second — not a word was spoken, only the face was at work.

From the pattern of these reactions a model predicted personality traits and even moral stances with an accuracy of up to 86 per cent — the kind of thing people otherwise report about themselves in long questionnaires. The striking part: no single video is enough. Only the mixture gives you away — how you react to the moving and, at the same time, to the provoking. And often this measurement is more honest than the questionnaire, because in self-judgement we like to flatter ourselves; the face does not. This is the honest signature in its purest form — and exactly for that reason also disquieting.

Before you begin: consent

This activity reads something out of a person they never explicitly agreed to — their character. That makes it the most delicate experiment in the whole book. Rules that are non-negotiable: participation is strictly voluntary; each person measures only themselves and sees only their own results; no videos are saved, only rows of numbers; and no one is judged, compared or ranked by their values. That is the golden rule of this book: what you find out about a person belongs to them, not to you.

The design of the study

The measuring chain has three links — you already know each in principle:

Stimulus. A fixed sequence of short, very different clips (funny, moving, disgusting, provoking, neutral). What matters is the variety — only it makes the mixture telling. For a lesson, 6–8 clips of 20–40 seconds each are enough.
Measurement. While watching, the emotion recorder from 9.2 runs once a second and writes one table per person: for each clip the averaged emotional reactions. That is your signature — a row of numbers, not an image.
Self-report. Each person voluntarily fills in a short, recognised questionnaire (e.g. the ten questions of the TIPI on the Big Five personality dimensions). That is the "truth" the model is later checked against — a self-reported one, mind you, not an absolute one.
Learning & checking. A simple model tries to predict the questionnaire values from the reaction signature. Because a school class is small, you check honestly by cross-validation — and always compute against a random baseline.

The second learning step

New compared with 9.1/9.2 is only the last link: the signature becomes a prediction. It is again a light, transparent method (no deep network) — you choose the features yourself. The full code with sample data is on GitHub; here is the skeleton:

import pandas as pd
import numpy as np
from sklearn.linear_model import Ridge
from sklearn.model_selection import cross_val_predict, KFold

# X: per person a reaction signature (emotion per clip, strung together)
# y: one personality value from the questionnaire (e.g. openness)
X = pd.read_csv("signaturen.csv")          # row = person, columns = reactions
y = pd.read_csv("fragebogen.csv")["offenheit"]   # 'offenheit' = openness

model = Ridge(alpha=1.0)             # deliberately simple and robust
kf = KFold(n_splits=5, shuffle=True, random_state=0)
prediction = cross_val_predict(model, X, y, cv=kf)

# honest evaluation: correlation predicted vs. self-reported
r = np.corrcoef(prediction, y)[0, 1]
print(f"Correlation prediction <-> questionnaire:  r = {r:.2f}")

# baseline: what does pure guessing achieve (shuffled y)?
baseline = []
for _ in range(200):
    yp = cross_val_predict(model, X, np.random.permutation(y), cv=kf)
    baseline.append(np.corrcoef(yp, y)[0, 1])
print(f"Random baseline (mean):               r = {np.mean(baseline):.2f}")

Why the baseline is sacred here

With few people and many features, a model almost always finds some pattern — even in pure noise. Only the comparison with the random baseline (questionnaire values shuffled) shows whether your result is real. If your correlation is not clearly above the baseline, you have found nothing — and saying that cleanly is the real achievement. Measure first, marvel second, stay sceptical throughout.

What you should see

With a real school class (small!) your correlation will lie well below the 86 per cent of the original study — perhaps weakly positive, perhaps at chance. That is the honest and instructive outcome: the effect is real, but it needs many people and many stimuli to show itself. Whoever computes "character" out of fifteen classmates and believes it has fallen into exactly the trap this chapter warns against.

Worksheet & discussion

The most honest — and most delicate — reader

Why is no single video enough to predict personality? What makes precisely the mixture of reactions telling?
The study often calls the facial measurement more honest than the questionnaire. Give one reason for that — and one reason why the questionnaire, too, is no perfect measure of the "truth".
Was your correlation above the random baseline? What would you need to get a robust result — and why is a school class too small for it?
An HR manager wants to use this method in job interviews. Give three concrete reasons why that would be wrong (technically and ethically).
Put in one sentence the "golden rule" that turns a suspicion of surveillance into a tool of self-knowledge — and explain why it matters especially here.

Show solution

1. A single video measures only a reaction that many people share (almost everyone laughs at something funny). Only the profile across many very different stimuli — how strongly you react to the moving versus the provoking — forms an individual pattern that distinguishes you from others.

2. More honest, because the facial reaction is involuntary and can hardly be dressed up, whereas in the questionnaire you paint a desirable self-image. But the questionnaire is itself no absolute truth — it is a self-report with its own biases; so the model learns to predict an imperfect reference.

3. Individual — probably near the baseline. You would need many people (dozens to hundreds) and many well-chosen stimuli, plus a strict separation of training and test subjects. With few people and many features, every model overrates itself because it memorises chance patterns.

4. Technically: the model is unreliable on small, biased data and demonstrably less accurate for some groups of people (training-data bias). Ethically: no one consented to being judged on their character from involuntary reactions; the method measures expressions, not suitability; and it shifts power secretly to the observer. An expression is not a fate.

5. "Aggregated results for management, personal results only for the person themselves." It matters especially here because the face can reveal not just mood but personality and moral stance — a judgement no one should be exposed to unasked.

Food for thought

This is the honest signature in its purest form: a signal you can hardly fake, coupled to a person's innermost self. Exactly what makes the technology powerful makes it dangerous. Powerful tools demand strict rules — not because you may never use them, but so that they serve people rather than rule over them.
An expression is not a fate. Even if a model guesses a stance from your face, you are more than your involuntary reactions — and you have the right that no one pins you down on something you never agreed to.
The same technology, ethically guided, is a gift: in animals, who can tell us nothing, it can make pain or fear readable for the first time (Part III). Only consent — and the rules you give yourself — separates a tool from an abuse.

Extension

Which stimulus carries most? Check which clips improve the prediction the most. Often it is the provoking ones — there people differ the most.
Test for fairness. If the group is diverse enough (and everyone agrees), check separately whether the model works equally well for all. Inequality here is not a computational error but the real bias problem in miniature.
Connection to Chapter 5. The same idea — predicting personality from non-verbal signals — sits in Pentland's sociometers and in the Happimeter. Compare: face, voice, pulse — which signal reveals how much, and at what risk?