Chapter 9 · The Face as a Reader · Activity 9.1

The Expression Mirror

Your webcam becomes a reader: a program reads, once a second, which feelings it suspects in your face — and you experience how pixels turn into probabilities.

Duration 60 min Difficulty easy Group alone or in pairs Fully digital S

In a nutshell

What: A small Python program opens your camera, finds your face and estimates, frame by frame, how much happiness, surprise, anger, sadness is in it. You see the percentages live next to your face.

Why: You grasp first-hand the heart of Part II — the same image recognition that later reads horse and plant is working here on you. And you feel the unease that comes with it: a machine reads a signal out of you that you did not send.

You need: a computer with a webcam and Python. No model training — the heavy lifting sits in a ready-made, open library.

What it's about

Even an infant reads faces before it understands a word. The face is the body's most honest page: it often moves faster than we can steer it, and reveals in fractions of a second what we might have wanted to hide. Exactly this readability makes it the most obvious target when a machine is to learn to read honest signals.

The machine's path is the same as in the AI toolbox: a camera delivers an image, a model first looks for fixed points — corners of the mouth, eyebrows, eyelids — and computes probabilities for emotional expressions from their positions relative to each other. It was trained on hundreds of thousands of hand-labelled face photos: again a work of open datasets and communities. On this book's map of AI, the expression mirror sits in the middle — a snapshot model that decides anew for every single image, with no memory. That matters for the next activity (9.2): only your analysis, not the model, turns it into a course over time.

Experimental setup: a camera reads facial expressions and maps them to emotions — **The same eye, three kingdoms.** The setup with which faces were read in the Perceptiface research. The same image recognition later reads a dog's posture and a plant's spectrogram — only the labels at the end differ.

A little background

From pixels to percentages. The model gives you not a truth but a distribution: "72 % happiness, 15 % surprise, …". These numbers are the model's confidence, not a measurement of your inner life. A raised corner of the mouth is a smile — not necessarily happiness. People smile out of politeness, embarrassment, even pain.

HSEmotion. We use an open, lightweight library for facial emotions (hsemotion) that runs on a small, pre-trained network — fast enough for live images, with no graphics card at all. The face search is handled by OpenCV. Both were built and given away by an open community; twenty years ago the same demo would have needed a research lab.

Only your own face

This activity points the camera at yourself — it is a toy of self-knowledge. Do not point it at others unasked. The same camera, secretly aimed at another person, is surveillance; the difference lies solely in consent. Record nothing, save no images, and if you work in pairs, each of you films only yourself.

Setting up

Install the Python packages. In a console: pip install hsemotion opencv-python. On first run HSEmotion downloads its small model automatically.
Allow the camera. The operating system asks for camera permission on the first run — confirm it. Close other programs that occupy the camera (video chat!).
Start the program. Save the script (below) and run it. A window with your camera image opens, with a frame around your face and the recognised emotion.
Quit. Press the q key in the window.

The program

About thirty lines — all the heavy work sits in the two libraries. The full, commented code is on GitHub.

import cv2
from hsemotion.facial_emotions import HSEmotionRecognizer

# load the model (recognises happiness, sadness, anger, surprise, fear, disgust, neutral)
fer = HSEmotionRecognizer(model_name="enet_b0_8_best_afew")

# OpenCV's face finder
detector = cv2.CascadeClassifier(
    cv2.data.haarcascades + "haarcascade_frontalface_default.xml")

cam = cv2.VideoCapture(0)             # 0 = built-in camera
print("Window active - press q to quit.")

while True:
    ok, frame = cam.read()
    if not ok:
        break

    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    faces = detector.detectMultiScale(gray, 1.3, 5)

    for (x, y, w, h) in faces:
        face_crop = frame[y:y+h, x:x+w]             # just the face
        emotion, probs = fer.predict_emotions(face_crop, logits=False)
        cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 180, 0), 2)
        cv2.putText(frame, emotion, (x, y-10),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 180, 0), 2)

    cv2.imshow("Expression Mirror", frame)
    if cv2.waitKey(1) & 0xFF == ord("q"):
        break

cam.release()
cv2.destroyAllWindows()

What you should see

A green frame around your face and, above it, a word that changes once a second. Smile — it usually jumps to Happiness. Draw your brows together — often Anger or Neutral. Play with the transitions. Notice how confident the model is on clear expressions and how it starts to waver on fine, mixed faces — that is exactly where it becomes visible that it measures expressions, not feelings.

Worksheet

From expression to probability

Show the mirror six expressions in turn (happiness, surprise, anger, sadness, fear, neutral). On which is the model accurate, on which unsure? Note it down.
Put on a polite smile, the kind you give a stranger, without feeling real happiness. What does the model show? What does that tell you about the difference between expression and feeling?
Change the light: once from the front, once from the side, once from behind (backlight). How stable does the recognition stay? Why is light so decisive?
The model was trained mainly on certain faces. Give two reasons why it might read some people worse than others — and why that is not a fringe problem.
The expression mirror is a snapshot model. What exactly does that mean, and how would you turn it into a course over an hour? (That is Activity 9.2.)

Show solution

1. Individual. Typically: happiness is recognised very confidently (a clear, unambiguous expression), fear and disgust often unsure, because they resemble each other in the face and are shown clearly less often. "Neutral" is a common fallback value.

2. Usually the model shows "Happiness" anyway — it measures the muscle position (raised corners of the mouth), not your experience. That is the key point: a smile is an expression, not proof of happiness. People smile out of politeness, embarrassment or pain.

3. With side- and backlight the recognition often collapses, because the network is trained on evenly lit frontal faces. Light determines which points in the face are even visible — if they are missing, the model guesses.

4. First, in much training data lighter faces were over-represented and darker ones under-represented, which is why such systems are markedly less accurate for some people. Second, not all cultures express feelings the same way. This is not a fringe problem: it decides whom a tool serves and whom it harms.

5. "Snapshot" means: the model decides for each image on its own, without remembering the previous one. You get a course by running the model once a second and saving the results with a timestamp — the time then lives in your analysis, not in the model.

When it sticks

Problem	Likely cause & fix
Window stays black / `cam.read()` fails	Camera occupied by another program (video chat, browser). Close everything; if you have several cameras, change the camera index from `0` to `1`.
No face is found	Too little light, or the face too small/tilted in the image. Get closer, even light from the front, look straight into the camera.
`ModuleNotFoundError`	Packages missing. `pip install hsemotion opencv-python` — with several Python versions, hit the right one (`python -m pip …`).
Runs but stutters badly	Face search on every frame is expensive. Analyse only every third frame, or shrink the camera image before the search.
Emotion jumps around wildly	Normal for a memoryless model. In 9.2 you smooth this by averaging over a few seconds.

Food for thought

The expression mirror measures an expression, not a feeling. Whoever makes that translation too quickly overestimates the technology — the most common mistake in the whole field. A smile does not mean "happy", a furrowed brow does not mean "angry".
Pointed at yourself, the camera is a tool of self-knowledge; pointed at others without their knowledge, it is surveillance. The tool is the same — the difference lies solely in consent. Remember: what you find out about others belongs to them, not to you.
The same image recognition that reads your face here reads a dog's posture in Part III and a plant's spectrogram in Part IV. One sense, lent to all the others — that is the secret lead character of this book.

Extension

Show the probabilities. Have fer.predict_emotions(..., logits=False) give you the full distribution and draw the three strongest emotions with their percentages into the image. That way you see the model's hesitation, not just its favourite.
Mirror in pairs. Sit facing each other (each filming themselves) and try to show the same expression. Does the model recognise you both equally well? A first, honest look at the question of fairness.
Bridge to 9.2. Attach a timestamp to every prediction and write it to a file — that is already halfway to "Mood over an hour".