Mapping pupil position to screen targets

This post describes the relationship between the different factors I consider in my eye tracking interface set up. I go through the geometric model which I plan to use to constrain my system.
To use eye tracking to interact with a computer it is necessary to map the location of the pupil in an image (from a head mounted camera for example) to the fixation point of gaze on a screen. Most systems require calibration to account for the variations in the positions of the cameras, the user and the screen. To calibrate the system you gather a set of training examples where the user is asked to fixate on a a set of points on the screen. A model is then fitted to this data to allow gaze towards arbitrary points on the screen to be accurately determined.

Figure 1 - The set up

Figure 2 - the vectors
Figure 1 shows the main elements we consider in the tracking. We label these elements as shown in figure 2 – which are:
is the origin of the system, which is centred on the fixed camera below the screen.
is the origin of the screen relative to the camera origin
.
- the vectors
and
are the basis vectors for the screen, they correspond to the width and height of a single pixel in the real world.
is the location of the target on the screen that the user is looking at.
is centre of the head target. The head target is a plane with four LEDs on it which are tracked by the fixed camera.
is the centre of the eye which is being tracked.
- the vector
is a unit vector pointing in the direction that the eye is looking.
is the vector from the eye to the target
It is useful to think of the system in two frames of reference. There is the frame of reference of the fixed camera and the frame of reference of the head.
Fixed camera frame of reference
In the fixed camera’s frame of reference we have three fixed but unknown vectors: and
. Once these are know we can express the target vector
as follows
where are the screen pixel coordinates of the target for training example i.
The relationship between the head and the fixed camera frame of reference is described by the following equation
Where is a point relative in the head frame of reference and
is the same point relative to the fixed-camera.
is a rotation matrix and
the translation vector
. To go back the other way you just need to use:
Where is the transpose of
and hence the inverse rotation.
The head frame of reference
We assume that the eye and the eye-camera are fixed in the head frame of reference. The position of the eye relative to the head target is fixed in the head coordinate space – we’ll call this translation . Thus
We assume that there exists a mapping from the image of the pupil from the eye-camera to the vector (more on this in a future post).
is the location of the centre of the pupil in the eye camera image for the
training example.
is the mapping from the pupil position in the eye image to the unit vector
relative to the head which is parameterised by the vector
- to get
in the fixed camera coordinates we need to  transform it as follows:
The intercept of the gaze and the target
Consider a user is looking at a target on a screen. We can consider their gaze as the line defined by the position of their eye  and the direction in which they are looking
. We consider the screen as a plane defined by it origin
and two points on the plane
and
.
Thus the line of gaze can be expressed parametrically as:
The plane can be defined by
where p is a point on the plane and is the normal to the plane. Providing the line of gaze is not parallel to the plane of the screen we can plug P(s) into the definition of the plane.
Finally we have that
The unknowns
In order to fit this model we need some training data. These data are obtained by haivng  user look at a series of targets on the screen and storing the following for each presentation i:
: the pupil position
: the translation of the head from the camera
: the rotation of the head relative to the camera
: the screen pixel coordinate of the ith target
The remain unknown constants are:
: the translation of the eye relative to the head
: the origin of the screen relative to the camera
: the unit screen basis vectors
: the parameter vector for mapping pupil position to the unit eye vector.
I will talk about obtaining these unknowns in a future post.
Prediction
Once the unknowns have been determined you are able to predict the screen target postion given the pupil position and the head’s position and orientation. We can calculate and
independently as follows:
Related posts:
February 5th, 2012 at 10:06 pm
target shooting…
[...]Mapping pupil position to screen targets | blog.jjhale.com[...]…