Argonauts:Projective transformations

From Wasteland

Jump to: navigation, search

Contents

Mon 23 and 30 October 2006

Spatio-temporal coordinates

IEB, Meeting room 3 and 2, 18:00 to 20:00 (exactly).

Attendees (in alphabetical order)

Mon 23 Oct

  • Nick Hughes
  • Niranjan Joshi
  • Rohan Loveland
  • Darren Morofke
  • Ingmar Posner
  • Dominique van de Sompel

Photo of attendees

Mon 30 Oct

  • Darren Gawley
  • Niranjan Joshi
  • Rohan Loveland
  • Ingmar Posner
  • Dominique van de Sompel
  • Brian Williams


Minutes

(still under writing ...) Projective transformations naturally come into picture when we start to think about how cameras work or how our eye sight works. The latter issue is known as stereo vision where instead of one camera we have two (our eyes) or more. And so we set out on a journey to understand following questions about the projective transformations:

  • what are projective transformations?
  • what is the difference between projective and other types of linear transformations that we come usually across (e.g. affine, similarity, rigid transformations)?
  • how do stereo reconstruction algorithms work?

Projective transformations are, as the name suggests, projection of a point in 3D space onto a plane. Note that as soon as we project the point onto the plane, the depth (how far the point was from the projecting plane) information is lost. Now here comes the first trick, in order to simplify any calculations a 2D point <math>(x,y)</math> is represented as 3D vector <math>(x_1, x_2, x_3)</math> with <math>x=x_1/x_3</math> and <math>y= x_2/x_3</math>. This is clearly counter-intuitive. Whenever possible we always try to convert higher dimensional quantities into lower dimensional ones. But here it is actually going up by one! Why is that? For the simple practical reason that we still want to use all those matrix manipulations. Projections require normalization of depth information and that in turn needs divisions of numbers. To avoid all this, the dimensionality is increased by one. The resultant 3D system is also known as homogeneous co-ordinate system.

Now, assume a line l_1 x + l_2 y + l_3 = 0 in 2D is represented as the homogeneous 3D vector (l_1, l_2, l_3). Here onwards I will assume that points and lines are represented in homogenous co-ordinate system unless said otherwise. If we take dot product of the vector of line with the vector of points lying of this line, it will be zero (for the simple reason that those point lie on that line and hence satisfy the equation of that line). If we take cross product of the vectors of colinear points, it gives the vector of line. Note that all these tricks are possible only due to the homogeneous co-ordinate representation.

Linear transformations (relationship between the transformed and the one to be transformed can be written as a simple matrix multiplication) are widely used in many computer vision tasks. In medical image analysis applications we use them typically in image registration. There four main types of linear transformations: rigid body, similarity, affine, and projective. Main difference between them is the the number of degrees of freedom allowed in each transformation. In 3D space, the rigid body transformations allow 6 degrees of freedom(3 translations and 3 rotations). The similarity transformations provide 9 degrees of freedom(in addition to previous ones, 3 scaling). The affine transformations on the other hand provide 12 degrees of freedom namely - 3 rotations, 3 translations, 3 scalings, and 3 skew. The most generalised of the linear transformations are the projective transformations(16 degrees of freedom). This description can also be seen in more intuitive sense. Suppose we are transforming a structure of connected straight lines. In all linear transformations, straight lines always remain straight. In rigid transformations, relative angle and distance among the straight remain same. In similarity transformations, the angle remains same but the distance might change. In affine transformations, parallel lines always remain parallel, and subject to this, the relative angles and distances might change. In the projective transformations, lines parallel before transformation may intersect after transformation (recollect, two rails appear to intersect each other at a long distance, even though we know that they always remain parallel!! - this is because our eyes make projective transformation of the actual world).

Now coming to our final point - the stereo reconstruction algorithm. Here we are given with the following information:

  • Two different pictures (projections) of the same view
  • Relative positioning of the two cameras which took these pictures
  • Internal calibration parameters of the two cameras

and what we want to find is the 3D reconstruction of the view or depth map. To attempt this, we note that a point in the first picture, in absense of the knowledge of the depth information, maps to a line in the second picture. This line is called as the epipolar line. The homogeneous co-ordinates of the point and the epipolar line are related through a linear transformation given by the fundamental matrix. The fundamental matrix can be completely defined by knowing the relative orientation of the two cameras and their internal calibration parameters. Once we find out the epipolar line, it only remains to be decided which point on this line corresponds to the point in the other picture. For this purpose we use stereo correspondence algorithms. Typically these algorithms use normalised correlation or sum of squared differences as the similarity metric (this sounds very similar to the image registration). And once we find out the corresponding points, we are in position to find out the depth of the actual point in 3D space.

--

Back to Argonauts Main Page