## The theory

I’ve recently ran in this problem, which is one of the very well know problems in computer vision, the **homography estimation**. Given two equicardinal sets of points:

with

with

find the matrix such that the following projective correspondence is valid:

where the lowercase points are the image coordinates, i.e. coordinates defined on a plane (in OpenGL terms typically the plane is the screen and the image points are the pixel coordinates) and the uppercase points are instead the 3D world coordinates of points to be projected on the screen plane. In order to make computations easier is useful to **homogenize** the coordinates. This is done by explicitly adding a to both image and world points, increasing their dimension by 1. For more details about correct and formal introduction to the meaning of coordinates homogenization look for other resources, here is not my interest to introduce such a complicate argument, I just say that in homogenous coordinates the projection of points into planes can be simply expressed as linear matrix-vector product. (Soatto et.al have a good introduction to homogenization and 3D views reconstruction in general).

The tilde means that this is a projective relation, in formal terms this is an ordinary equation up to a constant factor. As one might expect, for each corresponces we derive the relationship:

Where each is a row-vector of 4 entries, the i-th row of and is a row vector composed of 4 zeros. From a set of point correspondences, we obtain a matrix, called .

For the curious reader, the matrix can be explicitly built in the following way:

The elements of matrix are then the 12 values contained in the last column of the unitary matrix **rearranged in column-major order**.

The following code snippet does the dirty job:

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.

Learn more about bidirectional Unicode characters

MatrixXd computeProjectionMatrix(const std::vector<Vector3d> &x, const std::vector<Vector4d> &X) | |

{ | |

if (x.size() != X.size() ) | |

throw std::logic_error("There must be the same number of correspondencies"); | |

unsigned int n=x.size(); | |

Eigen::MatrixXd G; | |

G.setZero(2*n,12); | |

for ( unsigned int i=0; i<n; i++ ) | |

{ | |

Vector3d m = x.at(i); | |

RowVector4d M = X.at(i).transpose(); | |

G.row(2*i) << 0,0,0,0, -m[2]*M, m[1]*M;// -m[2] is always 1 because of the homogenization | |

G.row(2*i+1) << m[2]*M,0,0,0,0, -m[0]*M; | |

} | |

MatrixXd A(3,4); // allocate space for the projection 3×4 matrix | |

JacobiSVD<MatrixXd> svd(G, ComputeFullV ); | |

// Copy the data in the last column of matrix V (the eigenvector with the smallest eigenvalue of A^T*A) | |

// maintaining the correct order | |

int k=0; | |

for (int i=0; i<3;i++) | |

{ | |

for (int j=0; j<4; j++) | |

{ | |

A(i,j)= svd.matrixV().col(11).coeffRef(k); | |

++k; | |

} | |

} | |

return P; | |

} |

What is now interesting and not very clear from the sources, is instead how to get from a 3×4 matrix like , the two 4×4 matrices of OpenGL, namely GL_MODELVIEW_MATRIX and GL_PROJECTION_MATRIX.

The problem is that the OpenGL convention is different from the usual cartesian coordinates used in computer vision literature. In OpenGL the y-axis of image coordinates is inverted and the origin is in the upper-left part of the monitor and the camera is looking down the negative z-axis. All these things make the computation of OpenGL projection and modelview matrices (here denoted respectively with and ) definitely a pain.

In order to get the OpenGL projection matrix from the ComputerVision projection matrix, one needs to better understand what the projection matrix contains in terms of camera model and how that matrix can be decomposed in its extrinsic and intrinsic parts. This process is also known as **camera resectioning **or *decomposition of camera matrix*.

### Homography and pinhole camera model

The homography matrix also called *projection matrix* in the computer vision literature, has internally 11 degrees of freedom which come from the pinhole camera model. The first 5 parameters are called *intrinsic parameters* and they encompass focal length, image format, and principal point. In this simplified basic pinhole model, we consider the central projection of points in space onto a plane. The center of projection here is the origin of Euclidean coordinates plane and the image (or focal) plane is the plane at Z=f, where f is the *focal distance*. Under this model, a point with world coordinates is mapped to the point on the image plane where a line joining the point to the center of projection meets the image plane. The center of projection is called the *camera center *also know as *optical center*, while the line from the *camera center * perpendicular to the image plane is called *principal axis *or *principal ray*. The plane through the camera centre, parallel to the image plane is called the *principal plane* of the camera.

The *principal point* is the image of the physical camera center in image-coordinates, the intersection of the *principal ray* with the image plane. In a simple setup, with a camera centered in (0,0,0) looking down the z-axis, at screen resolution 1024×768, the principal point is exactly half the width and height, i.e. 512×384. All the camera intrinsic parameters are contained in a upper-triangular matrix that we call .

The camera extrinsic parameters are 3 camera orientation angles and 3 camera translation along x,y,z axis, forming in total 6 *extrinsic parameters. *Numerically the camera center is the right-null vector of because and can be obtained as:

while the orientations are better written in terms of a rotation matrix . Both the matrices and are computed from the upper-left 3×3 part of the original projection matrix . We can write:

Now we can form the intrinsic matrix and the extrinsic orientations from RQ decomposition of , paying attention that the axes are correct. It turns out that the correct code to accomplish the RQ decomposition is the following, where we require, in order to remove ambiguities, that K has positive diagonal entries:

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.

Learn more about bidirectional Unicode characters

void rq3(const Matrix3d &A, Matrix3d &R, Matrix3d& Q) | |

{ | |

// Find rotation Qx to set A(2,1) to 0 | |

double c = –A(2,2)/sqrt(A(2,2)*A(2,2)+A(2,1)*A(2,1)); | |

double s = A(2,1)/sqrt(A(2,2)*A(2,2)+A(2,1)*A(2,1)); | |

Matrix3d Qx,Qy,Qz; | |

Qx << 1 ,0,0, 0,c,-s, 0,s,c; | |

R = A*Qx; | |

// Find rotation Qy to set A(2,0) to 0 | |

c = R(2,2)/sqrt(R(2,2)*R(2,2)+R(2,0)*R(2,0) ); | |

s = R(2,0)/sqrt(R(2,2)*R(2,2)+R(2,0)*R(2,0) ); | |

Qy << c, 0, s, 0, 1, 0,-s, 0, c; | |

R*=Qy; | |

// Find rotation Qz to set A(1,0) to 0 | |

c = –R(1,1)/sqrt(R(1,1)*R(1,1)+R(1,0)*R(1,0)); | |

s = R(1,0)/sqrt(R(1,1)*R(1,1)+R(1,0)*R(1,0)); | |

Qz << c ,-s, 0, s ,c ,0, 0, 0 ,1; | |

R*=Qz; | |

Q = Qz.transpose()*Qy.transpose()*Qx.transpose(); | |

// Adjust R and Q so that the diagonal elements of R are +ve | |

// Make sure that R determinant is 1 | |

// if (R.determinant() < 0) | |

// R.col(2) =-R.col(2); | |

for (int n=0; n<3; n++) | |

{ | |

if (R(n,n)<0) | |

{ | |

R.col(n) = – R.col(n); | |

Q.row(n) = – Q.row(n); | |

} | |

} | |

} | |

Matrix3d K; //intrinsic matrix | |

Matrix3d R; //extrinsic orientation | |

Matrix3d M; //upper-left 3×3 part of projection matrix | |

rq3(M, K, R); |

## From intrinsic and extrinsinc matrices to OpenGL Projection and ModelView matrices

Now that we have developed all the necessary knowledge on homography estimation, pinhole model and camera resectioning, we can obtain the infamous OpenGL matrices.

First compute the GL_PROJECTION_MATRIX, this is done in the following code snippet:

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.

Learn more about bidirectional Unicode characters

Matrix4d computeOpenGLProjectionMatrix(const Matrix3d &K, double x0,double y0, double width, double height, double znear, double zfar) | |

{ | |

double depth = zfar – znear; | |

double q = -(zfar + znear) / depth; | |

double qn = –2.0 * (zfar * znear) / depth; | |

Matrix4d OpenGLProjectionMatrix; | |

OpenGLProjectionMatrix << 2*K(0,0)/width, –2*K(0,1)/width, (-2*K(0,2)+width+2*x0)/width, 0 , | |

0, –2*K(1,1)/height,(-2*K(1,2)+height+2*y0)/height, 0, | |

0,0,q,qn, | |

0,0,-1,0; | |

} |

then the simpler GL_MODELVIEW_MATRIX is obtained in the following snippet:

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.

Learn more about bidirectional Unicode characters

// The translation vector is the first 3 elements 4-th column of the inverse of K | |

Vector3d t = K.inverse()*A.col(3).head<3>(); | |

// Affine3d is a 4×4 matrix that allows simple manipulation of affine transformations | |

Affine3d computeOpenGLModelViewMatrix(const Eigen::Matrix3d &Rot, const Vector3d &trans) | |

{ | |

Affine3d MV = Affine3d::Identity(); | |

MV.linear().matrix() << Rot; // set the upper-left 3×3 rotation part | |

MV.translation() << trans; // set the 4-th column with the translation | |

return MV; | |

} |

## The code

In this long article I have tried to explain at my best how to get both GL_MODELVIEW_MATRIX and GL_PROJECTION_MATRIX from just the two sets of points correspondences already discussed before. I’ve written a C++ class that helps a user in the process. The full code is hosted on my GitHub page

https://github.com/CarloNicolini/OpenGL-CameraCalibration

Once downloaded, you can try to compile it. If you have CMake you just need to

$> git clone https://github.com/CarloNicolini/OpenGL-CameraCalibration CameraCalibration $> cd CameraCalibration $> mkdir build $> cmake ../ $> make testCameraCalibration $> ./testCameraCalibration [2D_points_file.txt] [3D_points_file.txt]

For the matrix computations I use the matrix library Eigen http://eigen.tuxfamily.org/ which is a freely available and very powerful, templatized header-only C++ library.

## Example:

In our example we already know the GL_MODELVIEW_MATRIX and the GL_PROJECTION_MATRIX and we want to find them from just the points correspondences. The 3D points are:

1 0 0 0 1 0 0 0 1 -1 -1 -1 1 1 1 0 0 0 -0.5 0.5 0 -0.2 -0.4 0.2 -0.4 0.1 0.8 0.2 0.51 0.118 5 1 2 1 4 6 10 -1 5

and their corresponding 2D images are:

817.258 513.731 769.14 562.358 848.552 519.778 730.073 405.62 851.791 594.5 791.141 500.383 766.171 524.836 805.826 476.392 823.965 516.715 791.94 536.803 992.483 651.365 1113.56 940.746 1261.94 641.017

## References

Yi Ma, S.Soatto – “An invitation to 3D vision”

R.Hartley, A.Zisserman – “Multiple view geometry in computer vision”, 2nd edition.