张正友相机标定方法,经典论文,英文原文----A Flexible New Technique for Camera CalibrationA Flexible New Technique for Camera CalibrationAbstractWe propose a flexible new technique to easily calibrate a camera. It is well suited for usewithout specialized knowledge of 3D geometry or computer vision. The technique only requiresthe camera to observe a planar pattern shown at a few(at least two)different orientations. Eitherthe camera or the planar pattern can be freely moved. The motion need not be known. Radial lensdistortion is modeled. The proposed procedure consists of a closed-form solution, followed by anon linear refinement based on the maximum likelihood criterion both computer simulation andreal data have been used to test the proposed technique and very good results have been obtainedCompared with classical techniques which use expensive equipment such as two or three orthog-onal planes the proposed technique is easy to use and flexible. It advances 3d computer visionone step from laboratory environments to real world useIndex Terms--Camera calibration, calibration from planes, 2D pattern, absolute conic, projectivemapping, lens distortion, closed-form solution, maximum likelihood estimation, flexible setup1 MotivationsCamera calibration is a necessary step in 3D computer vision in order to extract metric informationfrom 2D images. Much work has been done, starting in the photogrammetry community(see [24] to cite a few), and more recently in computer vision( [9, 8, 23, 7, 26, 24, 17, 6] to cite a fewWe can classify those techniques roughly into two categories: photogrammetric calibration and selfcalibrationPhotogrammetric calibration. Camera calibration is performed by observing a calibration objectwhose geometry in 3-D space is known with very good precision. Calibration can be done veryefficiently [5]. The calibration object usually consists of two or three planes orthogonal to eachother. Sometimes, a plane undergoing a precisely known translation is also used [23]. Theseapproaches require an expensive calibration apparatus, and an elaborate setupSelfi-calibration. Techniques in this category do not use any calibration object. Just by moving acamera in a static scene, the rigidity of the scene provides in general two constraints [17, 15on the cameras'internal paramcters from one camera displacement by using image informa-tion alone. Therefore, if images are taken by the same camera with fixed internal parameterscorrespondences between three images are sufficient to recover both the internal and externalparameters which allow us to reconstruct 3-D structure up to a similarity [16, 13]. while this approach is very flexible, it is not yet mature [1]. Because there are many parameters to estimatewe cannot always obtain reliable resultsOther techniques exist: vanishing points for orthogonal directions [3, 14], and calibration from purerotation [11, 21]Our current research is focused on a desktop vision system (Dvs)since the potential for usingDVSs is large. Cameras are becoming cheap and ubiquitous. A DVs aims at the general publicwho are not experts in computer vision. a typical computer user will perform vision tasks only fromtime to time, so will not be willing to invest money for expensive equipment. Therefore, flexibility,robustness and low cost are important. The camera calibration technique described in this paper wasdeveloped with these considerations in mindThe proposed technique only requires the camera to observe a planar pattern shown at a few(atleast two)different orientations. The pattern can be printed on a laser printer and attached to a"rea-sonable planar surface(e.g, a hard book cover). Either the camera or the planar pattern can be movedby hand. The motion need not be known. The proposed approach lies between the photogrammet-ric calibration and self-calibration, because we use 2D metric information rather than 3D or purelyImpplicit one. Both computer simulation and real data have been used to test the proposed techniqueand very good results have been obtained. Compared with classical techniques, the proposed tech-nique is considerably more flexible. Compared with self-calibration, it gains considerable degree ofrobustness. We believe the new technique advances 3D computer vision one step from laboratoryenvironments to the real worldNote that Bill Triggs [22 recently developed a self-calibration technique from at least 5 views ofa planar scene. His technique is more flexible than ours, but has difficulty to initialize. Liebowitz andZisserman [14] described a technique of metric rectification for perspective images of planes usingmetric information such as a known angle, two equal though unknown angles, and a known lengthratio. They also mentioned that calibration of the internal camera parameters is possible provided atleast three such rectified planes, although no experimental results were shownThe paper is organized as follows. Section 2 describes the basic constraints from observing asingle planc. Section 3 describes the calibration procedure. We start with a closed-form solutionfollowed by nonlinear optimization. Radial lens distortion is also modeled. Section 4 studies configurations in which the proposed calibration technique fails. It is very easy to avoid such situationsin practice. Section 5 provides the experimental results. Both computer simulation and real data areused to validate the proposed technique. In the Appendix, we provides a number of details, includingthe techniques for estimating the homography between the model plane and its image.2 Basic EquationsWe examine the constraints on the cameras intrinsic parameters provided by observing a single planeWe start with the notation used in this paper2.1 NotationA 2D point is denoted by m-u, U. A 3D point is denoted by M-X,Y, Z. We use x to denotethe augmented vector by adding I as the last element: m-u, u, 1]"andM-[X,Y, Z, 1] .A camerais modeled by the usual pinhole: the relationship between a 3d point m and its image projection m issm=AR tM,(1)where s is an arbitrary scale factor,(R, t), called the extrinsic parameters, is the rotation and translation which relates the world coordinate system to the camera coordinate system, and A, called thecamera intrinsic matrix, is given bA=0B0with(uo, vo) the coordinates of the principal point, a and B the scale factors in image u and v axesand y the parameter describing the skewness of the two image axesWe use the abbreviation A-for(A-l)or(A)-12.2 Homography between the model plane and its imageWithout loss of generality, we assume the model plane is on Z=0 of the world coordinate systemLet s denote the i column of the rotation matrix R by ri. From(1), we haver2 ratA rBy abuse of notation, we still use M to denote a point on the model plane, but M=X, Y since Z isalways equal to O. In turn, M=[X, Y, 1. Therefore, a model point M and its image m is related by ahomography Hsm=HM with H=A rI r2 tAs is clear, the 3 x 3 matrix H is defined up to a scale factor2.3 Constraints on the intrinsic parametersGiven an image of the model plane, an homography can be estimated(see Appendix a). Let's denoteit by H=h1 h2 h3. From(2), we havehI h2 h=XA rl r2twhere a is an arbitrary scalar. Using the knowledge that ri and r2 are orthonormal, we havehiRAh=0hiaAh=hahaThese are the two basic constraints on the intrinsic parameters, given one homography. Because ahomography has 8 degrees of freedom and there are 6 extrinsic parameters (3 for rotation and 3 fortranslation), we can only obtain 2 constraints on the intrinsic parameters. Note that a-TA-l actuallydescribes the image of the absolute conic [16]. In the next subsection, we will give an geometricinterpretation2.4 Geometric InterpretationWe are now relating (3)and (4 )to the absolute conicIt is not difficult to verify that the model plane, under our convention, is described in the cameracoordinate system by the following equation:where w=o for points at infinity and w= l otherwise. This plane intersects the plane at infinity ata line, and we can casily scc that r1 and/2"e two particular points on that line. Any point on itis a linear combination of these two points, i.ear1+br00pbi: Now, let,'s compute the intersection of the above line with the absolute conic. By definition, theInt xoo, known as the circularpoint, satisfies,r>=0,1r1+br2)2(ar1+br2)=0,ora2+b2=0The solution is b-+ ai, where i2--1. That is, the two intersection points arer1土2CTheir projection in the image plane is then given, up to a scale factor, byA(r1±ir2)=h1士Point moo is on the image of the absolute conic, described by A-TA-[16]. This gives(h±ih2)2A-A-1(h1±ih2)=0Requiring that both real and imaginary parts be zero yields (3)and(4)3 Solving Camera CalibrationThis section provides the details how to effectively solve the camera calibration problem. We startwith an analytical solution, followed by a nonlinear optimization technique based on the maximumlikelihood criterion. Finally, we take into account lens distortion, giving both analytical and nonlinearsolutions3.1 Closed-form solutionB11B1231B=AAB12 B22 BB13 B23 Bsr(UU0+2+1Note that B is symmetric, defined by a 6D vectorb=[B1,B12,B2,B13,B23,B3(6)Let the itn column vector of H be hi=[hal, hi2, hi3].Then, we haveh BhVi=[hihg,hi1hj2+hi2hgl, hi2hi 2Pishi1+ hi1hj 3, hi3hj2+hi] h 3, h:3h;31/Therefore, the two fundamental constraints (3)and (4), from a given homography, can be rewritten as2 homogeneous equations in bN12 T(v11-V22If n images of the model plane are observed, by stacking n such equations as( 8 )we haveVb=0,(9where V is a 2n x 6 matrix. If n> 3, we will have in general a unique solution b defined up to a scalefactor. If n =2, we can impose the skewless constraint y =0,i.e,0,1,0,0.0,0b=0, which isadded as an additional equation to(9).(If n=l, we can only solve two camera intrinsic parameterse.g., a and B, assuming uo and vo are known(e. g, at the image center) and y=0, and that is indeewhat we did in [19] for head pose determination bascd on the fact that eyes and mouth are reasonablycoplanar. The solution to(9) is well known as the eigenvector of vv associated with the smallesteigenvalue(equivalently, the right singular vector of V associated with the smallest singular value)Once b is estimated, we can compute all camera intrinsic matrix A. Sce Appendix b for thedetailsOnce a is known, the extrinsic parameters for each image is readily computed. From(2), we have1=XA hir2=入A-h3=r1×r:t=AA h3with A-1/A h1ll-1/A h2l. Of course, because of noise in data, the so-computed matrixR.=r1, r2, ra does not in gencral satisfy the properties of a rotation matrix. Appendix C describesa method to estimate the best rotation matrix from a general 3 x 3 matrix3.2 Maximum likelihood estimationThe above solution is obtained through minimizing an algebraic distance which is not physicallmeaningful. We can refine it through maximum likelihood inferenceWe are given n images of a model plane and there are m points on the model plane. Assumethat the image points are corrupted by independent and identically distributed noise. The maximumlikelihood estimate can be obtained by minimizing the following functional∑∑m-m(AR,t,N川2(10)where In(a, Ri, ti, Mi)is the projection of point M; in image i, according to equation(2). A rotationR is parameterized by a vector of 3 parameters, denoted by r, which is parallel to the rotation axisand whose magnitude is equal to the rotation angle. R and r are related by the rodrigues formula [5]Minimizing (10)is a nonlinear minimization problem, which is solved with the Levenberg- MarquardtAlgorithm as implemented in Minpack [18]. It requires an initial guess of A, Ri, tili=1.nywhich can be obtained using the technique described in the previous subsection3.3 Dealing with radial distortionUp to now, we have not considered lens distortion of a camera. However, a desktop camera usuallyexhibits significant lens distortion, especially radial distortion. In this section, we only consider thefirst two terms of radial distortion. The reader is referred to [20, 2, 4, 26] for more elaborated modelsBased on the reports in the literature [2, 23, 25], it is likely that the distortion function is totalldominated by the radial components, and especially dominated by the first term. It has also beenfound that any more elaborated modeling not only would not help(negligible when compared withsensor quantization), but also would cause numerical instability [23, 25t(u, u) be the ideal(nonobservable distortion-free) pixel image coordinates, and(u, i) thecorresponding real observed image coordinates. The ideal points are the projection of the modelpoints according to the pinhole model. Similarly, (., g) and(i, y)are the ideal(distortion-free)andreal(distorted) normalized image coordinates. We have [2, 25=x+xk1(x2+y2)k2(x2十y=y+y(k1(x2+y2)+k2(x2+y2)2],where k1 and kg are the coefficients of the radial distortion. The center of the radial distortion is thesame as the principal point Fromu=uo + a i t ry and v=vo I By and assuming y =0, we have+(-0)[k1(x2+y2)+k2(x2-y2)2=0+(v-)k1(x2+y2)+k2(x2+y2)2](12)Estimating Radial Distortion by Alternation. As the radial distortion is expected to be small,onewould expect to estimate the other five intrinsic parameters, using the technique described in Sect. 3.2reasonable well by simply ignoring distortion. One strategy is then to estimate hi and k2 after havingand(12), we have two equations for each point in each imagc el coordinates(u, u). Then, from(11),-0CHAPI kzGiven m points in n images, we can stack all equations together to obtain in total 2mn equations, orin matrix form as Dk=d, where k=k1, k2. The linear least-squares solution is given byk=(DD)Dd(13)Once hi and k2 are estimated, one can refine the estimate of the other parameters by solving(10)withm(A,Ri, ti, Mi)replaced by(11)and(12). We can alternate these two procedures until convergenceComplete Maximum Likelihood Estimation. Experimentally, we found the convergence of theabove alternation technique is slow. A natural extension to (10)is then to estimate the complete set ofparameters by minimizing the following functional∑∑‖m;-血(A,k,k,R,t,M川2(14)a typo was reported by Johannes Koester johannes koester(@ uni-dortmund de] via email on Aug. 13, 2008where Im(A,k:1, k 2. Ri, ti, Mi)is the projection of point M, in image i according to equation(2),followed by distortion according to(ll) and( 12). This is a nonlinear minimization problem, whichis solved with the Levenberg-Marquardt Algorithm as implemented in Mirpack [18]. A rotation isagain parameterized by a 3-vector r, as in Sect. 3. 2. An initial guess of A and Ri, tii=1.n/canbe obtained using the technique described in Sect. 3. 1 or in Sect. 3.2. An initial guess of h1 and k2 canbe obtained with the technique described in the last paragraph, or simply by setting them to O3.4 SummaryThe recommended calibration procedure is as follows1. Print a pattern and attach it to a planar surface2. Take a few images of the model plane under different orientations by moving either the planeor the camera3. Detect the feature points in the images4. Estimate the five intrinsic parameters and all the extrinsic parameters using the closed-formsolution as described in sect. 3.15. Estimate the coefficients of the radial distortion by solving the linear least-squares(13);6. Refine all parameters by minimizing(14)4 Degenerate ConfigurationsWe study in this section configurations in which additional images do not provide more constraints onthe camera intrinsic parameters. Because(3)and(4) are derived from the properties of the rotationmatrix,if R2 is not independent of R1, then image 2 does not provide additional constraints. Inparticular, if a plane undergoes a pure translation, then R2= Ri and image 2 is not helpful forcamera calibration. In the following, we consider a more complex configurationProposition 1. I/the model plane ul the second position is parallel to its first position, then che secondhomography does not provide additional constraintsProof. Under our convention, R2 and Ri are related by a rotation around z-axis. That is0- sing 0RI g cos 9 000where a is the angle of the relative rotation. We will use superscript (1)nd(2) to denote vectorsrelated to image I and 2, respectively. It is clear that we haveh(2=X(2)(Ar()cos01 Ar(2)sin 0)=入A()(ni cos 6+(1)sin e(2)_x(2)(Ar sin g I Ar-co A4( hi sin/6(cos 8)Then, the first constraint(3) from image 2 becomeshiAAh22=1[(cos20-sin20)()cos 0 0(hdA-TA-h(D)-hoDA-TA-Iha)which is a linear combination of the two constraints provided by H1. Similarly, we can show that thesecond constraint from image 2 is also a linear combination of the two constraints provided by H1Therefore, we do not gain any constraint from H2The result is self-evident because parallel planes intersect with the plane at infinity at the samecircular points, and thus according to Sect. 2. 4 they provide the same constraintsIn practice, it is very easy to avoid the degenerate configuration: we only need to change theorientation of the model plane from one snapshot to anotherAlthough the proposed technique will not work if the model plane undergoes pure translationcamera calibration is still possible if the translation is known. Please refer to Appendix d5 Experimental ResultsThe proposed algorithm has been tested on both computer simulated data and real data. The closedform solution involves finding a singular value decomposition of a small 21 x 6 matrix, where r isthe number of images. The nonlinear refinement within the Levenberg-Marquardt algorithm takes 3to 5 iterations to converge5.1 Computer SimulationsThe simulated camera has the following property: a= 1250, B=900, y=1.09083(equivalent to89.950), 20= 255, U0= 255. The image resolution is 512x512. The model plane is a checkerpattern containing 10x 11=110 corner points (so we usually have more data in the v direction thanin the u direction). The size of pattern is 18cmx25cm. The orientation of the plane is representedby a 3d vector r, which is parallel to the rotation axis and whose magnitude is equal to the rotationangle. Its position is represented by a 3D vector t (unit in centimeters)Performance w.r.t. the noise level. In this experiment, we use three planes with rI=[20, 0, 0--9.-1255002,r2=0.20,0r,t2=(9,-125.510,r3=55-30,-30,-1tg=[-105, -12.5, 525. Gaussian noise with 0 mean and o standard deviation is added to theprojected image points. The estimated camera parameters are then compared with the ground truthWe measure the relative error for a and B, and absolute error for uo and vo. We vary the noise levelrom OI pixels to 1.5 pixels. For each noise level, we perform 100 independent trials, and the resultsshown are the average. As we can see from Fig. l, errors increase linearly with the noise level. (Theerror for y is not shown, but has the same property. For o=0.5(which is larger than the normalnoise in practical calibration), the errors in a and b are less than 0.3%, and the errors in uo and vo arearound l pixel. The error in uo is larger than that in o. The main reason is that there are less data inthe a direction than in the o direction as we said beforePerformance w.r.t. the number of planes. This experiment investigates the performance with re-spect to the number of planes(more precisely, the number of images of the model plane). The orien-tation and position of the model plane for the first three images are the same as in the last subsectionFrom the fourth image, we first randomly choose a rotation axis in a uniform sphere, then apply arotation angle of 30, we vary the number of images from 2 to 16. For each number. 100 trialsof independent plane orientations(except for the first three) and independent noise with mean o and