Image-Based 3D FaceModeling System .pdf2074EURASIP Journal on Applied Signal Processing密FIGURE 2: Typical skin detection examples (a) Input image.(b)Detected skin pixels marked in white2. 1. Face region detectionwhere i is the probe index, n; is the probe expansion directionMany face detection algorithms exploiting difterent heuristic and t and t, arc the threshold valucs. Nonnegative coff-and appearance-based stralegies have been proposed (a com-ient kin and kout control the model defodprehensive review is presented in [15).Among those, color-After the displacement vectors for all probes are calcubased face region detection has gained increasing popularitybecause it enables the fast localization of potential facial related, an ellipse is fitted to the centers of the repositionedgions and is highly robust to geometric variation of face pat-probes. Taking advantage of the elliptical shape of the skincolored region makes the mcthod more robust, comparcdlerns and illuminalion condilions (except colored lighTing)with the existing method for skin-pixel cluster detection asThe success of color-based face detection depends heavily onthe accuracy of the skin-color model. In our approach, thereported in [19]Bayes skin probability map (SPM)[16] is employed in the 2.2. Eye position detectionnormalized R/G chrominance color space, which has shownd good performance for skin-color modeling. Typical skin Accurate eye detection is quite important in the subsequentdetection results are presented in Figure 2feature extraction because the eyes provide the baseline inSkin color alone is usually not enough to detect theformation about the expected location of the other faciapotential face regions reliably due to possible inaccuraciesfcaturcs in the proposcd systcm. Most cyc detection mcth-of camera color reproduction and the presence of nonfaceods exploit the observation that the eye regions usually exskin-colored objects in the background. Popular methods forhibit sharp changes in both the luminance and chrominanceskin-colored face region localization are based on connectedwhich is in contrast to the surrounding skin. Researcherscomponents analysis [17 and integral projection [18].Unhave used integral projection [20], morphological filters [18cdgc-map analysis [21], and non-skin-color arca dctectioncial region reliably when the face is not well separated from [17, 22 to identify the potential eye locations in the fawhat is mistakenly classified as "skin, " as shown in Figure 21. cial image. This leads to severe distortion errors when usingIn order to cope with these problems, a deformable elliptic a nonskin color and low pixel intensities as the eye regionmodel is developed, as shown Figure 3 a. The model is initialcharacteristics. Luminance edge analysis requires a good faized near the expected face position. For example, the masscial edge map, which is difficult to obtain when the image iscenter of the largest skin-colored connected region would benoisy or has a low contrast. However, the detection of sharpa good choice, as shown in Figure 3b. Subsequently, a num-changes in the red channel image provides a more stable reber (12 in the current implementation) of rectangular probessult, because the iris usually exhibits low red channel valueswith an area Sprobe on the ellipse border deform the mode(for both dark and light eyes), compared with the surround-to extract an elliptic region of skin-colored pixels, as showning pixels in the cyc white and skin. The proposcd mcthodin Figures 3c and 3d. Let Nin and nout be the numbers ofely detects the eye-shaped variations of the red chanskin-colored probe pixels inside and outside the face ellipse,nel, which is also easily implementedrespectively. The density of skin-colored probe pixels insideIn order to more easily detect a change in intensity, theand outside the face ellipse then control the probe displace-red channel image intensity is stretched to the maximumment vector virange and a variation image is then calculated. Lct i be theoriginal red channel image; the pixel value of the variation-k. n. if Ninimage at(x, y) can then be defined asT2,Vn, a(x,y)R∑I(p)otherwise,Image-Based 3D Face Modeling System2075Exterior probe neighborhood中Interior probe neighborhoodProbe(d)FIGURE 3: Face region detection with deformable model.(a) Deformable model. (b), (c), and(d)Convergence of the deformable modelHere, Kn,x,ydenotes an(n X 7)-sized rectangle centered at luminance edge gradient [24]. Deformable models require(x, y), while Pn,r is an(n X n/3)-sized ellipse centered at r. the careful formulation of the energy term and good initial-The control parameters are the scaling coefficient, a, and the ization, otherwise an unexpected contour extraction resultexpecled size of the eye features, nmay be acquired. Moreover, il is undesirable to use luniThe meaning of the variation image Vn,(x, y) can be de- nance edges for contour detection, because eye area may havescribed as a dilatation of the high-frequency patterns in the many outlier edgesred channel facial image. The variation image is calculatedThis paper proposes a novel technique that achieves bothfor scvcral (n,a) pairs in order to cope with the high vari- stability and accuracy. Taking the luminance valucs along aance of the eye appearance, as shown in Figure 4. This results single horizontal row of an eye inage as a scalar funclionin a stable and correct behavior for the images with differ- Ly(x), it can be seen that the significant local minima cor-ent lighting and quality. The connected components of the respond to the eye boundary points, as shown in Figure 5pixels with high variation values are then tested to satisfy the This observation is valid for many images taken under veryshape, sizc, and symmetry restriction in order to obtain the diffcrent lighting conditions and qualities. The dctccted canbest-matching eye position for each variation image inde- didate pixels of the eye boundary are filtered to remove thependently. Finally, different (n,a)configurations are sorted outliers before fitting a curve to the upper eyelid points. Onso that the later ones generate a stronger response. Their re- the other hand, the lower lid is detected by fitting the eyesults are combined in this order so that later results can either corners and the lower point of the iris circle with a quadraticprovide an output if no responsc is gencratcd previously, orCurⅤCrefine the previous results otherwiseThe eyebrows can be detected simply by fitting parabolasto the dark pixels after binarizing the luminance image in the2. 3. Eye contours detectionareas above the eye bounding boxesThe eye conlour model consists ofan upper lid curve in a cu-bic polynomial, a lower lid curve in a quadratic polynomial,2. 4. Lip contour detectionand an iris circle. The iris center and radius are estimated by In most cases, the lip color differs significantly from thatthe algorithm developed by Ahlberg [23]. This is based on of the skin. Iteratively refined skin and lip color models arethe assumptions that the iris is approximately circular and used to discriminate the lip pixels from the surrounding skindark against the background, that is, the eye while. Conven- The pixels classified as skin at the face delection slage andtional approaches of eyelid contour detection use deformable located inside the face ellipse are used to build a person-contour models attracted as a result of the high values of the specific skin-color histogram. The pixels with low values of a2076EURASIP Journal on Applied Signal ProcessingFIGURE 4: Red channel variation images at different scales. (a) Input image with low quality and contrast. (b)n-Face Width/30, a-1.4(c)n-Face Width/25, a-0.6.(d)n-Face Width/30, a-0.6LuminanceFIGURE 5: Eye contour detection. (a) Source image. (b) Detected eyelid points and fitted curve.(c) Detected contour (d) Pseudo 3D lumi-nance graph of the eye area. Candidate points for eye border are marked with dark circlesperson-specific skin-color histogram, located at the lowerface part, are used to estimate the mouth rectangle. The skinand lip color classes are then modeled using 2D Gaussiankout vc,f(P)≥Tprobability density functions in(R/G, B/G) color spacc. Itis observed empirically that this color space shows excellencef(pi)T(6)in Figures 6b and 6c0 otherwise2.5. Nose contour detectionIn(6),p, g, and r denote pixel locations; S()is theThe representative shape of the nose side has already been ex-set of template points; I2(p) is a 5 X 5 neighborhood of pploited in order to increase the robustness, and its matchingVI(r) is the image spatial luminance gradient vector at rto the edge and dark pixels has been reported to be successd a(p) is the normal vector at p on the template curve. T1Lul[24, 25 However, in Cases of a blurry picture or ambient sets the minimun gradient magnitude value lo exclude theface illumination, it becomes increasingly difficult to utilize weak edges. FoM(a is the counter function that yields thethe weak edge and brightness information. In our approachnumber of pixels in S(q), which exhibit a significant gradi-a step further from the naive template methods is made. The ent magnitude with a gradient direction close to the templateproposed approach utilizes the full information of the gra- curve normal. Approximately 13% of the maximum figure-dient veclor from the edge delector. The nose-side template of-merit template positions form a set of nose-side candirepresents the typical nose-side shape in a fixed scale. In or- dates, as shown in Figure 7b. a pair of candidates, locatedder to compensate for the fixed scale of the template, the nose with the close vertical coordinates on an approximately evenimage(face part between the eye-centers, vertically from the distance from the face central line, is chosen as the mostre bottom to the top of mouth box)is cropped and scaled probable nose position, which is also shown in Figure 7b2078EURASIP Journal on Applied Signal ProcessingUpper points with fixed Y4th-degree curveCurvesectionsJunction point(b)(dFIGURE 8: Chin and cheek contour detection. (a) Deformable contour model.(b),(c), and(d) The results of contour initialization andfittings shown in Figure 7c, the final nose curves are estimated 3. PROFILE FACIAL FEATURE EXTRACTIONfrom the nose-side position and by considering a geomet- The profile feature extraction consists of two steps: profileric relation on the general human face structure between thefiducial points detection and ear boundary detectioneyes and nose3.1. Profile fiducial points detection2.6. Chin and cheek contour detectionThe algoriThm can be roughly divided into two stages thatDeformable models [26] have proven to be efficient tools include profile curve detection and fiducial points detectionfor detecting the chin and cheek contour [27]. However,inhe robustness is achieved by a feedback strategy the laterseveral cases, the edge map, which is the main information steps will examine the results from previous steps. When ersource for a face boundary estimation, results in very noisy ronous intcrmediatc rcsults arc dctccted, thc algorithm willand incomplete face contour information. A subtle model automatically go back to previous steps and fix the errorsdeformation rule derived from the general knowledge on the with additional informationhunan facial structure must be applied for accurale detection[27]. This paper proposes a simpler but robust method 3. 1.1. Profile curve detectionthat relies on a deformable model, which consists of two Profile facial curve is detected as the boundary of the facefourth-degree polynomial curves linked at the bottom point, region, which is the largest skin-colored connected compoas shown in Figurc Sa. The modcl dcformation process is dc- nenL, or another near the image center when the two largestsigned in such a way that permits the detection of the precise components have a comparable size. Note that the same skinfacial boundary in the case of noisy or incomplete face con-olor classification algorithm for the frontal image is em-tour. The gradient magnitude and the gradient direction are plcployedutilized simultaneouslyI lowever, the color-based algorithm gives incorrect reThe models initial position is estimated from the al- sulls in many cases, as shown in Figure 9. For example,ready detected facial features, as shown in Figure 8b. After the detected face region will not include the nose when athe initialization, the model begins expanding towards the strong shadow separates the nose from the face, as shownface boundaries, until it encounters strong luminance edges, in Figure 9b. This failure case can be recognized by searchwhich are collinear with the model curves. The model curves ing for the nose candidate, which is the skin-colored con-are divided into several sections. It expands until a sufficient nected component near the face region in the horizontal di-number of pixels occupied by a curve section have an edge rection. The nose tip is detected as an extreme point along themagnitude greater than a given threshold and an edge direc- horizontal direction(it is the right-most point in this impletion collinear with a model curve. The model curves are fit- mentation) in the currently detected face region(or the noseted to the repositioned section points by least-square fitting region when the nose is separated from the face). In addi-after evaluating each sections displacement. This is repeated tion, the nose bridge is estimated by fitting a line segmentuntil the model achieves a stable convergence. Figures 8b, 8c, to the local edge points, beginning from the nose tip. Theand 8d illustrate this process, showing the convergence to lnext stage is to check if there is another failure conditionactual chin and cheek boundary. The lower chin area may ex- that is, the incomplete nose area due to strong illuminationhibit weak edges or no edges at all. In this case, the lower-part as shown in Figure ge. This can be carried out by calculatingseclions stop movement when they reach the significant lu- the distance from the nose bridge to the face region bound-minance valleys. In order to prevent the excessive downward ary points. After the failure cases of skin-color segmentaexpansion, the model is not allowed to move lower than the tion are recognized, a pyramid-based area segmentation alhorizontal line, which is derived from the human face pro- gorithm[28] in the Intel Open Source Computer Vision Liportionsbrary(http://www.intel.com/research/mrl/research/opencv)Image-Based 3D Face Modeling System2079images. (b),(e), and(h)The results of color-based segmentation. (c),(f), and(i)The results of area segmentatio oFIGURE 9: Failure cases of color-based segmentation and the corrected results by area segmentation. (a), (b),and (c) The case of separateskin and nose. (d),(e), and ( f) The case of incomplete nose area. ( g),(h), and (i) The case of incomplete chin area. (a),(d), and(g)Inputis used to detect the profile curve, as shown in Figures 9c, bridge top"(B in Figure 10) is the ending point of a line seg-9f, and 9iment representing the nose bridge. A search area for detecting the under-nose point" is then defined (C in Figure 10)3. 1.2. Fiducial points detectionbased on the length of the nose bridge, and the positionThe profile fiducial points are detected after extracting the where the curvature is maximized in this area is believedprofile curve. They are defined as a few characteristic points to be the detection result. Afterwards the profile curve bepositioned on the profile curve. First, a"profile function, low the"under-nose point is approximated as a polylinex= x(y) is constructed, where y varies along the vertical with an adaptive polyline fitting algorithm. The chin pointdirection,and x denotes the rightmost x coordinate. It is (D in Figure 10)is the first polyline vertex whose succes-also smoothed with ID Gaussian filter to eliminate noise. sive line segment direction is close lo the horizontal direc-Figure 10 shows a typical profile function, where a few fidu- tion; the neck top point(E in Figure 10) is the first polycial points to be detected are also marked. The"nose tip" line vertex after the chin point whose successive line seg-(A in Figure 10) is the global maximum position. The nose ment direction is close to the vertical direction. Finally, the2080EURASIP Journal on Applied Signal Processingwith the ear boundary at some partial segment in most casesas shown in Figurc 11. The problem is then to identify thecorresponding partial segment between the template and theskin-color boundary inside the search area. After the scaleand orientation normalization it can be solved with a sim-ple curve-matching algorithm based on the similarity of thecurvc tangentIn more detail, the two curves are preprocessed to be 4Dconnected, thereby avoiding local duplication. The resultantpoint sets are denoted as i e r4)Isin for the template, and pi E R2) lisjsM for the skin-color boundaryrcspcctivcly. Next, two displacement arrays arc constructcdFIGURE 10: Typical profile fnction. Nose tip(A), nose bridge top as(vQs - qa(s+1)-qas i and (VP:- Pa(t+1)-Pati, where a(B), under-nose point (C), chin point(D), and neck top point (E)are markedis a coefficient for the sampling step. I(S, t)is evaluated as themaximum integer I that satisfiesip points are positioned as the interpolation results of theunder-nose point and the neck top point with predetermined∑‖VQ-mn-VPml‖l≤6,coefficients. The distance from the nose tip to the chin pointis then calculated. If it is too short the third failure condi-tion of the skin-color segmentation is recognized, as shown where d is a threshold to measure the similarity in the tangenin Figure 9h. This is due to the insufficient illumination For tial direction at as and pat. The position(Sm, tm), where l(s, t)this case, the algorithm will return to the previous stage in or- is a maximum, gives a match result as qi lasn sisalsm +Im)der to redetect the profile curve with the area segmentation and pib latm sisa( tm+Im). Finally, the template is translatedalgorithmbased on the partial segment match. The proposed initialization method works very well in experimenLs, as shown in3.2. ar boundary detectionFigure 1 1Since the local low-level clues are usually weak and erroneousHowever, the skin-color boundary has no coincidencein the area around the ear, a curved template, which reprewith the ear boundary when hair is too short or the illumisents the priori knowledge about the human ear is utilized tonation is too strong, as shown in Figure 12. Such cases candetect the ear boundary. The complete ear detection consistsbe automatically detected during the previous malch processof threc steps: (1)profile image normalization to compensate Dyby selecting a threshold for l(sm, tm). In this case, an attemptfor the different scale and orientation, (2 )ear initialization tois made to initialize the ear by employing edge informationmatch the template with the image ear and to translate it to A Nevatia-Babu edge detector [29] is selected in our iman initial position, and (3)ear refinement to deform the tem- Plementation for its simplicity and good performance. Theplate in order to match the accurate ear boundary.edges are thinned and linked together lo generale long segments. The previous curve-matching algorithm is then uti3.2.1. Profile image normalizationlized again between the template and the edge segments inwo profile fiducial points, the nose bridge top and the chin order to obtain the matched partial segments and a translapoint, are sclcctcd as the calibration points for normalization vector for the template In order to determine the best'tion.The original image is rotated to make the segment con- edge segmenl, the lollowing factors are considerednecting them vertical, which is then scaled to make the dis(i) The length of the matched partial segment for eachtance between them a predefined value. These two points areedge segment: longer is betterselected because the(ii) The translated template position: it should be insideey could be detected robustly, and they aredistant enough so that the normalization is less sensitive tothc scarch arcatheir detection errors. In this stage a rectangle is also definedThe result of ear initialization is shown in Figure 12as the search area for the ear by statistical analysis on the relative positions between the ears and the calibration points in3. 23. Ear refinementour test imagesBased on the initialized ear template and the matched3. 2. 2. Ear initializationsegment on the ear boundary image, a contour-followingmethod is developed to deform the template to match withAs described previously, a curve template is utilized for the the whole ear boundary.car initialization and rcfincmcnt. In this implementation, theIn more dctail, the template with linc scgmcnts is approxtemplate is a five-degree polynomial, as shown in Figure 11 imaled using an adaptive polyline filling algorithm. The first(the thick white curve). The " skin-color boundary"(the line segment that has its vertex on the ear boundary is seboundary of the face region detected with the skin-color clas- lected as the starting position of contour following. This ver-sification)is used for ear initialization because it coincides tex is denoted as Contn, and the next vertex along the polylineImage-Based 3D Face Modeling System2081(g)FIGURE 11: The results of ear initialization with skin-color boundary(thin grey curve), translated ear template(thick white curve), and thematched partial segments(thick black curve)on both skin-color boundary and the ear templateFIGURE 12: The results of car initialization using edge segments when skin-color boundary is incorrectis denoted as Context. The segment is rotated to a new poThe match evaluation between a particular segment andsition that gives the best match evaluation, which will be de- the image is defined by combining two factors. One is the lofined in the next paragraph. All the connected segments after cal edge strength, which is evaluated as the sum of the edgeConln are rotated together, as illustraled in Figure 13. Finally, gradient magnitudes along the segment. The pixels inside theletting n =next, this operation is performed iteratively to de- ear are required to be brighter than the neighboring skin pixform the whole template. ' This procedure is employed twiceels. The other is the segment similarity, that is, the sum of thefor both directions of the polylineintensity values of all the segment pixels in a line segment