SYSTEM AND METHOD FOR FACE RECOGNITION WITH TWO-DIMENSIONAL SENSING MODALITY
A method and system in which facial image representations stored in a database are defined by facial coordinates in a plane common to other images in the database in order to facilitate comparison or likeness of the facial images by comparing the common plane facial coordinates, the common plane being determined by the locations of the eyes and mouth corners; at least one input operatively connected to the at least one processor and configured to input the corners of the eyes and mouth coordinates; the at least one processor configured to convert inputted coordinates for the corners of the eyes and mouth into estimated common plane coordinates by minimizing the error between the inputted corners of the eyes and mouth coordinates and the estimated coordinates corners of the eyes and mouth obtained from the least square estimation model of the common plane coordinates of the corners of eyes and mouth.
The invention described herein may be manufactured, used, and/or licensed by or for the United States Government without the payment of royalties
BACKGROUND OF THE INVENTIONFace recognition includes the process of recognizing an individual by comparing the captured face image against one or more stored face images to identify a match. The stored images are usually called the gallery or watch list databases. The captured image or video is usually called the probe.
In national security or military applications, there is a need for nighttime personnel target identification. At night or in darkness (without illumination), a visible face image (produced using visible light) is of limited use for identification. Thermal face images are found to be more useful at night since the thermal face images can be acquired by thermal sensors without external illumination. The thermal camera measures the heat (temperature) that is emitted (radiated) from the human face. However, most watch list databases contain visible imagery. For a given picture, visible sensors measure the light reflected from the facial surface at a single observation angle. The thermal face images and the visible face images are very different in appearance; but there is a need for cross modality face recognition, i.e., thermal-to-visible face recognition. Thermal-to-visible face recognition would include the identification of a person in a thermal image by comparing the person's thermal facial image to many visible facial images in a database or watch list.
Another challenge in facial recognition occurs during adverse conditions, such as during night time surveillance when frequently the images of the subject are not in a still or frontal position, but instead consist of multiple 3-D head rotations (pose angles). Different pose angles and different illumination conditions can result in different images. For example, the face image of a person in a side view is very different from the image of the same person in a frontal view. It is beneficial to consider the pose changes in face recognition.
The research on thermal face recognition is summarized in (1) C. S. Martin et. al., “Recent advances on face recognition using thermal infrared images,” Reviews, Refinements and New Ideas in Face Recognition, InTech Open Access Publisher, edited by Peter M. Corcoran, Chapter 5, pp. 95-112, July 2011; (2) M. K. Bhowmik, et. al., “Thermal infrared face recognition—a biometric identification technique for robust security system,” Reviews, Refinements and New Ideas in Face Recognition, InTech Open Access Publisher, edited by Peter M. Corcoran, Chapter 6, pp. 114-138, July 2011; (3) L. B. Wolff, D. A. Socolinsky, and C. K. Eveland, “Face recognition in the thermal infrared,” Computer Vision Beyond the Visible Spectrum Book, pp. 167-191, Springer London, 2006). Thermal imagery has less texture information. However, thermal imagery is less sensitive to illumination variation and expression changes. It has certain applications for nighttime personnel identification that is crucial to national security and military operations. U.S. Pat. No. 7,406,184, to L. B. Wolf, entitled “Method and apparatus for using thermal infrared for face recognition,” Jul. 29, 2008, discloses a method of incorporating the use of thermal face imagery into an end-to-end face recognition system to mitigate the influence of varying ambient illumination on systems using visible imagery. In this method, both visible face images and thermal face images are used to create a face representation template that is matched or compared with a stored database or gallery of face templates. The system requires both visible imagery and thermal imagery of an individual (wanted individual) in the probe. For a nighttime operation especially without an external illumination, this condition is not easy to meet.
Cross-modality face recognition research, in which the gallery contains images in one modality (e.g., visible) and the probe contains images in another modality (e.g., thermal), generally work using the existing techniques that were designed for visible face recognition (such as using a variation of edge information). U.S. Pat. No. 7,512,255, entitled “Multi-modal face recognition,” discloses a method to identify an individual using visible and infrared images using a sequence of multimodal data (3D from multiple visible light cameras, 2D infrared) and employing an Annotated Face Model (AFM), which is a 3D segmented facial model.
Generally, there are three common approaches to the problem of pose changes in facial recognition using two dimensional data. First, the training set contains many face images with different poses, which requires multiple datasets. An extension of this type of approach is to train each pose-specific face classifier with multiple examples of that pose. Then, the output of these classifiers is fused to give a face recognizer that can process a wider range of facial poses (for example, see U.S. Pat. No. 7,542,592 to Huang, Pose-invariant Face Recognition System and Process”). Second, for each input image, the estimate of the pose is calculated and then the input image is normalized to a virtual frontal view pose before it is matched against the gallery (see for example, U.S. Pub. Pat. App. No. 2010/0284577, “Pose-variant face recognition using multiscale local descriptors,” Nov. 11, 2010).
A third approach is to use multiple 2D images to estimate a 3D face model. The 3D geometry information can be used for pose correction (see U.S. Pub. Pat, Application No. 2010/0149177, “Generation of normalized 2D imagery and IS systems via 2D to 3D lifting of multifeatured objects,” Jun. 17, 2010, M. I. Miller).
When using 3D sensor data, a 3D face model can be established, which requires a high-resolution 3D sensor (see U.S. Pub. Pat. Application No. 2006/0120571 A1, “System and method for passive face recognition,” Jun. 8, 2006, P. H. Tu, et. al).
Visible sensor measures the reflectance of light from the object. Infrared sensor measures the heat transmitted from the object. Face images that are acquired from visible sensor and infrared sensor represent different face phenomena. Pose rotations of the human head can produce significant changes and distortions in the facial appearance of a person in the face image. Such effects cause degradation of face recognition performance, resulting in possible performance degradation.
Sensors of different modalities measure different properties of the object. Visible light and thermal infrared are in different spectrum. Therefore, visible sensors and thermal sensors capture different physical properties of the human face. Because of these physical variations, they are translated into different features.
One example is that edges are not aligned to each other in thermal and visible images. Human visual system and man-made algorithms do use edge as the primary information for recognition purposes. However, if e edge information is used to match thermal and visible face images, they do not quite match.
Other examples are that when imagery of a person is acquired from different sensors at different time, they do not have the same 3D orientation angles. This is a very common phenomenon. In a practical situation, the camera might not be exactly looking straightly at the face when the image is acquired. If one wants to match face images with different poses, one needs to address the 3D transformation among them first.
There exists a need to identify a personnel target when the probe image and gallery image are in different modalities. There exists a need to utilize common structures. There also exists a need to (biometric landmarks) characterize both images from gallery and probe, e.g., visible and thermal images.
It is known to use biometric landmarks such as the eyes and mouth to capture the predominate identifying features of an image. For example, Intel® Perceptual Computing SDK—How to use the Face Detaction Software, https://software.intel.com/en-us, Oct. 30, 2012, discloses The Intel® Perceptual Computing SDK, which is a library of pattern detection and recognition algorithm implementations exposed through standardized interfaces. The SDK provides a suite of face analysis algorithms including face location detection, landmark detection, face recognition and face attribute detection. The Intel face detection algorithm locates the rectangle position of a face or multiple faces from an image or a video sequence in real-time capture or playback mode. The detection algorithm locates the 6 point or 7 point landmarks namely, the outer and inner corners of the eyes, the tip of the noise, and the outer corners of the mouth.
Since the face images do not have the same 3D orientation angles when they are acquired from different sensors and/or at different times, there is a need to develop a new 3D registration method via a single frame for face images having 3D pose angles.
It is difficult for a machine to search facial photos for a match in part because when a person's photo appears in a non-standard position, the orientation of the face is different; i.e., the faces are generally pointed at an angle to the vertical plane of the camera. Because the orientation of the face is different, the coordinates of the facial features do not match those of a pose position in which the subject's head is upright and aligned with the camera (commonly referred to as a mug shot).
SUMMARY OF THE INVENTIONA preferred embodiment and method of the present invention provides a method and system for face recognition when face images in the gallery and probe are in different modalities and/or have different 3D pose angles due to the fact that they are acquired by different sensors and/or at different times. A preferred embodiment converts the coordinates of the eyes and mouth of a facial image appearing in a randomly oriented photograph to virtual coordinates corresponding to an estimate of how the head and facial coordinates would appear if the subject's head were to be turned such that the centers of the eyes and corner of the mouth were oriented in a vertical plane; i.e., the roll, pitch, and yaw are zero and the scale is one. The estimate is obtained using models based upon a non-linear Gaussian Least Square Differential Correction. The matching is performed based on matching the virtual coordinates of facial images in the probe against ones in the gallery.
The present invention provides a new system to mitigate degradation sources due to cross modality. It utilizes common biometric landmarks characterizing both visible and thermal images. Since the edge information and texture information do not correspond to each other in visible and thermal imagery, only common features remain for visible and thermal images of the same person. They are biometric landmarks, such as eye locations, mouth location, etc. The present invention also provides a new 3D registration method via a single frame for face images having 3D pose angles.
A preferred embodiment of the present invention comprises a system for facial recognition using an image of a human in which the eyes and mouth corners are not in a vertical plane when the image was taken to determine estimated virtual plane coordinates corresponding to estimated location of eyes and mouth corners of the human when the eyes and mouth corners are in a vertical plane; the system comprising:
at least one processor configured to determine the virtual plane coordinates;
at least one input operatively connected to the at least one processor and configured to input the first corners of the eyes and mouth coordinates F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12 from the image into the at least one processor comprising data points in a vector
F=(F1, . . . ,F12)t
where t represents the transpose, F1 represents the horizontal coordinate of the left eye outer corner, F2 represents the vertical coordinate of the left eye outer corner, F3 represents the horizontal coordinate of the left eye inner corner, F4 represents the vertical coordinate of the left eye inner corner, F5 represents the horizontal coordinate of the right eye outer corner, F6 represents the vertical coordinate of the right eye outer corner, F7 represents the horizontal coordinate of the right eye inner corner, F8 represents the vertical coordinate of the right eye inner corner, F9 represents the horizontal coordinate of the left mouth corner, F10 represents the vertical coordinate of the left mouth corner, F11 represents the horizontal coordinate of the right mouth corner, and F12 represents the vertical coordinate of the right mouth corner;
the at least one processor configured to convert the first coordinates F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12 into second coordinates for the corners of the eye and mouth in a virtual vertical plane comprising P1=(−xe1,ye1,0), P2=(−xe2,ye2,0), P3=(xe1,ye1,0), P4=(xe2,ye2,0), P5=(−xm,ym,0), and P6=(xm,ym,0), where P1 is the estimated left eye outer corner coordinates, P2 is, the estimated left eye inner corner coordinates, P3 is the estimated right eye outer corner coordinates, P4 is the estimated right eye inner corner coordinates, P5 is the estimated left mouth corner coordinates and P6 is the estimated right mouth corner coordinates and x and y represent horizontal and vertical distances from a facial reference point, and to determine the head orientation of the human subject using roll, yaw and pitch relative to the virtual vertical plane where, θ represents the yaw, φ represents the pitch, and Ψ represents the roll;
the at least one processor configured to solve for the parameter vector Vp comprising 9 parameters xe1, ye1, xe2, ye2, xm, ym, θ, φ, ψ using the following equation:
Vp=(xe1,ye1,xe2,ye2,xm,ym,θ,φ,ψ)t
where xe1, ye1, xe2, ye2, xm, ym, represent the virtual plane coordinates of the corners of the eyes and mouth in the vertical virtual plane, and wherein the error to be minimized is the error between the inputted corners of the eyes and mouth coordinates F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12 and the least square estimation model function f(Vp) coordinates of the estimated inputted corners of the eyes and mouth coordinate values comprising horizontal coordinates f1(Vp), f3(Vp), f5(Vp), f7(Vp) f9(Vp), f11(Vp), and vertical coordinates f2(Vp), f4(Vp), f6(Vp), f8(Vp), f10(Vp), f12(Vp) and where the least square model function f(Vp) is computed using
f1(Vp)=[1,0,0]TP1
f2(Vp)=[0,1,0]TP1
f3(Vp)=[1,0,0]TP2
f4(Vp)=[0,1,0]TP2
f5(Vp)=[1,0,0]TP3
f6(Vp)=[0,1,0]TP3
f7(Vp)=[1,0,0]TP4
f8(Vp)=[0,1,0]TP4
f9(Vp)=[1,0,0]TP5
f10(Vp)=[0,1,0]TP5
f11(Vp)=[1,0,0]TP6
f12(Vp)=[0,1,0]TP6
where T correlates to the head orientation θ, φ, Ψ and the matrix
T=Tθ·Tφ·Tψ
where
whereby the virtual plane coordinates are used to search a database of a plurality of images of people represented by either virtual plane or vertical plane coordinates in order to identify or recognize an individual.
These and other aspects of the present invention will be described in more detail below in conjunction with the following drawings.
The present invention can best be understood when reading the following specification with reference to the accompanying drawings, which are incorporated in and form a part of the specification, illustrate alternate embodiments of the present invention, and together with the description, serve to explain the principles of the invention. In the drawings:
The present invention now will be described more fully hereinafter with reference to the accompanying drawings, in which embodiments of the invention are shown. However, this invention should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. In the drawings, the thickness of layers and regions may be exaggerated for clarity. Like numbers refer to like elements throughout. As used herein the term “and/or” includes any and all combinations of one or more of the associated listed items.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the full scope of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
A flow chart of a preferred embodiment of the present invention is shown in
Pictures of subjects are taken with their faces not pointed directly at the camera so that the face images may contain yaw (pose rotation left and right), pitch (head tile rotation (up and down), and roll (in plane rotation). Also, pictures of a subject may be taken using a different modality (such as a thermal camera instead of a conventional camera). Accordingly, facial recognition in pictures are taken with the head of a subject not pointed directly at the camera may be effected by yaw (pose rotation left and right) pitch (head tilt up and down), and roll (in plane rotation). In accordance with a preferred embodiment of the present invention, yaw, pitch and roll are incorporated into a Unified Coordinate system (UCS). The Unified Coordinate System includes yaw, pitch and roll of the subject's head and defines by measures the location of the inner and outer points of the two eyes and inner and outer points of the mouth, Optionally, a preferred embodiment comprises the development of a parameter vector Vp, which comprises the six coordinates of the eyes and mouth, the yaw, pitch and roll of the subject's head and the scale (size) of the face. The parameter vector is defined as Vp=(xe1,ye1,xe2,ye2,xm,ym,θ,φ,ψ,α)t where xe1,ye1,xe2,ye2,xm, & ym are the coordinates used to define the previously mentioned six points of defining the eyes and mouth and θ defines the yaw, φ defines the pitch and Ψ defines the roll and α defines the scale.
As illustrated in
Referring now to
Since the face is generally symmetrical in the lateral or “x” direction, measurements for the six points P1 through P6. The left eye outer corner is represented by P1=(−xe1,ye1,0). The left eye inner corner is represented by P2=(−xe2,ye2,0). The right eye outer corner is represented by P3=(xe1,ye1,0). The right eye inner corner is represented by P4=(xe2,ye2,0). The left mouth corner is represented by P5=(−xm,ym,0) and the right mouth corner is P6=(xm,ym,0). Due to symmetry, the 6 points may be represented by six parameters Xe1, Xe2, Ye1, Ye2, Xm and Ym, where xe1 and ye1 represent the outer corners of two eyes, xe2 and ye2 represent the inner corners of two eyes, xm and ym represent the corners of mouth, which form the first six parameters for the parameter vector Vp. These 6 parameters define the face plane. The remaining parameters represent the yaw θ, pitch Φ, roll Ψ, and scale.
Referring now to the conventional image 10A of
Facial images points 10A are measurements in the conventional system with the origin located to the left of and below the facial image 10A as depicted in
Box 22 of
In order to convert the measurement of the Image 10A (points 16A-21A) into the Unified Coordinate System, the yaw, pitch and roll must be taken into account. For each landmark point (corners of eyes and mouth) on a face, the transformation equation is written as,
PMn=a·T·Pn (1)
where PMn=[u, v, r]t represents the coordinates of a landmark of an acquired face image in the measurement (image) domain, Pn=[x, y, z]t represents the coordinates of a landmark in the UCS domain, a is the scale factor, and t is the transformation matrix. This transformation matrix is written as,
T=Tsi·Tphi·Tte (2)
where Te represents the in-lane rotation transformation in the (x, y) plane, that is,
Tphi represents the pose-rotation transformation in the (x, z) plane, that is,
Tsi represents the tilt-rotation transformation in the (y, z) plane, that is,
where θ, φ, and ψ are the angles of in-plane, pose, and tilt rotations, respectively, as shown in
For a 2-D face image, the coordinates of each landmark result in z=0 for the UCS domain. For the coordinates of each landmark in the measurement domain, the r is not available. The measurable coordinates of a landmark in the measurement domain are denoted as PMn,
From equation (1), in order to estimate the transformation that transforms a face image into the UCS, ten parameters are estimated in the UCS domain, which are defined as a parameter vector:
Vp=(xe1,ye1,xe2,ye2,xm,ym,θ,φ,ψ,α)t. (6)
The coordinates of each landmark in a face image can be represented by this parameter vector, Vp, as shown in (1). The coordinates of each face landmark can also be obtained by measurements from the face image, which is denoted as
For a face image, 6 landmarks are used in the measurement domain which results 6 pairs of (u, v) values. Therefore, 12 equations can be formed from equation (1) to estimate 10 parameters in (6).
The aim is to make the difference between the measurement coordinates and the estimated coordinates of face landmarks minimal. That is,
min|
In the following, minimization problem is formulated into a cost function via a procedure that we developed in S. S Young, “Optimized target localization in stereotactic radiosurgery using real-time digital portal images,” Phys. Med. Biol. Vol. 41, pp. 1621-1632, 1996. In this way, the estimate of ten parameters minimizes the cost function.
Define Vp as the parameter vector,
Vp=(xe1,ye1,xe2,ye2,xm,ym,θ,φ,ψ,α)t. (8)
The function F, which is the function of the parameter vector Vp, can be defined as
F=f(Vp) (9)
where
where PMn,2(1)=u and PMn,2(2)=v are calculated from equation (1) using the parameter vector Vp, and n=1, . . . , 6 represent 6 landmarks and k=1, . . . , 12 represent 12 equations that are resulted from 6 landmark measurements. The measurements
|F−f({tilde over (V)}p)|. (14)
To estimate ten parameters Vp=(xe1,ye1,xe2,ye2,xm,ym,θ,φ,ψ,α)t, a non-linear least square algorithm, such as the Gaussian least square differential correction (GLSDC) can be applied. The function F can be approximated by
If ΔF is given, then ΔVp can be found as follows:
The algorithm can be summarized in the following.
Step (1) Input measurements F (Box 200).
Step (2) Initialize the parameter vector (Box 210)
V=(xe1,ye1,xe2,ye2,xm,ym,θ,φ,ψ,α)t. (17)
Step (3) Calculate the estimated coordinate PMn,2=f(Vp) using (13). (Box 220)
Step (4) Calculate ΔF=F−f(Vp) (Box 230).Step (5) Calculate the estimate correction vector ΔVp using (16) (Box 240).
Step (6) Update the parameter vector Vp=Vp+ΔVp (Box 250).
Step (7) Stop iteration if AF does not change significantly from one iteration to another (Box 260), otherwise go to step (4).
Step (8) Output the estimated parameter vector Vp (Box 270).
Referring now to
Many algorithms are available for finding available coordinates of face features. In this invention, the stereotactic registration (Box 45) and the stereotactic-based matching (Box 46) are described in the following.
Stereotactic RegistrationWhen a face image is acquired under an un-cooperative condition, the face image can contain three rotations, such as, in-plane, pose, and tile rotations as shown in
In order to match face images that are taken by different sensors and at different times, the face images from gallery and probe need to be transformed into a common coordinate system. This common coordinate system should be invariant to the sensor position with respect to the human subject when the face image is taken. In another word, this common coordinate system should be invariant of relative pose of the human head.
The following describes, a method of a generalized yaw, pitch, roll, scale, and shift transformation to a unified coordinate system (UCS) via a single frame.
Unified Coordinate System (UCS)Before describing the UCS, a human face is described in three dimensions. Looking at a side view of a human face, there is a depth from eyes to mouth. This depth is unique for each person. Therefore, for a normal posed human head, which is normally termed as a frontal pose, the centers of corners of each eye and the corners of mouth are on two different vertical planes.
Using a unified coordinate system (UCS), four points on a human face are labelled.
This face plane is different from the Face Plane that was used in U.S. Pat. No. 7,221,809, “Face recognition system and method,” May 22, 2007, Z. J. Geng. The face plane that is used in the conjunction with a preferred embodiment of the present invention is the plane that passes through centers of two eyes and outer corners of mouth and is vertical. The Face Plane that was used in U.S. Pat. No. 7,221,809 was the plane that passed through centers of two eyes and outer corners of mouth but was not defined as vertical. And it also used the location of the nose tip to derive the pan (title) angle. Usually, it is difficult to locate the nose tip accurately. In accordance with the current invention, use of the nose tip is avoided.
This Unified Coordinate System nomenclature is unique for each person. That is, it is invariant in coordinates of the sensor with respect to the person, and relative pose of the human head. A face image is transformed from any other position into the UCS. Then the face recognition or face matching is performed in this UCS. The matching could be between any desired features.
Since face images at any other positions with possible three rotation angles are transformed into the Unified Coordinate System, the face recognition problem can be addressed with un-cooperative or uncontrolled conditions where face images have 3D rotational angles; allowing the address un-cooperative face recognition based on one single 2D frame via performing the generalized yaw, pitch, roll, scale, and shift transformation to a unified coordinate system (UCS).
Shift is related to the relative position of the object to the center of the image plane. Scale is related to the object range and sensor focal plane properties (number of pixels, element spacing, etc.) Yaw, pitch, and roll are related to the pose rotation, head tilt rotation and in plane rotation.
Generalized Yaw, Pitch Roll, Scale, and Shift Transformation to a Unified Coordinate System (UCS) Via a Single FrameThe face plane under the UCS is illustrated in
In the face plane of the Unified Coordinate, the eye corners and mouth corners are in a same vertical plane where z=0. There are 6 primary points, or landmarks, to describe a face; i.e., the outer corners of two eyes, inner corners of two eyes, and corners of mouth.
The origin point O(0,0,0) of the face plane is defined as the intersection of two lines L1 and L2, as shown in
Since these 6 primary points are symmetric with respect to the center line Lc, there are 6 parameters to define the face plane. They are (xe1,ye1,xe2,ye2,xm,ym), where xe1 and ye1 represent the outer corners of two eyes, xe2 and ye2 represent the inner corners of two eyes, xm and ym represent the corners of mouth, as shown in
Therefore, these 6 primary points are presented in a three dimensional space as follows.
The left eye outer corner is represented by P1=(−xe1,ye1,0). The left eye inner corner is represented by P2=(−xe2,ye2,0). The right eye outer corner is represented by P3=(xe1,ye1,0). The right eye inner corner is represented by P4=(xe2,ye2,0). The left mouth corner is represented by P5=(−xm,ym,0)) and the right mouth corner is P6=(xm,ym,0). These parameters are obtained via a minimization procedure by solving the equation (7).
Stereotactic-Based Matching
After the facial features are transformed into the UCS, matching can be performed. In the approach of stereotactic-based matching, the matching problem is formulated as a multiple hypotheses testing as illustrated in
In this multiple hypotheses testing formation, we will present the gallery model, the probe model, and the decision rule.
Gallery Model:As shown in
Hm, m=1,2, . . . ,M (18)
with a prior probability
Pm=Prob(Gallery generating Hm) (19)
Probe Model:
The probe contains the test face image. The probe observation vector, R, contains N landmark coordinates (rxi,ryi), that is,
{right arrow over (R)}=[rx1ry1, . . . ,rxNryN] (20)
The coordinates of landmarks of the gallery are denoted as (xmi,ymi) as shown in
rxi=xmi+nxi, ryi=ymi+nyi, i=1,2, . . . N (21)
where nxi's and nyi's are independent identically distributed (iid) normal or Gaussian random variables with zero mean and variance σn2, respectively, that is,
E(nxi)=0, E(nxi2)=σn2, E(nyi)=0, E(nyi2)=σn2. (22)
Then, the conditional probability in which the probability of the observation vector given the gallery generating Hm is represented as follows,
Under the above model, the multiple hypotheses testing theory allows the development of the decision rule as shown in the following:
By manipulating the equation in (23), it follows that
In summary, the decision rule is to calculate the minimum value according (26) among M subjects in the gallery. The subject m that results the minimum value in equation (26) is claimed as the right match to the test subject. The theory of multiple hypotheses testing shows that the cost function of this decision rule in equation (26) is optimal to minimize probability of error under the AWGN measurement model.
ExampleIn the present invention, the stereotactic matching can be implemented by exploring multiple frames through temporal information as shown in
Since many sensors can capture a video of a person, each of these frames can be transformed into the UCS domain of that person. The data can be averaged to obtain a better estimate of the coordinates of the landmarks in the UCS domain. This results in a better matching score of that person since the output error of a correct match goes down by 1/M where M is the number of temporal images used. Meanwhile, for an incorrect match, the averaging would not decrease the error since there is still a mismatch.
Although various preferred embodiments of the present invention have been described herein in detail to provide for complete and clear disclosure, it will be appreciated by those skilled in the art, that variations may be made thereto without departing from the spirit of the invention.
As used herein the terminology yaw means movement of a subject to the left or right.
As used herein the terminology “pitch” means the tilting of the subject's head on the y axis.
As used herein the terminology “roll” means the in plane rotation on the z-axis.
As used herein, virtual neutral pose position or virtual vertical plane position is defined as when the head is facing the camera or image producer and tilting such that the midpoints of the eyes and corners of the mouth are on a vertical plane and the corners of the eyes and corners of the mouth are equally distant from a face center line (as shown in
It should be emphasized that the above-described embodiments are merely possible examples of implementations. Many variations and modifications may be made to the above-described embodiments. All such modifications and variations are intended to be included herein within the scope of the disclosure and protected by the following claims. The invention has been described in detail with particular reference to certain preferred embodiments thereof, but it will be understood that variations and modifications can be effected within the spirit and scope of the invention.
Claims
1. A system for facial recognition using an image of a human in which the eyes and mouth corners are not in a vertical plane when the image was taken to determine estimated virtual plane coordinates corresponding to estimated location of eyes and mouth corners of the human when the eyes and mouth corners are in a vertical plane; the system comprising: where t represents the transpose, F1 represents the horizontal coordinate of the left eye outer corner, F2 represents the vertical coordinate of the left eye outer corner, F3 represents the horizontal coordinate of the left eye inner corner, F4 represents the vertical coordinate of the left eye inner corner, F5 represents the horizontal coordinate of the right eye outer corner, F6 represents the vertical coordinate of the right eye outer corner, F7 represents the horizontal coordinate of the right eye inner corner, F8 represents the vertical coordinate of the right eye inner corner, F9 represents the horizontal coordinate of the left mouth corner, F10 represents the vertical coordinate of the left mouth corner, F11 represents the horizontal coordinate of the right mouth corner, and F12 represents the vertical coordinate of the right mouth corner; where xe1, ye1, xe2, ye2, xm, ym, represent the virtual plane coordinates of the corners of the eyes and mouth in the vertical virtual plane, and wherein the error to be minimized is the error between the inputted corners of the eyes and mouth coordinates F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12 and the least square estimation model function f(Vp) coordinates of the estimated inputted corners of the eyes and mouth coordinate values comprising horizontal coordinates f1(Vp), f3(Vp), f5(Vp), f7(Vp) f9(Vp), f11(Vp), and vertical coordinates f2(Vp), f4(Vp), f6(Vp), f8(Vp), f10(Vp), f12(Vp) and where the least square model function f(Vp) is computed using T θ = [ cos θ sin θ 0 - sin θ cos θ 0 0 0 1 ] T φ = [ cos φ 0 sin φ 0 1 0 - sin φ 0 cos φ ] T ψ = [ 1 0 0 0 cos ψ sin ψ 0 - sin ψ cos ψ ]. whereby the virtual plane coordinates are used to search a database of a plurality of images of people represented by either virtual plane or vertical plane coordinates in order to identify or recognize an individual.
- at least one processor configured to determine the virtual plane coordinates;
- at least one input operatively connected to the at least one processor and configured to input the first corners of the eyes and mouth coordinates F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12 from the image into the at least one processor comprising data points in a vector F=(F1,...,F12)t
- the at least one processor configured to convert the first coordinates F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12 into second coordinates for the corners of the eye and mouth in a virtual vertical plane comprising P1=(−xe1,ye1,0), P2=(−xe2,ye2,0), P3=(xe1,ye1,0), P4=(xe2,ye2,0), P5=(−xm,ym,0), and P6=(xm,ym,0), where P1 is the estimated left eye outer corner coordinates, P2 is, the estimated left eye inner corner coordinates, P3 is the estimated right eye outer corner coordinates, P4 is the estimated right eye inner corner coordinates, P5 is the estimated left mouth corner coordinates and P6 is the estimated right mouth corner coordinates and x and y represent horizontal and vertical distances from a facial reference point, and to determine the head orientation of the human subject using roll, yaw and pitch relative to the virtual vertical plane where, θ represents the yaw, φ represents the pitch, and ψ represents the roll;
- the at least one processor configured to solve for the parameter vector Vp comprising 9 parameters xe1, ye1, xe2, ye2, xm, ym, θ, φ, ψ using the following equation: Vp=(xe1,ye1,xe2,ye2,xm,ym,θ,φ,ψ)t
- f1(Vp)=[1,0,0]TP1
- f2(Vp)=[0,1,0]TP1
- f3(Vp)=[1,0,0]TP2
- f4(Vp)=[0,1,0]TP2
- f5(Vp)=[1,0,0]TP3
- f6(Vp)=[0,1,0]TP3
- f7(Vp)=[1,0,0]TP4
- f8(Vp)=[0,1,0]TP4
- f9(Vp)=[1,0,0]TP5
- f10(Vp)[0,1,0]TP5
- f11(Vp)[1,0,0]TP6
- f12(Vp)[0,1,0]TP6
- where T correlates to the head orientation θ, φ, ψ and the matrix T=Tθ·Tφ·Tψ
- where
2. The system of claim 1 wherein the virtual plane correlates to a plane having zero values for the yaw, pitch, and roll.
3. The system of claim 1 wherein if the subject of the image is photographed when the eyes and mouth coordinates are in a vertical plane, the vertical plane coordinates inputted into the database directly from the image.
4. The system of claim 1 wherein the plurality of images of people comprise suspected individuals, persons convicted of crimes, driver's license images of general public, criminal detection, airline screening, access control, surveillance system, identification of dead bodies, terrorist watch list, and wherein each image is represented using virtual or vertical plane coordinates of the corners of the eyes and mouth for each image in the gallery.
5. The system of claim 1 wherein the image of a human correlates to a probe and wherein the database comprises targeted individuals and wherein the virtual plane coordinates are used to determine vertical plane coordinates of the corners of the eyes and mouth of the image in the probe.
6. The system of claim 1 wherein the virtual plane coordinates of the corners of the eyes and mouth of the image of a human are matched against a gallery of virtual or vertical plane coordinates of the corners of eyes and mouths to generate the matching scores and to determine a minimum match score to determine a match.
7. The method of claim 6 wherein the matching is based on the multiple hypotheses testing with the risk function of minimizing the probability of error of a matching result and comprises calculating the matrix distances between the probe vertical or virtual plane coordinates of the corners of the eyes and mouth and each gallery vertical or virtual plane coordinates of the corners of the eyes and mouth to generate matching scores.
8. The system of claim 1 wherein the at least one processor is configured to solve for the parameter vector VP using the non-linear Gaussian Least Square Differential Correction algorithm, the at least one processor being configured to perform repeated iterations comprising first determining whether to update the vector Vp by calculating ΔF=F−f(Vp) to calculate the change of an estimated least square estimation model function relative to the actual values F, and then using the equation Δ V p = [ ( ∂ f ∂ V p ) t ( ∂ f ∂ V p ) ] - 1 ( ∂ f ∂ V p ) t Δ F to calculate the estimate correction vector ΔVp; wherein for each iteration the parameter vector is updated using Vp=Vp+ΔVp and wherein the iteration continues until ΔF is less than a pre-determined threshold at which point the iteration is stopped and the estimated parameter vector Vp is outputted.
9. A system for facial recognition in which facial images in a database are defined by facial coordinates in a plane common to other images in order to facilitate comparison or likeness of the facial images: where t represents the transpose, F1 represents the horizontal coordinate of the left eye outer corner, F2 represents the vertical coordinate of the left eye outer corner, F3 represents the horizontal coordinate of the left eye inner corner, F4 represents the vertical coordinate of the left eye inner corner, F5 represents the horizontal coordinate of the right eye outer corner, F6 represents the vertical coordinate of the right eye outer corner, F7 represents the horizontal coordinate of the right eye inner corner, F8 represents the vertical coordinate of the right eye inner corner, F9 represents the horizontal coordinate of the left mouth corner, F10 represents the vertical coordinate of the left mouth corner, F11 represents the horizontal coordinate of the right mouth corner, and F12 represents the vertical coordinate of the right mouth corner; where xe1, ye1, xe2, ye2, xm, ym, represent the virtual plane coordinates of the corners of the eyes and mouth in the vertical virtual plane, and wherein the error to be minimized is the error between the inputted corners of the eyes and mouth coordinates F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12 and the least square estimation model function f(Vp) coordinates of the estimated inputted corners of the eyes and mouth coordinate values comprising horizontal coordinates f1(Vp), f3(Vp), f5(Vp), f7(V) f9(Vp), f11(Vp),and vertical coordinates f2(Vp), f4(Vp), f6(Vp), f8(Vp), f10(Vp), f12 (Vp) and where the least square model function f(Vp) is computed using T θ = [ cos θ sin θ 0 - sin θ cos θ 0 0 0 1 ] T φ = [ cos φ 0 sin φ 0 1 0 - sin φ 0 cos φ ] T ψ = [ 1 0 0 0 cos ψ sin ψ 0 - sin ψ cos ψ ]. whereby the common plane coordinates P1 through P6 are used to find a match in a database of a plurality of images of people represented by facial coordinates in a plane common to other images in order to identify or recognize an individual.
- at least one input operatively connected to the at least one processor and configured to input the corners of the eyes and mouth coordinates F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12 from a facial image into the at least one processor using a vector F=(F1,...,F12)t
- the at least one processor configured to convert the first coordinates F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12 into second coordinates for the corners of the eye and mouth in a virtual vertical plane comprising P1=(−xe1,ye1,0), P2=(−xe2,ye2,0), P3=(xe1,ye1,0), P4=(xe2,ye2,0), P5=(−xm,ym,0), and P6=(xm,ym,0), where P1 is the estimated left eye outer corner coordinates, P2 is, the estimated left eye inner corner coordinates, P3 is the estimated right eye outer corner coordinates, P4 is the estimated right eye inner corner coordinates, P5 is the estimated left mouth corner coordinates and P6 is the estimated right mouth corner coordinates; and wherein x and y represent horizontal and vertical distances from a facial reference point, xe1 and −xe1 represent the horizontal coordinates of the outer corners of the eyes, xe2 and −xe2 represent the horizontal coordinates of the inner corners of the eyes, ye1 and ye2 represent the vertical coordinates of the outer and inner corners of the eyes, xm and −xm represent the horizontal coordinates of the corners of the mouth, ym represent the vertical coordinates of the corners of the mouth in the virtual vertical plane;
- the at least one processor being configured to solve for the parameter vector Vp using the equation Vp=(xe1, ye1, xe2, ye2, xm, ym, θ, φ, ψ, α)t wherein the parameter vector Vp comprises 10 parameters xe1, ye1, xe2, ye2, xm, ym, θ, φ, ψ, α and wherein θ represents the yaw, φ represents the pitch, ψ represents the roll and α represents the scale in the following equation: Vp=(xe1,ye1,xe2,ye2,xm,ym,θ,φ,ψ,α)t
- f1(Vp)=α[1,0,0]TP1
- f2(Vp)=α[0,1,0]TP1
- f3(Vp)=α[1,0,0]TP2
- f4(Vp)=α[0,1,0]TP2
- f5(Vp)=α[1,0,0]TP3
- f6(Vp)=α[0,1,0]TP3
- f7(Vp)=α[1,0,0]TP4
- f8(Vp)=α[0,1,0]TP4
- f9(Vp)=α[1,0,0]TP5
- f10(Vp)=α[0,1,0]TP5
- f11(Vp)=α[1,0,0]TP6
- f12(Vp)=a [0,1,0]TP6
- where T correlates to the head orientation θ, φ, ψ and the matrix T=Tθ·Tφ·Tψ
- where
10. A system for facial recognition in which facial image representations stored in a database are defined by facial coordinates in a plane common to other images in the database;
- at least one processor configured to compare the likeness of the facial images by comparing the common plane facial coordinates, the common plane being determined by the locations of the eyes and mouth corners;
- at least one input operatively connected to the at least one processor and configured to input the corners of the eyes and mouth coordinates from a facial image into the at least one processor;
- the at least one processor configured to convert the inputted coordinates for the corners of the eyes and mouth into estimated coordinates in a common plane by minimizing the error between the inputted corners of the eyes and mouth coordinates and the estimated coordinates corners of the eyes and mouth obtained from an estimation of the common plane coordinates of the corners of eyes and mouth; the at least one processor configured to perform facial recognition to determine identification of a subject by comparing the common plane facial coordinates of an inputted facial image.
11. The system of claim 10 wherein inputted first corners of the eyes and mouth coordinates F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12 are determined from a facial image inputted into the at least one processor using a vector T θ = [ cos θ sin θ 0 - sin θ cos θ 0 0 0 1 ] T φ = [ cos φ 0 sin φ 0 1 0 - sin φ 0 cos φ ] T ψ = [ 1 0 0 0 cos ψ sin ψ 0 - sin ψ cos ψ ]. whereby the common plane coordinates P1 through P6 in a database of a plurality of images of people are used to match a requested individual's facial image represented by facial coordinates in a plane common to other images in the database in order to identify or recognize the individual.
- F=(F1,...,F12)t
- where t represents the transpose, F1 represents the horizontal coordinate of the left eye outer corner, F2 represents the vertical coordinate of the left eye outer corner, F3 represents the horizontal coordinate of the left eye inner corner, F4 represents the vertical coordinate of the left eye inner corner, F5 represents the horizontal coordinate of the right eye outer corner, F6 represents the vertical coordinate of the right eye outer corner, F7 represents the horizontal coordinate of the right eye inner corner, F8 represents the vertical coordinate of the right eye inner corner, F9 represents the horizontal coordinate of the left mouth corner, F10 represents the vertical coordinate of the left mouth corner, F11 represents the horizontal coordinate of the right mouth corner, and F12 represents the vertical coordinate of the right mouth corner; and wherein the coordinates in the common plane for the corners of the eye and mouth comprise P1=(−xe1,ye1,0), P2=(−xe2,ye2,0), P3=(xe1,ye1,0), P4=(xe2,ye2,0), P5=(−xm,ym,0), and P6=(xm,ym,0), where P1 is the estimated left eye outer corner coordinates, P2 is, the estimated left eye inner corner coordinates, P3 is the estimated right eye outer corner coordinates, P4 is the estimated right eye inner corner coordinates, P5 is the estimated left mouth corner coordinates and P6 is the estimated right mouth corner coordinates; and wherein x and y represent horizontal and vertical distances from a facial reference point, xe1 and −xe1 represent the horizontal coordinates of the outer corners of the eyes, xe2 and −xe2 represent the horizontal coordinates of the inner corners of the eyes, ye1 and ye2 represent the vertical coordinates of the outer and inner corners of the eyes, xm and −xm represent the horizontal coordinates of the corners of the mouth, ym represent the vertical coordinates of the corners of the mouth in the virtual vertical plane;
- and wherein the at least one processor converts the first coordinates F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12 into second coordinates for the corners of the eye and mouth in a common vertical plane by solving for the parameter vector Vp using the equation Vp=(xe1,ye1,xe2,ye2,xm,ym,θ,φ,ψ)t wherein the parameter vector Vp comprises 9 parameters xe1, ye, xe2, ye2, xm, ym, θ, φ, ψ and wherein θ represents the yaw, φ represents the pitch, and ψ represents the roll of the facial image relative to the virtual vertical plane, and wherein the error to be minimized is the error between the inputted corners of the eyes and mouth coordinates F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12 and the least square estimation model function f(Vp) coordinates of the estimated inputted corners of the eyes and mouth coordinate values comprising horizontal coordinates f1(Vp), f3(Vp), f5(Vp), f7(Vp) f9(Vp), f11(Vp), and vertical coordinates f2(Vp), f4(Vp), f6(Vp), f8(Vp), f10(Vp), f12(Vp) and where the least square model function f(Vp) is computed using f1(Vp)=[1,0,0]TP1 f2(Vp)=[0,1,0]TP1 f3(Vp)=[1,0,0]TP2 f4(Vp)=[0,1,0]TP2 f5(Vp)=[1,0,0]TP3 f6(Vp)=[0,1,0]TP3 f7(Vp)=[1,0,0]TP4 f8(Vp)=[0,1,0]TP4 f9(Vp)=[1,0,0]TP5 f10(Vp)=[0,1,0]TP5 f11(Vp)=[1,0,0]TP6 f12(Vp)=[0,1,0]TP6
- where T correlates to the head orientation comprising yaw, pitch and roll (θ, φ, ψ) and is a matrix T=Tθ·Tφ·Tψ
- where
12. The system of claim 10 wherein the at least one processor defines common plane facial coordinates correlating to the virtual locations of the inputted facial feature coordinates located in a virtual vertical plane wherein the yaw, pitch and roll are zero.
13. The system of claim 10 wherein the at least one processor is configured to perform facial recognition by comparing the location of the coordinates of the corners of the eyes and mouth coordinates of a subject to the locations of the corners of the eyes and mouth coordinates of the images of people in the database.
14. The system of claim 10 wherein processor is configured to convert the inputted coordinates for the corners of the eyes and mouth into estimated coordinates in a common plane by minimizing the error between the inputted corners of the eyes and mouth coordinates and the estimated coordinates corners of the eyes and mouth using a least square estimation of the common plane coordinates of the corners of eyes and mouth.
15. The system of claim 14 wherein the least square estimation of the common plane coordinates of the corners of eyes and mouth is computed using a non-linear Gaussian Least Square Differential Correction algorithm.
16. The system of claim 15 wherein the Gaussian Least Square Differential Correction algorithm comprises a least square model function f(Vp) that is computed using T θ = [ cos θ sin θ 0 - sin θ cos θ 0 0 0 1 ] T φ = [ cos φ 0 sin φ 0 1 0 - sin φ 0 cos φ ] T ψ = [ 1 0 0 0 cos ψ sin ψ 0 - sin ψ cos ψ ]. whereby the common plane coordinates P1 through P6 are used to find a match in a database of a plurality of images of people represented by facial coordinates in a plane common to other images in order to identify or recognize a subject.
- f1(Vp)=α[1,0,0]TP1
- f2(Vp)=α[0,1,0]TP1
- f3(Vp)=α[1,0,0]TP2
- f4(Vp)=α[0,1,0]TP2
- f5(Vp)=α[1,0,0]TP3
- f6(Vp)=α[0,1,0]TP3
- f7(Vp)=α[1,0,0]TP4
- f8(Vp)=α[0,1,0]TP4
- f9(Vp)=α[1,0,0]TP5
- f10(Vp)=a[0,1,0]TP5
- f11(Vp)=a [1,0,0]TP6
- f12(Vp)=a[0,1,0]TP6
- where T correlates to the head orientation θ,φ,ψ and the matrix T=Tθ·Tφ·Tψ
- where
17. The system of claim 10 wherein the facial image representations stored in the database comprise one of facial images of suspected individuals, persons convicted of crimes, driver's license images of general public, criminals for criminal detection, persons for airline screening, persons for access control, persons for a surveillance system, persons for identification of dead bodies, and terrorists on a watch list.
Type: Application
Filed: Jun 30, 2016
Publication Date: Jan 4, 2018
Inventor: Shiqiong Susan Young (Bethesda, MD)
Application Number: 15/198,344