Segmenting Human Hairs and Faces
Systems for segmenting human hairs and faces in color images are disclosed, with methods and processes for making and using the same. The image may be cropped around the face area and roughly centered. Optionally, the illumination environment of the input image may be determined. If the image is taken under dark environment or the contrast between the face and hair regions and background is low, an extra image enhancement may be applied. Sub-processes for identifying the pose angle and chin contours may be performed. A preliminary mask for the face by using multiple cues, such as skin color, pose angle, face shape and contour information can be represented. An initial hair mask by using the abovementioned multiple cues plus texture and hair shape information may be created. The preliminary face and hair masks are globally refined using multiple techniques.
Latest FLASHFOTO, INC. Patents:
This application claims priority to U.S. Provisional Patent Application No. 61/341,707, filed Apr. 5, 2010, which is hereby incorporated by reference herein in its entirety.
BACKGROUND OF THE INVENTIONImages have been utilized to capture precious moments since the advent of the photograph. With the emergence of the digital camera, an unimaginable number of photographs are captured every day. Certain precious moments have significant value to a particular person or group of people, such that photographs of a precious moment are often selected for a personalized presentation. For example, greeting card makers now allow users to edit, configure, or otherwise personalize their offered greeting cards, and a user will likely put in a photograph of choice to add their personal touch to a greeting card. Items that may be used for creating a personalized presentation abound, such as t-shirts, mugs, cups, hats, mouse-pads, other print-on-demand items, and other gift items and merchandise. Personalized presentations may also be created for sharing or viewing on certain devices, uploading to an online or offline location, or otherwise utilizing computer systems. For example, personalized presentations may be viewed on desktop computers, laptop computers, tablet user devices, smart phones, or the like, through online albums, greeting card websites, social networks, offline albums, or photo sharing websites.
Many applications exist for allowing a user to provide context to a photograph for providing a humorous, serious, sentimental, or otherwise personal message. Online photo galleries allow their customers to order such merchandises by selecting pictures from their albums. Kiosks are available at big retail stores all around the world to address similar needs. However, there is no approach available to the general population for creating a personalized presentation that allows for placing a person's head within another image or graphic. Indeed, the placement of a person's head from one image to another, though possessing the ability to create a multiple of applications, is a daunting task for the regular consumer and often, until now, is best handled by an expert image processor.
As should be apparent, there are needs for solutions that provide users with easier and or automated or semi-automated abilities for creating contextually personalized presentations of their images which allow for the segmentation of a human head from one image to be utilized for other applications of choice, including creating personalized presentations.
SUMMARYThe following presents a simplified summary of the disclosure in order to provide a basic understanding to the reader. This summary is not an extensive overview of the disclosure and it does not identify key or critical elements of the embodiments disclosed nor delineate the scope of the disclosed embodiments. Its sole purpose is to present some concepts disclosed herein in a simplified form as a prelude to the more detailed description that is presented later.
Segmenting or distinguishing salient regions within an image is a difficult task. Disclosed embodiments provide for creation of easier, semi-automated, and automated abilities for segmenting or distinguishing human heads from an image. In one embodiment, a head or heads to be distinguished are either provided, selected, located, or otherwise identified. This The elements to distinguish may all be processed together or may be done separately. In one alternative embodiment, data for the face of the head is gathered or acquired. Some or all of this data is then processed to distinguish the face of the head from the other elements of the image. In another alternative embodiment, data for the hair of the head is gathered or acquired. Some or all of this data is then processed to distinguish the hair from the other elements of the image. In an additional embodiment, the data for both elements face and hair of the head is gathered or acquired and this data is processed to distinguish the face and hair, individually or together, from the rest of the image. In an alternative additional embodiment, elements to be distinguished are selectively chosen to be distinguished and are done so individually or together. In images with more than one head, a single head to be distinguished may be chosen or more than one head may be chosen.
The elements that are distinguished may be represented in any number of processes, including creating an image mask. A head that has been distinguished from one image may be placed into another image or graphic. The ability to create a unique personalized message or image may be utilized to attract users to physical locations or electronically available locations, such as websites and web-forums. The attraction of users may be associated with the ability to sell advertisement space or provide an advertising campaign, either specifically related to the creation of personalized messages or images, or generally otherwise.
In another alternative embodiment, the image may be cropped around the face area and roughly centered. Optionally, the illumination environment of the input image may be determined. If the image is taken under dark environment or the contrast between the face and hair regions and background is low, an extra image enhancement may be applied. Sub-processes for identifying the pose angle and chin contours may be performed. A preliminary mask for the face by using multiple cues, such as skin color, pose angle, face shape and contour information can be represented. An initial hair mask by using the abovementioned multiple cues plus texture and hair shape information may be created. The preliminary face and hair masks may be globally refined using multiple techniques.
The accompanying drawings, which are included as part of the present specification, illustrate the presently preferred embodiments and together with the general description given above and the detailed description of the preferred embodiments given below serve to explain and teach the principles of the disclosed embodiments.
It should be noted that the figures are not drawn to scale and that elements of similar structures or functions are generally represented by like reference numerals for illustrative purposes throughout the figures. It also should be noted that the figures are only intended to facilitate the description of the preferred embodiments of the present disclosure. The figures do not illustrate every aspect of the disclosed embodiments and do not limit the scope of the disclosure.
DETAILED DESCRIPTIONSystems for segmenting human hairs and faces in color images are disclosed, with methods and processes for making and using the same.
In the following description, for purposes of explanation, specific nomenclature is set forth to provide a thorough understanding of the various inventive concepts disclosed herein. However it will be apparent to one skilled in the art that these specific details are not required in order to practice the various inventive concepts disclosed herein.
Some portions of the detailed description that follow are presented in terms of processes and symbolic representations of operations on data bits within a computer memory. These process descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. A process is here, and generally, conceived to be a self-consistent sequence of sub-processes leading to a desired result. These sub-processes are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or “locating” or “finding” or “reconciling”, or “identifying”, or the like, may refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system's memories or registers or other such information storage, transmission, or display devices.
The disclosed embodiments also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but not limited to, any type of disk, including floppy disks, optical disks, CD-ROMS, and magnetic-optical disks, read-only memories (“ROMs”), random access memories (“RAMs”), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
The processes and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method sub-processes. The required structure for a variety of these systems will appear from the description below. In addition, the disclosed embodiments are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the disclosed embodiments.
In some embodiments an image is a bitmapped or pixmapped image. As used herein, a bitmap or pixmap is a type of memory organization or image file format used to store digital images. A bitmap is a map of bits, a spatially mapped array of bits. Bitmaps and pixmaps refer to the similar concept of a spatially mapped array of pixels. Raster images in general may be referred to as bitmaps or pixmaps. In some embodiments, the term bitmap implies one bit per pixel, while a pixmap is used for images with multiple bits per pixel. One example of a bitmap is a specific format used in Windows that is usually named with the file extension of .BMP (or .DIB for device-independent bitmap). Besides BMP, other file formats that store literal bitmaps include InterLeaved Bitmap (ILBM), Portable Bitmap (PBM), X Bitmap (XBM), and Wireless Application Protocol Bitmap (WBMP). In addition to such uncompressed formats, as used herein, the term bitmap and pixmap refers to compressed formats. Examples of such bitmap formats include, but are not limited to, formats, such as JPEG, TIFF, PNG, and GIF, to name just a few, in which the bitmap image (as opposed to vector images) is stored in a compressed format. JPEG is usually lossy compression. TIFF is usually either uncompressed, or losslessly Lempel-Ziv-Welch compressed like GIF. PNG uses deflate lossless compression, another Lempel-Ziv variant. More disclosure on bitmap images is found in Foley, 1995, Computer Graphics Principles and Practice, Addison-Wesley Professional, p. 13, ISBN 0201848406 as well as Pachghare, 2005, Comprehensive Computer Graphics: Including C++, Laxmi Publications, p. 93, ISBN 8170081858, each of which is hereby incorporated by reference herein in its entirety.
In typical uncompressed bitmaps, image pixels are generally stored with a color depth of 1, 4, 8, 16, 24, 32, 48, or 64 bits per pixel. Pixels of 8 bits and fewer can represent either grayscale or indexed color. An alpha channel, for transparency, may be stored in a separate bitmap, where it is similar to a greyscale bitmap, or in a fourth channel that, for example, converts 24-bit images to 32 bits per pixel. The bits representing the bitmap pixels may be packed or unpacked (spaced out to byte or word boundaries), depending on the format. Depending on the color depth, a pixel in the picture will occupy at least n/8 bytes, where n is the bit depth since 1 byte equals 8 bits. For an uncompressed, packed within rows, bitmap, such as is stored in Microsoft DIB or BMP file format, or in uncompressed TIFF format, the approximate size for a n-bit-per-pixel (2n colors) bitmap, in bytes, can be calculated as: size≈width×height×n/8, where height and width are given in pixels. In this formula, header size and color palette size, if any, are not included. Due to effects of row padding to align each row start to a storage unit boundary such as a word, additional bytes may be needed.
In computer vision, segmentation refers to the process of partitioning a digital image into multiple regions (sets of pixels). The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images.
The result of image segmentation is a set of regions that collectively cover the entire image, or a set of contours extracted from the image. Each of the pixels in a region share a similar characteristic or computed property, such as color, intensity, or texture. Adjacent regions are significantly different with respect to the same characteristic(s).
Several general-purpose algorithms and techniques have been developed for image segmentation. Exemplary segmentation techniques are disclosed in The Image Processing Handbook, Fourth Edition, 2002, CRC Press LLC, Boca Raton, Fla., Chapter 6, which is hereby incorporated by reference herein for such purpose. Since there is no general solution to the image segmentation problem, these techniques often have to be combined with domain knowledge in order to effectively solve an image segmentation problem for a problem domain.
Some embodiments disclosed below create a mask, often stored as an alpha channel. In computer graphics, when a given image or portion of an image (or figure) is intended to be placed over another image (or background), the transparent areas can be specified through a binary mask. For each intended composite image there are three bitmaps: the image containing the figure, the background image and an additional mask, in which the figure areas are given a pixel value of all bits set to 1's and the surrounding areas a value of all bits set to 0's. The mask may be nonbinary when blending occurs between the figure and its surroundings.
To put the figure image over the background, the processes may first mask out the ground pixels in the figure image with the binary mask by taking a pixel by pixel product of the two bitmaps. This preserves the figure pixels. Another product is performed between the inverse of the binary mask and the background, removing the area where the figure will be placed. Then, the processes may render the final image pixels by adding the two product results. This way, the figure pixels are appropriately placed while preserving the background. The result is a compound of the figure over the background. Other blending techniques may be used to blend the figure with the new background, such as smoothing at the figure mask boundary.
Figure mask may be produced by segmenting the figure region from the background. In computer vision, segmentation refers to the process of partitioning a digital image into multiple regions. The pixels in a region share similar characteristics or computed properties. They may be similar in color and intensity, or be part of a larger texture or object. Adjacent regions are significantly different with respect to the same characteristic(s). Masks representing the different elements of the head may comprise of many layers, where each layer represents a meaningful region of pixels, such as face, hair, sunglasses, hat, and so on.
Throughout the present description of the disclosed embodiments described herein, all steps or tasks will be described using one or more embodiments. However, it will be apparent to one skilled in the art, that the order of the steps described could change in certain areas, and that the embodiments are used for illustrative purposes and for the purpose of providing understanding of the inventive properties of the disclosed embodiments.
The elements that are distinguished may be represented in any number of processes, including creating an image mask. The elements distinguished may be utilized to create creative, humorous, message-delivering, or otherwise personalized presentations. A head that has been distinguished from one image may be placed into another image or graphic. For example, the image mask of the head may be placed into an electronically created greeting card, or onto images with celebrities, or otherwise placed into images or graphics with the intention of representing a person or persons heads within the images or graphics. As should be apparent, the utilization of the distinguished head may be utilized in a multiple of ways to create a personalized message or image. The ability to create a unique personalized message or image may be utilized to attract users to the creation of purchasable items, such as photos, t-shirts, coffee mugs, mouse-pads, greeting cards, and the like. The ability to create a unique personalized message or image may be utilized to attract users to physical locations or electronically available locations, such as websites and web-forums. The attraction of users may be associated with the ability to sell advertisement space or provide an advertising campaign, either specifically related to the creation of personalized messages or images, or generally otherwise.
In another alternative embodiment, the location and rotation angle of eyes, nose, and mouth are also determined by Haar-based Ada-Boost Classifiers. The location of eyes can help to decide the pose angle, and all of facial components are useful for finding the face contour.
Some images may optionally be processed to enhance image quality. For example, photos taken under a dark scene, low contrast background, and outdoor environment with strong directional sun lights may be selected for enhancement.
An image taken under a dark background may be detected by checking if the average intensity level of the luminance channel in the image is larger than a chosen or set threshold. The contrast value of the image may be computed by the average range of the intensity distribution of the luminance channel near the face area. An image with low contrast background can be found by setting a threshold of the contrast value. If the input image is detected under the situation of dark background or low contrast environment, one process that may be initially applied to the image is disclosed in “Contrast Limited Adaptive Histogram Equalization,” Karel Zuiderveld, Graphics Gems IV, pp. 474-485, which is hereby incorporated by reference in its entirety for this purpose. All of the parameter setting may be determined by cross-validation.
Images taken in the outdoor environment sometimes contain directional sun light, which may cause specular and shadow effects on the face areas. All of color channels of skin pixels under specular lighting effects usually become quite bright, and therefore the color-based skin detector may fail to detect those skin pixels. The idea of illumination correction is based on finding some new color space that is insensitive to specular light, as disclosed in “Beyond Lambert: Reconstructing Specular Surfaces Using Color,” S. P. Mallick, T. E. Zickler, D. J. Kriegman, P. N. Bellhumeur, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2005, which is hereby incorporated by reference in its entirety for this purpose.
The following is a system, method or process for enhancing an image based upon correcting outdoor environmental effects, according to one embodiment. Assuming that the sun light can be represented by a white color vector, i.e. [1,1,1] in the RGB color space, a transformation may be found that rotates one color channel to be parallel to the white color vector. The rest of the two rotated color channels will become independent to the white color vector, and therefore the color space formed by these new channels is insensitive to specular illumination from the sun. Denote these two rotated color channels as Ip, Iq. The new image formed by Id=√{square root over (Ip2+Iq2)}, mostly depends on diffuse reflectance. The final output of the illumination correction is the linear blending of the diffused image and each RGB channel of the original image in the following equation: IR,G,B=αId+(1−α)IR,G,B, where α is the model parameter and may be set to 0.7.
In the embodiment disclosed in
At 203 of
At 204 of
An example color model is a 2-D Gaussian parametric model, and trained by skin and non-skin pixels indentified manually from a training data set. The description of a parametric color model is disclosed in “Systems and Methods for Rule-based Segmentation for Vertical Person or People With Full or Partial Frontal View in Color Images,” U.S. patent application Ser. No. 12/735,093, filed on Jun. 4, 2010, and “System and Method for Segmenting an Image for a Person to Obtain a Mugshot,” U.S. patent application Ser. No. 12/603,093, filed on Oct. 21, 2009, each of which are assigned to the assignee of the current application, and each of which are hereby incorporated by reference in their entirety. The color model helps to identify the skin pixels in the test image. Next a set of color pixels are sampled from the refined face mask, and used to re-calculated the 2-D Gaussian parametric model. The refined color model then is computed based on the statistics of the skin color intensity in the test image, and therefore the local variation of the skin color can be well captured by the re-calculated model. Finally, the results from the color model are combined with the initial face mask found by the shape model jointly to form a refined face mask.
In some cases, when the skin or shape model fail to capture the initial face mask the results from one of the unstable models will be discarded, and only the results from the other reliable model will be kept as the final result. The face skin mask may also used to adjust the location of the initial face box returned by the face detector.
At 205 of
At 602 of
At 603 of
At 605 of
The following formula may be utilized to define a black hair expert:
BlackHairExpert=BlackHairExpert1BlackHairExpert2
BlackHairExpert1: Thresholding based on Gaussian likelihood of luminance channel in YCbCr color space.
BlackHairExpert2: Thresholding based on the distance to the line between (0,0,0) and (1,1,1) in the normalized RGB color space.
The following formula may be utilized to define a blonde hair expert:
Gaussian likelihood of CbCr channels in YCbCr color space.
The following formula may be utilized to define a grey hair expert:
GreyHairExpert=GreyHairExpert1GreyHairExpert2
GreyHairExpert1: If average luminance channel in YCbCr color space larger than a threshold.
GreyHairExpert2: Thresholding based on the distance to the line between (0,0,0) and (1,1,1) in the normalized RGB color space.
After the classification of hair color in each JigCut region, the final hair color may be determined by the voting results of all JigCut regions inside the registered hair mask. A set of pixels in the detected JigCut regions with the major hair color may be selected to retrain the Gaussian mixture model or models. The new Gaussian mixture model may then be used to find more JigCut region of hair pixels, which may be utilized to update the initial hair mask.
Optionally, at 606 of
At 607 of
The fitting operative process tries to perform an exhaustive search in the parametric space (x0,y0,w,h) based on the face size and location, as well as yaw angle information. During each search iteration, the distance di of each edge pixel (xi,yi) in the input image to the nearest point on the curve defined by (x0,y0,w,h) is computed, as well as the angular difference θi of the direction of the tangent vectors between those two points. The objective function of the exhaustive search process may be to find the curve associating with the maximum number of the edge points where both Euclidean distance and the angular difference to the nearest points are smaller than some threshold in the following:
where t1 and t2, are two thresholds, and
δ(di<t1 & θi<t2)=1 if di<t1 & θi<t2 is true.
After the optimal curve Copt is found, pixels inside the curve may be added into the hair mask, and pixels outside the curve will be removed from hair mask to background. The refined results may be further refined by utilizing a random walk operation as disclosed below.
At 608 of
Optionally, at 206 of
At 800 of
When there are regions in the background with color similar to skin or hair, such as wooden furniture or sand, it is possible to lose all skin or hair colored pixels after least square fitting process. The face skin mask from previous modules above may be considered to be a higher confidence mask since it is more robust than the hair mask which is harder to estimate for complex color variation and different hair styles. This sub-process may include the skin mask to the refined mask after least square fitting step. The hair contour may be re-evaluated based on the current head mask. Pixels inside the curve could be added into the hair mask, and pixels outside the curve could be removed from hair mask to background. In addition, if the fitting score is less than the fitting score returned at 607 of
At 801 of
At 803 of
Based on the initial segmentation results, a set of boundary pixels between the background and head mask may be selected. The labels of those boundary pixels are re-evaluated by solving a sparse linear system. The revised version includes a regularization term of shape prior, and allows the unknown pixels to be specified and removed from the background.
At 804, a convex hull sub-process is performed. After the global refinement in the previous sub-process, the refined segmentation results may have some dents in the top of the hair area because the Least Square Fitting may not have preserved the boundary of the hair contour. The heuristic that the top of the hair forms a convex contour shape can be used to fill those dents. A convex hull sub-process may be applied to find all of the small dents in the hair boundary. All the pixels inside the region of dents may be selected, and optionally, the random walk operation may again be performed to resolve the labels of those pixels and update the head mask.
At 805, a determination of whether the refined mask's re-construction error of Principle Component Analysis (hereinafter “PCA”) projection is within a threshold is performed. The PCA subspace may be trained by a set of face images registered by ASM control points described above. The current refined face mask will be replaced at 806 by the re-constructed mask if the re-construction error is larger than a threshold. If replacement is chosen, the mask may optionally be re-sub-processed by the random walk and/or the convex hull operations, and again reevaluated by the determination at 805.
At 807, hair and skin area resolution operation may optionally occur. The refined head mask after Least Square Fitting and random walk operations may include pixels from the initial background mask. In this sub-process, those pixels will be assigned to either hair or face mask by a Gaussian Mixture Model in the RGB color space and by a random walk operation.
At 808, areas outside the chin may optionally be cleaned up. For example, the neck areas from the current refined face mask may need to be removed. This sub-process may select all the pixels outside the chin contour specified by ASM control points. In one embodiment, the pixels that appear in the hair mask will be kept. The pixels outside the chin contour may be treated as neck, hand, or body areas of the person and removed from the current segmentation result. The random walk sub-process may again be applied to refine the chin boundary of the final segmentation output. At 809, the final segmentation result is outputted, optionally scaled accordingly.
As desired, the methods disclosed herein may be executable on a conventional general-purpose computer (or microprocessor) system. Additionally, or alternatively, the methods disclosed herein may be stored on a conventional storage medium for subsequent execution via the general-purpose computer.
A data storage device 1027 such as a magnetic disk or optical disk and its corresponding drive is coupled to computer system 1000 for storing information and instructions. The data storage device 1027, for example, can comprise the storage medium for storing the method for segmentation for subsequent execution by the processor 1010. Although the data storage device 1027 is described as being magnetic disk or optical disk for purposes of illustration only, the methods disclosed herein can be stored on any conventional type of storage media without limitation.
Architecture 1000 is coupled to a second I/O bus 1050 via an I/O interface 1030. A plurality of I/O devices may be coupled to I/O bus 1050, including a display device 1043, an input device (e.g., an alphanumeric input device 1042 and/or a cursor control device 1041).
The communication device 1040 is for accessing other computers (servers or clients) via a network. The communication device 1040 may comprise a modem, a network interface card, a wireless network interface, or other well known interface device, such as those used for coupling to Ethernet, token ring, or other types of networks.
Foregoing described embodiments are provided as illustrations and descriptions. They are not intended to limit the embodiments to precise form described. In particular, it is contemplated that functional implementation of the embodiments described herein may be implemented equivalently in hardware, software, firmware, and/or other available functional components or building blocks, and that networks may be wired, wireless, or a combination of wired and wireless. Other variations and embodiments are possible in light of above teachings, and it is thus intended that the scope of invention not be limited by this detailed description, but rather by the claims following.
Claims
1-3. (canceled)
4. A method comprising:
- generating a generic facial mask associated with a facial feature of a person, wherein the generic facial mask is generated based on a first plurality of training images, wherein the generic facial mask is configured to identify the facial feature of a particular person;
- generating a generic hair mask associated with a hair feature of a person, wherein the generic hair mask is generated based on a second plurality of training images, wherein the generic hair mask is configured to identify the hair feature of the particular person; and
- storing the generic hair and facial masks.
5. The method of claim 4,
- wherein the generic facial mask comprises a first plurality of control data points, wherein the first plurality of control data points is based on a first plurality of training data point sets, wherein each training data point set of the first plurality of training data point sets is associated with a facial feature of a training image of the first plurality of training images, and
- wherein the generic hair mask comprises a second plurality of control data points, wherein the second plurality of control data points is based on a second plurality of training data point sets, wherein each training data point set of the second plurality of training data point sets is associated with a hair feature of a training image of the second plurality of training images.
6. The method of claim 4, wherein the generic facial mask comprises a first plurality of control data points, and wherein the generating of the generic facial mask comprises averaging training data point sets associated with the first plurality of training images.
7. The method of claim 4, wherein the generic hair mask comprises a second plurality of control data points, and wherein the generating of the generic hair mask comprises averaging training data point sets associated with the second plurality of training images.
8. The method of claim 4, wherein the generic facial mask is a mask associated with a chin of a person.
9. The method of claim 4 further comprising:
- generating a skin model associated with a skin color of a person based on skin color intensity of a third plurality of training images; and
- storing the skin model.
10. A method comprising:
- receiving a desired image;
- receiving a generic facial mask associated with a facial feature of a person from a memory component, wherein the generic facial mask is based on a first plurality of training images, and wherein the generic facial mask is configured to identify a facial feature of a person in an image;
- receiving a generic hair mask associated with a hair feature of a person from the memory component, wherein the generic hair mask is based on a second plurality of training images, and wherein the generic hair mask is configured to identify a hair feature of a person in an image;
- applying the generic facial mask and the generic hair mask to the desired image; and
- identifying facial feature and hair feature in the desired image.
11. The method of claim 10,
- wherein the generic facial mask comprises a first plurality of control data points, wherein the first plurality of control data points is based on a first plurality of training data point sets, wherein each training data point set of the first plurality of training data point sets is associated with a facial feature of a training image of the first plurality of training images; and
- wherein the generic hair mask comprises a second plurality of control data points, wherein the second plurality of control data points is based on a second plurality of training data point sets, wherein each training data point set of the second plurality of training data point sets is associated with a hair feature of a training image of the second plurality of training images.
12. The method of claim 11, wherein applying the generic facial mask and the generic hair mask comprises iteratively applying the first plurality of control data points to the desired image to identify the facial feature in the desired image and iteratively applying the second plurality of control data points to the desired image to identify the hair feature in the desired image.
13. The method of claim 11, wherein the identified facial feature is substantially a match of the first plurality of control data points in the desired image, and wherein the identified hair feature is substantially a match of the second plurality of control data points in the desired image.
14. The method of claim 10 further comprising:
- refining the identified hair feature and the identified facial feature in the desired image; and
- forming a head mask of a person in the desired image based on the refined hair and facial features.
15. The method of claim 10 further comprising:
- dividing the identified hair feature into a plurality of hair regions, wherein each hair region of the plurality of hair regions has a hair color associated therewith;
- classifying each hair region of the plurality of hair regions by a hair color,
- determining an overall hair color associated with the identified hair feature based on a majority of the hair classification;
- identifying a portion in the desired image that includes the overall hair color; and
- refining the identified hair feature by including the identified portion into the identified hair feature.
16. The method of claim 10 further comprising:
- applying a plurality of hair contour data points to the desired image to identify an outer hair boundary between the identified hair feature and a background portion of the desired image;
- identifying the outer hair boundary in response the outer hair boundary substantially matching the plurality of hair contour data points;
- identifying uneven portions at the outer hair boundary;
- smoothing out the uneven portions to form a new outer hair boundary that is substantially convex contour shaped; and
- updating the identified hair features to include the new outer hair boundary.
17. The method of claim 10 further comprising:
- determining an illumination level based on an average illumination level of the identified facial feature;
- determining whether the illumination level exceeds an illumination level threshold; and
- enhancing image quality of the identified facial feature in response to the illumination level exceeding the illumination level threshold.
18. The method of claim 10, wherein the generic facial mask is associated with a chin feature of a person, and wherein the method further comprises:
- identifying a chin contour in the desired image in response to a match resulting from application of the generic facial mask to the desired image;
- determining whether the identified facial feature includes portions outside of the chin contour;
- refining the identified facial feature by removing the determined portions from the identified facial feature in response to determining that the identified facial feature includes portions outside of the chin contour, and
- responsive to the refining, smoothing uneven portions along the chin contour to form a new chin contour, wherein the new chin contour is substantially convex shaped.
19. The method of claim 10, wherein applying the generic hair mask includes applying a bald head template to the desired image, and wherein the method further comprises:
- determining that a person in the desired image has a bald head if a match between the facial feature and the bald head template is within a predetermined threshold; and
- responsive to determining that a person in the desired image is bald, refining the identified hair features by removing portions from the identified hair features that are outside of a boundary represented by the bald head template.
20. A method comprising:
- generating a generic feature mask associated with a facial feature of a person that is substantially invariant from one person to another person, wherein the generic feature mask is based on a plurality of training images; and
- storing the generic feature mask.
21. The method of claim 20, wherein the generic feature mask comprises a plurality of control data points, wherein the plurality of control data points is based on a plurality of training data point sets, wherein each training data point set of the plurality of training data point sets is associated with a facial feature of a training image of the plurality of training images.
22. The method of claim 20, wherein the generic feature mask comprises a plurality of control data points, and wherein the generating the generic feature mask comprises averaging training data point sets associated with the plurality of training images.
23. The method of claim 20, wherein the facial feature of the person is selected from a group consisting of a face, a nose, eyes, a mouth, and a chin.
Type: Application
Filed: Dec 20, 2013
Publication Date: May 1, 2014
Applicant: FLASHFOTO, INC. (Los Gatos, CA)
Inventors: Kuang-chih Lee (Union City, CA), Robinson Piramuthu (Oakland, CA), Katharine Ip (Mountain View, CA), Daniel Prochazka (Pacifica, CA)
Application Number: 14/137,771
International Classification: G06K 9/00 (20060101); G06K 9/46 (20060101);