Eye gaze direction tracker
A system for eye-gaze direction detection is disclosed that uses an infrared light emitting diode mounted coaxially with the optical axis and in front of the imaging lens of an infrared sensitive video camera for remotely recording images of the eye of the computer operator. The infrared light enters the eye and is absorbed and then re-emitted by the retina, thereby causing a "bright eye effect" that makes the pupil brighter than the rest of the eye. It also gives rise to an even brighter small glint that is formed on the surface of the cornea. The computer includes software and hardware that acquires a video image, digitizes it into a matrix of pixels, and then analyzes the matrix to identify the location of the pupil's center relative to the glint's center. Using this information, the software calibrates the system to provide a high degree of accuracy in determining the user's point of regard. When coupled with a computer screen and a graphical user interface, the system may place the cursor at the user's point of regard and then perform the various mouse clicking actions at the location on the screen where the user fixates. This grants the individual complete control over the computer with solely their eye. The technology is not limited to determining gaze position on a computer display; the system may determine point of regard on any surface, such as the light box radiologists use to study x-rays.
This invention describes an eye-gaze direction detecting device which is utilized as a means of interfacing a person with a computer.
The present invention is an improvement over the invention disclosed in U.S. patent application Ser. No. 267,266 filed on Nov. 4, 1988 now U.S. Pat. No. 4,950,069 which disclosure is hereby incorporated herein and made a part hereof. The improvement relates to a new calibration arrangement granting greater accuracy, better algorithms that allow a more robust determination of pupil-glint displacement, and features that allow an operator to effectively exploit a graphical user interface (GUI).
There are a number of eye-gaze direction detector techniques. Some of these have been applied to assist a handicapped person such as a quadriplegic who has lost many of his or her physical abilities and may have even lost the ability to speak clearly. Many of these severely handicapped people still have control over their direction of gaze and all of their mental faculties. Thus, a device that enables them to fully operate a computer with their eye and therefore communicate messages and maintain productivity is a great benefit both physically and psychologically. As used in this disclosure, "eye-gaze direction detector" refers generically to any technology related to detecting either movement of the eye or eye-gaze direction.
One of the eye movement technologies is electrooculography, which uses the difference in voltage between the cornea and the retina of the eye. Another technology is corneal reflection which uses the reflection of a beam of light from the various surfaces of the eye the beam crosses. The brightest reflection is the outer corneal surface (first Purkinje image) with the second, third, and fourth Purkinje images being dimmer and corresponding respectively to the inner surface of the cornea and the outer and inner surfaces of the lens. Usually the reflections from the outer surface of the cornea and the inner surface of the lens are the two utilized.
A third technology is limbus tracking that detects the generally sharp boundary between the iris and the sclera (the white part of the eye). A fourth technology is the use of a contact lens that rotates with the eye and is fitted with various devices to determine the amount of eye rotation. A fifth technique is to measure the eccentricity of the pupil which varies from circular when viewed head on to elliptical as it rotates away from the axis in which it is viewed. A sixth technique is a movement measurement based on the head and eyes being moved together. A seventh technique is the oculometer that determines the center of the pupil and the corneal highlight from a reflected light and the change in the distance and direction between the two as the eye is rotated. Additional techniques exist, but these are the major ones. Some of them incorporate a head or eye mounted devices whereas others permit the eye-gaze direction detector to be mounted remote to the head.
The present invention relates specifically to the category of oculometer eye measurement detectors but some aspects of the invention can also be utilized with other techniques. The invention is primarily designed to provide an eye-driven method of interfacing a person with the computer. This is of fundamental interest to handicapped individuals who would benefit greatly from an eye-driven device. When a healthy functional person suddenly loses their ability to act for themselves and to communicate, the mind becomes trapped within the body. The need to communicate and remain productive becomes more acute for the individual when they are robbed of motor functions and speech.
Complete motor dysfunction occurs as a result of both traumatic spinal injuries and gradual paralysis during the course of peripheral neuromuscular diseases. It affects young children by making them grow up in mental and physical isolation without a way to educate their minds or to acquire skills for an adult life. It also affects adults who are accustomed to a normal lifestyle and who must learn to cope with the resulting severe form of isolation from family, friends, and others who must now care for them.
Use of the present invention prevents this isolation by allowing the individual to have complete control over a computer with their eye. This grants the person the ability to effectively communicate on his or her own. Also, in a society where computers are becoming increasing pervasive, this allows a handicapped person to remain productive and possibly employed. This greatly increases the quality of the handicapped person's life and the lives of those who care for them.
While the invention has been primarily developed for handicapped persons, the technology may also be used for industrial control, cockpit control, video game control, workstation control, and other applications where eye-gaze control is desired. The invention may also be used for testing and training purposes based on eye movement, such as testing or training radiologists to read x-rays.
Because of the physiological make-up of the human eye, the eye normally directs its gaze with a very high degree of accuracy at the gaze point. This is due to the photoreceptors of the human retina not being uniformly distributed but instead show a pronounced density peak in a small region known as the fovea. In this region, which subtends a visual angle for about 1.degree., the receptor density increases to about 10 times the average density. The nervous system controls the muscles attached to the eye to keep the image of the region of current interest centered accurately on the fovea as this gives the high resolution. The appearance of high resolution at all directions outside of this region is thus an illusion maintained by a combination of physiological mechanisms (rapid scanning with brief fixations), and psychological ones. As an example, a character on a typical computer display screen subtends an angle of about 0.3.degree. at a normal viewing distance. Such characters cannot be accurately resolved unless the eye is accurately aligned for a duration of 0.2 seconds.
The eye-gaze direction detecting system described here is designed to allow control over the computer through dwell time activation of commands normally controlled by mouse clicks. By fixating on an area of the computer display, the system performs a user selected mouse action at that position, thus granting the user the same control over the computer that is allowed by a conventional mouse.
The main components of the system include a charge-coupled device (CCD) camera sensitive to the near infrared with a 300 mm zoom lens and a coaxially mounted infrared light source. An infrared filter allows primarily infrared light to reach the camera. This assembly is housed inside a module that rests beneath the computer monitor. The module employs a folded optical arrangement that directs the camera's line of sight to the user's eye through the use of two mirrors. This keeps the system relatively small and unobtrusive. One of the primary advantages of the system is that it does not require any devices to be fixed to body of the user. The camera is coupled to an inexpensive computer that contains a frame grabber or image processing board. The board digitizes video images of the eye acquired by the video camera, and the algorithms forming a part of this invention then process the image.
The operant image consists of a video image encompassing one side of the user's face. It contains the iris and sclera (both dark), the reemission of the infrared light out of the pupil (bright eye), and the corneal reflection of the infrared light source (glint). An in-focus bright eye image gives a high contrast boundary at the pupil perimeter making it easily distinguishable. The system computes the relative x-y coordinates between the glint center and the pupil center as determined by the pattern recognition software incorporated in this patent. One of the improvements to the pattern recognition software is the inclusion of algorithms that enable individuals with glasses to use the system. Glasses create extraneous reflections from the rims and lenses (glares). These reflections can possibly make pupil and glint identification difficult. The new improvements essentially remove a majority of these added reflections from the image thereby allowing individuals with glasses to benefit from an eye-controlled device.
The glint and the bright eye are features produced by the infrared light emitted by the semiconductor diode, mounted coaxial with the video camera lens. The illumination is invisible to the user and has a principle advantage of producing very little subject awareness of the monitoring process. Since the light source is in coaxial alignment with the video camera lens, it results in light re-emitted back from the retina which effectively backlights the pupil so that it appears as a bright disk. This is the bright eye effect and provides a distinctive contrast to the rest of the eye or facial details. This greatly facilitates the determination of the periphery of the pupil so that the center of the pupil can be accurately calculated, which is an important feature in an oculometer. The system measures the differential between the pupil center and glint center and uses this data to compute where the eye is gazing.
The system detects when the eye lingers for a predetermined period at any position on the display. If the predetermined linger period is exceeded, then the system magnifies the area the user was looking at. A window at or near the center of the screen appears and the magnified area is placed in this window. The user then fixates in the magnified area at a point where they wish for a mouse action to be performed. After the predetermined fixation period has expired, a menu appears that allows the user to select what mouse action they wish to perform--left mouse button drag, left mouse button single click, left mouse button double click, right mouse button drag, right mouse button single click, or right mouse button double click. After fixating on the menu command for a predetermined amount of time, the system then performs the desired mouse action at the position fixated on in the magnified area.
Thus, the present invention determines eye gaze position using effects generated by infrared light. An infrared light emitting diode mounted at the center of the video camera lens illuminates the user's face with infrared light. A small fraction of the light entering the eye is absorbed and re-emitted by the retina, creating an image that is the same phenomenon which causes the pink eye effect in photography where the flash bulb is located close to the optical axis of the camera. Basically, it causes the pupil to be brighter than the rest of the eye whereas it is usually darker than the remainder of the eye. The cornea reflects an intense virtual image of the light emitting source and is referred to as the glint. The glint is usually located within the pupil's image or along or near its outer circumference.
The computer contains a frame grabber card that is commercially available and captures a frame of the video camera as a 640.times.480.times.8 bit frame. The pattern recognition software of the present invention calculates eye-gaze direction by determining the distance and angle between the glint center and pupil center. To do this, the system first obtains the intensity thresholds between the glint and the surrounding portions of the eye and then between the pupil and the surrounding portions of the eye. This step is called threshold setting. Upon completing threshold setting, the system may then accurately calculate the displacement measurements between the glint and pupil center. To effectively control the computer, the user must undergo the next step, calibration. Completion of this step, which entails the user looking at a sequence of fixed points on the screen, allows the software to map a particular glint-pupil displacement to a particular point on the screen. Thus, the user may operate the computer using solely their eye.
The following figures illustrate the setup of the system and serve to facilitate an understanding of how the system operates as described below.
FIG. 1 shows the equipment used and the optical path of the system.
FIG. 2 describes how the glint-pupil relationships change as the user looks at different positions on the screen.
FIG. 3 is a flowchart for the basic procedure used in the threshold determination algorithm.
FIG. 4 describes the glare removal algorithm.
FIG. 5 shows how the glint center is determined.
FIG. 6 shows the six point calibration layout.
FIG. 7 shows how the system identifies features in the actual operation of the system.
FIG. 8 describes how the system maps the glint-pupil displacement to a particular position on the screen.
FIG. 9 shows how the zooming feature with the system works.
FIG. 10 shows the menu for activating mouse control actions.
FIG. 1 shows the setup of the system. The setup includes a standard computer display resting on top of a module that houses the eye-tracking equipment. Two mirrors on the module direct the camera's line of sight to the user's eye. The computer display sits so that the image of the camera in the mirror is at the center horizontally of the computer display.
The current camera employed is the Hitachi KP-160 camera but other cameras can perform the same job. The lens is a 300 mm zoom lens but again most any lens with suitable magnification would allow the device to operate.
The infrared light emitting source is a readily available gallium arsenide diode. They are approximately 1 to 3 cm in diameter and are normally used for communication purposes with a narrow light beam dispersion as opposed to some types that send light in all directions and are used mainly as indicator lights. They usually have a small built in molded clear lens that condenses the light into a narrow beam. Additionally, the diode is placed in a plastic tube that serves to condense the beam further and to ensure that the infrared light striking the eye does so by following the optical path shown in figure one and not by bypassing the mirrors altogether. This would give a double glint in the camera image and decrease the reliability in determining the proper glint-pupil displacement. The diode preferred emits a focussed beam of light at a wavelength of 880 nm, but wavelengths as high as 900 nm are acceptable. This wavelength is invisible to the user and of a sufficiently low intensity such that it is safe for continuous use by the operator. The diode is mounted at the center of a filter screwed onto the lens. This filter blocks most visible light, allowing the infrared to pass through with little decrease in intensity.
The secondary monitor normally placed by the side of the computer monitor shows the eye image seen by the camera and allows the user to ensure that they are in focus and properly positioned in front of the video camera.
FIG. 2 describes qualitatively the glint-pupil relationships seen as the user gazes at different positions on the display. When the user looks directly back into the camera, the glint is approximately at the pupil's center. As they look directly above the camera, the glint center drops below the pupil center. When the user looks to the left, the glint center moves to the left, so the glint is to the bottom left of the pupil center. As the user looks to the right, the glint center moves to the right of the pupil center.
Installed inside the computer is a Matrox Meteor frame grabber card that acquires images from the video camera, but any compatible frame grabber card will do. The card uses the RS-170 standard to acquire an image, digitizes it into a sequence of pixels, and maps those pixel values into system memory. This results in a 640.times.480 resolution image with each pixel having a value from 0 to 255, with 0 being black and 255 being white. The intensity values in between represent different shades of gray.
ERICA HQ is the software program that provides an interface for manipulating the setup of the eye-controlled mouse and for undergoing calibration and threshold setting.
As mentioned earlier, the first step in using the system is to set thresholds; the basic procedure is outlined in FIG. 3. The object of this stage is to obtain the minimum intensity values of the glint and pupil so that the software may quickly identify both of these features during calibration and actual system operation.
In the first stage of threshold setting, the user looks directly back into the center of the camera, focusing on the image of the light emitting diode (LED) seen in the top mirror. The system snaps an image and maps it into memory.
Once the image has been acquired, the system then checks if any glares are present using the flowchart shown in FIG. 4. The origin of the camera coordinate system is the upper-left comer of the camera image. The x coordinates increase as the system moves to the right. The y coordinates increase as the system moves down in the image. There are six parameters of interest. All of these parameters are set in or determined by HQ based on the user's desires or the camera setup. The first parameter, search scheme, specifies whether the software searches horizontally or vertically when trying to find the boundary of a glare. A third setting has the software perform two passes; one searching for glares horizontally first, then searching for the glares vertically. This last method maximizes the reduction of the glares but does slow system processing down. Two additional parameters specify how many lines to skip vertically and how many to skip horizontally as glare detection proceeds through the image. The more lines skipped, the less efficiently the system will remove the glares, but the less time will expire before the glare removal algorithm completes its task. Another parameter is the glare threshold, which is the intensity value above which a pixel could potentially be considered part of a glare. Another is the pixel length of the glare, which defines the number of pixels in succession that must be found greater than the glare threshold for a region of the image to be considered a glare. A glare generally has the same or higher intensity than a glint, but the glare is larger. Once a glare has been found, then the software will find a bounding rectangle for the glare. To determine this rectangle, the glint threshold is reduced by the last parameter, the adjust factor. This allows the software to remove the halo that typically accompanies a glare but is generally of a lower intensity than the glare. Once the threshold is reduced, then the system begins bounding the glare.
The system starts by moving to the right of the current sequence of pixels it believes to be part of a glare. The system checks each pixel as it advances to the right and stops its check when it either reaches the edge of the screen or finds a pixel whose intensity is below the glare threshold. The software then checks to the left, again advancing left until it either reaches the screen edge or finds a pixel whose intensity is below the glare threshold. The system then decrements its y value and checks to the left and right again. After an upper bound has been reached (the pixel value at the new y coordinate is below the threshold or the screen edge is reached), then the system increments its y value to find the bottom of the rectangle. During the checking, the overall minimum and maximum x values are stored. After the bottom of the rectangle is found, then the system has found both the minimum and maximum x and y values for the glare, so the glare is ready for removal, and the glare threshold is returned to its old value.
The system begins its search for the glares in the upper left-hand corner of the camera image. If any glares were found, then the system removes the glares by turning every pixel in the glare's bounding rectangle black. Glare removal is fast and has the advantage of decreasing the number of areas that the feature finding algorithms will query to attempt to identify the glint and pupil. Upon removing the glares, the system is now ready to attempt to set the thresholds (minimum intensity values) for the glint and pupil.
The first step is to determine a glint threshold. The system first makes a histogram of the intensity values for each pixel of the camera image. The intensity values range from 0 to 255, so there are 256 buckets. The number of pixels with an intensity value equal to the value of the bucket is stored in each bucket. Were this data to be plotted, several humps would appear. The lowest and largest hump would represent data that shows the intensity values of the face. The next highest hump would represent data for the bright eye and a spike near the highest intensity values would show an intensity value for the glint. To determine a starting threshold for the glint, the system begins at the end of the histogram (intensity value 255) and moves backward to each bucket, summing the value in each bucket. Once a value of twenty pixels is reached, the system sets the current bucket number to be the threshold for the glint.
The next stage is to test this threshold by attempting to find a glint. The system will alter the threshold as errors occur in determining the glint's bounding rectangle. FIG. 5 shows a flowchart representing how a glint center is found. This is the same algorithm used to determine the glint center in calibration and runtime operation of the system.
The first step to finding a glint is to find a seed point, which is the pixel that serves as the starting point for attempting to find the glint's bounding rectangle. The system scans the search region to find the first pixel whose intensity value is greater than the glint threshold. If a seed is found but it is in the seed point memory array, which serves as a buffer containing the locations of all invalid glints, then it is ignored. The search region is either the entire image if the system is in a global search mode or a region of 180.times.180 pixels centered on the last valid glint if the system is in the local search mode.
Once a valid seed is found, then the system tries to find an upper bound on the glint. The system decrements its y counter by one pixel. It then tries to find the minimum and maximum x coordinates for the glint with the current y value. The system checks each pixel to the left and right and stops its check once it finds a pixel with an intensity value lower than the threshold. The system stores the overall maximum and minimum x coordinates of the glint's bounding rectangle for all lines checked. This process of checking each line is called filling. A filling error occurs when the system reaches the edge of the camera image and all the pixels checked are still above the threshold value. Once the first filling operation is completed, the system determines the x coordinate center of the line. This is the x value where the system will start its checks on subsequent filling operations as the y counter is changed. As the system decrements its y counter, it checks the intensity of the point at the new y and the previously determined x center. If the intensity is below the glint threshold, then the top of the glint has been found.
Next, the system tries to find a valid lower bound of the glint if the upper bound was found without error. This follows the same algorithm as far as filling is concerned. The only difference is that the y counter is being incremented.
If filling completes successfully, then the system determines the glint diameter, center, and aspect ratio. The diameters are calculated based on the maximum and minimum x and y coordinates. The aspect ratio is the ratio of the x diameter to the y diameter. All three of these values are checked against preset values and tolerances to ensure that a proper glint is found--that its shape belongs to a valid glint. For example, the glint's bounding rectangle should be approximately a square, so an aspect ratio that is too far above or below the value of one would signify an erroneous glint. If these three checks pass, then the system calculates the center of mass of the glint. As filling occurs, the system sums the number of pixels in the glint, and the x coordinates of valid glint pixels are summed as well as the y coordinates. The x coordinate of the center of mass is the x sum divided by the number of pixels and the y coordinate is the y sum divided by the number of pixels.
If the glint is not found due to a filling error, then the software performs the glint error handling routine as described below and attempts to find another glint. If no seed was found, then decrement the threshold by 10 units. If the glint threshold goes below the average image intensity, then snap a new image and restart the threshold setting procedure.
The glint error handling routine shown consists of a feature called seed point memory. This is a new innovation with the system that keeps it from bouncing between potential glints. Once a glint is found to be erroneous, then its bounding rectangle is added to the seed point memory. The system will then ignore any seed points that it finds in these regions while it attempts to analyze the present image. Once a new image is snapped, then this memory is cleared.
Next, the system sets an initial approximation for the pupil threshold. This value is set at 10 percent above the average image intensity.
The software then tests the threshold by attempting to find a pupil. The algorithm used is the same as the one used for the glint, except the system checks pixel intensities relative to the pupil threshold. If no pupil is found due to not being able to find a seed, then we are not looking at the correct glint. Perhaps the system is trying to bound some other reflection from the eye or a small glare escaped elimination. The software performs the glint error handling routine and tries to find a new glint. If the fill routines failed, then the pupil threshold might be too low, so try setting a higher threshold. The pupil threshold is increased by 10 percent, and the system tries to find the pupil again. If this step has already occurred, then we are searching around a bad glint so perform the glint error handling routine and try to find a new glint. The system then checks that there are not too many pixels in the pupil greater than the glint threshold. If this test fails, then the pupil found is not the true pupil, so the system performs the glint error handling routine and tries to find a new glint. If the pupil is found, then the system then tries to relocate the pupil after making the threshold 90 percent of the current average pupil intensity. The system uses a different routine to now find the pupil. This is the method used in the runtime operation of the system. This routine is essentially the same as the one used to find the glint, except the system will try to find the pupil several times before failing this section of the routine. Basically, if the pupil center determination fails, then the search region is adjusted based on the previously found max of the pupil; the new x starting position is set to the pupil x max and the y starting position is set to the pupil seed's y value plus 1. The initial search region for the pupil seed is a rectangle that is 30 pixels by 10 pixels and centered at 5 pixels above the potential glint's center. If the pupil cannot be found, then the glint is bad so perform the glint error handling routine and try to find a new glint.
Once both a glint and a pupil are identified, the system has now found valid thresholds. One of the improvements in the system is the use of dynamic thresholding, which means the threshold values used to define the minimum glint and pupil intensity are constantly updated as the system operates to provide more accurate bounds on the features during runtime. So, the software will now setup the minimum allowed pupil and glint thresholds, which are set to 2/3 of the current threshold for the respective feature, ensuring that the minimum glint threshold is not below the current pupil intensity. Also, the system stores the current threshold to use as a reset point if the feature threshold ever gets adjusted too far. The system also stores the current glint intensity relative to pupil intensity to serve as an additional check during calibration and runtime. Lastly, minimum and maximum glint and pupil sizes are set based on the current features sizes.
An innovation added to this invention is the storage of the x and y offsets of the pupil center from the glint center that occurs as the user looks back into the camera. All of the mathematical models used in calibration and the discussions so far have assumed that the glint center is truly at the pupil center when the user looks back into the center of the camera; however, this is not the case for everyone. Thus, in subsequent determinations of the glint-pupil displacement, the system will subtract these stored offsets from the calculated x-y glint-pupil displacement before converting to the polar coordinates used by the calibration equations. This compensation more than doubles system accuracy for the typical user, granting them a consistent accuracy over the entire computer display. This increases the proportion of the population that can effectively use the system to control the GUI.
In the second stage of threshold setting, the user looks at the bottom of the computer display. The above algorithm is repeated, except the glint-pupil offset is not stored. This stage is necessary because the pupil is usually at a lower intensity when looking at the bottom of the screen than when looking directly back to the camera, and the pupil threshold needs to be set at the lowest possible valid value.
Once the user has successfully completed threshold setting, they are ready to calibrate. Calibration is the process of determining the equations used to map angles and radii between the pupil and glint centers of the user's eye to radii and angles on the screen, with the origin of the coordinate system for the screen being just below the bottom center of the monitor's screen. To determine these equations, the user fixates on a sequence of calibration points shown on the computer display as either a dot or a picture of some type. Calibration is essential to achieving good accuracy with the system. No matter how well the system determines the pupil and glint centers, if the regression equations are faulty, then the accuracy will be poor. The current calibration algorithm sets up calibration points at fixed angles and radii from the origin of the monitor's coordinate system. The resulting configuration is a web of points expanding from just below the bottom center of the monitor's screen. Each radial line, a line of calibration points at a fixed angle from the monitor origin, performs its own linear regression for determining the equation used to map a radius between the pupil and glint center to a radius on the screen. Each arc of a constant radius performs its own linear regression for determining the equation used to map an angle between the pupil and glint center to an angle on the screen. These resulting equations allow the system to determine where the user is looking on the screen with a high degree of accuracy. FIG. 6 shows the current six point calibration method and these points grant the typical user an accuracy level of 5 mm and better at a standard viewing distance from the display.
The calibration procedure consists of the user fixating on each calibration point as the system shows each point one at a time. While the user fixates on the point, the software is snapping images and identifying glint-pupil displacements using the methods described above to determine the glint and pupil center. Note that these displacements are corrected by the glint pupil offset found during the first stage of threshold setting. The system tries to obtain ten valid displacements for each calibration point. The software then computes the median value for the ten samples and discards any samples that are too far away from the median; if the computed radius for a sample is more than one pixel from the median or if the computed angle for a sample is more than two degrees from the median, then the sample is considered erroneous. An average glint-pupil displacement is then determined by averaging the displacements of the remaining samples. If five or more samples are thrown out or ten valid samples cannot be obtained, then the system attempts to recalibrate the point. If the point fails calibration three times, then the software moves on to the next point and removes the calibration point from the linear regression algorithm.
As described above, the system determines a sequence of equations to map a particular glint-pupil vector to the screen. Once all calibration points have been completed, then the system uses a linear regression to calculate these equations. The system tries to find the best-fit linear line through the points corresponding to the different radii and angles that maximizes the correlation coefficient. If this value is too low or only two points went into the regression (a linear regression can always determine a line through two points), then the system will signal to the user that some of the points may need recalibrating if they wish to obtain better accuracy. The software has a method that allows the user to recalibrate only the points that caused the regressions to fail.
With calibration completed, the user may then enable the system through the HQ interface. If mouse emulation is enabled, then the mouse cursor will follow where the user is looking.
To determine the user's point of regard on the screen and to effectively control the cursor, the software uses the algorithm shown in FIG. 7. HQ allows the user to specify the sampling rate--the frequency at which the system acquires an image and attempts to determine where the user is looking. A higher sampling rate burdens the system more, but delivers more data to the operating system. Even with the maximum sampling rate of 15 samples per second, effective control over the computer is possible with little degradation of system performance; however, when using a processor intensive application like a video game, the sampling rate might need to be reduced to improve performance. When the system is enabled, a timer starts that signals the system to acquire images at a frequency corresponding to the sampling rate. Once the image has been mapped into memory, the software uses the routines described above to determine the bounding rectangle for both the glint and the pupil. To speed up the search process, the system has two searching modes--local and global search. A global search attempts to find the glint at any region in the camera image. A local search centers the searching region on the previously determined valid glint; in essence, the system is assuming the user is sitting in the same position. Local searching greatly reduces the processing that must occur for the system to identify valid features, as a user's position typically does not vary much between successive frames.
In the event that a glint cannot be found, the system dynamically adjusts the glint threshold in an attempt to find the correct glint. If a filling error occurred, such as the system believing the glint was at the edge of the camera screen, then the system simply performs the glint error handling routine and jumps to a global search. If no seed was found, then the search region was exhausted, so it is time to adjust the glint threshold. If the bound on the bad glint resulted in a glint too big, then the threshold is increased by 5 pixel intensity units, which range from a value of 0 to 255. If the glint was too small, then the threshold is decreased. If neither of these conditions arose, then the system checks the search mode. If a local search is being conducted, then the threshold is decreased. If this occurs too many times, then the search mode is changed to global search. If a valid glint cannot be found globally, then the system simply waits. If three frames pass with no valid glint, then the threshold is reduced. If the threshold is reduced below the minimum threshold value for the glint, then tracking is failed for the image, and the thresholds are reset.
If a pupil could not be found, then the system jumps to a global search mode and performs the glint error handling routine and tries to find a new glint.
Once both features are found, then the system performs a few additional checks before calculating gaze position. First, the software checks the glint intensity relative to the pupil intensity. If this ratio is too small (as compared to the minimum set in threshold setting), then the glint detected is really most likely part of the pupil, so the glint threshold is increased and the system attempts to find a new glint. Next, the system checks that the glint center is not above the pupil center. During operation of a computer, the glint center will never be above the pupil center, so if it is, then an erroneous glint was found, so the glint error handling routine occurs, a global search for the glint is initiated, and the threshold is reduced. Lastly, the system checks the counter for the number of pixels in the pupil greater than the glint's threshold. If this is too high, then the glint is erroneous, so the same steps occur as for the glint center being above the pupil center.
If all features are found correctly, then the glint threshold is readjusted to be the current glint's intensity value. The x-y displacement is now calculated, and these values are adjusted by the glint-pupil offset found in threshold setting.
The system then performs a smoothing algorithm on the displacement. This has the advantage of removing some of the noise inherent in the calculation of the glint and pupil bounding rectangles. The software will compute a final displacement that is the average of the previous n displacements found, where n is the number of displacements to average over as set by the user in the HQ control interface. The higher the n, the less jitter that occurs in the data as the user fixates, but the slower the time in which the system can update the cursor position to reflect a change in the user's dwell area. This has the primary advantage of delivering better control over the cursor.
After the displacement has been smoothed, the system uses the regression equations to determine the user's point of regard as described in FIG. 8. The x-y displacement is converted to polar coordinates. The system first uses an angle regression based on all calibration points to get an approximation for the angle at which the user is looking. The system then uses this angle to determine which radial regression equations it should use to calculate the radius of the user's gaze on the screen from the monitor origin. The system finds the regression equations for the angles just above and below the angle previously determined. It then calculates the radius it thinks the user is looking at for both these equations. The system then interpolates between the two results, placing a higher weight on whichever equation is at the closest angle to the angle previously determined. The system then uses this radius to better determine the angle at which the user is looking by using the same method described above except the system is checking angular regressions now based on a fixed radius. Interpolation again occurs, and the system repeats the above sequence of determining the radius and angle to better approximate the user's point of regard. The system then converts the radius and angle of where the user is looking to Cartesian coordinates on the monitor.
These Cartesian coordinates then undergo a smoothing routine. This routine throws out any data that would place the new gaze position at a certain distance away from the previously determined gaze position. Essentially, the new gaze position would be considered as an erroneous point because it would be impossible for the user's gaze position to change by that much that fast. There is also a minimum movement check. If the new gaze position is only a small distance from the previous gaze position, then the mouse cursor's position will not be updated. This removes jitter in the cursor caused by noise in determining the glint-pupil displacement.
Using this smoothing and the above algorithms, the user now has effective control over the placement of the cursor in the GUI. The user can look at a certain position on the screen and have the cursor reach that position to an accuracy of about 5 mm or better for the typical user.
The system uses dwell time to provide the user with access to various mouse clicking actions. When the user fixates (focuses at a point on the screen and keeps the mouse cursor stationary) for a predetermined amount of time on the computer display, a red rectangle appears, centered on the point of fixation. This rectangle begins collapsing in on itself. The rectangle serves as a visual cue to the user that if they keep fixating at that point, they will be asked to perform a mouse control action at the point. When the rectangle reaches the end of its collapse by basically becoming a point at the user's position of fixation, then a window pops up near the center of the screen. The region around which the user was fixating appears magnified in this window as shown in FIG. 9. At the bottom of the window is an eye-gaze controlled button that closes the window if the user fixates on the button for a predefined length of time. The user then fixates in this window on where they actually wished to have performed a mouse action. A red rectangle again signals the user that continued fixation will cause another event to occur. If the rectangle reaches the end of its collapse, then a menu of six additional buttons appears in the window as shown in FIG. 10. Fixating on a button for a predetermined amount of time causes that button's action to be performed. The available actions include left mouse button drag, left mouse button single click, left mouse button double click, right mouse button drag, right mouse button single click, and right mouse button double click. A collapsing red rectangle again shows which button will be clicked if fixation continues. If the user is in the middle of a drag then all buttons stop the drag and then perform their set action, with the exception of the drag start buttons which only cease the dragging. The fixation times for all events described are user customizable to suit the different user's skill levels in operating the system as is the size of the zoom window and the factor by which it magnifies the area the user initially fixates at. The window size and the zoom level determine the amount of area on the screen that appears zoomed each time the zoom window opens.
The zooming feature is a recent addition to the invention and allows any accuracy concerns with the system to become moot. Even with an accuracy level of 2 to 3 millimeters, the user cannot consistently fixate on certain GUI control objects such as scroll bars and toolbar buttons when operating at a high resolution. The zooming feature provides a reliable means for activating a GUI control and accomplishing various tasks in the GUI environment. With this feature, individuals can effectively tap into the GUI interface using solely the eye.
The system is not limited to using the zooming feature and a menu for implementing click actions. If the user knows they are going to be only using certain actions, like left clicking, then the menu may be bypassed and different dwell times can activate the different clicking actions in the zoom window. Also, if the user is working in applications with large controls, as is the case in a children's game, then the zooming feature may be turned off, and dwell time activation of the different clicking actions can occur on the display.
When dwell time clicking is used, the collapsing rectangle goes through different phases to signal to the user the different actions that can be performed. The first stage is the collapsing red rectangle. If the dragging is enabled, then the rectangle will turn to a blue circle and continue its collapse. If the user looks away while the blue circle is shown, then they will start dragging what they were fixating on. If the user prolongs their fixation and allows the circle to reach the end of its collapse, then the system clicks once on where they were looking. Lastly, a green rectangle will appear and after a predetermined fixation time, the system will double click on where the user is fixating.
The colors used throughout this description are arbitrary and serve no purpose save delineating the different modes to the user.
The system may use an alternative method for clicking instead of the dwell time. If the operator still has some use of their hands, then pressing a single key on the keyboard will make the system click where they are looking. This has the advantage of greatly speeding up the clicking process. The key on the keyboard works like a mouse key, pressing the key twice quickly performs a double click.
This variety in the clicking mechanisms allows the eye-gaze system to be optimized for each user--the user interfaces with the GUI in the most efficient method that suits their particular needs.
An eye-gaze direction detector and method has been disclosed that provides a means by which a human operator, including a paraplegic, can interface directly with a computer by eye gaze alone in an efficient, reliable, and flexible manner that is both very fast and has a high degree of screen resolution. All material set forth should be considered illustrative and not as the limiting sense. The invention is to be interpreted within the spirit and scope of the following claims.
This appendix contains an in-depth description of the functionality of the HQ control interface software and provides an introductory user's guide.
This appendix is considered part of the specification of the invention. ##SPC1##
1. A system for eve-gaze direction detection said system comprising
- an infrared light source
- means to protect said light source upon the eye of a user of a system employing the eye-gaze direction system,
- a video camera means adapted to view the position of the glint-pupil relationships of the users eye,
- a computing means adapted to assist in computing the eye position of the user so as to calibrate the system to the users eye movements whereby the system may be used to control the interaction between a computer/monitor system and the user,
- wherein said video camera means has a frame grabber card associated therewith.
2. A system as in claim 1 wherein the card acquires the image and digitizes it into a sequence of pixels and maps the pixels into the system memory.
3. A system as in claim 2 wherein the system checks to see, once the image is acquired, if any glares are present in a predetermined sequence.
4. A system for eye-gaze direction detection, said system comprising
- an infrared light source
- means to project said light source upon the eye of a user of a system employ employing the eye-gaze direction system,
- a video camera means adapted to view the position of the glint-pupil relationships of a users eye,
- a computing means adapted to assist in computing the eye position of the user so as to calibrate the system to the users eye movements whereby the system may be used to control the interaction between a computer/monitor system and the user,
- and including mirrors positioned to bounce the infra red light source onto the users eye and to bounce the reflection back to the video means.
5. The method of controlling a computer system which has a graphic display by eye movement alone, said method comprising
- fixating one's gaze upon a specific area of said display to produce a window which is a magnification of that specific area and which contains specific information,
- again fixating one's gaze upon the window to produce a menu of specific actions and
- fixating one's gaze upon that specific action that one desires the computer to initiate whereby one had interacted with the computer with only eye movement.
6. The method of claim 5 including the steps of fixating ones gaze initially to an area of the monitor to turn the computer system on.
7. The method of claim 6 which includes the step of the user first setting thresholds and to calibrate the system by making initial eye movements to allow the system to calibrate the equations used to map angles and radii between the pupil and glint centers of the users eye to radii and angles on the screen.
8. The method of claim 5 including the step of
- physically activating a computer key to allow the user to drag items on the screen across the screen by fixating on that item and then dragging it while the computer key is depressed.
9. The method of claim 5 including the step of
- releasing said key to allow said item to stay where the users gaze has left it.
10. A method of calibrating an eye-gaze computer control system which is directed solely by user eye gaze fixation upon the computer display as an interactive user control means, said method comprising
- directing an infra-red bean upon the users eye to detect the glint-pupil relationships,
- viewing the glint-pupil relationships and establishing a grid therefrom on which to measure all future user eye movements in relationship to the display,
- and establishing, through angles and radii, lines of calibration points from which to calibrate all future user eye movements.
11. A method as in claim 10 wherein a regression equation is used during system operation to determine where the user is looking.
12. A method as in claim 10 including the initial step of
- causing the user to initiate said eye only control.
U.S. Patent Documents
|5861940||January 19, 1999||Robinson et al.|
International Classification: A61B 314;