Method and apparatus for adaptive model network on image recognition
Techniques for forming, designing, generating or building up recognizers using recursive qualifications are described, where the recognizers can be used in any devices or systems with recognition capabilities, such as robotic vision systems, motion detections, artificial intelligence and driverless vehicles. Through respective and recursive observations on a set of actual data, recognizers are generated to reduce the inconsistencies among the observations to produce better recognition accuracies.
This application claims the benefits of U.S. Provisional Application No. 62/133,356, filed Mar. 14, 2015, and entitled “Adaptive Model Network on Image Recognition”, which is hereby incorporated by reference for all purposes.
BACKGROUND OF THE INVENTION1. Field of the Invention
The present invention is related to the area of pattern recognition and more particularly, related to processes, systems, architectures and software products for building up recognizers using recursive qualifications, where the recognizers can be used in any devices or systems with vision capabilities, such as robotic vision systems, motion detections, artificial intelligence and driverless vehicles.
2. Description of Related Art
Pattern recognition is a branch of machine learning that focuses on the recognition of patterns and regularities in data, although it is in some cases considered to be nearly synonymous with machine learning. Pattern recognition systems are in many cases trained from labeled “training” data (supervised learning), but when no labeled data are available other algorithms can be used to discover previously unknown patterns (unsupervised learning).
In machine learning, pattern recognition is the assignment of a label to a given input value. An example of pattern recognition is classification, which attempts to assign each input value to one of a given set of classes (e.g., determine whether an object in an image is a human being or a structure). However, pattern recognition is a more general problem that encompasses other types of output as well. Other examples are regression, which assigns a real-valued output to each input; sequence labeling, which assigns a class to each member of a sequence of values (e.g., part of speech tagging, which assigns a part of speech to each word in an input sentence); and parsing, which assigns a parse tree to an input sentence, describing the syntactic structure of the sentence.
Pattern recognition has been encountering lots of challenges over the past several decades. The major challenges are: how to find a good feature set that represents the data to provide good discriminative power; how to acquire a sufficient amount of oracle data (i.e., data with labels that indicate true nature of the data) for a pattern recognition system to learn from; and how to make the oracle data representative to data in real applications so that the learnings on oracle data can be applied to real applications.
Some of the major problems with the latest pattern recognition software or systems are that they rely on the assumption that a training dataset is representative to a testing dataset, which in practice is often not true; when new data deviates from a training set, an algorithm, if any adaptation features are implemented, relies heavily on domain specific knowledge. For example, font-adaptive optical character recognition will have to provision different fonts' classifier so that all of them are applied simultaneously to the target, with the best one picked. This requires precise tuning of the workflow, which is specific to font adaptation, and in general is difficult to be re-used for other domain's adaptation algorithm design.
Thus there is a great need for recognizers that can be formed, generated and built up quickly with high accuracy to reduce the inconsistencies among different models (observations) to produce better recognition accuracies.
SUMMARY OF INVENTIONThis section is for the purpose of summarizing some aspects of the present invention and to briefly introduce some preferred embodiments. Simplifications or omissions may be made to avoid obscuring the purpose of the section. Such simplifications or omissions are not intended to limit the scope of the present invention.
In general, the present invention is related to processes, systems, architectures and software products for forming, designing, generating or building up recognizers using recursive qualifications, where the recognizers can be used in any devices or systems with recognition capabilities, such as robotic vision systems, motion detections, artificial intelligence and driverless vehicles. According to one aspect of the present invention, an image pattern recognition process, also referred to adaptive model network (AMN) herein, is designed to generate a set of image recognizer models or recognizers based on a set of input data (e.g., image data), select and combine a confident subset of the recognizers to interpret the image data, and output a proposed label therefor.
According to another aspect of the present invention, AMN is designed to combine existing image recognition techniques in a model network, and adapt the model network to reduce the inconsistencies among different models (observations) to produce better recognition accuracies. One of the major differences from a standard pattern recognition process is that AMN does not require a training set to be representative of a testing set (actual data set); rather it adapts itself to testing data by leveraging the intrinsic prior knowledge that a valid data set should get consistent interpretations over valid but different observations.
According to still aspect of the present invention, depending on a defined resolution, each of the recognizers in the AMN can be subsequently dividable in a sense that a recognizer can be represented in a tree structure with one node leading to multiple branches, each of the branches ends with a node. In other words, a recognizer may include a plurality of sub-recognizers, each of the sub-recognizers may include a plurality of next sub-recognizers, and each next sub-recognizers may include a plurality of further dividable sub-recognizers till permitted by the defined resolution.
According to yet aspect of the present invention, the AMN is designed to update the recognizers by recursively testing the recognizers, their respective sub-recognizers, next sub-recognizers and/or further dividable sub-recognizers. As a result, the AMN reduces inconsistencies on each recursion level, and outputs a result when a top level has a set of observations producing consistent interpretations to a target data set.
Various embodiments may be implemented as a method, a software product, a service and a part of a system. According to one embodiment, the present invention is a method for generating recognizers for pattern recognition, the method comprises: receiving in a computing device a set of initial recognizers, wherein the recognizers are generated from a set of training data not required to be representative of a set of actual data. Each of the recognizers is dividable to form a set of sub-recognizers and each of the sub-recognizers is further dividable to form a set of next sub-recognizers till a predefined resolution on the recognizers. The method further comprises: performing observations on the set of input data received in the computing device in accordance with the recognizers; and generating recursively and respectively subsequent observations with reduction of inconsistencies on each recursion level, when one of the observations is determined uncertain, wherein a recursion stops when a top level has a set of observations producing consistent interpretations on the set of input data. A recursion stops when a top level has a set of observations producing consistent interpretations on the set of input data. Meanwhile the recognizers are recursively and respectively updated by discarding one or more of the recognizers, the sub-recognizers or the further dividable sub-recognizers, and adding new recognizers, sub-recognizers or further dividable sub-recognizers generated based on the input data.
According to another embodiment, the present invention is a computing device for generating recognizers for pattern recognition, the computing device comprises: an input receiving a set of actual data, where the actual data is captured by a source (e.g., a camera), a memory for storing code, a processor coupled to the memory and executing the code to perform operations of: loading a set of initial recognizers in the memory, wherein the recognizers are generated from a set of training data not required to be representative of the set of actual data, each of the recognizers is dividable to form a set of sub-recognizers and each of the sub-recognizers is further dividable to form a set of next sub-recognizers till a predefined resolution on the recognizers. The operations further include generating observations on the set of input data received in the computing device in accordance with the recognizers; and generating recursively and respectively subsequent observations on the set of input data with reduction of inconsistencies on each recursion level, when one of the observations is uncertain, wherein a recursion stops when a top level has a set of observations producing consistent interpretations on the set of input data.
One of the objectives in the present invention is to provide a mechanism that adapts itself to testing data by leveraging the intrinsic prior knowledge that a valid data set should get consistent interpretations over valid but different observations.
Other objects, features, and advantages of the present invention will become apparent upon examining the following detailed description of an embodiment thereof, taken in conjunction with the attached drawings.
These and other features, aspects, and advantages of the present invention will become better understood with regard to the following description, appended claims, and accompanying drawings where:
The present invention is related to processes, systems, architectures and software products for forming, designing, generating or building up recognizers using recursive qualifications. In one perspective, a process, referred herein as adaptive model network (AMN), is designed to update the recognizers by recursively testing the recognizers, their respective sub-recognizers, next sub-recognizers and/or further dividable sub-recognizers, up to a defined resolution. As a result, the AMN reduces inconsistencies on each recursion level, and outputs a result when a top level has a set of observations producing consistent interpretations on a target data set.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will become obvious to those skilled in the art that the present invention may be practiced without these specific details. The description and representation herein are the common means used by those experienced or skilled in the art to most effectively convey the substance of their work to others skilled in the art. In other instances, well-known methods, procedures, components, and circuitry have not been described in detail to avoid unnecessarily obscuring aspects of the present invention.
Reference herein to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in “one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Further, the order of blocks in process flowcharts or diagrams representing one or more embodiments of the invention do not inherently indicate any particular order nor imply any limitations in the invention.
Embodiments of the present invention are discussed herein with reference to
It is assumed that an object 106 is in a scene captured by the camera 104. The object 106 appears in an image 108. One of the objectives in the recognition system is to determine whether the object 106 is a structure or a human being (possibly crossing a street). To determine what the object is, the recognition system shall be equipped with a set of recognizers that not only interprets the image data correctly but also expands the already generated recognizers with one or more recognizers based on the provided image when there is a need. It is evident to those skilled in the art the recognizers must be robust but also accurate to interpret the image correctly.
The input interface 128 includes one or more input mechanisms. A user may use an input mechanism to interact with the device 120 by entering a command to the microcontroller 122. Examples of the input mechanisms include a microphone or mic to receive an audio command and a keyboard (e.g., a displayed soft keyboard) to receive a click or texture command. Another example of an input mechanism is a camera provided to generate images, where the image data from the images are used for subsequent processing with other module(s) or application(s) 127. In the context of the present invention, some of the image data are subsequently provided to the recognition system for interpretation.
The driver 130, coupled to the microcontroller 122, is provided to take instructions therefrom to drive a display screen 132. In one embodiment, the driver 130 is caused to drive the display screen 132 to display an image or images or play back a video. The network interface 134 is provided to allow the device 120 to communicate with other devices via a designated medium (e.g., a data network).
One of the objects, advantages and benefits in the present invention is to combine existing image recognition techniques in a model network, and adapt the model network to reduce the inconsistencies among different models (observations) to produce better recognition accuracies. According to one embodiment, the recognizers are generated or updated in a recursive manner with reduction of inconsistencies on each recursion level, the recursion stops when a top level has a set of observations producing consistent interpretations to the target.
As will be further described below, whenever one of the observations in state O is uncertain, state O goes to Sk. State Sk is caused to go on state Sk_m when one of the observations in state Sk encounters some uncertainty (e.g., comparing with a threshold). At each of the states, Sk or Sk_m, the recognizers are verified or updated by removing one or/and adding a new one. State, Sk or Sk_m then returns back to a previous state, as such the state diagram 200 forms a recursive loop to fine tune or update and generate the recognizers for recognition on a given set of data.
As shown in
These n different observations 302 produce n results i1, i2, . . . , in. Mathematically, they are often expressed in vectors. Ignoring the exact representation of the n results, these n results are coupled to a statistical operation (M) 402 as shown in
As shown in
At the same time, one or more new recognizers can be added into the library of the recognizers when new features or characteristics of the actual data are not recognized by any of the existing recognizers. These new recognizers may be used for observation At At a result, state Sk returns to state O, labeled as A2 in
According to one embodiment, a data transformation may be used at any recursion level to facilitate the observation with respect to one or more recognizers.
Referring to
In state Sk, the overall disagreements among its observations are also computed: dkc. If it is higher than a pre-set threshold, then its own reduction of inconsistency is trigged and drives the adaptation for Sk. Since the Sk is a recursive process of O, it can expand one of its own observation (m) into a set of observations, reaching a state Sk_m.
If the dkc in Sk cannot be reduced anymore by any means, then return to the state O. The return to the state O has two possible paths, based on the condition: if dkc is higher than its threshold, A1 operation will be conducted, which will remove the current Sk; otherwise Sk is kept. New observations are performed after returning from Sk.
It should be noted that the observations on each level may be applied in parallel on the same target with outlier selected. If an outlier deviates enough, more observations can be trigged so that more evidences can be obtained on whether there is a need to adopt the outlier, or ignore it.
The state transition graph shown in
At stage O, where multiple observations {Ok, k in 1. . . n} are recognizing the same target t, and output a set of interpretations {ik, k in 1 . . . n}. Then, the interpretation set {ik, k in 1 . . . n} is sent to the invariant checker I to obtain per observation disagreements {dk, k in 1 . . . n} and overall disagreement—inconsistencies among {ik, k in 1 . . . n}: dc. I's output are fed into the selector W to select one interpretation from one of the set of interpretations {ik, k in 1 . . . n} as output iout. One implementation of I is to have a merging module M to get the average interpretation of {ik, k in 1 . . .n }: c; then a set of disagreement detectors {Dk, k in 1 . . . n} compare corresponding ik to c to get their distance Dk; the set of per observation disagreement {dk, k in 1 . . . n} is then fed into a module to do average P to get the overall disagreement dc.
If dc is beyond a threshold that is pre-set or adjusted on-the-fly, the algorithm will start the process of reduction of inconsistency to drive down dc. It will decide Ok is the next priority to dive into, so it expand the single observation into a set of observations with own invariant checker Ik and selector Wk, the same structure as it parent O, and correspondingly, the network enters into Sk state.
In Sk, it adapts itself to drive down its own overall disagreement dkc (should be in the Ok module but due to space not shown on graph). If the disagreement is irreducible, it will then traverse back to state O by two alternative actions A1 and A2. Both A1 and A2 add a new observation On+1 to the set of observations. The difference is that A1 will remove Ok (or Ok′) from the set of observations. The choice of A1 or A2 is determined by whether dkc is beyond a threshold that is pre-set or adjusted on-the-fly (yes for A1, otherwise for A2).
Now it is assumed that the transition traverses back to the state O. If the updated overall disagreement dc′ is still higher than threshold, the transition is caused to continue to conduct reduction of inconsistency. One possible action is to dive into an observation Ol other than Ok to reach the state Sl (not shown on the graph), or conduct the action A2 to expand the observation set (stay in state O) in hope of reducing the overall disagreement.
On the other hand, Sk is a recursion of O, which means it can also pick one of its observations (say we pick Ok_m) and expand it, reaching the state Sk m. Note that we intentionally have Ok_m have its own internal structures: the target t will first be processed by a pre-processing observation Ok_mt to produce a transformed target tm. Then another observation Ok (was used in top level state O, or any observation) takes the intermediate target tm and classify it to get an interpretation. Then the way to expand Ok_m is very similar to what we did in state Sk, except that the pre-processing observation Ok_mt remains the same. As described above, observations can be chained together into a workflow to work together.
At any state (O, Sk or Sk_m), the process in the state diagram of
Back to state O, it is supposed after all the operations, the updated network results in an overall disagreement dc′ that is below its pre-set threshold, the top level interpretations reaches the status of consistent interpretations, therefore the output i′out can be accepted as the final output.
From a high level perspective, the transition starts from an initial set of observations (referred as the parent state), and checks for the consistency of their interpretations. If they are consistent, the result is obtained and the transition exits; otherwise, every individual observation is checked. When needed, the transition is expanded into a set of child observations (with sub-recognizers) to check consistency and simultaneously adapt the set to maximize the consistency, just like it were the parent state. If the set of child observations cannot get consistent, remove the parent observation from the parent state. After a parent observation's recursion completes, recruit new parent observations to the parent state, and get to the consistency check where the new cycle starts.
In AMN, a target is “a matter” or “stuff” that has independent feature representations on different observations (could be same feature extraction on different derivative images or data), with each observation having its own feature space, and being able to produce an independent interpretation of the target, the interpretations from different observations of a target should form a consensus to be qualified as an AMN target.
An AMN target is different from a random noise in that it has invariant properties carrying through different (valid) observations. For example, one can precisely recognize a character image no matter how challenge the task is (even in CAPTCHA) because it has invariant properties, which can be reliably captured by human visual perceptions that presumably employs a flexible set of “observations.” However, one cannot precisely define, identify or recognize an exact shape of a cloud because it changes from this moment to the next, with no stable shapes. Therefore a character is an AMN target, but an exact shape of a cloud is not.
To facilitate better understanding of the present invention, it deems necessary to provide a set of Questions & Answers. Without any inherent limitations, the answers are provided according to only one embodiment of the present invention.
Question 1: whether an exact shape of a cloud can be defined as an AMN target? (Exact shape with high resolution, not something vague such as a “mushroom-like shape”). Answer: no, cloud's exact shape change every second, you cannot find a stable exact shape over time that can be taken as an “invariant” property, not to mention that every cloud image has its own exact shape. As a result, a cloud exact shape fails to meet AMN target requirement that there are different observations to produce consistent interpretations for an invariant property.
Question 2: whether a cloud image (with either “this-is-a-cloud” or “this-is-not-a-cloud” label) can be defined as a AMN target? Answer: yes. A cloud image—if it is truly a cloud—has consistent properties (for example, color is white) over different observations, so we can get consistent classifications that this is a cloud. Therefore, it can be defined as AMN target.
Question 3: whether a sample in a pattern recognition task can be defined as an AMN target? Answer: yes. In all pattern recognition tasks, a sample is associated with an oracle label by definition. Therefore, there should exist some ideal (but different) classifiers that can output consistently its true label; therefore, it can be defined as AMN target. Most real world physical objects (doors, roads, cars, etc.) can fall into AMN target due to the fact that different perspectives to perceive them come to the same interpretation. If we imagine that each way of perception can be simulated by a software (observation), then the physical object is an AMN target.
AMN is a recognition process that aims for finding the invariant interpretations over a set of observations on an AMN target. It not only finds interpretations for a target as traditional pattern recognition does, it also makes sure those interpretations across different observations are consistent. If the algorithm cannot identify an AMN target in the input, it will adapt the set of observations until it can find one.
To recognize an AMN target, the prior knowledge is utilized that an AMN target should manifest its identities consistently over a sequence of valid observations, which is similar to WBR that all characters in a book abides to image homogeneity constraints. Therefore if current set of observations does not satisfy this prior knowledge, the set of observations is adjusted until this prior knowledge is satisfied.
The invention is preferably implemented in software, but can also be implemented in hardware or a combination of hardware and software. The invention can also be embodied as computer readable code on a computer readable medium. The computer readable medium is any data storage device that can store data which can thereafter be read by a computer system. Examples of the computer readable medium include read-only memory, random-access memory, CD-ROMs, DVDs, magnetic tape, optical data storage devices, and carrier waves. The computer readable medium can also be distributed over network-coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
The processes, sequences or steps and features discussed above are related to each other and each is believed independently novel in the art. The disclosed processes, sequences or steps and features may be performed alone or in any combination to provide a novel and unobvious system or a portion of a system. It should be understood that the processes, sequences or steps and features in combination yield an equally independently novel combination as well, even if combined in their broadest sense, i.e., with less than the specific manner in which each of the processes, sequences or steps and features has been reduced to practice.
The forgoing description of embodiments is illustrative of various aspects/embodiments of the present invention. Various modifications to the present invention can be made to the preferred embodiments by those skilled in the art without departing from the true spirit and scope of the invention as defined by the appended claims. Accordingly, the scope of the present invention is defined by the appended claims rather than the foregoing description of embodiments.
Claims
1. A method for generating recognizers for pattern recognition, the method comprising:
- receiving in a computing device a set of initial recognizers, wherein the recognizers are generated from a set of training data not required to be representative of a set of actual data, each of the recognizers is dividable as a set of sub-recognizers and each of the sub-recognizers is further dividable as a set of next sub-recognizers till a predefined resolution on the recognizers;
- performing observations on the set of input data received in the computing device in accordance with the recognizers; and
- performing recursively and respectively subsequent observations with reduction of inconsistencies on each recursion level, when one of the observations is determined uncertain, wherein a recursion stops when a top level has a set of observations producing consistent interpretations on the set of input data.
2. The method as recited in claim 1, wherein the recognizers are recursively and respectively updated by discarding one or more of the recognizers, the sub-recognizers or the next sub-recognizers, and adding new recognizers, sub-recognizers or next sub-recognizers generated based on the input data.
3. The method as recited in claim 2, wherein the input data is obtained from actual data captured by a source, wherein the recognizers are used in the observation to determine a pattern from the actual data.
4. The method as recited in claim 3, wherein the source is an imaging capturing device.
5. The method as recited in claim 1, wherein the recognizers are generated to reduce the inconsistencies among the observations to produce better recognition accuracies.
6. The method as recited in claim 5, wherein said generating recursively and respectively subsequent observations with reduction of inconsistencies on each recursion level comprises: transforming the input data into a transformed data set to carry out an observation.
7. The method as recited in claim 4, further comprising:
- determining a statistic measurement among the results from the observations;
- performing a logical operation on the results from the observations with respect to the statistic measurement to produce respective disagreements from the observations; and
- determining an overall disagreement for comparisons with the respective disagreements.
8. The method as recited in claim 7, wherein the statistic measurement is to determine a median among the results from the observations.
9. The method as recited in claim 8, wherein the logical operation is based on an XOR operator.
10. A computing device for generating recognizers for pattern recognition, the computing device comprising:
- an input receiving a set of actual data;
- a memory for storing code;
- a processor, coupled to the memory, executing the code to cause the computing device to perform operations of: loading a set of recognizers in the memory, wherein the recognizers are generated from a set of training data not required to be representative of the actual data, each of the recognizers representing one or more features that are supposed to describe the actual data, wherein each of the recognizers is represented in a tree structure with one node leading to multiple branches, each of the branches ends with a node; performing observations on the set of input data received in the computing device in accordance with the recognizers to produce results from the observations; when one of the observations is uncertain: performing recursively and respectively subsequent observations with reduction of inconsistencies on each recursion level, wherein a recursion stops when a top level has a set of observations producing consistent interpretations on the set of input data.
11. The computing device as recited in claim 10, wherein the recognizers are recursively and respectively updated by discarding one or more of the recognizers, sub-recognizers or next sub-recognizers, and adding new recognizers, sub-recognizers or next sub-recognizers generated based on the input data.
12. The computing device as recited in claim 11, wherein the input data is obtained from actual data captured by a source, wherein the recognizers are used in the observation to determine a pattern from the actual data.
13. The computing device as recited in claim 12, wherein the source is an imaging capturing device.
14. The computing device as recited in claim 10, wherein the recognizers are generated to reduce the inconsistencies among the observations to produce better recognition accuracies.
15. The computing device as recited in claim 14, wherein said generating recursively and respectively subsequent observations with reduction of inconsistencies on each recursion level comprises: transforming the input data into a transformed data set to carry out an observation.
16. The computing device as recited in claim 13, further comprising:
- determining a statistic measurement among the results from the observations;
- performing a logical operation on the results from the observations with respect to the statistic measurement to produce respective disagreements from the observations; and
- determining an overall disagreement for comparisons with the respective disagreements.
17. The computing device as recited in claim 16, wherein the statistic measurement is to determine a median among the results from the observations.
18. The computing device as recited in claim 17, wherein the logical operation is based on an XOR operator.
Type: Application
Filed: Mar 14, 2016
Publication Date: Sep 15, 2016
Inventor: Pingping Xiu (Santa Clara, CA)
Application Number: 15/069,905