Hiroaki Sakoe has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
Abstract: For use in recognizing an input pattern consisting off feature vectors positioned at a first, . . . , an i-th, . . . , and an I-th pattern time instant along a pattern time axis, a connected word recognition system comprises first through N-th neural networks B(1) to B(N) which are assigned to reference words identified by a first, . . . , an n-th, . . . , and an N-th word identifier and are arranged along a signal time axis divisible into a first . . . , a j-th, . . . , and a J-th signal time instant. The n-th word identifier n corresponds to consecutive ones of the pattern time instants by a first function n(i). The time axes are related to each other by a second function j(i).
Abstract: A pattern matching system including a calculating arrangement for calculating a current value of cumulative distances between an input pattern and a reference pattern of a current pattern number at a current input pattern time instant and a current reference pattern time instant by using a dynamic programming algorithm which decreases workload because the current value is selected as a selected distance only when the current value is not greater than a threshold value which is determined at the current input pattern time instant. Work areas or memories are used as a result memory device accessed by a combination of the current input and reference pattern time instants for memorizing or storing the selected distance, the current pattern number, and the current reference pattern time instant, which are for later use in calculating another of the cumulative distances. Preferably, the threshold value should monotonously increase with time. More preferably, two of the work areas are alternatively used.
Abstract: In a neural network, input neuron units of an input layer are grouped into first through J-th input layer frames, where J represents a predetermined natural number. Intermediate neuron units of an intermediate layer are grouped into first through J-th intermediate layer frames. An output layer comprises an output neuron unit. Each intermediate neuron unit of a j-th intermediate layer frame is connected to the input neuron units of j'-th input layer frames, where j is variable between 1 and j and j' represents at least two consecutive integers, one of which is equal to j and at least one other of which is less than j. Each output neuron unit is connected to the intermediate neuron units of the intermediate layer. For recognition of an input pattern represented by a time sequence of feature vectors, each consisting of K vector components, where K represents a predetermined positive integer, each input layer frame consists of K input neuron units.
Abstract: A pattern matching apparatus for comparing an input pattern of features with a reference pattern. Address information is stored in the reference pattern along with features, so that branching in a work memory which stores cumulative distances between the input and reference patterns may be effected. In this manner, memory requirements for storing reference patterns are reduced, and the number of required distance calculations also is reduced.
Abstract: A voice verification system in which multiple generic reference patterns are obtained by speaking the password in a number of different ways and in which a speaker specific reference pattern is generated by the speaker undergoing registration. A subset of the generic reference patterns are selected having the greatest similarity to the registered speaker's pattern. During verification, the speaker's identity is verified if the dissimilarity between the input pattern and the registered speakers reference pattern is both less than any dissimilarity between the input pattern and the selected generic reference patterns and also less than a threshold value.
Abstract: For improved pattern recognition, the reference pattern feature sequence contains control parameters (operators) which provide branching and/or omission of those portions of words which may be non-standard due to speaker or dialect deformations. A pattern matching apparatus comprising a stack controller with two PUSH/POP stacks for addressing reference patterns to be used for correlations against the detected pattern information. The reference patterns may contain control characters indicating alternative reference pattern segments or segments which may be omitted.
Abstract: A continuous speech recognition system determines the similarity between input patterns and reference patterns over time such that similarities between previously spoken speech patterns and reference patterns are determined while speech continues to be spoken. Degrees of dissimilarity at arbitrary reference pattern word times are determined asymptotically and are recorded. The minimum degree of dissimilarity is determined and the corresponding word is categorized. Recognition decisions are ultimately made in reverse chronological order.
Abstract: There is provided a voice recognition system comprising a standard pattern memory in which a voice pattern of a predetermined word is stored as a positive reference pattern and also voice patterns of words similar to but different from the first-mentioned word are stored as negative reference patterns, a pattern comparator for calculating dissimilarities of an input voice pattern with respect to the positive reference pattern and negative reference patterns, and a discriminator for providing a coincidence confirmation output signal when the dissimilarity with respect to the positive reference pattern is less than a predetermined threshold value and less than the dissimilarities with respect to the negative reference patterns while otherwise rejecting the result of recognition.
Abstract: A connected word recognition system operable according to a DP algorithm and in compliance with a regular grammar, is put into operation in synchronism with successive specification of feature vectors of an input pattern. In an m-th period in which an m-th feature vector is specified, similarity measures are calculated (58, 59) between reference patterns representative of reference words and those fragmentary patterns of the input pattern, which start at several previous periods and end at the m-th period, for start and end states of the reference words. In the m-th period, an extremum of the similarity measures is found (66, 69, 86), together with a particular word and a particular pair of start and end states thereof, and stored (61-63). Moreover, a particular start period is selected (67, 86) and stored (64).
Abstract: A system for recognizing a continuously spoken word sequence with reference to preselected reference words with the problem of coarticulation removed, comprises a pattern memory for memorizing demi-word pair reference patterns consisting of a former and a latter reference pattern segment for each reference word and a word pair reference pattern segment for each permutation with repetition of two words selected from the preselected reference words. A recognition unit is operable as a finite-state automaton on concatenating the demi-word pair reference patterns so that no contradiction occurs at each interface of the reference patterns in every concatenation. It is possible to use the automaton in restricting the number of reference patterns in each concatenation either to an odd or an even positive integer.
Abstract: A pattern matching device generally comprises a first circuit (36) for calculating an elementary similarity measure between two feature vectors, one and the other selected from two feature vector sequences representative of two patterns, respectively, and a second circuit (37) for iteratively calculating a recurrence formula which defines a recurrence value by a sum of such an elementary similarity measure and an extremum of a prescribed number of previously calculated recurrence values. The recurrence formula eventually gives an overall similarity measure between the two patterns. The elementary similarity measure is now calculated by calculating a primitive similarity measure by a conventional circuit (15) and subtracting a predetermined value therefrom by a compensation circuit (31). Preferably, the second circuit (37) comprises circuitry (41, 42) for preventing the sum from overflowing outwardly of a preselected range.
Abstract: This pattern matching system features the calculation of a weighting factor based on the variable interval between feature vector samples. On carrying out matching of two information compressed patterns, a weighted similarity measure calculator (64) calculates a weighted similarity measure by multiplying an intervector similarity measure between one each feature vector of the respective patterns by a weighting factor calculated by the use of a variable interval between each feature vector and a next previous one. A recurrence formula is calculated by the use of such weighted similarity measures instead of the intervector similarity measures. A predetermined value .delta. may be used in reducing the number of signal bits used for the recurrence formula. Preferably, a sum for the recurrence formula is restricted by two preselected values. Most preferably, an additional similarity measure is used for the recurrence formula.
Abstract: Speaker recognition is decided by a similarity measure (D) calculated from comparing selected feature vectors among an input speech signal sequence of feature vectors (A) and a selected sequence (B) of reference vectors selected from a plurality of pre-stored reference sequences. Prior to comparison of the input and reference vector sequences, the two sequences are time normalized to align corresponding feature vectors. A significant sound specifying signal (V) including a time sequence of elementary signals is generated in synchronism with one of the input and reference sequences and indicates which feature vectors in that one of the input and reference sequences are considered to represent significant sound. The similarity measure (D) is then calculated in accordance with the comparison of those feature vectors in the one sequence which are indicated by the significant sound specifying signal as representing significant sound and the corresponding feature vectors of the other sequence.
Abstract: Operation of a continuous speech recognition system operable according to the dynamic programming technique, is controlled by a state transition diagram in compliance with which word sequences to be recognized by the system with reference to a predetermined number of reference words B.sup.n 's are pronounced. The system comprises a state transition table accessed by the reference words B.sup.n 's to successively produce particular states y's in the diagram and previous states z's for each particular state y. In cooperation with a recurrence value and an optimum parameter table, a matching unit determines a recurrence value T.sub.y (m) and an optimum parameter set ZUN.sub.y (m) according to: ##EQU1## where u and m represent an end and a start point of a fragmentary pattern A(u, m) of an input pattern A representative of a word sequence and D(u, m, n), a similarity measure between the fragmentary pattern A(u, m) and a reference word B.sup.
Abstract: A similarity calculator for calculating a set of similarity measures S(A(u, m), B.sup.c)'s according to the technique of dynamic programming comprises an input pattern buffer for successively producing input pattern feature vectors of an input pattern A to be pattern matched with reference patterns B.sup.c, an m-th input pattern feature vector a.sub.m at a time. The similarity measure set is for a set of fragmentary patterns A(u, m)'s defined by a common end point m and start points u's predetermined relative to the end point m. Scalar products (a.sub.m .multidot.b.sub.j.sup.n) are calculated between the m-th input pattern feature vector and reference pattern feature vectors b.sub.j.sup.n of an n-th reference pattern B.sup.n and stored in a scalar product buffer. Recurrence values are calculated according to a recurrence formula for each end point m, rather than for each fragmentary pattern set, and for each reference pattern B.sup.n to provide a similarity measure subset S(A(u, m), B.sup.
Abstract: A continuous speech recognition system utilizes a format memory (14) which specifies a sequence of word sets and a plurality of words, or reference patterns, which may be included in each word set. The input pattern sequence is divided into all possible partial patterns having start points p and end points q, and each of these partial patterns is compared with all reference patterns to derive elementary similarity measures. The elementary similarity measures for each combination of a partial pattern and a permitted word in a word set under the specified format are then examined to determine the optimum input pattern segmentation points and corresponding sequence of reference patterns which will yield a maximum similarity result. The maximum similarity is represented by ##EQU1## where S(p(x-1), p(x),n(x)) indicates the degree of similarity between an input partial pattern having a start point p(x-1) and an n point p(x) and a reference word unit n(x) within a word set f.sub.
Abstract: In a pattern recognition device according to pattern matching, one or more specific dimensions of vector components are memorized for each reference pattern feature vector sequence in a reference pattern memory for the reference pattern feature vector sequences. A warping function for time-normalizing input pattern feature vectors of a sequence and the vectors of each reference pattern feature vector sequence is determined so as to minimize the difference between a pattern represented by the specific vector components of the specific dimension or dimensions and another pattern represented by the vector components corresponding in the input pattern feature vector sequence to the specific reference pattern feature vector components as regards the dimensions of a space in which each input or reference pattern feature vector is defined. The input pattern feature vector sequence and each reference pattern feature vector sequence are subjected to nonlinear pattern matching with reference to the warping function.
Abstract: In a speech recognition system which time-normalizes (i.e. aligns by time-warping) each pre-stored reference pattern of feature vectors b.sub.j before comparison with the input signal pattern vectors a.sub.i, an improved time-alignment technique requiring less calculation, by selecting "standard" vectors v.sub.m(j) which approximate the reference vectors thereby simplifying the derivation of the time-warp mapping function j=j(i).
Abstract: A speech recognition system adaptable to noisy environments is disclosed. The system includes a recognition unit for recognizing input speech signals and a noise measuring unit for measuring the intensity of ambient noises. The system also includes a rejection unit responsive to a rejection standard controlled by the intensity of the measured noise for rejecting the rejection results given from the recognition unit when the rejection standard is exceeded.
Abstract: In a speech recognition system of the type including a recognition unit responsive to a voice input and a conditioning input for recognizing the voice input to produce a recognition output, a start signal is produced whenever a voice input exceeds a threshold level and a pause interval detection signal is produced whenever a voice input falls below a threshold level. An output timing signal is produced when the detection signal lasts a preselected interval of time that may be either about 250 milliseconds or about 250 milliseconds plus a delay. The recognition output from the recognition unit produced in response to the detection signal is displayed in response also to the detection signal. The result is delivered to a utilization device in response to the output timing signal.