ROBUST AUTO-ASSOCIATIVE MEMORY WITH RECURRENT NEURAL NETWORK
Computer systems and computer-implemented methods recursively train a content-addressable auto-associative memory such that: (i) the content addressable auto-associative memory system is trained to produce an output pattern for each of the input examples; and (ii) a quantity of the learned parameters for the content-addressable auto-associative memory is equal to the number of input variables times a quantity that is independent of the number of input variables. The quantity of learned parameters for the content-addressable auto-associative memory system can be varied based on the number of input examples to be learned.
The present application claims priority to U.S. provisional patent application Ser. No. 62/564,754, entitled “Aggressive Development with Cooperative Generators,” filed Sep. 28, 2017, which is incorporated herein by reference in its entirety.
BACKGROUNDIn information retrieval and in data science it is often necessary to retrieve an item from a large collection of data without knowing where the data item is stored. Often there is only an incomplete, partial description of the item being sought. The data collection may contain billions or even trillions of items, so an exhaustive search of the collection may be prohibitively expensive and time consuming. In some cases, it is not known whether an exact match of a desired item is in the collection. In some cases, it is merely necessary to find any item that matches a query to some degree.
A memory system that can retrieve a data item based on a description of its content rather than by knowing its location is called a “content addressable” memory. If, furthermore, the retrieval can be done in spite of partial or imperfect knowledge of the contents, the memory system is said to be a “robust content addressable” memory. For example, a Hopfield network may be used as a robust content addressable memory. In addition, such a Hopfield network is auto-associative. A variation on a Hopfield network, called a “Bi-directional associative memory (BAM)” is hetero-associative. However, the number of learned parameters (a measure of memory capacity) in a Hopfield network is just the number of undirected arcs, which is to say, the number of unordered pairs of input variables. That is, the number of learned parameters of a Hopfield network is roughly one-half the square of the number of input variables. For a BAM, the number of learned parameters is the product of the number of input variables times the number of output variables.
There is no flexibility in the number of learned parameters for these networks. In either case, the capacity of the memory is determined by the number of input and output variables rather than by the number of data items to be learned. In an auto-associative memory, the number of scalar values to be represented is the product of the number of data items times the number of input variables. Thus, with say, 100 input variables, a Hopfield network does not have the capacity to learn a data base with say, 100,000,000 data items. On the other hand, a high-resolution color image may have twenty-million pixels in three colors, so the number of learned parameters in a Hopfield network would be on the order of two hundred trillion for a black-and-white image or sixteen quadrillion for the full color image, which would be totally impractical.
In addition, Hopfield networks and BANIs are connected via undirected arcs and, unlike deep layered neural networks, do not have hidden layers and are not trained by the highly successful back propagation computation commonly used in deep learning.
SUMMARYThe present invention, in one general aspect, uses a machine learning system, such as a deep neural network, to implement a robust, content-addressable auto-associative memory system. The memory is content-addressable in the sense that an item in the memory system may be retrieved by a description or example rather than by knowing the address of the item in memory. The memory system is robust in that an item can be retrieved from a query that is a transformed, distorted, noisy version of the item to be retrieved. An item may also be retrieved based on an example of only a small portion of the item. The associative memory system may also be trained to be a classifier. The memory system is recurrent and auto-associative because it operates by feeding its output back to its input.
The memory system may be based on a layered deep neural network with an arbitrary number of hidden layers. The number of learned parameters may be varied based on the number of data items to be learned or other considerations. Embodiments based on such deep neural networks may be trained by well-known deep learning techniques such as stochastic gradient descent based on back propagation.
These and other benefits realizable with the present invention will be apparent from the description below.
Various embodiments of the present invention are described herein by way of example in conjunction with the following figures.
The task of an auto-associative memory is to memorize its training data. A robust auto-associative memory 100 not only memorizes its training data, it is able to retrieve an example from its training data given only a partial input or a degraded input. In
An illustrative example of a computer system 300 that may perform the computations associated with
In the illustrative embodiment of
In some embodiments, the amount of a translation, rotation, or other transformation at step 102 may be characterized by a parameter, such as the distance of the translation or the angle of the rotation. This transformation-characterizing parameter may directly be controlled as a hyperparameter or its maximum magnitude may be controlled by a hyperparameter. Hyperparameter values may be set by the system designer or may be controlled by a second machine learning system, called a “learning coach.” A learning coach may also adjust the number of learned parameters based on the number of data items to be learned, such as, for example, by adding nodes and/or layers, to thereby increase the number of arcs that need weights and the number of nodes that need biases.
A learning coach is a second machine learning system that is trained to help manage the learning process of a first machine learning system. Learning coaches are described in more detail in the following applications, which are incorporated herein by reference in their entirety: published PCT application Pub. No. WO 2018/063840 A1, published Apr. 5, 2018, entitled “LEARNING COACH FOR MACHINE LEARNING SYSTEM”; and PCT Application No. PCT/US18/20887, filed Mar. 5, 2018, entitled “LEARNING COACH FOR MACHINE LEARNING SYSTEM.”
In the embodiment illustrated in
In that connection,
If there are no more transformations to be made to the selected training example at step 125, or if step 125 is omitted, the process advances to step 126, where, as shown by the feedback loop from step 126 back to step 121, the process is repeated for the next training example in the epoch, and so on until the process has been performed for all of the training examples in the epoch. The process is then repeated for multiple epochs until the training has converged or another stopping criterion, such as a specified number of epochs, is met, as indicated by the feedback loop from the decision step 127 back to step 120.
The use of the fixed points of the recursive process to represent memorized data items results in the robustness and other remarkable properties for the robust auto-associative memory system 100. For example, with the recursive process, an entire complex image may be recovered from a small piece of the image if the small piece occurs in only one image in the set of images memorized by the robust auto-associative memory unit 100. Similarly, a memorized document (e.g., word processing document, pdf file, spreadsheet, presentation, etc.) may be recovered from a small, unique portion of text; or an audio recording may be recovered from a small interval of sound. As already mentioned, the auto-associative memory system 100 may be trained to be robust against a wide variety of transformations or noise. As another example, a work of art or a photograph may be retrieved from a memorized database from a sketch-like query. This robustness and other properties are further enhanced by another training process for which an illustrative embodiment is discussed in association with
The training process illustrated in
Normal feedback may be represented by a loss function that has its minimum at a data item that is an intended target or positive example. Negative feedback may be represented by a loss function that has its maximum at a data item that is a negative example. For either type of feedback, the partial derivatives of the loss function may be computed by computer system 300 using back propagation if the associative memory 104 is a neural network. Back propagation is well-known to those skilled in the art of training neural networks and is discussed in association with
In some embodiments, the number of learned parameters for the associative memory 104 may be adjusted during the learning process. For example, such a change may be based on testing the performance of the associative memory 104 on new data not used in the training. The number of learned parameters may be increased to increase the capacity of the memory 104 or the number of learned parameters may be decreased if testing reveals that there is spare capacity. Where, for example, the associative memory 104 comprises a deep neural network, additional learned parameters may be added by adding additional directed arcs, by adding additional nodes, or by adding additional layers to the network.
In the embodiment illustrated in
Once computer system 300 has computed the output 105 of the associative memory 104, the computer system 300 applies the output 105 recursively as input to the associative memory 104. The computer system 300 repeats this recursion until the recursion converges or a stopping condition is met. A possible stopping condition is detection of an infinite cycle. An infinite cycle may be detected by observing that an output for a cycle is identical to the output for some previous cycle. In some embodiments, the computer system 300 may save an output example 109 at convergence or at any stage of the recursive process, including the input 103, for use in later training or for detection of an infinite cycle. In some embodiments, all stages of the recursion process are saved. In some embodiments, examples to be saved may be selected by a learning coach.
Although the associative memory 104 is a machine learning system, such as a deep neural network, its task is not to classify its input as in most machine learning tasks. Instead, as its name implies, the associative memory 104 has the task of retrieving from its “memory” of the training data 101 (see
If the recursion converges to an output that is not a training data item, in some embodiments the computer system 300 may save the output at convergence in 109 for later use in training as a negative example 107 of
If the recursion converges to an output that is a training data example, the computer system 300 may save the input and one or more of the intermediate stage outputs in association with the training data example as positive examples 108 for future training.
For either negative or positive examples transferred by computer system 300 to block 108, if the learned parameters of the associative memory 104 are being trained during operation, such as with adaptive training, then the computer system also saves in 108 a snapshot of the current values of the learned parameters of the associative memory unit 104 and links the snapshot to the corresponding positive or negative example.
A negative example saved in block 108 of
In step 200 of
In step 201, the computer system 300 obtains training data that is labeled with classification category labels. The computer system 300 may store the training data in memory. In step 202, the computer system 300 sets aside some of the training data obtained at step 201. The data set aside at step 202 may be used for development testing by a learning coach.
The computer system 300 applies rules and procedures 203, 204, and 205 throughout the training process, as described below. In some embodiments, in procedure 203, the computer system 300 uses the auto-associative memory system 100 (e.g., the one obtained at step 200) as a classifier. As such, at least some of the training data items obtained at step 201 are labeled with a classification category. In operation, the illustrative embodiment of the auto-associative memory system 100 shown in
In procedure 204, the computer system 300 may control the maximum amount or degree to be allowed of a transformation or data augmentation in step 102 of
In operation of the auto-associative memory system 100 in the embodiment illustrated in
After steps 200, 201, and 202 and putting in place the procedures 203, 204, and 205 for operation during the training process, the computer system proceeds to the iterative training loop illustrated in
In the first pass through the loop from step 211 to step 217, preferably delta is set at a small value. In some embodiments, in the first pass the value of delta is set to zero, representing that no transformations, data augmentations, or input variable deletions are to be performed in step 102 of
In step 212, the computer system 300 measures the performance of the auto-associative memory system 100 acting as a classifier. Preferably, this performance measurement is made on development data that has been set aside from the training data, as specified in step 202 of
In step 213, the computer system 300 compares the performance measured in step 212 with measures of performance from previous passes through the loop from step 211 to step 217 or from the preliminary auto-associative memory obtained in step 200 of
In step 214, the computer system generates data according to step 102 of
In step 215, the computer system classifies data generated in step 214 and counts as a misclassification any data 102 that is classified with a category different from the category of the untransformed data 101. Under control of hyperparameters, some or all of these misclassifications are used for training as negative examples.
In step 216, the computer system again measures the performance of the auto-associative memory system 100 acting as a classifier with the performance measured on set-aside development data. In some embodiments the same set of set-aside development data may be used for performance comparisons in step 213 so that the performance difference does not depend on differences in the data on which the performance is measured. In some embodiments, a second set of set-aside development data is used to confirm the cumulative progress of multiple passes through the loop from step 211 to step 217.
In step 217, the computer system again compares the performance of the current system with the previous performance. If there has been an improvement in performance, the computer system returns to step 211 to continue the process with a larger value of delta. If there has been no improvement in performance, the computer system backs up the learned parameter values to the best performance values previously obtained and terminates the training process.
In some embodiments there may be other remedial actions to reduce the classification errors caused when delta is increased. In such embodiments, the computer system may experimentally try some of these other remedial actions and return to step 214 multiple times to try to find an improvement and proceeding to step 211 before eventually deciding to proceed to step 218. For example, in some embodiments, a learning coach may modify the specified function of delta for one or more types of transformation eliminating or limiting the amount of increase in such a transformation. In some embodiments, the computer system may merely return to step 214 multiple times to generate more negative examples.
Although the auto-associative memory unit 104 has generally been described as a single neural network, such as shown in
In various embodiments, the different processor cores 304 may train and/or implement different networks or subnetworks or components. For example, in one embodiment, the cores of the first processor unit 302A may implement the auto-associative memory 104 and the second processor unit 302B may implement the learning coach. For example, the cores of the first processor unit 302A may train the machine learning system (e.g., neural network) of the auto-associative memory 104 according to techniques described herein, whereas the cores of the second processor unit 302B may learn, from implementation of the learning coach, the hyperparameters for the auto-associative memory 104. Further, where the associate memory 104 comprises an ensemble of machine learning systems, different sets of cores in the first processor unit 302A may be responsible for different ensemble members. One or more host processors 310 may coordinate and control the processor units 302A-B.
In other embodiments, the system 300 could be implemented with one processor unit 302. In embodiments where there are multiple processor units, the processor units could be co-located or distributed. For example, the processor units 302 may be interconnected by data networks, such as a LAN, WAN, the Internet, etc., using suitable wired and/or wireless data communication links. Data may be shared between the various processing units 302 using suitable data links, such as data buses (preferably high-speed data buses) or network links (e.g., Ethernet).
The software for the various compute systems described herein and other computer functions described herein may be implemented in computer software using any suitable computer programming language such as .NET, C, C++, Python, and using conventional, functional, or object-oriented techniques. Programming languages for computer software and other computer-implemented instructions may be translated into machine language by a compiler or an assembler before execution and/or may be translated directly at run time by an interpreter. Examples of assembly languages include ARM, MIPS, and x86; examples of high level languages include Ada, BASIC, C, C++, C#, COBOL, Fortran, Java, Lisp, Pascal, Object Pascal, Haskell, ML; and examples of scripting languages include Bourne script, JavaScript, Python, Ruby, Lua, PHP, and Perl.
In this discussion, a neural network comprises a network of nodes organized into layers, a layer of input nodes, zero or more inner or “hidden” layers of nodes, and a layer of output nodes. There is an input node associated with each input variable and an output node associated with each output variable. An inner layer may also be called a “hidden layer.” A given node in the output layer or in an inner layer is connected to one or more nodes in lower layers by means of a directed arc from the node in the lower layer to the given higher layer node. In this example network, there is also a directed arc from each output node back to the corresponding input node. A directed arc is an arc where direction matters, as opposed to undirected arcs. Note that there are only directed arc in the recurrent neural network shown in
The directed arcs are each associated with a trainable parameter, called its weight, which represents the strength of the connection from the lower node to the given higher node (or from an output node to its corresponding input node for the directed arcs from the output nodes to the input nodes). A trainable parameter is also called a “learned” parameter. Each node is also associated with an additional learned parameter called its “bias.” In a preferred embodiment, the weight associated with an arc from an output node to the corresponding input node implicitly has the value 1.0 and there is no learned parameter for these particular output-to-input arcs, as opposed to other arcs in the network that will have trainable (or learned) weights. Other parameters that control the learning process are called “hyperparameters.” The neural network illustrated in
A neural network in which there is no cycle of directed arcs leading from a node back to itself is called a “feed-forward” network. A neural network in which there is a cycle of directed arcs is called a “recurrent neural network.” In embodiments of the present invention, the cycle for the recurrent neural network are the directed arcs from the output nodes to their corresponding input nodes, as shown in
For training purposes, a recurrent neural network R may be “unrolled” by making a sequence of copies of the neural network R(t) for each value oft in {0, 1, . . . , T}. In the case of the auto-associative memory unit 104 in
The recurrent neural network Rt on the left side of the equation in
The copies of R0, . . . t, therefore collectively form a single large feed-forward neural network since each directed arc goes from a node in a higher numbered layer to a destination node in a lower numbered layer in copy R(t) not go to the node in its own copy R(t) (or from a node to itself or to a lower numbered destination node in its own layer), but rather have the directed arc go to the copy of the corresponding destination node in the next copy of the network R(t+1). Thus, in the unrolled network there are no cycles of directed arcs, so the unrolled network is a feed-forward network, as shown by the right side of
In the auto-associative memory unit 104 or the network shown in
A feed-forward neural network or an unrolled recurrent neural network may be trained using an iterative training process called stochastic gradient descent with a gradient estimate and learned parameter update for each minibatch of training data. An epoch of this iterative training process comprises a minibatch update for all the minibatches in the full batch of training data.
The estimate of the gradient of the objective function for each minibatch may be computed by accumulating an estimate of the gradient of the objective function for each training data item in the minibatch. The estimate of the gradient for a training data item may be computed by a feed-forward computation of the activation of each node in the network followed by a backwards computation of the partial derivatives of the objective function based on the chain rule of calculus. The backwards computation of the partial derivatives of the objective function is called “back propagation.”
Stochastic gradient descent, including the feed-forward computation, the back propagation of partial derivatives of an objective function, and unrolling a recurrent neural network are all well-known to those skilled in the art of training neural networks.
In the auto-associative memory unit 104 and in the neural network in
In many recurrent neural network architectures, the unrolled feed-forward network is only an approximate model to the recursive neural network because in the activation computation of the nodes in a cycle in the recursive network can, in principle go around the cycle an infinite number of times. Furthermore, for many recurrent neural network architectures, during training a problem called “vanishing gradient” may occur for an unrolled recurrent neural network with too large a value of T. The magnitudes of the back propagated partial derivative may decrease by roughly a multiplicative factor for a factor less than 1.0 for each round of recursion, producing an exponential decay in the magnitudes of the partial derivatives.
Having an objective for each unrolled copy R(t) of the network R and accumulating the combined back propagated partial derivatives from higher numbered copies of R prevents this form “vanishing gradient.” In addition, the number of rounds of recursion in the auto-associative memory is limited to a finite number because of convergence or some other stopping criterion for the recursion.
In one general aspect, therefore, the present invention is directed to computer systems and computer-implemented methods for recursively training a content-addressable auto-associative memory. The system comprises a set of processor cores and computer memory that is in communication with the set of processor cores. The computer memory stores software that when executed by the set of processor cores, causes the set of processor cores to recursively train the content-addressable auto-associative memory system with a plurality of learned parameters and with a plurality of input examples, where each input example is represented by a plurality of input variables, such that: (i) the content addressable auto-associative memory system is trained to produce an output pattern for each of the input examples; and (ii) a quantity of the learned parameters for the content-addressable auto-associative memory is equal to the number of input variables times a quantity that is independent of the number of input variables. In various implementations, the software causes the set of processor cores to train the content-addressable auto-associative memory system such that the quantity of learned parameters for the content-addressable auto-associative memory system can be varied based on the number of input examples to be learned.
In another general aspect, the present invention is directed to computer systems and computer-implemented methods for recursively training a recurrent neural network with a plurality of input examples. In such embodiments, the computer system comprises a set of processor cores and computer memory in communication with the set of processor cores. The computer memory stores software that when executed by the set of processor cores, causes the set of processor cores to recursively train a recurrent neural network with a plurality of input examples, such that: (i) the recurrent neural network comprises a deep neural network that comprises N+1 layers, numbered 0, . . . , N, wherein N>3, and wherein layer 0 is an input layer and layer N is an output layer of the recurrent neural network, and wherein layers 1 to N−1 are between the input layer and the output layer; (ii) the recurrent neural network is trained to produce an output pattern for each of the input examples; (iii) a target for the output pattern for each input example is the input example; and (iv) the recurrent neural network comprises a plurality of directed arcs, wherein at least some of the directed arcs are between a node in one layer of the recurrent neural network and a node in another layer of the recurrent neural network.
In another implementation, the software stored by the computer memory causes the set of processor cores to recursively train the recurrent neural network such that: (i) the recurrent neural network is trained to produce an output pattern for each of the input examples; and (ii) a quantity of learned parameters for the recurrent neural network is equal to the number of input variables times a quantity that is independent of the number of input variables.
In various implementations, the software can cause the set of processor cores to train the content-addressable auto-associative memory system or the recurrent neural network, as the case may be, such that the quantity of learned parameters for the content-addressable auto-associative memory system or the recurrent neural network, as the case may be, can be varied based on the number of input examples to be learned. Also, the software can cause the set of processor cores to train the content-addressable auto-associative memory system or the recurrent neural network, as the case may be, by back propagating partial derivatives of a loss function through the content-addressable auto-associative memory system or the recurrent neural network. Also, the software can cause the set of processor cores to train the content-addressable auto-associative memory system or the recurrent neural network, as the case may be, by for each input example: randomly transforming the input example; and recursively providing the randomly transformed input example to the content-addressable auto-associative memory system or the recurrent neural network, as the case may be, for training, until an output of the content-addressable auto-associative memory system converges to the input example. The random transformations of the input examples can comprise one or more of: translating the input example; rotating the input example; linearly transforming the input example; degrading the input example; and subsampling the input example.
In still further implementations, the computer memory may store software that when executed by the set of processor cores further causes the set of processor cores to train the content-addressable auto-associative memory system or the recurrent neural network, as the case may be, with negative input examples. The negative input examples may comprise input examples where the output of the content-addressable auto-associative memory system or the recurrent neural network, as the case may be, in operation, does not converge to an input example.
In still further implementations, at least some of the input examples are labeled examples that have, for each such input example, a classification category label such that the content-addressable auto-associative memory system or the recurrent neural network, as the case may be, is trained to act as a classifier. The classification category labels may comprise error-correcting encoding.
Based on the above description, it is clear that embodiments of the auto-associative memory system described herein can be content-addressable in the sense that an item in the memory may be retrieved by a description or example rather than by knowing the address of the item in memory. Further, the auto-associative memory is associative in that an item can be retrieved with a query based on an example of an associated item rather than by an example of the item itself. The auto-associative memory is also robust in that an item can be retrieved from a query that is a transformed, distorted, noisy version of the item to be retrieved. The auto-associative memory can be use, for example, to retrieve images, documents, acoustic files, etc. from inputs, which can be small pieces of the images, documents, acoustic files, etc.
The examples presented herein are intended to illustrate potential and specific implementations of the present invention. It can be appreciated that the examples are intended primarily for purposes of illustration of the invention for those skilled in the art. No particular aspect or aspects of the examples are necessarily intended to limit the scope of the present invention. Further, it is to be understood that the figures and descriptions of the present invention have been simplified to illustrate elements that are relevant for a clear understanding of the present invention, while eliminating, for purposes of clarity, other elements. While various embodiments have been described herein, it should be apparent that various modifications, alterations, and adaptations to those embodiments may occur to persons skilled in the art with attainment of at least some of the advantages. The disclosed embodiments are therefore intended to include all such modifications, alterations, and adaptations without departing from the scope of the embodiments as set forth herein.
Claims
1. A computer system comprising:
- a set of processor cores; and
- computer memory in communication with the set of processor cores, wherein the computer memory stores software that when executed by the set of processor cores, causes the set of processor cores to recursively train a content-addressable auto-associative memory system with a plurality of learned parameters and with a plurality of input examples, wherein each input example is represented by a plurality of input variables, such that: the content addressable auto-associative memory system is trained to produce an output pattern for each of the input examples; and a quantity of the learned parameters for the content-addressable auto-associative memory is equal to the number of input variables times a quantity that is independent of the number of input variables.
2. The computer system of claim 1, wherein the software causes the set of processor cores to train the content-addressable auto-associative memory system such that the quantity of learned parameters for the content-addressable auto-associative memory system can be varied based on the number of input examples to be learned.
3. The computer system of claim 1, wherein the computer memory stores software that when executed by the set of processor cores causes the set of processor cores to train the content-addressable auto-associative memory system by back propagating partial derivatives of a loss function through the content-addressable auto-associative memory system.
4. The computer system of claim 1, wherein the computer memory stores software that when executed by the set of processor cores causes the set of processor cores to train the content-addressable auto-associative memory system by, for each input example:
- randomly transforming the input example; and
- recursively providing the randomly transformed input example to the content-addressable auto-associative memory system for training, until an output of the content-addressable auto-associative memory system converges to the input example.
5. The computer system of claim 4, wherein the computer memory stores software that when executed by the set of processor cores further causes the set of processor cores to train the content-addressable auto-associative memory system with negative input examples.
6. The computer system of claim 5, wherein the negative input examples comprise input examples where the output of the content-addressable auto-associative memory system, in operation, does not converge to an input example.
7-11. (canceled)
12. A computer system comprising:
- a set of processor cores; and
- computer memory in communication with the set of processor cores, wherein the computer memory stores software that when executed by the set of processor cores, causes the set of processor cores to recursively train a recurrent neural network with a plurality of input examples, such that: the recurrent neural network comprises a deep neural network that comprises N+1 layers, numbered 0,..., N, wherein N>3, and wherein layer 0 is an input layer and layer N is an output layer of the recurrent neural network, and wherein layers 1 to N−1 are between the input layer and the output layer;
- the recurrent neural network is trained to produce an output pattern for each of the input examples;
- a target for the output pattern for each input example is the input example; and
- the recurrent neural network comprises a plurality of directed arcs, wherein at least some of the directed arcs are between a node in one layer of the recurrent neural network and a node in another layer of the recurrent neural network.
13-15. (canceled)
16. The computer system of claim 12, wherein the software causes the set of processor cores to train the recurrent neural network such that the only directed arcs in the recurrent neural network from a higher numbered layer to a lower numbered layer are from a node in the output layer N to a node in the input layer 0.
17. The computer system of claim 16, wherein:
- the output layer of the recurrent neural network comprise a plurality of output layer nodes;
- the input layer of the recurrent neural network comprise a plurality of input layer nodes;
- the quantity of input layer nodes equals the quantity of output layer nodes, such that each output layer node has one and only one corresponding input layer node; and
- the only directed arcs in the recurrent neural network that are from a higher numbered layer to a lower numbered layer are directed arcs from the output layer to the input layer, wherein there is a directed arc from each output layer node to its associated input layer node.
18. The computer system of claim 12, wherein the computer memory stores software that when executed by the set of processor cores causes the set of processor cores to train the recurrent neural network by back propagating partial derivatives of a loss function through the recurrent neural network.
19. The computer system of claim 13, wherein the computer memory stores software that when executed by the set of processor cores causes the set of processor cores to train the recurrent neural network by back propagating partial derivatives of a loss function through the recurrent neural network.
20. The computer system of claim 12, wherein the computer memory stores software that when executed by the set of processor cores causes the set of processor cores to train the recurrent neural network by back propagating partial derivatives of a loss function through the recurrent neural network.
21. The computer system of claim 17, wherein the computer memory stores software that when executed by the set of processor cores causes the set of processor cores to train the recurrent neural network by, for each input example:
- randomly transforming the input example; and
- recursively providing the randomly transformed input example to the content-addressable auto-associative memory system for training, until an output of the content-addressable auto-associative memory system converges to the input example.
22. The computer system of claim 21, wherein the computer memory stores software that when executed by the set of processor cores causes the set of processor cores to transform an input example by performing a distortion on the input example that comprises a distortion selected from the group consisting of:
- translating the input example;
- rotating the input example;
- linearly transforming the input example;
- degrading the input example; and
- subsampling the input example.
23. The computer system of claim 21, wherein the transformations of the input examples are controlled by one or more hyperparameters.
24. The computer system of claim 23, further comprising:
- a second set of processor cores; and
- second computer memory in communication with the second set of processor cores, wherein the second computer memory stores software that when executed by the second set of processor cores causes the second set of processor cores to implement a machine-learning learning coach that learns, through machine learning, values for the one or more hyperparameters.
25. The computer system of claim 21, wherein the computer memory stores software that when executed by the set of processor cores further causes the set of processor cores to train the recurrent neural network with negative input examples.
26. The computer system of claim 25, wherein the negative input examples comprise input examples where the output of the recurrent neural network, in operation, does not converge to an input example.
27. The computer system of claim 12, wherein at least some of the input examples are labeled examples that have, for each such input example, a classification category label such that the recurrent neural network is trained to act as a classifier.
28. The computer system of claim 27, wherein the classification category labels comprise error-correcting encoding.
29. The computer system of claim 12, wherein:
- the input examples are digital images; and
- the recurrent neural network is trained to, in operation, retrieve one of the digital images in response to receiving as input a portion of the digital image.
30. The computer system of claim 12, wherein:
- the input examples are audio files; and
- the recurrent neural network is trained to, in operation, retrieve one of the audio files in response to receiving as input a portion of the audio file.
31. The computer system of claim 12, wherein:
- the input examples are document files; and
- the recurrent neural network is trained to, in operation, retrieve one of the document files in response to receiving as input a portion of the document file.
32. A method comprising:
- recursively train, by a computer system that comprises a set of processor cores, a content-addressable auto-associative memory system with a plurality of learned parameters and with a plurality of input examples, wherein each input example is represented by a plurality of input variables, such that: the content addressable auto-associative memory system is trained to produce an output pattern for each of the input examples; and a quantity of the learned parameters for the content-addressable auto-associative memory is equal to the number of input variables times a quantity that is independent of the number of input variables.
33. The method of claim 32, wherein training the content-addressable auto-associative memory system comprises raining the content-addressable auto-associative memory system such that the quantity of learned parameters for the content-addressable auto-associative memory system can be varied based on the number of input examples to be learned.
34. The method of claim 32, wherein training the content-addressable auto-associative memory system comprises back propagating partial derivatives of a loss function through the content-addressable auto-associative memory system.
35. The method of claim 32, wherein training the content-addressable auto-associative memory system comprises, for each input example:
- randomly transforming the input example; and
- recursively providing the randomly transformed input example to the content-addressable auto-associative memory system for training, until an output of the content-addressable auto-associative memory system converges to the input example.
36. The method of claim 35, wherein training the content-addressable auto-associative memory system comprises training the content-addressable auto-associative memory system with negative input examples.
37. The method of claim 36, wherein the negative input examples comprise input examples where the output of the content-addressable auto-associative memory system, in operation, does not converge to an input example.
38. The method of claim 35, wherein at least some of the input examples are labeled examples that have, for each such input example, a classification category label such that the r content-addressable auto-associative memory system is trained to act as a classifier.
39. The method of claim 38, wherein the classification category labels comprise error-correcting encoding.
40. The method of claim 32, wherein:
- the input examples are digital images; and
- the content-addressable auto-associative memory system is trained to, in operation, retrieve one of the digital images in response to receiving as input a portion of the digital image.
41. The method of claim 32, wherein:
- the input examples are audio files; and
- the content-addressable auto-associative memory system is trained to, in operation, retrieve one of the audio files in response to receiving as input a portion of the audio file.
42. The method of claim 32, wherein:
- the input examples are document files; and
- the content-addressable auto-associative memory system is trained to, in operation, retrieve one of the document files in response to receiving as input a portion of the document file.
43. A method comprising:
- training, recursively, by a computer system that comprises a set of processor cores, a recurrent neural network with a plurality of input examples, such that: the recurrent neural network comprises a deep neural network that comprises N+1 layers, numbered 0,..., N, wherein N>3, and wherein layer 0 is an input layer and layer N is an output layer of the recurrent neural network, and wherein layers 1 to N−1 are between the input layer and the output layer; the recurrent neural network is trained to produce an output pattern for each of the input examples; a target for the output pattern for each input example is the input example; and the recurrent neural network comprises a plurality of directed arcs, wherein at least some of the directed arcs are between a node in one layer of the recurrent neural network and a node in another layer of the recurrent neural network.
44. A method comprising:
- training, recursively, by a computer system that comprises a set of processor cores, a recurrent neural network with a plurality of input examples, such that: the recurrent neural network is trained to produce an output pattern for each of the input examples; and a quantity of learned parameters for the recurrent neural network is equal to the number of input variables times a quantity that is independent of the number of input variables.
45-46. (canceled)
47. The method of claim 43, wherein training the recurrent neural network comprises training the recurrent neural network such that the only directed arcs in the recurrent neural network from a higher numbered layer to a lower numbered layer are from a node in the output layer N to a node in the input layer 0.
48. The method of claim 47, wherein:
- the output layer of the recurrent neural network comprise a plurality of output layer nodes;
- the input layer of the recurrent neural network comprise a plurality of input layer nodes;
- the quantity of input layer nodes equals the quantity of output layer nodes, such that each output layer node has one and only one corresponding input layer node; and
- the only directed arcs in the recurrent neural network that are from a higher numbered layer to a lower numbered layer are directed arcs from the output layer to the input layer, wherein there is a directed arc from each output layer node to its associated input layer node.
49. The method of claim 43, wherein training the recurrent neural network comprises back propagating partial derivatives of a loss function through the recurrent neural network.
50. The method of claim 44, wherein training the recurrent neural network comprises back propagating partial derivatives of a loss function through the recurrent neural network.
51. (canceled)
52. The method of claim 48, wherein training the recurrent neural network comprises, for each input example:
- randomly transforming the input example; and
- recursively providing the randomly transformed input example to the content-addressable auto-associative memory system for training, until an output of the content-addressable auto-associative memory system converges to the input example.
53. The method of claim 52, wherein transforming an input example by performing a distortion on the input example that comprises a distortion selected from the group consisting of:
- translating the input example;
- rotating the input example;
- linearly transforming the input example;
- degrading the input example; and
- subsampling the input example.
54-55. (canceled)
56. The method of claim 52, wherein training the recurrent neural network comprises training the recurrent neural network with negative input examples.
57-62. (canceled)
Type: Application
Filed: Sep 19, 2018
Publication Date: Sep 10, 2020
Inventor: James K. BAKER (Maitland, FL)
Application Number: 16/646,071