METHODS AND SYSTEMS TO TRAIN NEURAL NETWORKS
A computer-implemented technique for training an artificial neural network is disclosed. The technique includes obtaining a training sample for training the artificial neural network; determining multiple sub concepts within the training sample; processing the sub concepts to obtain differential neurons associated with the sub concepts, wherein the differential neurons provide a relative distinction between the sub concepts; integrating the differential neurons to obtain sub concepts neurons, wherein the sub concepts neurons provide an absolute distinction of sub concepts; and integrating the sub concepts neurons to obtain concept neurons that form an output of the neural network.
Latest THE BOARD OF REGENTS OF THE UNIVERSITY OF TEXAS SYSTEM Patents:
This application is a U.S. National Stage Application of International Application No. PCT/US2021/019470, filed Feb. 24, 2021, which claims priority to U.S. Provisional Application No. 62/980,687 filed Feb. 24, 2020, each of the forgoing are hereby incorporated by reference in their entireties.
FIELDThe present disclosure relates to methods and systems to train neural networks.
BACKGROUNDA defining feature of living systems is the ability to integrate multiple signals and respond appropriately. It is often difficult to understand these processes mechanistically because they involve multiple, distributed agents. For human cognition, the ancient question of how the brain gives rise to thought has been embodied during the last century in the divide between symbolic cognitive models and connectionist network models.
On the one hand, symbolic models of reasoning have historically been phenomenological and therefore have lacked a direct mechanistic link to the brain’s neuronal structure. On the other hand, connectionist models of neural networks have been unable to explain the emergence of conceptual thought and symbolic manipulation.
The divide between symbolism and connectionism has been especially evident in their various implementations in artificial intelligence (AI) systems. Recent attempts at capturing symbolic reasoning in connectionist models do not address this divide because they are hybrids in which symbolic work only occurs on the outputs of networks, ignoring the need to integrate symbolic manipulation within the network itself.
Because of this divide, both symbolic and connectionist AI currently have other fundamental limitations. Symbolic AI often proves too rigid, does not scale well to combinatorically large problems, and is not able to learn features from raw input data.
Recently, connectionist deep learning models have become popular due to their superhuman accuracy across a large range of tasks. This known approach utilizes artificial neural networks with layers of biologically inspired neurons, which they train by gradient descent (GD) with backpropagation. Despite its success, it is widely regarded as a black box because there appears to be no understandable explanation for the learned synaptic weights or for the process by which individual neurons give rise to the final output.
Deep learning techniques also have difficulty generalizing and therefore require large labeled datasets. And the networks are paradoxically fooled by adversarial attacks with small, human-imperceptible input perturbations. More fundamentally, various biological phenomena such as modularity, hubs, and sparse neuron firing are not naturally learned by artificial neural networks. Much effort has been devoted to addressing these problems, yet they remain largely unresolved.
It is an object of the present disclosure to provide methods and systems to train neural networks that overcome these limitations.
SUMMARYA computer-implemented method for training an artificial neural network is disclosed. The method includes obtaining a training sample for training the artificial neural network; determining multiple sub concepts within the training sample; processing the sub concepts to obtain differential neurons associated with the sub concepts, wherein the differential neurons provide a relative distinction between the sub concepts; integrating the differential neurons to obtain sub concepts neurons, wherein the sub concepts neurons provide an absolute distinction of sub concepts; and integrating the sub concepts neurons to obtain concept neurons that form an output of the neural network.
The training sample may be designed to teach one or more rules. The determining of the sub concepts within the training sample may include obtaining various subsets of the training sample and distinguishing between the various subsets. The unsupervised learning may be used to determine hierarchical structure of the sub concepts. The sub concepts may be overlapping or hierarchically structured. One or more of the differential neurons may be pruned before the integrating of the differential neurons to obtain sub concepts neurons. The neurons of the artificial neural network can be deliberative, temporarily changing their parameters. The artificial neural network can be tuned after the training to improve its performance. The neurons of the artificial neural network may provide symbolic outputs that are interpretable as algorithms.
A system for training a neural network is disclosed. The system comprises a processor and an associated memory, the processor being configured to: obtain a training sample for training the artificial neural network; determine multiple sub concepts within the training sample; process the sub concepts to obtain differential neurons associated with the sub concepts, wherein the differential neurons provide a relative distinction between the sub concepts; integrate the differential neurons to obtain sub concepts neurons, wherein the sub concepts neurons provide an absolute distinction of sub concepts; and integrate the sub concepts neurons to obtain concept neurons that form an output of the neural network.
Other objects and advantages of the present disclosure will become apparent to those skilled in the art upon reading the following detailed description of exemplary embodiments, in conjunction with the accompanying drawings, in which like reference numerals have been used to designate like elements, and in which:
The present disclosure provides a machine learning algorithm to bridge the aforementioned symbolic-connectionist divide using the philosophy of essences. Such algorithms (“essence neural networks” or ENNs) can be more explainable and capable of simulating symbolic reasoning. The integration of symbolism can allow ENNs to be explainable and capable of hierarchical organization, deliberation, symbolic manipulation, and concept generalization. They can also be more modular, sparse, and robust to noise and adversarial attacks. These networks can represent a new interpretation of the complex connections and activities of biological neural networks and how they give rise to perception and reasoning.
The disclosed ENNs can integrate the symbolic Aristotelian theory of essences with the connectionist model of deep neural networks. This approach does not start with a random guess or use incremental improvements in network performance but instead builds the network using the underlying structure (or “essence”) of the learning problem. This allows a human-level explainability of decision-making, the ability to find symbolic and therefore highly generalizable solutions, and greater robustness to input noise and adversarial attacks.
Such an approach is different from the known techniques, which are based on the gradient minimization of the error function and are referred to as gradient descent networks (GDNs). Therefore, the known approaches are not able to utilize the structure of the problem. In contrast, the underlying structure of the present approach can be based on finding an optimal category structure for the problem, rather than seeding with random structure and performing gradient optimization used in the known approaches.
In an exemplary embodiment, training images used can be 28×28 black images with a one-pixel-wide stripe across the full length or height of the image, which means there can be 56 total training images. The diagonal line and box outline datasets can be generated as follows. For each pair of possible heights and widths of non-square rectangles in the image, no more than 50 unique rectangles with randomly placed corners can be generated. This rectangle’s outline can be drawn to make the box outline datasets, and one of its two diagonals can be chosen randomly to make the diagonal line dataset.
The method 100 can include a step 120 of determining multiple sub concepts within the training samples. In an exemplary embodiment, the determining of the sub concepts can be done via unsupervised learning. A hierarchical linkage clustering can be used within each class, choosing a single cutoff value for all concepts’ linkage trees such that the desired total number of sub concepts be obtained. Ward clustering metric can provide good results due to its emphasis on generating compact clusters of comparable size.
The method 100 can include a step 130 of processing the sub concepts to obtain differential neurons associated with the sub concepts. The differential neurons can provide a relative distinction between the sub concepts.
The step 130 of processing the sub concepts can be performed using linear SVMs. The weights and intercepts of the SVMs can be scaled by a multiplier hyperparameter to alter the steepness of the neuron response, and these can become the weights and biases of the inputs to each differential neuron in the first layer.
In an exemplary embodiment, the output of a neuron can be σ(ω·x+b), which is a function of the distance of the incoming signal x from the neuron’s hyperplane w with bias b, with a sigmoid activation function
saturating the output between non-firing (0) and maximal firing or neurons. It is therefore natural to model neurons as responsible for separating, or distinguishing, concepts.
SVMs (support vector machines), as described herein, are merely an example of the supervised learning techniques that may be used in step 130. A person of ordinary skill in the art would appreciate that other similar techniques can also be used. In an exemplary embodiment, it may be unnecessary to compute differentiae between sub concepts of the same concept.
The method 100 can include a step 140 of integrating the differential neurons to obtain sub concepts neurons. The sub concepts neurons can provide an absolute distinction of the sub concepts. To perform step 140, an initial SVM can be generated between each sub concept and all other concepts using the differentia neuron outputs. To improve running time, this SVM may use as features the differentiae associated with the particular sub concept.
Neurons whose absolute weight values in the SVM are low can be sequentially masked and then the SVM can be recomputed. This sequential pruning can be halted either when the SVM’s margin drops below a certain fraction of the original SVM’s margin or when its misclassification error increased by a certain amount. This can be done for each sub concept, and the differential neurons that are no longer being used by any sub conceptual SVM can be pruned from the network.
The method 100 can include a step 150 of integrating the sub concepts neurons to obtain concept neurons that form an output of the neural network. To connect the sub conceptual neurons with the output conceptual neurons, either an SVM can be first computed for each conceptual class separating it from all other classes or each sub concept can be connected to its own concepts. The weights and intercepts of the SVM can be scaled by a multiplier to become the weights and biases for each sub conceptual neuron which can be followed by a sigmoid activation function.
In an exemplary embodiment, to improve network accuracy and assign more meaningful output probabilities, the weights can be refined using a stochastic gradient descent approach for this final layer, using a categorical cross-entropy loss function. This gradient descent can also be used to find the best hyperparameter value for the sub concept SVM multiplier. At the end of the ENN a softmax layer can be placed to turn the concept neuron outputs into probabilities or left as a sigmoid output.
There are several hyperparameters found in method 100 that can be user-defined. The number of sub concepts can define the size of the second layer of neurons. Each SVM may require a cost to set the softness of the margin and a multiplier to scale the response. For the pruning, there can both be a tolerated margin fraction and misclassification tolerance used to determine when to halt. With gradient descent in the final layer the multiplier for the sub conceptual layer can also be found, with the hyperparameter here serving as a maximum value.
To find an optimal set of hyperparameters, a grid search can be done using 10-fold cross validation. To speed up the search process this can be done on a restricted set of training data to narrow down several hyperparameters. For the final results, several ENNs can be trained on each problem with all the same hyperparameters except for a variable number of sub concepts and a variable pruning toleration margin to vary the size of the network.
Based on the above steps (110-150), the method 100 can optimally split the input space into distinct convex regions using differentiating feature hyperplanes, and the neural network can be trained to judge all inputs according to the distinguishing features and then to assign the input to a unique sub concept based on the unique combination of features that it satisfies. The sub concept can then be assigned to the given output, with multiple sub concepts possibly mapping to the same output. A linear separability of disjoint convex sets is useful because artificial neurons can be modeled mathematically as hyperplanes.
The disclosed ENNs, in contrast to known Voronoi neural networks (VNNs), can learn from aggregate concepts and sub concepts. This is a fundamentally different approach that focuses on concepts instead of exemplars and therefore makes smooth, natural distinctions instead of memorizing every past experience (i.e. nearest neighbors) task.
Other such non-limiting examples can include recognizing animals with life-cycle morphological changes, to read letters in different fonts or cases, etc. The next section describes various properties of ENNs in detail and compares the disclosed ENNs and known techniques (e.g. GDNs) for the same input signals.
ScalabilityENNs can scale to problems with larger datasets and with many more features without having to grow exponentially in the number of neurons or in training time. Table 1 below illustrates a scale comparison between the disclosed ENNs and known GDNs. Table 1 provides the results of training ENNs and a GDN of the same size on several datasets. Shown are the sizes of each training set, the sizes of the learned ENN, the training times, and the performance results on a test set for the ENN and a trained GDN of the same size.
In an exemplary embodiment, the reported training times can be the measured wall times-starting once the training data is loaded and ending with the end of storing all of the network’s parameters—on a computing cluster on a single non-GPU node without parallelization for a fair comparison against ENNs.
In an exemplary embodiment, error rates can be from the test sets, which can be held out from training and hyperparameter optimization. In order to assess how the size of the training set affected performance, ENNs and GDNs can be trained on random subsamples of MNIST, each subsample with the same number of images from each class. For each subsample size this can be repeated five times. The same ENN hyperparameters can be used, with 60 sub concepts and without any pruning to maintain a consistent network size, as shown in panel (C) of
In an exemplary embodiment, training of cENNs can begin by obtaining sub images from the training set randomly sampled equally from each class and uniformly within each image. k-means clustering, or other similar techniques, can be used to divide up the sub images into feature sub concepts, with k corresponding to the number of convolutional filters. Each cluster can be collapsed into its average, and one-versus-all SVMs can be computed for each, which generate the convolutional filters.
The outputs of this layer can be passed through a max-pooling layer. Another set of convolutional filters and a max-pooling layer can be performed on the outputs of the first max-pooled convolutional layer. The outputs of the second max-pooling layer can then be fed into the ENN learning algorithm.
In an exemplary embodiment, for the convolutional layers SVM multipliers of 2, stride rate of 1×1 pixel, and max-pooling with non-overlapping 2×2 pixel boxes can be used. The MNIST input images can be padded out to be 32×32 pixels consistent with LeNet-5. The smaller of the two cENNs in Table 1 can be designed to be of similar dimensions to LeNet-5—the tolerated margin fraction adjusted to achieve this-and its learned filters, as shown in
The filters can be visualized both by plotting their weights and by computing the weighted average of all windows in the test set which lie on either side of the filter’s hyperplane. Taking the filter neuron’s output yi for each window, the weight applied to each can be |yi - 0.5|. For the second set of convolutional filters, the same can be done, but taking the full receptive field from the original image that pertained to each filter.
In an exemplary embodiment, ENNs can also be amenable to post-training adaptation. Two cognitive systems described by dual-process theory can be simulated - System 1 making rapid, intuitive decisions, and System 2 performing slow, deliberative reasoning. Feedforward neural networks can be analogous to System 1, and System 2 can be mimicked by allowing deliberative ENNs (dENNs) to dynamically modify the bias factors of their sub concept neurons whenever they had low classification certainty. This can improve classification accuracy, especially when training with smaller amounts of data or on symbolic problems.
Their dynamic, post-training deliberation can be implemented in multiple ways. One such way includes providing a test sample to the network, to see if there are two output probabilities that are within a factor of given factor (which can be 2 in certain cases except for the TSP, where it was 10). In such a case, deliberation can be allowed to occur on that sample.
The network then can uniformly increase or decrease the bias values of its sub conceptual neurons in order to attempt to find a result where the output probabilities are well separated. It can choose to increase the bias factor if none of the sub conceptual neurons are firing over 0.5 and decrease otherwise. The biases can be changed uniformly because computing the SVMs scales them all so that the weighted distance to the hyperplane is the same.
ExplainabilityIn
In
While a distributed neural circuit encodes information in diffuse firing patterns across many non-selective neurons, a localized network has neurons highly selective for specific stimuli or processes (e.g. “grandmother cells”). In ENNs, differentiae neurons can have a distributed firing pattern, while the sub concept and concept neurons can fire much more sparsely and selectively. There is increasing evidence that this hierarchical separation of distributed and localized firing patterns is how parts of animal nervous systems are organized.
Generalize Concepts Using Symbolic ManipulationThe ENNs can learn large-magnitude weights so that each neuron only fires at 0 or 1, as shown in
Symbolic reasoning can allow learning simple rules from limited experience and apply them to more complex problems. The ability to extrapolate from one distribution of inputs to a different one without any additional training (blind generalization). To test this, a GDN and a symbolic ENN can be trained on a set of images that contained a one-pixel white stripe oriented horizontally or vertically, for 56 total 28×28 images, as shown in
In an exemplary embodiment, the TSP can feature a salesman trying to find the shortest possible route that takes him through all cities on a map and return home. Both the training and test sets can include samples with 55 features, 45 corresponding to the upper half of the inter-city distance matrix for a 10-city map, and the remaining 10 serving as a one-hot encoding of the current city, scaled up by 10. The cities can be located on a map on the unit square. Cities that have already been visited can be denoted in the distance matrix as being a distance of 10 from all other cities. The training set can include 90 samples corresponding to the maps with only on unvisited city. In order to teach a generalizable rule, the correct city to visit next can be located a distance of 0 from the current city.
In an exemplary embodiment, ENN training can be allowed on each sub concept to use as inputs its associated differentiae, and the initial concept layer can use connections of weight 10 between sub concepts and their specific concept neurons which had bias -5. The output neurons can use a sigmoid activation function. After training each network can be asked to find a route for the test set maps. The distance matrix can be given to the networks, which picked the next city to visit. The distance matrix can then be altered by switching the indices of the new city with the current city (i.e. the first index) and setting all distances from the previous city as 10. The network outputs corresponding to cities already visited can be masked to prevent the possibility of endless loops.
In an exemplary embodiment, the test set can include 5000 maps with the 10 cities all placed randomly. To serve as a reference for a greedy algorithm, each map can be put through the greedy nearest-neighbor algorithm (i.e. choose for the next city the closest unvisited city). The test error reported for the TSP can be the average difference in the route length found by the neural network compared to the nearest-neighbor algorithm.
GDN could not be generalized perfectly by changing its size, running GD many times, and choosing the network with best test set performance. GD could not train a network that perfectly generalized to the diagonal line and box outline datasets when the noise levels were as low as 1% and 3%, respectively.
In an exemplary embodiment, to demonstrate the rarity of finding a generalizable solution with GDNs, the weights of the generalizing ENN can be perturbed by a small amount, and then be trained as a GDN. The perturbation can include adding a normally distributed value to all weights and biases, with the standard deviation being a given fraction of the mean weight magnitude for each layer separately.
In an exemplary embodiment, lesions can be performed in the second layer (sub conceptual neurons in ENNs). Neurons can be deleted sequentially, and test accuracy can be calculated individually for each class. The sequence of neuron deletions can be decided by using hierarchical linkage clustering on their outputs on the test set, with the assumption that neurons with similar firing patterns are physically located more closely together.
GDNs performed worse with noise of 0.01%, demonstrating the virtual impossibility of GDNs learning a generalizable rule to map short routes. GDN pre-training weights can be seeded with ENN weights that have various amounts of noise added to them. The results of these GDNs can be trained for different numbers of epochs are shown. Dots indicate all 5 repeats for each noise level and number.
At each branch point of the tree the networks were asked to choose which feature to split. Training included only the 20 samples of 10-feature truth tables for which the optimal BDT contained a single branch node, while the test set included truth tables with deeper optimal BDTs, as shown in
The BDT problem is to find a BDT of minimum depth (i.e. cost) that fully reproduces a truth table. The depth of the tree can be defined as the average depth necessary to classify each entry of the truth table. Both the training and test sets can include samples with 1024 features corresponding to the label associated with each value in the 10-input truth table, encoded as zeros and ones. The training set can include 20 samples corresponding to all possible BDTs with only a single branch node.
In an exemplary embodiment, after training, each network can be asked to build full trees on the test set. This can be done by feeding the truth table to the network and taking its output as the first branch node. Going down each of the branches in turn, if all entries on the branch are labelled the same a leaf can be placed at the end with the corresponding label. If more branch nodes are necessary, the truth table can be reformed by taking the half corresponding to its side of the split and copying onto the other half, such that the already split feature is no longer needed to be split. This new truth table can be put through the network again with masking of the output choices that had already been split in order to prevent an infinite tree. This can be done until all branches have terminated in leaves.
In an exemplary embodiment, the test set can include 5000 truth tables corresponding to trees of much greater depth. For each test sample, a random BDT can be generated by allowing each node to branch with probability 0.7 and not allowing branches beyond a depth of 7. The BDT’s truth table can be found and used as the test sample. To serve as a reference for a greedy algorithm, each tree was put through the CART algorithm with Gini impurity as the splitting criterion, using scikit-learn’s Decision Tree Classifier. The test error reported can be the average difference in the tree depth found by neural network compared to the greedy CART algorithm.
In an exemplary embodiment, for the orientation problem and the TSP, GDNs of varying layer widths can be trained, performing a grid search by scaling from 0 to twice the width of each ENN layer. 10 GDNs can be generated for the orientation problem and 5 for the TSP. Then an architecture can be chosen with the best performance on the test sets as this optimally generalizing GDN. The symbolic nature of ENNs can allow them to be translated directly into computer code. This can be demonstrated by translating into pseudocode.
Robust to Input Noise and Adversarial AttacksIn
To measure the separation between data samples and decision boundaries on a less structured problem, individual images can be taken from the MNIST and rectangle test sets and interpolating between them and either an image of a different class or white noise. Along this interpolation the closest point can be found directly on a decision boundary and measured the average pixel difference from the starting image (proportional to the L1 distance).
In an exemplary embodiment, for each sample in the test set correctly predicted by both the ENN and GDN—about 96% of MNIST and 99% of the rectangles-20 target locations can be chosen for interpolation. This target can either be a test image from a different class or white noise (i.e. random black and white pixels) that the networks classified differently than the test image. Interpolating between the sample and the target, the point at which the network changed its predicted class can be found and the average pixel difference can be calculated (which is proportional to the L1 distance to the boundary). The distribution of these distances for each sample are reported.
Both GDNs and ENNs space decision boundaries are at about the same distance when interpolating between images. However, when interpolating between images and white noise, ENN decision boundaries are at a greater distance than those of GDNs, suggesting a more robust placement of decision boundaries.
In an exemplary embodiment, the plots in
Moreover, robust decision boundary arrangement can be important when defending against adversarial attacks.
In an exemplary embodiment, the error rate in classifying the test set can be computed with increasing amounts of noise. For different noise levels, Gaussian noise with a corresponding standard deviation can be added to the test set, and the classification error computed. This can be repeated 20 times for each noise level.
To generate adversarial images, fast gradient sign method (FGSM) can be used, which calculates the sign of the gradient of the loss function L with respect to the inputs x, sign(VxL), and then scales this vector by a small e until the minimum perturbation to cause misclassification is found, emin. For both the GDN and ENN we computed sign(VxL) for each image, with the loss function L for both being the cross-categorical entropy function used to train the GDN. This network-specific perturbation was allowed to scale separately to find emin for each network.
ApplicationsThe disclosed techniques (ENNs) can be used in applications requiring legal and business interpretability, AI safety, autonomous self-driving, background checks, forensic/medical diagnoses. The need has been demonstrated by the existence of adversarial inputs that contain human-imperceptible levels of noise, but which GDN’s spectacularly miss-classify. This need can also be evidenced by the unpredictable behaviors of neural networks given certain edge cases. The disclosed ENN will therefore be necessary in all such applications. ENN’s satisfy this because they are built from a classification framework, and the role of each neuron can be understood on an individual level, making the ENN decisions inherently interpretable and amenable to improvement by the relevant parties, who may not be experts in the technical domain of this technique. Future methods using this technique for applications related to interpretability must, by definition, also reveal the use of this technique during the interpretation verification step.
Because ENN’s are built from a framework consistent with principles of human cognition from cognitive and experimental neuroscience, ENN’s interpretable and rule-finding performance can be a foundation to build human-like artificial intelligence, for example in software or robotics interfaces, as well as to perform future tasks that are only currently accessible to human brains. ENN’s share “physiological” features with biological neural networks, such as modularity and modular failure patterns (rather than all-or-none failure common with GDN’s).
ENNs can learn underlying rules that allow it to generalize on problem types it has not trained on, and allow the decisions made by ENNs to be understood by humans. Such tasks can include finding robust rules in the mapping from genotype to phenotype; learning rules from in vitro experiments that generalize to an in vivo setting; and in automated tasks that require human interpretability, such as tasks involving legal consequences like navigation, identity fraud, or drug dosage protocols.
The processor 1510 can be configured to determine multiple sub concepts within the training sample. This aspect of system 1500 can be similar to previously described step 120 of method 100. The processor 1510 can be configured to process the sub concepts to obtain differential neurons associated with the sub concepts such that the differential neurons provide a relative distinction between the sub concepts. This aspect is similar to previously described step 130 of method 100.
The processor 1510 can be configured to integrate the differential neurons to obtain sub concepts neurons such that the sub concepts neurons provide an absolute distinction of sub concepts. This aspect is similar to previously described step 140 of method 100. The processor 1510 can be configured to integrate the sub concepts neurons to obtain concept neurons that form an output of the neural network. This aspect is similar to previously described step 150 of method 100.
In alternative embodiments, the machine can operate as a standalone device or may be connected (e.g., networked) to other machines. In a networked deployment, the machine may operate in the capacity of either a server or a client machine in server-client network environments, or it may act as a peer machine in peer-to-peer (or distributed) network environments.
Example computer system 1600 includes a processor 1602 (e.g., a central processing unit (CPU), a graphics processing unit (GPU) or both), a main memory 1604 and a static memory 1006, which communicate with each other via an interconnect 1608 (e.g., a link, a bus, etc.). The computer system 1000 may further include a video display unit 1610, an input device 1612 (e.g. keyboard) and a user interface (UI) navigation device 1614 (e.g., a mouse). In one embodiment, the video display unit 1610, input device 1612 and UI navigation device 1614 are a touch screen display. The computer system 1600 may additionally include a storage device 1616 (e.g., a drive unit), a signal generation device 1618 (e.g., a speaker), an output controller 1632, and a network interface device 1620 (which may include or operably communicate with one or more antennas 1630, transceivers, or other wireless communications hardware), and one or more sensors 1628.
The storage device 1616 includes a machine-readable medium 1622 on which is stored one or more sets of data structures and instructions 1624 (e.g., software) embodying or utilized by any one or more of the methodologies or functions described herein. The instructions 1624 may also reside, completely or at least partially, within the main memory 1604, static memory 1606, and/or within the processor 1602 during execution thereof by the computer system 1600, with the main memory 1604, static memory 1606, and the processor 1602 constituting machine-readable media.
While the machine-readable medium 1622 is illustrated in an example embodiment to be a single medium, the term “machine-readable medium” may include a single medium or multiple medium (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more instructions 1624.
The term “machine-readable medium” shall also be taken to include any tangible medium that is capable of storing, encoding or carrying instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present disclosure or that is capable of storing, encoding or carrying data structures utilized by or associated with such instructions.
The term “machine-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories, optical media, and magnetic media. Specific examples of machine-readable media include non-volatile memory, including, by way of example, semiconductor memory devices (e.g., Electrically Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM)) and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.
The instructions 1624 may further be transmitted or received over a communications network 1626 using a transmission medium via the network interface device 1620 utilizing any one of several well-known transfer protocols (e.g., HTTP). Examples of communication networks include a local area network (LAN), wide area network (WAN), the Internet, mobile telephone networks, Plain Old Telephone (POTS) networks, and wireless data networks (e.g., Wi-Fi, 3G, and 4G LTE/LTE-A or WiMAX networks).
The term “transmission medium” shall be taken to include any intangible medium that can store, encoding, or carrying instructions for execution by the machine, and includes digital or analog communications signals or other intangible medium to facilitate communication of such software.
Other applicable network configurations may be included within the scope of the presently described communication networks. Although examples were provided with reference to a local area wireless network configuration and a wide area Internet network connection, it will be understood that communications may also be facilitated using any number of personal area networks, LANs, and WANs, using any combination of wired or wireless transmission mediums.
The embodiments described above may be implemented in one or a combination of hardware, firmware, and software. For example, the features in the system architecture 1600 of the processing system may be client-operated software or be embodied on a server running an operating system with software running thereon.
While some embodiments described herein illustrate only a single machine or device, the terms “system”, “machine”, or “device” shall also be taken to include any collection of machines or devices that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
Examples, as described herein, may include, or may operate on, logic or several components, modules, features, or mechanisms. Such items are tangible entities (e.g., hardware) capable of performing specified operations and may be configured or arranged in a certain manner. In an example, circuits may be arranged (e.g., internally or with respect to external entities such as other circuits) in a specified manner as a module, component, or feature. In an example, the whole or part of one or more computer systems (e.g., a standalone, client or server computer system) or one or more hardware processors may be configured by firmware or software (e.g., instructions, an application portion, or an application) as an item that operates to perform specified operations. In an example, the software may reside on a machine readable medium. In an example, the software, when executed by underlying hardware, causes the hardware to perform the specified operations.
Accordingly, such modules, components, and features are understood to encompass a tangible entity, be that an entity that is physically constructed, specifically configured (e.g., hardwired), or temporarily (e.g., transitorily) configured (e.g., programmed) to operate in a specified manner or to perform part or all operations described herein. Considering examples in which modules, components, and features are temporarily configured, each of the items need not be instantiated at any one moment in time. For example, where the modules, components, and features comprise a general-purpose hardware processor configured using software, the general-purpose hardware processor may be configured as respective different items at different times. Software may accordingly configure a hardware processor, for example, to constitute a particular item at one instance of time and to constitute a different item at a different instance of time.
Additional examples of the presently described method, system, and device embodiments are suggested according to the structures and techniques described herein. Other non-limiting examples may be configured to operate separately or can be combined in any permutation or combination with any one or more of the other examples provided above or throughout the present disclosure.
It will be appreciated by those skilled in the art that the present disclosure can be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The presently disclosed embodiments are therefore considered in all respects to be illustrative and not restricted. The scope of the disclosure is indicated by the appended claims rather than the foregoing description and all changes that come within the meaning and range and equivalence thereof are intended to be embraced therein.
Claims
1. A computer-implemented method for training an artificial neural network, the method comprising:
- obtaining a training sample for training the artificial neural network;
- determining multiple sub concepts within the training sample;
- processing the sub concepts to obtain differential neurons associated with the sub concepts, wherein the differential neurons provide a relative distinction between the sub concepts;
- integrating the differential neurons to obtain sub concepts neurons, wherein the sub concepts neurons provide an absolute distinction of sub concepts; and
- integrating the sub concepts neurons to obtain concept neurons that form an output of the neural network.
2. The method of claim 1, wherein the training sample is designed to teach one or more rules.
3. The method of claim 1, wherein the determining of the sub concepts within the training sample includes obtaining various subsets of the training sample and distinguishing between the various subsets.
4. The method of claim 1, wherein unsupervised learning is used to determine hierarchical structure of the sub concepts.
5. The method of claim 1, wherein the sub concepts are overlapping or hierarchically structured.
6. The method of claim 1, wherein one or more of the differential neurons are pruned before the integrating of the differential neurons to obtain sub concepts neurons.
7. The method of claim 1, wherein neurons of the artificial neural network are deliberative, temporarily changing their parameters.
8. The method of claim 1, comprising:
- tuning the artificial neural network after the training to improve its performance.
9. The method of claim 1, wherein neurons of the artificial neural network provide symbolic outputs that are interpretable as algorithms.
10. A system for training a neural network, the system comprising a processor and an associated memory, the processor being configured to:
- obtain a training sample for training the artificial neural network;
- determine multiple sub concepts within the training sample;
- process the sub concepts to obtain differential neurons associated with the sub concepts, wherein the differential neurons provide a relative distinction between the sub concepts;
- integrate the differential neurons to obtain sub concepts neurons, wherein the sub concepts neurons provide an absolute distinction of sub concepts; and
- integrate the sub concepts neurons to obtain concept neurons that form an output of the neural network.
11. The system of claim 10, wherein the training sample is designed to teach one or more rules.
12. The system of claim 10, wherein to determine of the sub concepts within the training sample, the processor is configured to obtain various subsets of the training sample and distinguish between the various subsets.
13. The system of claim 10, wherein unsupervised learning is used to determine hierarchical structure of the sub concepts.
14. The system of claim 10, wherein the sub concepts are overlapping or hierarchically structured.
15. The system of claim 10, wherein one or more of the differential neurons are pruned before the integrating of the differential neurons to obtain sub concepts neurons.
16. The system of claim 10, wherein neurons of the artificial neural network are deliberative, temporarily changing their parameters.
17. The system of claim 10, wherein the processor is configured to tune the artificial neural network after the training to improve its performance.
18. The system of claim 10, wherein neurons of the artificial neural network provide symbolic outputs that are interpretable as algorithms.
Type: Application
Filed: Feb 24, 2021
Publication Date: Aug 3, 2023
Applicant: THE BOARD OF REGENTS OF THE UNIVERSITY OF TEXAS SYSTEM (Austin, TX)
Inventors: Milo M. LIN (Dallas, TX), Paul J. BLAZEK (Irving, TX)
Application Number: 17/801,175