GENERATIVE AUGMENTATION OF IMAGE DATA
Systems and methods to receive one or more first images associated with a training set of images to train a machine learning model; provide the one or more first images as a first input to a first set of layers of computational units, wherein the first set of layers utilizes image filters; provide a first output of the first set of layers of computational units as a second input to a second layer of the computational units, wherein the second layer utilizes random parameter sets for computations; obtain distortion parameters from the second layer of the computational units; generate one or more second images comprising a representation of the one or more first images modified with the distortion parameters; obtain, as a third output, the one or more second images; and add the one or more second images to the training set of images to train the machine learning model.
The present application is a continuation of application Ser. No. 15/938,897 filed on Mar. 28, 2018 and claims the benefit of priority under 35 U.S.C. § 119 to Russian Patent Application No. 2018110382 filed Mar. 23, 2018, the disclosure of which is incorporated herein by reference in its entirety for all purposes.
TECHNICAL FIELDThe present disclosure is generally related to image processing, and is more specifically related to systems and methods for generating random distortions and augmenting image data, including, for use in machine learning models training.
BACKGROUNDMachine learning enables computer systems to learn to perform tasks from observational data. Machine learning algorithms may enable the computer systems to learn without being explicitly programmed. Machine learning approaches may include, but not limited to, neural networks, decision tree learning, deep learning, etc. A machine learning model, such as a neural network, may be used in solutions related to image recognition, including optical character recognition. The observational data in the case of image recognition may be plurality of images. A neural network may thus be provided with training sets of images from which the neural network can learn image recognition.
SUMMARY OF THE DISCLOSUREIn accordance with one or more aspects of the present disclosure, an example method for generating image augmentation may comprise: receiving, by a processing device, one or more first images associated with a training set of images to train a machine learning model in training; providing, by the processing device, the one or more first images as a first input to a first set of layers of computational units, wherein the first set of layers utilizes image filters; providing a first output of the first set of layers of computational units as a second input to a second layer of the computational units, wherein the second layer utilizes random parameter sets for computations; obtaining distortion parameters from the second layer of the computational units; generating one or more second images based on the one or more first images and the distortion parameters; obtaining, as a third output, the one or more second images; and adding the one or more second images to the training set of images to train the machine learning model.
In accordance with one or more aspects of the present disclosure, an example system for generating image augmentation may comprise: a memory; and a processor, coupled to the memory, the processor to: receive one or more first images associated with a training set of images to train a machine learning model in training; provide the one or more first images as a first input to a first set of layers of computational units, wherein the first set of layers utilizes image filters; provide a first output of the first set of layers of computational units as a second input to a second layer of the computational units, wherein the second layer utilizes random parameter sets for computations; obtain distortion parameters from the second layer of the computational units; provide a second output of the second layer of the computational units as a third input to a third set of layers of the computational units; provide a third output of the third set of layers of the computational units as one or more second images, the third output being based on the one or more first images and the distortion parameters; and add the one or more second images to the training set of images to train the machine learning model.
In accordance with one or more aspects of the present disclosure, an example computer-readable non-transitory storage medium may comprise executable instructions that, when executed by a processing device, cause the processing device to: receive one or more first images associated with a training set of images to train a machine learning model in training; provide the one or more first images as a first input to a first set of layers of computational units, wherein the first set of layers utilizes image filters; provide a first output of the first set of layers of computational units as a second input to a second layer of the computational units, wherein the second layer utilizes random parameter sets for computations; obtain distortion parameters from the second layer of the computational units; generate one or more second images comprising a representation of the one or more first images modified with the distortion parameters; obtain, as a third output, the one or more second images; and add the one or more second images to the training set of images to train the machine learning model.
The present disclosure is illustrated by way of examples, and not by way of limitation, and may be more fully understood with references to the following detailed description when considered in connection with the figures, in which:
Described herein are methods and systems for generative augmentation of image data for training set of images for use in a machine learning model.
“Computer system” herein shall refer to a data processing device having a general purpose processor, a memory, and at least one communication interface. Examples of computer systems that may employ the methods described herein include, without limitation, desktop computers, notebook computers, tablet computers, and smart phones.
Machine learning models may be used to perform image recognition, including optical character recognition (OCR), pattern recognition, photo recognition, facial recognition, etc. As an example, a neural network may be used as a machine learning model for image recognition. A machine learning model may be provided with sample images as training sets of images which the machine learning model can learn from. The larger and varied the training sample, the better it is possible to train a machine learning model. However, providing a machine learning model with varied and adequate number of sample training images is often a difficult task due to the limited availability of such images. Additionally, creating initial sets of training samples is an exorbitant task because during the initial stage, there may only be a small number of sample images available, which may not be enough to train the model.
An effort to produce a larger number of training samples may be made using augmentation techniques that can create artificial sample images from an original, existing image. The techniques may involve creating alterations or distortions to an original image to produce a slightly different version of the original image. In this manner, multiple training samples with distortions may be produced from each original image. Traditional systems involve applying numerous tasks and/or sequential actions to each original image for augmenting it for obtaining one training sample of one distortion type. This process is resource intensive and error prone. Using traditional methods, a restricted and limited number of augmented images can be obtained (e.g., one augmented image for one original image). Also, in conventional systems, the techniques may involve applying a particular type of distortion on each available original image. The particular type of distortion may be based on specified rules or restrictions. Thus, for example, when augmenting several original images with similar characteristics, such as, images with different Chinese-Japanese-Korean (“CJK”) symbols, the distortions applied to each of the original images may result in similar distortions and as a result cause inaccuracies and difficulties in terms of learning the model. Traditional methods do not provide for random distortions that are close to naturally distorted images and without restriction to the number of synthetically (e.g., artificially) distorted images obtained from a single original image. When images are not close to naturally distorted images, the machine learning using the images may lead to inaccuracies and ineffective learning for recognizing other naturally distorted images. Thus, traditional techniques may lead to inefficiency, inaccuracy, error proneness, slowness and labor intensity.
The systems and methods herein provide improvements to an image augmentation system. Image augmentation adds value to base image data by appending additional information, thereby increasing the size of an image data set. An autoencoder (AE) may be used to augment (e.g., increase, expand, etc.) image data. An autoencoder is a direct distribution (e.g., learns to mimic the data distribution of the input data) machine learning model (e.g., neural network) that restores an input signal to the output. The neural network may include an input layer, an output layer, and one or more hidden layers, with the output layer reconstructing the inputs. Autoencoders are designed such that they cannot precisely copy the input at the output layer. The input signal in the autoencoder is reconstructed with some errors and the neural network minimizes the errors by learning to select the most important characters. One type of autoencoder includes variational autoencoders (“VAE”), which may be used for learning latent representations (e.g., latent variables, which are inferred through mathematical models from observed variables). In some examples, autoencoders may be used for learning generative models of data. A generative model is a model for generating all values for a phenomenon, both those that can be observed in the world and target variables that can only be computed from those observed. Some augmentation systems may utilize convolutional neural networks. Convolution may refer to a process of adding each element of an image to its local neighbors, being processed (e.g., multiplied) by specified numbers. Convolutional neural networks (“CNN”) may consist of layers of computational units to hierarchically process visual data, and may feed forward the results of one layer to another layer, extracting a certain feature from input images. Each of the layers may be referred to as a convolutional layer or convolution layer.
The systems and methods described herein represent significant improvements to image augmentation systems by imposing random distortions to images. The systems described herein use randomized parameter sets for image augmentation. The technology provides for effectively creating useful synthetic training dataset for a machine learning model. The systems herein provide for using a single original image and superimpose nearly natural random distortions on the original image every time when the original image passes through the improved augmentation system and do so without any restriction on the number of times the original image may be augmented. It also provides for regulation of the roughness or coarseness of distortions applied on the original image.
In one implementation, the systems and methods herein provide for an improved layer of a convolutional neural network. The improved layer may be a new layer that utilizes random parameter sets. The improved layer may be referred to as the “random convolutional layer,” “random layer,” “variational convolutional layer,” and/or “variational layer” throughout the disclosure. In an example, one or more images may be received by an input layer of a convolutional neural network (CNN). The input layer may feed forward the one or more images to another set of layers of a CNN, which may be two dimensional layers with different number of image filters and channels. The set of layers may include iterative filtering of the one or more images, passing the images from one layer to the next layer within the set of layers. The filtered images may be fed to a random convolutional layer of the CNN. The random layer may utilize random parameter sets for computations. The random layer may include matrices with learnable parameters, such as a matrix of mean values, a matrix of standard deviation values, a matrix of displacement values, and an “epsilon” matrix with non-learned parameters. The matrix of mean values may be initialized with random values. The epsilon matrix may be based on a normal distribution value and an arbitrary standard deviation value each time the computation for the layer is performed. The matrices may be used to generate a randomized kernel matrix for the random layer. Randomized distortion parameters may be obtained from the random layer. The output of the random layer may be fed to a deconvolution layer of the CNN, where the input images may be restored and superimposed with random distortions. One or more images with the superimposed random distortions may be obtained as an output of the deconvolution layer. The one or more images with the superimposed random distortions may be added to a training set of images to train a machine learning model. The machine learning model may be a support vector machine, a neural network, etc. Once trained, the machine learning model can be used to automatically recognize new images.
As described herein, random augmentation provides for random distortions at the output of a neural network every time the same image passes through augmentation system. Using the methods described herein, an unlimited number of randomly distorted training images can be derived from a single image. The random distortions may be imposed using random rules and/or random values. Each random distortion may be most closely approximated to real (e.g., natural) distortions. The random convolutional layer may be built into another neural network, an autoencoder, a variational autoencoder (“VAE”), etc. In an example, the random convolutional layer may be embedded within an AE (although it is not necessary to restrict it to an AE). The synthetic (e.g., simulated), augmented image dataset derived by the systems and methods described herein allows for inclusion of a vast number of different types of images in a training set of images, improving the quality, accuracy, and usefulness of training of a neural network. The image processing effectively improves image recognition quality. The image recognition quality produced by the systems and methods of the present disclosure allows significant improvement in the optical character recognition (OCR) accuracy over various common methods. Additionally, the random convolution layer can provide for better augmentation for large images, high resolution images, rare images, images containing hieroglyphs, CJK symbols, Arabic strings, or other complex symbols. However, the disclosure is not limited to these types of images, rather, is inclusive of any types of images.
Various aspects of the above referenced methods and systems are described in details herein below by way of examples, rather than by way of limitation.
An image 140 may be used as an input image that is to be augmented. In one example, image 140 may be a digital image depicting a document 141. In another example, image 140 may be included within document 141. Document 141 may be a printed document, an electronic document, etc. Image 140 may include an item 142 representing, for example, a symbol, a face, a pattern, a large image, a high resolution image, a rare image, or any other type of image. The image 140 may include or be part of a document with one or more sentences each having one or more words that each has one or more characters. The one or more characters may include, but not be limited to, hieroglyphs, CJK symbols, Arabic strings, or other complex symbols.
The image 140 may be received in any suitable manner. For example, a digital copy of the image 140 may be received by scanning the document 141 or photographing the document 141. Additionally, in some instances a client device connected to a server via the network 130 may upload a digital copy of the image 140 to the server. In some instances, for a client device connected to a server via the network 130, the client device may download the image 140 from the server. The image 140 may depict a document or one or more of its parts. In an example, image 140 may depict document 141 in its entirety. In another example, image 140 may depict a portion of document 141. In yet another example, image 140 may depict multiple portions of document 141. Image 140 may include multiple images. Image 140 may comprise multiple items 142, multiple documents 141, etc. The image 140 may be used to produce additional images for training a set of machine learning models.
Server machine 150 may be a rackmount server, a router computer, a personal computer, a portable digital assistant, a mobile phone, a laptop computer, a tablet computer, a camera, a video camera, a netbook, a desktop computer, a media center, or any combination of the above. The server machine 150 may include a random augmentation engine 151. The set of machine learning models 114 may be trained using training images 116 that have been generated using the random augmentation engine 151. The random augmentation engine 151 may generate multiple training images 116 from a single image (e.g., image 140) and provide images 116 to train the set of machine learning models 114. The set of machine learning models 114 may be composed of, e.g., a single level of linear or non-linear operations (e.g., a support vector machine [SVM]) or may be a deep network, i.e., a machine learning model that is composed of multiple levels of non-linear operations. Examples of deep networks are neural networks including convolutional neural networks, recurrent neural networks with one or more hidden layers, and fully connected neural networks. For example, a neural network for OCR may be trained using the augmented dataset (e.g., images 116) produced by the random augmentation engine 151.
The set of machine learning models 114 may be trained using training data to be able to recognize contents of various images. Once the set of machine learning models 114 are trained, the set of machine learning models 114 can be provided to image recognition engine 112 for analysis of new images.
The repository 120 may be a persistent storage that is capable of storing image 140, item 142, and/or document 141, training images 116, as well as various data structures used by various components of system 100. Repository 120 may be hosted by one or more storage devices, such as main memory, magnetic or optical storage based disks, tapes or hard drives, NAS, SAN, and so forth. Although depicted as separate from the computing device 110 and server machine 150, in an implementation, the repository 120 may be part of the computing device 110 or server machine 150. In some implementations, repository 120 may be a network-attached file server, while in other embodiments content repository 120 may be some other type of persistent storage such as an object-oriented database, a relational database, and so forth, that may be hosted by a server machine or one or more different machines coupled to the via the network 130.
The computing device 110 may be used for performing image recognition. The computing device 110 may include an image recognition engine 112. Image recognition may include, but not be limited to character recognition, optical character recognition, pattern recognition, photo recognition, facial recognition, etc. The computing device 110 may be a desktop computer, a laptop computer, a smartphone, a tablet computer, a server, a scanner, or any suitable computing device capable of performing the techniques described herein.
The image recognition engine 112 may include instructions stored on one or more tangible, machine-readable media of the computing device 110 and executable by one or more processing devices of the computing device 110. In an implementation, the image recognition engine 112 may use a set of trained machine learning models 114 that are trained to recognize various images. The set of machine learning models 114 may be trained using a set of images 116. In some instances, the set of trained machine learning models 114 may be part of the image recognition engine 112 or may be accessed on another machine (e.g., server machine 150) by the image recognition engine 112. Based on the output of the set of trained machine learning models 114, the image recognition engine 112 may recognize objects in various images, such as content of documents including one or more words, sentences, logos, patterns, faces, etc.
In one implementation, random augmentation system 300 may include input layer 220. Input layer 220 may be used to receive one or more images. The received one or more images may have arbitrary dimensions, such as, arbitrary height values, arbitrary width values, etc. The image may be of any size. The one or more images may include any type of an image. The one or more images may include, but not be limited to, an image of various sizes (e.g., small, medium, large), different resolution (e.g., high, low resolution), rare images, images containing symbols (e.g., hieroglyphs, CJK symbols, Arabic symbols), etc. An input layer may be used to pass on input values to the next layer of the CNN. In that regard, an input layer may receive an image signal in a particular format and pass on the same values as an output of the input layer to further layers. For example, the computational unit of input layer 220 may accept as input 222 one or more parameters, such as, number of images in a batch of images, number of channels, image height, image width, etc. Different images may be of different dimensions, without restriction. If multiple images are provided as input, each of the multiple images may be processed one at a time. In an implementation, input layer 220 may be designed to accept input 222 in the format “(number of images in a batch, number of channels, image height, image width).” An example of values for input 222 that are provided to input layer 220 is “(None, 1, None, None),” as depicted in
In an implementation, random augmentation system 300 may include various convolutional layers. The convolutional layers may be logically grouped. Convolutional layers may be used to perform filtering of an input image or representations of an input image in the increment of fragments of the input image. A fragment may be obtained by dividing an input image into a plurality of portions. Image filters used in the convolution layers may be represented as matrices or arrays of numbers. The numbers in the arrays (or matrices) may be referred to as weights or parameter sets. Applying a filter to a fragment of an input image may include calculating a dot product of the weights of a matrix and the pixel values of the fragment being processed. Each fragment may include multiple pixel values. Thus, the dot product calculation may be an element wise multiplication, that is, each pixel value of the fragment may be multiplied by the matrix weights. A “kernel” matrix (e.g., a small matrix) may be used to sum the results of the dot products for each fragment. A kernel matrix may be the core of the convolution performed by each of the convolution layers.
In one implementation, random augmentation system 300 may include a first set of layers of computational units as a first set of convolutional layers 231, 232, 233. Each of the first set of convolutional layers may be two dimensional computational units. For example, the two dimensions may include height and width. The first set of convolutional layers may use particular image filters and channels. An image filter may remove components or features of an image signal. The first set of convolutional layers may be used to process a representation of an image. A representation of the image may be a processed version of an original image.
In one implementation, random augmentation system 300 may include a second layer of computational units as a second convolutional layer 240. The second convolution layer may also be known as a random convolutional layer or a variational convolutional layer. The random convolutional layer may be two dimensional computational unit. The random convolutional layer may further filter images using randomized parameters and produce random distortion parameters to modify the image. The distortion parameters obtained from the random layer may be superimposed on a representation of the image.
In one implementation, random augmentation system 300 may include a third set of layers of computational units as a third set of convolutional layers 251, 252, 253. The third set of convolution layers may also be known as transposed convolutional layer or deconvolution layer. Each of the deconvolution layers may be two dimensional computational units. The deconvolution layers may use particular channels and sets of image filters to restore a representation of the image superimposed with distortion parameters.
Calculation of the randomized kernel matrix of the random convolutional layer may involve selecting a sampling standard deviation value. For example, a parameter of the layer 400, sampling_std 460, may be used as the sampling standard deviation for random convolutional layer 400. The value of the sampling_std 460 may be selected arbitrarily, depending on the problem being solved. For example, values of the sampling_std 460 may include, but not be limited to, 0.1, 0.2, 0.3, 0.4, or other numeric values. In an implementation, augmenting an image using an autoencoder or other NN type, the sampling_std 460 parameter may specify “roughness” of distortions applied to the representation of the input image. The larger the selected value of the sampling_std 460, the coarser (e.g., grainier) the applied image distortion maybe.
In an implementation, random convolutional layer 400 may include a kernel matrix 410, which may represent a randomized kernel matrix. The kernel matrix may be generated based on one or more matrices. For example, random convolutional layer 400 may be generated based on four matrices: 1) a kernel_mean matrix 420, 2) a kernel_stddev matrix 430, 3) a bias matrix 440, and 4) an epsilon matrix 450. The first three matrices 420, 430, and 440 may comprise of learnable weights or parameter sets. Kernel_mean matrix 420, kernel_stddev matrix 430, and epsilon matrix 450 may each be four dimensional matrices with parameters filter height, filter width, filters_in, and filters_out. Parameters filter height and filter width may represent the size of a filter. In an example, filter height may have a value of “3” (e.g., 3 pixels) and filter width may have a value of “3.” The parameter filters_in may be the number of channels at the input of the matrix. The parameter filters_out may be the number of filters to apply to the input tensor. In an example, filters_in and filters_out may each have a value of “32” or a different value. The value of filters_height and/or filters_out may vary, according to the problem being solved.
In an example, kernel_mean matrix 420 may be a matrix of mean values. The kernel_mean matrix 420 may be initialized with random values. The kernel_mean matrix 420 may have same or similar shape as the kernel matrix of the first set of convolutional layers. The deviation between kernel matrix 410 and kernel_mean matrix 420 may be affected by the value of the selected sampling_std 460, as sampling_std 460 is used to calculate the standard deviation applied to the kernel_mean matrix 420 to derive kernel matrix 410. Kernel_mean matrix 420, kernel_stddev matrix 430, and epsilon matrix 450 may each have a shape similar to the kernel matrix of the first set of convolutional layers.
In an example, kernel_stddev matrix 430 may be a matrix of standard deviation values. The kernel stddev matrix 430 may be initialized with zero values. The kernel_stddev matrix 430 may have same or similar shape as the kernel matrix of the first set of convolutional layers.
In an example, bias matrix 440 may be a matrix of displacement values. The bias matrix 440 may be generated based on a number of filters to apply to the input of the random layer. The bias matrix 440 may have same or similar role as the bias matrix of the first set of convolutional layers (e.g., applying pixel offset to the convolution results). The bias matrix 440 dimensions may be specified by the parameter filters_out. As described above, the parameter filters_out may be the number of filters to apply to the input of the matrix.
In an example, epsilon matrix 450 may be a matrix that is based on an arbitrary standard deviation value and a normal distribution value. The epsilon matrix may be a non-leaned matrix. The epsilon matrix 450 may be initialized with random values. The epsilon matrix may be generated anew with each pass of an input through the random layer. For example, a random number generator may be used to derive the epsilon matrix. Epsilon matrix 450 may be generated from a normal distribution with a mean value of zero and a standard deviation value of sampling_std 460. The shape of the epsilon matrix 450 may coincide with shape of the kernel mean matrix 420.
The randomized kernel matrix of the random convolution layer may be generated based on the one or more matrices using a specified formula. For example, kernel matrix 410 may be calculated using the formula:
Kernel 410=Kernel_mean 420+exp (Kernel_stddev/2)×epsilon 450.
Thus, the weights or parameter sets generated for the kernel matrix 410 may be random each time and generated from the specific parameters of the normal distribution obtained in the learning process of the CNN. According to the formula, the randomized kernel matrix 410 may be a normal vector with a mean value of “kernel_mean 420” and the standard deviation value of “exp (Kernel_stddev/2).” Computation for the randomized kernel matrix may include arithmetic operations that are performed on a pixel by pixel basis. Determination of the distortion parameters may be performed for each portion (e.g., fragment) of the input image rather than determining an overall distortion parameter for the entirety of the input image as a whole. Each portion of the image may be processed separately within the CNN, resulting in local transformation of the input image for each portion. The convolution performed with generated randomized kernel matrix applied to the image data on the input of the random layer may represent random image distortion parameters superimposed to a representation of the input image.
Convolution operation 470 for the random convolution layer may be performed using the randomized kernel matrix 410, and obtained data 480 may be fed forward to deconvolutional layers 251-253 as output 490 of the convolution operation 470. Therefore, it may be possible to generate one or more randomly distorted images from a single input image each time a representation of the input image is passed through the random convolution layer after, as an example, AE which comprises this layer is learned and augment the image data set for the training set of images for a machine learning model.
At block 510, the computer system implementing the method 500 may obtain one or more first images used to train a set of machine learning models. The one or more first images may be used as an input image for which augmentation may be generated using method 500. Optionally, although not necessary, method 500 may pre-process the input image, including performing zooming, Sobel (e.g., Sobel-Feldman operation), Canny (e.g., Canny edge detection), morphological, and other image processing operations. For example,
At block 520, the computer system implementing method 500 may receive the one or more first images associated with a training set of images to train a machine learning model in training. The machine learning model may comprise a neural network, such as, a convolutional neural network (CNN). As illustrated in
At block 530, the computer system may provide the one or more first images as a first input to a first set of layers of computational units. The first set of layers may utilize image filters as described in reference to
At block 540, the computer system may provide a first output of the first set of layers of computational units as a second input to a second layer of the computational units. The second layer may utilize random parameter sets for computations within the layer. Additionally, the computer system may generate a randomized kernel matrix for the second layer of the computational units based on one or more matrices. The one or more matrices may include one or more of: 1) a first matrix of mean values, the first matrix initialized with random values; 2) a second matrix of standard deviation values, the second matrix initialized with zero values; 3) a third matrix of displacement values, the third matrix based on a number of filters to apply to the second input; or 4) a fourth matrix (e.g., epsilon matrix) that is based on an arbitrary standard deviation value and a normal distribution value. In an example, the arbitrary standard deviation value specifies the roughness of the image distortions. Parameters for the one or more matrices may include at least one of filter height, filter width, image height, image width, size of filter, number of channels, number of filters, or number of images. The image height and image width each may include arbitrary values. In one example, the first matrix, the second matrix, and the third matrix each may include learnable parameters.
As illustrated in
At block 550, the computer system may obtain distortion parameters from the second layer of the computational units. The distortion parameters obtained from the random layer may be superimposed on a representation of the input image 610 of
At block 560, the computer system may provide a second output of the second layer of the computational units as a third input to a third set of layers of the computational units. In an example, the second layer may represent a deconvolution layer. As illustrated in
At block 570, the computer system may provide a third output of the third set of layers of the computational units as one or more second images, the third output being based on the one or more first images and the distortion parameters. As illustrated in
At block 580, the computer system may add the one or more second images to the training set of images to train the machine learning model. For example, one or more second images 620 (as depicted in
In other implementations, 801 may be a neural network, a convolutional neural network, a variational autoencoder, etc.
Exemplary computer system 1100 includes a processor 1102, a main memory 1104 (e.g., read-only memory (ROM) or dynamic random access memory (DRAM)), and a data storage device 1118, which communicate with each other via a bus 1130.
Processor 1102 may be represented by one or more general-purpose processing devices such as a microprocessor, central processing unit, or the like. More particularly, processor 1102 may be a complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor, or a processor implementing other instruction sets or processors implementing a combination of instruction sets. Processor 1102 may also be one or more special-purpose processing devices such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), network processor, or the like. Processor 1102 is configured to execute instructions 1126 for performing the operations and functions of method 500 for generating image augmentation, as described herein above.
Computer system 1100 may further include a network interface device 1122, a video display unit 1110, a character input device 1112 (e.g., a keyboard), and a touch screen input device 1114.
Data storage device 1118 may include a computer-readable storage medium 1124 on which is stored one or more sets of instructions 1126 embodying any one or more of the methods or functions described herein. Instructions 1126 may also reside, completely or at least partially, within main memory 1104 and/or within processor 1102 during execution thereof by computer system 1100, main memory 1104 and processor 1102 also constituting computer-readable storage media. Instructions 1126 may further be transmitted or received over network 1116 via network interface device 1122.
In certain implementations, instructions 1126 may include instructions of method 500 for generating image augmentation, as described herein above. While computer-readable storage medium 1124 is shown in the example of
The methods, components, and features described herein may be implemented by discrete hardware components or may be integrated in the functionality of other hardware components such as ASICS, FPGAs, DSPs or similar devices. In addition, the methods, components, and features may be implemented by firmware modules or functional circuitry within hardware devices. Further, the methods, components, and features may be implemented in any combination of hardware devices and software components, or only in software.
In the foregoing description, numerous details are set forth. It will be apparent, however, to one of ordinary skill in the art having the benefit of this disclosure, that the present disclosure may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present disclosure.
Some portions of the detailed description have been presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of operations leading to a desired result. The operations are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as “determining,” “computing,” “calculating,” “obtaining,” “identifying,” “modifying,” “generating” or the like, refer to the actions and processes of a computer system, or similar electronic computer system, that manipulates and transforms data represented as physical (e.g., electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
The present disclosure also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions.
It is to be understood that the above description is intended to be illustrative, and not restrictive. Various other implementations will be apparent to those of skill in the art upon reading and understanding the above description. The scope of the disclosure should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.
Claims
1. A method comprising:
- providing a representation of a first image associated with a machine learning model to a first layer of a set of computational units, wherein the first layer utilizes random parameter sets for computations;
- obtaining distortion parameters from the first layer of the set of computational units; and
- generating a second image based on the first image and the distortion parameters, the second image generated to be used in the machine learning model.
2. The method of claim 1, further comprising:
- providing an output of the first layer of the set of computational units as an input to a second layer of the set of computational units.
3. The method of claim 1, wherein the machine learning model comprises a convolutional neural network.
4. The method of claim 1, wherein the representation of the first image is obtained by:
- receiving the first image, wherein the first image is associated with a training set of images to train the machine learning model;
- dividing the first image into a plurality of portions;
- providing each of the plurality of portions to a third layer of the set of computational units; and
- obtaining an output of the third layer as the representation of the first image.
5. The method of claim 4, wherein obtaining the distortion parameters comprises:
- obtaining a distortion parameter for each of the plurality of portions of the first image.
6. The method of claim 1, further comprising:
- generating a randomized kernel matrix for the first layer of the set of computational units based on one or more matrices.
7. The method of claim 6, wherein the one or more matrices comprise one or more of:
- a first matrix of mean values, the first matrix initialized with random values;
- a second matrix of standard deviation values, the second matrix initialized with zero values;
- a third matrix of displacement values, the third matrix based on a number of filters to apply to the representation of the first image; or
- a fourth matrix that is based on an arbitrary standard deviation value and a normal distribution value.
8. The method of claim 7, wherein the arbitrary standard deviation value specifies a roughness of image distortions.
9. The method of claim 6, wherein parameters for the one or more matrices comprise at least one of filter height, filter width, image height, image width, size of filter, number of channels, number of filters, or number of images.
10. The method of claim 9, wherein the image height and image width each comprises arbitrary values.
11. The method of claim 1, wherein the first image comprises one or more of:
- one or more hieroglyphs;
- one or more Chinese-Japanese-Korean (CJK) symbols;
- one or more Arabic strings; or
- a combination of one or more other symbols.
12. The method of claim 1, wherein generating the second image comprises:
- generating the second image corresponding to naturally distorted images.
13. The method of claim 1, further comprising: adding the second image to a training set of images to train the machine learning model.
14. The method of claim 7, wherein the first matrix, the second matrix, and the third matrix each comprises learnable parameters.
15. A system comprising:
- a memory; and
- a processor, coupled to the memory, the processor to: provide a representation of a first image associated with a machine learning model to a first layer of a set of computational units, wherein the first layer utilizes random parameter sets for computations; obtain distortion parameters from the first layer of the set of computational units; and generate a second image based on the first image and the distortion parameters, the second image generated to be used in the machine learning model
16. The system of claim 15, wherein the processor is further to:
- generate a randomized kernel matrix for the first layer of the set of computational units based on one or more matrices.
17. The system of claim 16, wherein the one or more matrices comprise one or more of:
- a first matrix of mean values, the first matrix initialized with random values;
- a second matrix of standard deviation values, the second matrix initialized with zero values;
- a third matrix of displacement values, the third matrix based on a number of filters to apply to the representation of the first image; or
- a fourth matrix that is based on an arbitrary standard deviation value and a normal distribution value.
18. The system of claim 17, wherein the arbitrary standard deviation value specifies roughness of image distortions.
19. A computer-readable non-transitory storage medium comprising executable instructions that, when executed by a processing device, cause the processing device to:
- provide a representation of a first image associated with a machine learning model to a first layer of a set of computational units, wherein the first layer utilizes random parameter sets for computations;
- obtain distortion parameters from the first layer of the set of computational units; and
- generate a second image comprising the representation of the first image modified with the distortion parameters, the second image generated to be used in the machine learning model.
20. The computer-readable non-transitory storage medium of claim 19, wherein the processing device is further to:
- generate a randomized kernel matrix for the first layer of the set of computational units based on one or more matrices.
Type: Application
Filed: Jun 1, 2020
Publication Date: Oct 22, 2020
Inventors: Konstantin Zuev (Moscow), Andrejs Sautins (Moscow)
Application Number: 16/889,619