Patents by Inventor Amol Ashok Ambardekar

Amol Ashok Ambardekar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Power-efficient deep neural network module configured for executing a layer descriptor list

Patent number: 11100391

Abstract: A deep neural network (DNN) processor is configured to execute descriptors in layer descriptor lists. The descriptors define instructions for performing a pass of a DNN by the DNN processor. Several types of descriptors can be utilized: memory-to-memory move (M2M) descriptors; operation descriptors; host communication descriptors; configuration descriptors; branch descriptors; and synchronization descriptors. A DMA engine uses M2M descriptors to perform multi-dimensional strided DMA operations. Operation descriptors define the type of operation to be performed by neurons in the DNN processor and the activation function to be used by the neurons. M2M descriptors are buffered separately from operation descriptors and can be executed at soon as possible, subject to explicitly set dependencies. As a result, latency can be reduced and, consequently, the neurons can complete their processing faster. The DNN module can then be powered down earlier than it otherwise would have, thereby saving power.

Type: Grant

Filed: April 11, 2018

Date of Patent: August 24, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Amol Ashok Ambardekar, Kent D. Cedola, Larry Marvin Wall, Boris Bobrov, George Petre, Chad Balling McBride
FLEXIBLE HARDWARE FOR HIGH THROUGHPUT VECTOR DEQUANTIZATION WITH DYNAMIC VECTOR LENGTH AND CODEBOOK SIZE

Publication number: 20210232904

Abstract: The performance of a neural network (NN) and/or deep neural network (DNN) can limited by the number of operations being performed as well as memory data management of a NN/DNN. Using vector quantization of neuron weight values, the processing of data by neurons can be optimize the number of operations as well as memory utilization to enhance the overall performance of a NN/DNN. Operatively, one or more contiguous segments of weight values can be converted into one or more vectors of arbitrary length and each of the one or more vectors can be assigned an index. The generated indexes can be stored in an exemplary vector quantization lookup table and retrieved by exemplary fast weight lookup hardware at run time on the fly as part of an exemplary data processing function of the NN as part of an inline de-quantization operation to obtain needed one or more neuron weight values.

Type: Application

Filed: April 15, 2021

Publication date: July 29, 2021

Inventors: Amol Ashok AMBARDEKAR, Aleksandar TOMIC, Chad Balling McBRIDE, George PETRE, Kent D. CEDOLA, Larry Marvin Wall, Boris BOBROV
REDUCING POWER CONSUMPTION IN A NEURAL NETWORK ENVIRONMENT USING DATA MANAGEMENT

Publication number: 20210232205

Abstract: Techniques to provide for improved (i.e., reduced) power consumption in an exemplary neural network (NN) and/or Deep Neural Network (DNN) environment using data management. Improved power consumption in the NN/DNN may be achieved by reducing a number of bit flips needed to process operands associated with one or more storages. Reducing the number bit flips associated with the NN/DNN may be achieved by multiplying an operand associated with a first storage with a plurality of individual operands associated with a plurality of kernels of the NN/DNN. The operand associated with the first storage may be neuron input data and the plurality of individual operands associated with the second storage may be weight values for multiplication with the neuron input data. The plurality of kernels may be arranged or sorted and subsequently processed in a manner that improves power consumption in the NN/DNN.

Type: Application

Filed: April 16, 2021

Publication date: July 29, 2021

Inventors: Amol Ashok AMBARDEKAR, Chad Balling MCBRIDE, George PETRE, Kent D. CEDOLA, Larry Marvin WALL
Data processing performance enhancement for neural networks using a virtualized data iterator

Patent number: 11030131

Abstract: The performance of a neural network (NN) and/or deep neural network (DNN) can limited by the number of operations being performed as well as management of data among the various memory components of the NN/DNN. Using virtualized hardware iterators, data for processing by the NN/DNN can be traversed and configured to optimize the number of operations as well as memory utilization to enhance the overall performance of a NN/DNN. Operatively, an iterator controller can generate instructions for execution by the NN/DNN representative of one more desired iterator operation types and to perform one or more iterator operations. Data can be iterated according to a selected iterator operation and communicated to one or more neuron processors of the NN/DD for processing and output to a destination memory. The iterator operations can be applied to various volumes of data (e.g., blobs) in parallel or multiple slices of the same volume.

Type: Grant

Filed: July 29, 2020

Date of Patent: June 8, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Chad Balling McBride, George Petre, Amol Ashok Ambardekar, Kent D. Cedola, Larry Marvin Wall, Boris Bobrov
Flexible hardware for high throughput vector dequantization with dynamic vector length and codebook size

Patent number: 11010315

Abstract: The performance of a neural network (NN) and/or deep neural network (DNN) can limited by the number of operations being performed as well as memory data management of a NN/DNN. Using vector quantization of neuron weight values, the processing of data by neurons can be optimize the number of operations as well as memory utilization to enhance the overall performance of a NN/DNN. Operatively, one or more contiguous segments of weight values can be converted into one or more vectors of arbitrary length and each of the one or more vectors can be assigned an index. The generated indexes can be stored in an exemplary vector quantization lookup table and retrieved by exemplary fast weight lookup hardware at run time on the flyas part of an exemplary data processing function of the NN as part of an inline de-quantization operation to obtain needed one or more neuron weight values.

Type: Grant

Filed: January 26, 2018

Date of Patent: May 18, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Amol Ashok Ambardekar, Aleksandar Tomic, Chad Balling McBride, George Petre, Kent D. Cedola, Larry Marvin Wall, Boris Bobrov
Reducing power consumption in a neural network environment using data management

Patent number: 10996739

Abstract: Techniques to provide for improved (i.e., reduced) power consumption in an exemplary neural network (NN) and/or Deep Neural Network (DNN) environment using data management. Improved power consumption in the NN/DNN may be achieved by reducing a number of bit flips needed to process operands associated with one or more storages. Reducing the number bit flips associated with the NN/DNN may be achieved by multiplying an operand associated with a first storage with a plurality of individual operands associated with a plurality of kernels of the NN/DNN. The operand associated with the first storage may be neuron input data and the plurality of individual operands associated with the second storage may be weight values for multiplication with the neuron input data. The plurality of kernels may be arranged or sorted and subsequently processed in a manner that improves power consumption in the NN/DNN.

Type: Grant

Filed: December 19, 2017

Date of Patent: May 4, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Amol Ashok Ambardekar, Chad Balling McBride, George Petre, Kent D. Cedola, Larry Marvin Wall
Processing discontiguous memory as contiguous memory to improve performance of a neural network environment

Patent number: 10963403

Abstract: The performance of a neural network (NN) can be limited by the number of operations being performed. Using a line buffer that is directed to shift a memory block by a selected shift stride for cooperating neurons, data that is operatively residing memory and which would require multiple write cycles into a cooperating line buffer can be processed as in a single line buffer write cycle thereby enhancing the performance of a NN/DNN. A controller and/or iterator can generate one or more instructions having the memory block shifting values for communication to the line buffer. The shifting values can be calculated using various characteristics of the input data as well as the NN/DNN inclusive of the data dimensions. The line buffer can read data for processing, shift the data of the memory block and write the data in the line buffer for subsequent processing.

Type: Grant

Filed: December 1, 2017

Date of Patent: March 30, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: George Petre, Chad Balling McBride, Amol Ashok Ambardekar, Kent D. Cedola, Boris Bobrov, Larry Marvin Wall
COMPRESSION-ENCODING SCHEDULED INPUTS FOR MATRIX COMPUTATIONS

Publication number: 20210049232

Abstract: A method of performing matrix computations includes receiving a compression-encoded matrix including a plurality of rows. Each row of the compression-encoded matrix has a plurality of defined element values and, for each such defined element value, a schedule tag indicating a schedule for using the defined element value in a scheduled matrix computation. The method further includes loading the plurality of rows of the compression-encoded matrix into a corresponding plurality of work memory banks, and providing decoded input data to a matrix computation module configured for performing the scheduled matrix computation. For each work memory bank, a next defined element value and a corresponding schedule tag are read. If the schedule tag meets a scheduling condition, the next defined element value is provided to the matrix computation module. Otherwise, a default element value is provided to the matrix computation module.

Type: Application

Filed: October 30, 2020

Publication date: February 18, 2021

Applicant: Microsoft Technology Licensing, LLC

Inventors: Shuayb M. ZARAR, Amol Ashok AMBARDEKAR, Jun ZHANG
REAL-WORLD OBJECT RECOGNITION FOR COMPUTING DEVICE

Publication number: 20200372715

Abstract: A method for object recognition includes, at a computing device, receiving an image of a real-world object. An identity of the real-world object is recognized using an object recognition model trained on a plurality of computer-generated training images. A digital augmentation model corresponding to the real-world object is retrieved, the digital augmentation model including a set of augmentation-specific instructions. A pose of the digital augmentation model is aligned with a pose of the real-world object. An augmentation is provided, the augmentation associated with the real-world object and specified by the augmentation-specific instructions.

Type: Application

Filed: May 22, 2019

Publication date: November 26, 2020

Applicant: Microsoft Technology Licensing, LLC

Inventors: Harpreet Singh SAWHNEY, Andrey KONIN, Bilha-Catherine W. GITHINJI, Amol Ashok AMBARDEKAR, William Douglas GUYMAN, Muhammad Zeeshan ZIA, Ning XU, Sheng Kai TANG, Pedro URBINA ESCOS
Compression-encoding scheduled inputs for matrix computations

Patent number: 10846363

Abstract: A method of performing matrix computations includes receiving a compression-encoded matrix including a plurality of rows. Each row of the compression-encoded matrix has a plurality of defined element values and, for each such defined element value, a schedule tag indicating a schedule for using the defined element value in a scheduled matrix computation. The method further includes loading the plurality of rows of the compression-encoded matrix into a corresponding plurality of work memory banks, and providing decoded input data to a matrix computation module configured for performing the scheduled matrix computation. For each work memory bank, a next defined element value and a corresponding schedule tag are read. If the schedule tag meets a scheduling condition, the next defined element value is provided to the matrix computation module. Otherwise, a default element value is provided to the matrix computation module.

Type: Grant

Filed: January 29, 2019

Date of Patent: November 24, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Shuayb M. Zarar, Amol Ashok Ambardekar, Jun Zhang
DATA PROCESSING PERFORMANCE ENHANCEMENT FOR NEURAL NETWORKS USING A VIRTUALIZED DATA ITERATOR

Publication number: 20200356500

Abstract: The performance of a neural network (NN) and/or deep neural network (DNN) can limited by the number of operations being performed as well as management of data among the various memory components of the NN/DNN. Using virtualized hardware iterators, data for processing by the NN/DNN can be traversed and configured to optimize the number of operations as well as memory utilization to enhance the overall performance of a NN/DNN. Operatively, an iterator controller can generate instructions for execution by the NN/DNN representative of one more desired iterator operation types and to perform one or more iterator operations. Data can be iterated according to a selected iterator operation and communicated to one or more neuron processors of the NN/DD for processing and output to a destination memory. The iterator operations can be applied to various volumes of data (e.g., blobs) in parallel or multiple slices of the same volume.

Type: Application

Filed: July 29, 2020

Publication date: November 12, 2020

Inventors: Chad Balling MCBRIDE, George PETRE, Amol Ashok AMBARDEKAR, Kent D. CEDOLA, Larry Marvin WALL, Boris BOBROV
Data processing performance enhancement for neural networks using a virtualized data iterator

Patent number: 10795836

Abstract: The performance of a neural network (NN) and/or deep neural network (DNN) can limited by the number of operations being performed as well as management of data among the various memory components of the NN/DNN. Using virtualized hardware iterators, data for processing by the NN/DNN can be traversed and configured to optimize the number of operations as well as memory utilization to enhance the overall performance of a NN/DNN. Operatively, an iterator controller can generate instructions for execution by the NN/DNN representative of one more desired iterator operation types and to perform one or more iterator operations. Data can be iterated according to a selected iterator operation and communicated to one or more neuron processors of the NN/DD for processing and output to a destination memory. The iterator operations can be applied to various volumes of data (e.g., blobs) in parallel or multiple slices of the same volume.

Type: Grant

Filed: September 1, 2017

Date of Patent: October 6, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Chad Balling McBride, George Petre, Amol Ashok Ambardekar, Kent D. Cedola, Larry Marvin Wall, Boris Bobrov
SELECTIVELY CONTROLLING MEMORY POWER FOR SCHEDULED COMPUTATIONS

Publication number: 20200293105

Abstract: A computer system comprising a scheduled computation module, a work memory storage device, and a controller. The scheduled computation module is configured to receive and process data values according to a predetermined access pattern.

Type: Application

Filed: March 15, 2019

Publication date: September 17, 2020

Applicant: Microsoft Technology Licensing, LLC

Inventors: Amol Ashok AMBARDEKAR, Shuayb M. ZARAR, Jun ZHANG
ENHANCING PROCESSING PERFORMANCE OF A DNN MODULE BY BANDWIDTH CONTROL OF FABRIC INTERFACE

Publication number: 20200233820

Abstract: An exemplary computing environment having a DNN module can maintain one or more bandwidth throttling mechanisms. Illustratively, a first throttling mechanism can specify the number of cycles to wait between transactions on a cooperating fabric component (e.g., data bus). Illustratively, a second throttling mechanism can be a transaction count limiter that operatively sets a threshold of a number of transactions to be processed during a given transaction sequence and limits the number of transactions such as multiple transactions in flight to not exceed the set threshold. In an illustrative operation, in executing these two exemplary calculated throttling parameters, the average bandwidth usage and the peak bandwidth usage can be limited. Operatively, with this fabric bandwidth control, the processing units of the DNN are optimized to process data across each transaction cycle resulting in enhanced processing and lower power consumption.

Type: Application

Filed: April 8, 2020

Publication date: July 23, 2020

Inventors: Chad Balling McBRIDE, Timothy Hume HEIL, Amol Ashok AMBARDEKAR, George PETRE, Kent D. CEDOLA, Larry Marvin WALL, Boris BOBROV
COMPRESSION-ENCODING SCHEDULED INPUTS FOR MATRIX COMPUTATIONS

Publication number: 20200159812

Abstract: A method of performing matrix computations includes receiving a compression-encoded matrix including a plurality of rows. Each row of the compression-encoded matrix has a plurality of defined element values and, for each such defined element value, a schedule tag indicating a schedule for using the defined element value in a scheduled matrix computation. The method further includes loading the plurality of rows of the compression-encoded matrix into a corresponding plurality of work memory banks, and providing decoded input data to a matrix computation module configured for performing the scheduled matrix computation. For each work memory bank, a next defined element value and a corresponding schedule tag are read. If the schedule tag meets a scheduling condition, the next defined element value is provided to the matrix computation module. Otherwise, a default element value is provided to the matrix computation module.

Type: Application

Filed: January 29, 2019

Publication date: May 21, 2020

Applicant: Microsoft Technology Licensing, LLC

Inventors: Shuayb M. ZARAR, Amol Ashok AMBARDEKAR, Jun ZHANG
Enhancing processing performance of a DNN module by bandwidth control of fabric interface

Patent number: 10628345

Abstract: An exemplary computing environment having a DNN module can maintain one or more bandwidth throttling mechanisms. Illustratively, a first throttling mechanism can specify the number of cycles to wait between transactions on a cooperating fabric component (e.g., data bus). Illustratively, a second throttling mechanism can be a transaction count limiter that operatively sets a threshold of a number of transactions to be processed during a given transaction sequence and limits the number of transactions such as multiple transactions in flight to not exceed the set threshold. In an illustrative operation, in executing these two exemplary calculated throttling parameters, the average bandwidth usage and the peak bandwidth usage can be limited. Operatively, with this fabric bandwidth control, the processing units of the DNN are optimized to process data across each transaction cycle resulting in enhanced processing and lower power consumption.

Type: Grant

Filed: April 11, 2018

Date of Patent: April 21, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Chad Balling McBride, Timothy Hume Heil, Amol Ashok Ambardekar, George Petre, Kent D. Cedola, Larry Marvin Wall, Boris Bobrov
Queue management for direct memory access

Patent number: 10540584

Abstract: A direct memory access (DMA) engine may be responsible to enable and control DMA data flow within a computing system. The DMA engine moves blocks of data, associated with descriptors in a plurality of queues, from a source to a destination memory location or address, autonomously from control by a computer system's processor. Based on analysis of the data blocks linked to the descriptors in the queues, the DMA engine and its associated DMA fragmenter ensure that data blocks stored linked to descriptors in the queues do not remain idle for an exorbitant period of time. The DMA fragmenter may divide large data blocks into smaller data blocks to ensure that the processing of large data blocks does not preclude the timely processing of smaller data blocks associated with one or more descriptors in the queues. The data blocks stored may be two-dimensional data blocks.

Type: Grant

Filed: September 12, 2017

Date of Patent: January 21, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Chad Balling McBride, Amol Ashok Ambardekar, Kent D. Cedola, George Petre, Larry Marvin Wall, Boris Bobrov
REDUCING POWER CONSUMPTION IN A NEURAL NETWORK ENVIRONMENT USING DATA MANAGEMENT

Publication number: 20190187771

Abstract: Techniques to provide for improved (i.e., reduced) power consumption in an exemplary neural network (NN) and/or Deep Neural Network (DNN) environment using data management. Improved power consumption in the NN/DNN may be achieved by reducing a number of bit flips needed to process operands associated with one or more storages. Reducing the number bit flips associated with the NN/DNN may be achieved by multiplying an operand associated with a first storage with a plurality of individual operands associated with a plurality of kernels of the NN/DNN. The operand associated with the first storage may be neuron input data and the plurality of individual operands associated with the second storage may be weight values for multiplication with the neuron input data. The plurality of kernels may be arranged or sorted and subsequently processed in a manner that improves power consumption in the NN/DNN.

Type: Application

Filed: December 19, 2017

Publication date: June 20, 2019

Inventors: Amol Ashok AMBARDEKAR, Chad Balling MCBRIDE, George PETRE, Kent D. CEDOLA, Larry Marvin WALL
DYNAMICALLY PARTITIONING WORKLOAD IN A DEEP NEURAL NETWORK MODULE TO REDUCE POWER CONSUMPTION

Publication number: 20180300616

Abstract: A deep neural network (DNN) module is disclosed that can dynamically partition neuron workload to reduce power consumption. The DNN module includes neurons and a group partitioner and scheduler unit. The group partitioner and scheduler unit divides a workload for the neurons into partitions in order to maximize the number of neurons that can simultaneously process the workload. The group partitioner and scheduler unit then assigns a group of neurons to each of the partitions. The groups of neurons in the DNN module process the workload in their assigned partition to generate a partial output value. The neurons in each group can then sum their partial output values to generate a final output value for the workload. The neurons can be powered down once the groups of neurons have completed processing their assigned workload to reduce power consumption.

Type: Application

Filed: April 13, 2018

Publication date: October 18, 2018

Inventors: Amol Ashok AMBARDEKAR, Boris BOBROV, Chad Balling McBRIDE, George PETRE, Kent D. CEDOLA, Larry Marvin WALL
PROCESSING DISCONTIGUOUS MEMORY AS CONTIGUOUS MEMORY TO IMPROVE PERFORMANCE OF A NEURAL NETWORK ENVIRONMENT

Publication number: 20180300613

Abstract: The performance of a neural network (NN) can be limited by the number of operations being performed. Using a line buffer that is directed to shift a memory block by a selected shift stride for cooperating neurons, data that is operatively residing memory and which would require multiple write cycles into a cooperating line buffer can be processed as in a single line buffer write cycle thereby enhancing the performance of a NN/DNN. A controller and/or iterator can generate one or more instructions having the memory block shifting values for communication to the line buffer. The shifting values can be calculated using various characteristics of the input data as well as the NN/DNN inclusive of the data dimensions. The line buffer can read data for processing, shift the data of the memory block and write the data in the line buffer for subsequent processing.

Type: Application

Filed: December 1, 2017

Publication date: October 18, 2018

Inventors: George PETRE, Chad Balling McBRIDE, Amol Ashok AMBARDEKAR, Kent D. CEDOLA, Larry Marvin WALL, Boris BOBROV

prev 1 2 3 next