Patents by Inventor Boris Bobrov

Boris Bobrov has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Data processing performance enhancement for neural networks using a virtualized data iterator

Patent number: 11030131

Abstract: The performance of a neural network (NN) and/or deep neural network (DNN) can limited by the number of operations being performed as well as management of data among the various memory components of the NN/DNN. Using virtualized hardware iterators, data for processing by the NN/DNN can be traversed and configured to optimize the number of operations as well as memory utilization to enhance the overall performance of a NN/DNN. Operatively, an iterator controller can generate instructions for execution by the NN/DNN representative of one more desired iterator operation types and to perform one or more iterator operations. Data can be iterated according to a selected iterator operation and communicated to one or more neuron processors of the NN/DD for processing and output to a destination memory. The iterator operations can be applied to various volumes of data (e.g., blobs) in parallel or multiple slices of the same volume.

Type: Grant

Filed: July 29, 2020

Date of Patent: June 8, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Chad Balling McBride, George Petre, Amol Ashok Ambardekar, Kent D. Cedola, Larry Marvin Wall, Boris Bobrov
Flexible hardware for high throughput vector dequantization with dynamic vector length and codebook size

Patent number: 11010315

Abstract: The performance of a neural network (NN) and/or deep neural network (DNN) can limited by the number of operations being performed as well as memory data management of a NN/DNN. Using vector quantization of neuron weight values, the processing of data by neurons can be optimize the number of operations as well as memory utilization to enhance the overall performance of a NN/DNN. Operatively, one or more contiguous segments of weight values can be converted into one or more vectors of arbitrary length and each of the one or more vectors can be assigned an index. The generated indexes can be stored in an exemplary vector quantization lookup table and retrieved by exemplary fast weight lookup hardware at run time on the flyas part of an exemplary data processing function of the NN as part of an inline de-quantization operation to obtain needed one or more neuron weight values.

Type: Grant

Filed: January 26, 2018

Date of Patent: May 18, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Amol Ashok Ambardekar, Aleksandar Tomic, Chad Balling McBride, George Petre, Kent D. Cedola, Larry Marvin Wall, Boris Bobrov
Processing discontiguous memory as contiguous memory to improve performance of a neural network environment

Patent number: 10963403

Abstract: The performance of a neural network (NN) can be limited by the number of operations being performed. Using a line buffer that is directed to shift a memory block by a selected shift stride for cooperating neurons, data that is operatively residing memory and which would require multiple write cycles into a cooperating line buffer can be processed as in a single line buffer write cycle thereby enhancing the performance of a NN/DNN. A controller and/or iterator can generate one or more instructions having the memory block shifting values for communication to the line buffer. The shifting values can be calculated using various characteristics of the input data as well as the NN/DNN inclusive of the data dimensions. The line buffer can read data for processing, shift the data of the memory block and write the data in the line buffer for subsequent processing.

Type: Grant

Filed: December 1, 2017

Date of Patent: March 30, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: George Petre, Chad Balling McBride, Amol Ashok Ambardekar, Kent D. Cedola, Boris Bobrov, Larry Marvin Wall
NEURAL PROCESSING ELEMENT WITH SINGLE INSTRUCTION MULTIPLE DATA (SIMD) COMPUTE LANES

Publication number: 20200409663

Abstract: An architecture is disclosed for an neural processing element having single instruction, multiple data (“SIMD”) compute lanes. The neural processing element includes compute lanes having multipliers configured to multiply a binary operand with another binary operand to generate a binary output. The neural processing element also includes a single adder tree for summing the binary outputs of the hardware binary multipliers. The neural processing element also includes a storage element for storing a binary output of the single hardware binary adder tree.

Type: Application

Filed: June 26, 2019

Publication date: December 31, 2020

Inventors: Chad Balling MCBRIDE, Amol A. AMBARDEKAR, Boris BOBROV, Kent D. CEDOLA, George PETRE, Larry Marvin WALL
MANAGING WORKLOADS OF A DEEP NEURAL NETWORK PROCESSOR

Publication number: 20200409757

Abstract: A computing system includes processor cores for executing applications that utilize functionality provided by a deep neural network (“DNN”) processor. One of the cores operates as a resource and power management (“RPM”) processor core. When the RPM processor receives a request to execute a DNN workload, it divides the DNN workload into workload fragments. The RPM processor then determines whether a workload fragment is to be statically allocated or dynamically allocated to a DNN processor. Once the RPM processor has selected a DNN processor, the RPM enqueues the workload fragment on a queue maintained by the selected DNN processor. The DNN processor dequeues workload fragments from its queue for execution. Once execution of a workload fragment has completed, the DNN processor generates an interrupt indicating that execution of the workload fragment has completed. The RPM processor can then notify the processor core that originally requested execution of the workload fragment.

Type: Application

Filed: June 26, 2019

Publication date: December 31, 2020

Inventors: Chad Balling MCBRIDE, Amol A. AMBARDEKAR, Boris BOBROV, Kent D. CEDOLA, George PETRE, Larry Marvin WALL
INCREASED PRECISION NEURAL PROCESSING ELEMENT

Publication number: 20200410329

Abstract: Neural processing elements are configured with a hardware AND gate configured to perform a logical AND operation between a sign extend signal and a most significant bit (“MSB”) of an operand. The state of the sign extend signal can be based upon a type of a layer of a deep neural network (“DNN”) that generate the operand. If the sign extend signal is logical FALSE, no sign extension is performed. If the sign extend signal is logical TRUE, a concatenator concatenates the output of the hardware AND gate and the operand, thereby extending the operand from an N-bit unsigned binary value to an N+1 bit signed binary value. The neural processing element can also include another hardware AND gate and another concatenator for processing another operand similarly. The outputs of the concatenators for both operands are provided to a hardware binary multiplier.

Type: Application

Filed: June 28, 2019

Publication date: December 31, 2020

Inventors: Amol A. AMBARDEKAR, Boris BOBROV, Kent D. CEDOLA, Chad Balling MCBRIDE, George PETRE, Larry Marvin WALL
DATA PROCESSING PERFORMANCE ENHANCEMENT FOR NEURAL NETWORKS USING A VIRTUALIZED DATA ITERATOR

Publication number: 20200356500

Abstract: The performance of a neural network (NN) and/or deep neural network (DNN) can limited by the number of operations being performed as well as management of data among the various memory components of the NN/DNN. Using virtualized hardware iterators, data for processing by the NN/DNN can be traversed and configured to optimize the number of operations as well as memory utilization to enhance the overall performance of a NN/DNN. Operatively, an iterator controller can generate instructions for execution by the NN/DNN representative of one more desired iterator operation types and to perform one or more iterator operations. Data can be iterated according to a selected iterator operation and communicated to one or more neuron processors of the NN/DD for processing and output to a destination memory. The iterator operations can be applied to various volumes of data (e.g., blobs) in parallel or multiple slices of the same volume.

Type: Application

Filed: July 29, 2020

Publication date: November 12, 2020

Inventors: Chad Balling MCBRIDE, George PETRE, Amol Ashok AMBARDEKAR, Kent D. CEDOLA, Larry Marvin WALL, Boris BOBROV
Data processing performance enhancement for neural networks using a virtualized data iterator

Patent number: 10795836

Abstract: The performance of a neural network (NN) and/or deep neural network (DNN) can limited by the number of operations being performed as well as management of data among the various memory components of the NN/DNN. Using virtualized hardware iterators, data for processing by the NN/DNN can be traversed and configured to optimize the number of operations as well as memory utilization to enhance the overall performance of a NN/DNN. Operatively, an iterator controller can generate instructions for execution by the NN/DNN representative of one more desired iterator operation types and to perform one or more iterator operations. Data can be iterated according to a selected iterator operation and communicated to one or more neuron processors of the NN/DD for processing and output to a destination memory. The iterator operations can be applied to various volumes of data (e.g., blobs) in parallel or multiple slices of the same volume.

Type: Grant

Filed: September 1, 2017

Date of Patent: October 6, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Chad Balling McBride, George Petre, Amol Ashok Ambardekar, Kent D. Cedola, Larry Marvin Wall, Boris Bobrov
ENHANCING PROCESSING PERFORMANCE OF A DNN MODULE BY BANDWIDTH CONTROL OF FABRIC INTERFACE

Publication number: 20200233820

Abstract: An exemplary computing environment having a DNN module can maintain one or more bandwidth throttling mechanisms. Illustratively, a first throttling mechanism can specify the number of cycles to wait between transactions on a cooperating fabric component (e.g., data bus). Illustratively, a second throttling mechanism can be a transaction count limiter that operatively sets a threshold of a number of transactions to be processed during a given transaction sequence and limits the number of transactions such as multiple transactions in flight to not exceed the set threshold. In an illustrative operation, in executing these two exemplary calculated throttling parameters, the average bandwidth usage and the peak bandwidth usage can be limited. Operatively, with this fabric bandwidth control, the processing units of the DNN are optimized to process data across each transaction cycle resulting in enhanced processing and lower power consumption.

Type: Application

Filed: April 8, 2020

Publication date: July 23, 2020

Inventors: Chad Balling McBRIDE, Timothy Hume HEIL, Amol Ashok AMBARDEKAR, George PETRE, Kent D. CEDOLA, Larry Marvin WALL, Boris BOBROV
Enhancing processing performance of a DNN module by bandwidth control of fabric interface

Patent number: 10628345

Abstract: An exemplary computing environment having a DNN module can maintain one or more bandwidth throttling mechanisms. Illustratively, a first throttling mechanism can specify the number of cycles to wait between transactions on a cooperating fabric component (e.g., data bus). Illustratively, a second throttling mechanism can be a transaction count limiter that operatively sets a threshold of a number of transactions to be processed during a given transaction sequence and limits the number of transactions such as multiple transactions in flight to not exceed the set threshold. In an illustrative operation, in executing these two exemplary calculated throttling parameters, the average bandwidth usage and the peak bandwidth usage can be limited. Operatively, with this fabric bandwidth control, the processing units of the DNN are optimized to process data across each transaction cycle resulting in enhanced processing and lower power consumption.

Type: Grant

Filed: April 11, 2018

Date of Patent: April 21, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Chad Balling McBride, Timothy Hume Heil, Amol Ashok Ambardekar, George Petre, Kent D. Cedola, Larry Marvin Wall, Boris Bobrov
Direct memory access descriptor processing using timestamps

Patent number: 10572401

Abstract: Hardware accelerated synchronization of data movement across multiple direct memory access (DMA) engines is provided using techniques in which the order of descriptor processing is guaranteed for scenarios involving a single CPU and multiple DMA engines as well as those involving multiple CPUs and multiple DMA engines.

Type: Grant

Filed: July 17, 2017

Date of Patent: February 25, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Chad McBride, Jeffrey Bradford, Steven Wheeler, Christopher Johnson, Boris Bobrov, Andras Tantos
Queue management for direct memory access

Patent number: 10540584

Abstract: A direct memory access (DMA) engine may be responsible to enable and control DMA data flow within a computing system. The DMA engine moves blocks of data, associated with descriptors in a plurality of queues, from a source to a destination memory location or address, autonomously from control by a computer system's processor. Based on analysis of the data blocks linked to the descriptors in the queues, the DMA engine and its associated DMA fragmenter ensure that data blocks stored linked to descriptors in the queues do not remain idle for an exorbitant period of time. The DMA fragmenter may divide large data blocks into smaller data blocks to ensure that the processing of large data blocks does not preclude the timely processing of smaller data blocks associated with one or more descriptors in the queues. The data blocks stored may be two-dimensional data blocks.

Type: Grant

Filed: September 12, 2017

Date of Patent: January 21, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Chad Balling McBride, Amol Ashok Ambardekar, Kent D. Cedola, George Petre, Larry Marvin Wall, Boris Bobrov
Direct memory access (“DMA”) descriptor processing using identifiers assigned to descriptors on DMA engines

Patent number: 10528494

Abstract: Hardware accelerated synchronization of data movement across multiple direct memory access (DMA) engines is provided using techniques in which the order of descriptor processing is guaranteed for scenarios involving a single CPU and multiple DMA engines as well as those involving multiple CPUs and multiple DMA engines.

Type: Grant

Filed: July 17, 2017

Date of Patent: January 7, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Chad McBride, Jeffrey Bradford, Steven Wheeler, Christopher Johnson, Boris Bobrov, Andras Tantos
POWER-EFFICIENT DEEP NEURAL NETWORK MODULE CONFIGURED FOR LAYER AND OPERATION FENCING AND DEPENDENCY MANAGEMENT

Publication number: 20180300604

Abstract: A deep neural network (DNN) processor is configured to execute layer descriptors in layer descriptor lists. The descriptors define instructions for performing a forward pass of a DNN by the DNN processor. The layer descriptors can also be utilized to manage the flow of descriptors through the DNN module. For example, layer descriptors can define dependencies upon other descriptors. Descriptors defining a dependency will not execute until the descriptors upon which they are dependent have completed. Layer descriptors can also define a “fence,” or barrier, function that can be used to prevent the processing of upstream layer descriptors until the processing of all downstream layer descriptors is complete. The fence bit guarantees that there are no other layer descriptors in the DNN processing pipeline before the layer descriptor that has the fence to be asserted is processed.

Type: Application

Filed: April 11, 2018

Publication date: October 18, 2018

Inventors: Chad Balling McBRIDE, Amol Ashok AMBARDEKAR, Kent D. CEDOLA, George PETRE, Larry Marvin WALL, Boris BOBROV
ENHANCING PROCESSING PERFORMANCE OF A DNN MODULE BY BANDWIDTH CONTROL OF FABRIC INTERFACE

Publication number: 20180299943

Abstract: An exemplary computing environment having a DNN module can maintain one or more bandwidth throttling mechanisms. Illustratively, a first throttling mechanism can specify the number of cycles to wait between transactions on a cooperating fabric component (e.g., data bus). Illustratively, a second throttling mechanism can be a transaction count limiter that operatively sets a threshold of a number of transactions to be processed during a given transaction sequence and limits the number of transactions such as multiple transactions in flight to not exceed the set threshold. In an illustrative operation, in executing these two exemplary calculated throttling parameters, the average bandwidth usage and the peak bandwidth usage can be limited. Operatively, with this fabric bandwidth control, the processing units of the DNN are optimized to process data across each transaction cycle resulting in enhanced processing and lower power consumption.

Type: Application

Filed: April 11, 2018

Publication date: October 18, 2018

Inventors: Chad Balling McBRIDE, Timothy Hume HEIL, Amol Ashok AMBARDEKAR, George PETRE, Kent D. CEDOLA, Larry Marvin WALL, Boris BOBROV
QUEUE MANAGEMENT FOR DIRECT MEMORY ACCESS

Publication number: 20180300634

Abstract: A direct memory access (DMA) engine may be responsible to enable and control DMA data flow within a computing system. The DMA engine moves blocks of data, associated with descriptors in a plurality of queues, from a source to a destination memory location or address, autonomously from control by a computer system's processor. Based on analysis of the data blocks linked to the descriptors in the queues, the DMA engine and its associated DMA fragmenter ensure that data blocks stored linked to descriptors in the queues do not remain idle for an exorbitant period of time. The DMA fragmenter may divide large data blocks into smaller data blocks to ensure that the processing of large data blocks does not preclude the timely processing of smaller data blocks associated with one or more descriptors in the queues. The data blocks stored may be two-dimensional data blocks.

Type: Application

Filed: September 12, 2017

Publication date: October 18, 2018

Inventors: Chad Balling McBRIDE, Amol Ashok AMBARDEKAR, Kent D. CEDOLA, George PETRE, Larry Marvin Wall, Boris BOBROV
DYNAMIC SEQUENCING OF DATA PARTITIONS FOR OPTIMIZING MEMORY UTILIZATION AND PERFORMANCE OF NEURAL NETWORKS

Publication number: 20180300601

Abstract: Optimized memory usage and management is crucial to the overall performance of a neural network (NN) or deep neural network (DNN) computing environment. Using various characteristics of the input data dimension, an apportionment sequence is calculated for the input data to be processed by the NN or DNN that optimizes the efficient use of the local and external memory components. The apportionment sequence can describe how to parcel the input data (and its associated processing parameters—e.g., processing weights) into one or more portions as well as how such portions of input data (and its associated processing parameters) are passed between the local memory, external memory, and processing unit components of the NN or DNN. Additionally, the apportionment sequence can include instructions to store generated output data in the local and/or external memory components so as to optimize the efficient use of the local and/or external memory components.

Type: Application

Filed: September 28, 2017

Publication date: October 18, 2018

Inventors: Kent D. CEDOLA, Chad Balling McBRIDE, Amol Ashok AMBARDEKAR, George PETRE, Larry Marvin WALL, Boris BOBROV
MINIMIZING MEMORY READS AND INCREASING PERFORMANCE BY LEVERAGING ALIGNED BLOB DATA IN A PROCESSING UNIT OF A NEURAL NETWORK ENVIRONMENT

Publication number: 20180300607

Abstract: The performance of a neural network (NN) and/or deep neural network (DNN) can be limited by the number of operations being performed as well as management of data among the various memory components of the NN/DNN. By inserting a selected padding in the input data to align the input data in memory, data read/writes can be optimized for processing by the NN/DNN thereby enhancing the overall performance of a NN/DNN. Operatively, an operations controller/iterator can generate one or more instructions that inserts the selected padding into the data. The data padding can be calculated using various characteristics of the input data as well as the NN/DNN as well as characteristics of the cooperating memory components. Padding on the output data can be utilized to support the data alignment at the memory components and the cooperating processing units of the NN/DNN.

Type: Application

Filed: November 15, 2017

Publication date: October 18, 2018

Inventors: George PETRE, Chad Balling McBRIDE, Amol Ashok AMBARDEKAR, Kent D. CEDOLA, Larry Marvin WALL, Boris BOBROV
FLEXIBLE HARDWARE FOR HIGH THROUGHPUT VECTOR DEQUANTIZATION WITH DYNAMIC VECTOR LENGTH AND CODEBOOK SIZE

Publication number: 20180300603

Abstract: The performance of a neural network (NN) and/or deep neural network (DNN) can limited by the number of operations being performed as well as memory data management of a NN/DNN. Using vector quantization of neuron weight values, the processing of data by neurons can be optimize the number of operations as well as memory utilization to enhance the overall performance of a NN/DNN. Operatively, one or more contiguous segments of weight values can be converted into one or more vectors of arbitrary length and each of the one or more vectors can be assigned an index. The generated indexes can be stored in an exemplary vector quantization lookup table and retrieved by exemplary fast weight lookup hardware at run time on the flyas part of an exemplary data processing function of the NN as part of an inline de-quantization operation to obtain needed one or more neuron weight values.

Type: Application

Filed: January 26, 2018

Publication date: October 18, 2018

Inventors: Amol Ashok AMBARDEKAR, Aleksandar TOMIC, Chad Balling McBRIDE, George PETRE, Kent D. CEDOLA, Larry Marvin Wall, Boris BOBROV
DYNAMICALLY PARTITIONING WORKLOAD IN A DEEP NEURAL NETWORK MODULE TO REDUCE POWER CONSUMPTION

Publication number: 20180300616

Abstract: A deep neural network (DNN) module is disclosed that can dynamically partition neuron workload to reduce power consumption. The DNN module includes neurons and a group partitioner and scheduler unit. The group partitioner and scheduler unit divides a workload for the neurons into partitions in order to maximize the number of neurons that can simultaneously process the workload. The group partitioner and scheduler unit then assigns a group of neurons to each of the partitions. The groups of neurons in the DNN module process the workload in their assigned partition to generate a partial output value. The neurons in each group can then sum their partial output values to generate a final output value for the workload. The neurons can be powered down once the groups of neurons have completed processing their assigned workload to reduce power consumption.

Type: Application

Filed: April 13, 2018

Publication date: October 18, 2018

Inventors: Amol Ashok AMBARDEKAR, Boris BOBROV, Chad Balling McBRIDE, George PETRE, Kent D. CEDOLA, Larry Marvin WALL

prev 1 2 3 next