Patents by Inventor Olatunji Ruwase

Olatunji Ruwase has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Convolutional neural networks on hardware accelerators

Patent number: 11200486

Abstract: A hardware acceleration component is provided for implementing a convolutional neural network. The hardware acceleration component includes an array of N rows and M columns of functional units, an array of N input data buffers configured to store input data, and an array of M weights data buffers configured to store weights data. Each of the N input data buffers is coupled to a corresponding one of the N rows of functional units. Each of the M weights data buffers is coupled to a corresponding one of the M columns of functional units. Each functional unit in a row is configured to receive a same set of input data. Each functional unit in a column is configured to receive a same set of weights data from the weights data buffer coupled to the row. Each of the functional units is configured to perform a convolution of the received input data and the received weights data, and the M columns of functional units are configured to provide M planes of output data.

Type: Grant

Filed: June 13, 2019

Date of Patent: December 14, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Eric Chung, Karin Strauss, Kalin Ovtcharov, Joo-Young Kim, Olatunji Ruwase
Tool for investigating the performance of a distributed processing system

Patent number: 10686869

Abstract: A performance investigation tool (PIT) is described herein for investigating the performance of a distributed processing system (DPS). The PIT operates by first receiving input information that describes a graph processing task to be executed using a plurality of computing units. The PIT then determines, based on the input information, at least one time-based performance measure that describes the performance of a DPS that is capable of performing the graphical task. More specifically, the PIT can operate in a manual mode to explore the behavior of a specified DPS, or in an automatic mode to find an optimal DPS from within a search space of candidate DPSs. A configuration system may then be used to construct a selected DPS, using the plurality of computing units. In one case, the graph processing task involves training a deep neural network model having a plurality of layers.

Type: Grant

Filed: September 29, 2014

Date of Patent: June 16, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Trishul Chilimbi, Yutaka Suzue, Johnson T. Apacible, Karthik Kalyanaraman, Olatunji Ruwase, Yuxiong He, Feng Yan
Efficient instruction processing for sparse data

Patent number: 10592252

Abstract: Efficient instruction processing for sparse data includes extensions to a processor pipeline to identify zero-optimizable instructions that include at least one zero input operand, and bypass the execute stage of the processor pipeline, determining the result of the operation without executing the instruction. When possible, the extensions also bypass the writeback stage of the processor pipeline.

Type: Grant

Filed: December 31, 2015

Date of Patent: March 17, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Trishul A. Chilimbi, Olatunji Ruwase, Vivek Seshadri
Loop code processor optimizations

Patent number: 10459727

Abstract: Loop code processor optimizations are implemented as a loop optimizer extension to a processor pipeline. The loop optimizer generates optimized code associated with code loops that include at least one zero-optimizable instruction. The loop optimizer may generate multiple versions of optimized code associated with a particular code loop, where each of the multiple version of optimized code has a different associated condition under which the optimized code can be safely executed.

Type: Grant

Filed: December 31, 2015

Date of Patent: October 29, 2019

Assignee: Microsoft Technology Licensing, LLC

Inventors: Trishul A Chilimbi, Olatunji Ruwase, Vivek Seshadri
Deep neural network partitioning on servers

Patent number: 10452971

Abstract: A method is provided for implementing a deep neural network on a server component that includes a host component including a CPU and a hardware acceleration component coupled to the host component. The deep neural network includes a plurality of layers. The method includes partitioning the deep neural network into a first segment and a second segment, the first segment including a first subset of the plurality of layers, the second segment including a second subset of the plurality of layers, configuring the host component to implement the first segment, and configuring the hardware acceleration component to implement the second segment.

Type: Grant

Filed: June 29, 2015

Date of Patent: October 22, 2019

Assignee: Microsoft Technology Licensing, LLC

Inventors: Eric Chung, Karin Strauss, Kalin Ovtcharov, Joo-Young Kim, Olatunji Ruwase
CONVOLUTIONAL NEURAL NETWORKS ON HARDWARE ACCELERATORS

Publication number: 20190311253

Abstract: A hardware acceleration component is provided for implementing a convolutional neural network. The hardware acceleration component includes an array of N rows and M columns of functional units, an array of N input data buffers configured to store input data, and an array of M weights data buffers configured to store weights data. Each of the N input data buffers is coupled to a corresponding one of the N rows of functional units. Each of the M weights data buffers is coupled to a corresponding one of the M columns of functional units. Each functional unit in a row is configured to receive a same set of input data. Each functional unit in a column is configured to receive a same set of weights data from the weights data buffer coupled to the row. Each of the functional units is configured to perform a convolution of the received input data and the received weights data, and the M columns of functional units are configured to provide M planes of output data.

Type: Application

Filed: June 13, 2019

Publication date: October 10, 2019

Inventors: Eric Chung, Karin Strauss, Kalin Ovtcharov, Joo-Young Kim, Olatunji Ruwase
NEURAL NETWORK TRAINING PERFORMANCE OPTIMIZATION FRAMEWORK

Publication number: 20170193361

Abstract: A neural network training tool selects from a plurality of parallelizing techniques and selects from a plurality of forward-propagation computation techniques. The neural network training tool performs a forward-propagation phase to train a neural network using the selected parallelizing technique and the selected forward-propagation computation technique based on one or more inputs. Additionally, the neural network training tool selects from a plurality computation techniques and from a plurality of parallelizing techniques for a backward-propagation phase. The neural network training tool performs a backward-propagation phase of training the neural network using the selected backward-propagation parallelizing technique and the selected backward-propagation computation technique to generate error gradients and weight deltas and to update weights associated with one or more layers of the neural network.

Type: Application

Filed: December 31, 2015

Publication date: July 6, 2017

Inventors: Trishul A. Chilimbi, Olatunji Ruwase, Samyam Rajbhandari, Michael Carbin, Yuxiong He
EFFICIENT INSTRUCTION PROCESSING FOR SPARSE DATA

Publication number: 20170192793

Abstract: Efficient instruction processing for sparse data includes extensions to a processor pipeline to identify zero-optimizable instructions that include at least one zero input operand, and bypass the execute stage of the processor pipeline, determining the result of the operation without executing the instruction. When possible, the extensions also bypass the writeback stage of the processor pipeline.

Type: Application

Filed: December 31, 2015

Publication date: July 6, 2017

Inventors: Trishul A. Chilimbi, Olatunji Ruwase, Vivek Seshadri
LOOP CODE PROCESSOR OPTIMIZATIONS

Publication number: 20170192787

Abstract: Loop code processor optimizations are implemented as a loop optimizer extension to a processor pipeline. The loop optimizer generates optimized code associated with code loops that include at least one zero-optimizable instruction. The loop optimizer may generate multiple versions of optimized code associated with a particular code loop, where each of the multiple version of optimized code has a different associated condition under which the optimized code can be safely executed.

Type: Application

Filed: December 31, 2015

Publication date: July 6, 2017

Inventors: Trishul A. Chilimbi, Olatunji Ruwase, Vivek Seshadri
ZERO CACHE MEMORY SYSTEM EXTENSION

Publication number: 20170192896

Abstract: A zero cache memory system extension includes a zero cache to store cache tags associated with zero cache lines, while a corresponding data cache stores cache tags and data bytes associated with non-zero cache lines. As non-zero data is written to the cache, cache lines may be moved from the zero cache to the data cache. Similarly, as zero data is written to the cache, cache lines may be moved from the data cache to the zero cache.

Type: Application

Filed: December 31, 2015

Publication date: July 6, 2017

Inventors: Trishul A Chilimbi, Olatunji Ruwase, Vivek Seshadri
CONVOLUTIONAL NEURAL NETWORKS ON HARDWARE ACCELERATORS

Publication number: 20160379109

Abstract: A hardware acceleration component is provided for implementing a convolutional neural network. The hardware acceleration component includes an array of N rows and M columns of functional units, an array of N input data buffers configured to store input data, and an array of M weights data buffers configured to store weights data. Each of the N input data buffers is coupled to a corresponding one of the N rows of functional units. Each of the M weights data buffers is coupled to a corresponding one of the M columns of functional units. Each functional unit in a row is configured to receive a same set of input data. Each functional unit in a column is configured to receive a same set of weights data from the weights data buffer coupled to the row. Each of the functional units is configured to perform a convolution of the received input data and the received weights data, and the M columns of functional units are configured to provide M planes of output data.

Type: Application

Filed: June 29, 2015

Publication date: December 29, 2016

Inventors: Eric Chung, Karin Strauss, Kalin Ovtcharov, Joo-Young Kim, Olatunji Ruwase
DEEP NEURAL NETWORK PARTITIONING ON SERVERS

Publication number: 20160379108

Abstract: A method is provided for implementing a deep neural network on a server component that includes a host component including a CPU and a hardware acceleration component coupled to the host component. The deep neural network includes a plurality of layers. The method includes partitioning the deep neural network into a first segment and a second segment, the first segment including a first subset of the plurality of layers, the second segment including a second subset of the plurality of layers, configuring the host component to implement the first segment, and configuring the hardware acceleration component to implement the second segment.

Type: Application

Filed: June 29, 2015

Publication date: December 29, 2016

Inventors: Eric Chung, Karin Strauss, Kalin Ovtcharov, Joo-Young Kim, Olatunji Ruwase
Tool for Investigating the Performance of a Distributed Processing System

Publication number: 20160092765

Abstract: A performance investigation tool (PIT) is described herein for investigating the performance of a distributed processing system (DPS). The PIT operates by first receiving input information that describes a graph processing task to be executed using a plurality of computing units. The PIT then determines, based on the input information, at least one time-based performance measure that describes the performance of a DPS that is capable of performing the graphical task. More specifically, the PIT can operate in a manual mode to explore the behavior of a specified DPS, or in an automatic mode to find an optimal DPS from within a search space of candidate DPSs. A configuration system may then be used to construct a selected DPS, using the plurality of computing units. In one case, the graph processing task involves training a deep neural network model having a plurality of layers.

Type: Application

Filed: September 29, 2014

Publication date: March 31, 2016

Inventors: Trishul Chilimbi, Yutaka Suzue, Johnson T. Apacible, Karthik Kalyanaraman, Olatunji Ruwase, Yuxiong He, Feng Yan