Patents by Inventor Olatunji Ruwase

Olatunji Ruwase has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11200486
    Abstract: A hardware acceleration component is provided for implementing a convolutional neural network. The hardware acceleration component includes an array of N rows and M columns of functional units, an array of N input data buffers configured to store input data, and an array of M weights data buffers configured to store weights data. Each of the N input data buffers is coupled to a corresponding one of the N rows of functional units. Each of the M weights data buffers is coupled to a corresponding one of the M columns of functional units. Each functional unit in a row is configured to receive a same set of input data. Each functional unit in a column is configured to receive a same set of weights data from the weights data buffer coupled to the row. Each of the functional units is configured to perform a convolution of the received input data and the received weights data, and the M columns of functional units are configured to provide M planes of output data.
    Type: Grant
    Filed: June 13, 2019
    Date of Patent: December 14, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Eric Chung, Karin Strauss, Kalin Ovtcharov, Joo-Young Kim, Olatunji Ruwase
  • Patent number: 10686869
    Abstract: A performance investigation tool (PIT) is described herein for investigating the performance of a distributed processing system (DPS). The PIT operates by first receiving input information that describes a graph processing task to be executed using a plurality of computing units. The PIT then determines, based on the input information, at least one time-based performance measure that describes the performance of a DPS that is capable of performing the graphical task. More specifically, the PIT can operate in a manual mode to explore the behavior of a specified DPS, or in an automatic mode to find an optimal DPS from within a search space of candidate DPSs. A configuration system may then be used to construct a selected DPS, using the plurality of computing units. In one case, the graph processing task involves training a deep neural network model having a plurality of layers.
    Type: Grant
    Filed: September 29, 2014
    Date of Patent: June 16, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Trishul Chilimbi, Yutaka Suzue, Johnson T. Apacible, Karthik Kalyanaraman, Olatunji Ruwase, Yuxiong He, Feng Yan
  • Patent number: 10592252
    Abstract: Efficient instruction processing for sparse data includes extensions to a processor pipeline to identify zero-optimizable instructions that include at least one zero input operand, and bypass the execute stage of the processor pipeline, determining the result of the operation without executing the instruction. When possible, the extensions also bypass the writeback stage of the processor pipeline.
    Type: Grant
    Filed: December 31, 2015
    Date of Patent: March 17, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Trishul A. Chilimbi, Olatunji Ruwase, Vivek Seshadri
  • Patent number: 10459727
    Abstract: Loop code processor optimizations are implemented as a loop optimizer extension to a processor pipeline. The loop optimizer generates optimized code associated with code loops that include at least one zero-optimizable instruction. The loop optimizer may generate multiple versions of optimized code associated with a particular code loop, where each of the multiple version of optimized code has a different associated condition under which the optimized code can be safely executed.
    Type: Grant
    Filed: December 31, 2015
    Date of Patent: October 29, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Trishul A Chilimbi, Olatunji Ruwase, Vivek Seshadri
  • Patent number: 10452971
    Abstract: A method is provided for implementing a deep neural network on a server component that includes a host component including a CPU and a hardware acceleration component coupled to the host component. The deep neural network includes a plurality of layers. The method includes partitioning the deep neural network into a first segment and a second segment, the first segment including a first subset of the plurality of layers, the second segment including a second subset of the plurality of layers, configuring the host component to implement the first segment, and configuring the hardware acceleration component to implement the second segment.
    Type: Grant
    Filed: June 29, 2015
    Date of Patent: October 22, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Eric Chung, Karin Strauss, Kalin Ovtcharov, Joo-Young Kim, Olatunji Ruwase
  • Publication number: 20190311253
    Abstract: A hardware acceleration component is provided for implementing a convolutional neural network. The hardware acceleration component includes an array of N rows and M columns of functional units, an array of N input data buffers configured to store input data, and an array of M weights data buffers configured to store weights data. Each of the N input data buffers is coupled to a corresponding one of the N rows of functional units. Each of the M weights data buffers is coupled to a corresponding one of the M columns of functional units. Each functional unit in a row is configured to receive a same set of input data. Each functional unit in a column is configured to receive a same set of weights data from the weights data buffer coupled to the row. Each of the functional units is configured to perform a convolution of the received input data and the received weights data, and the M columns of functional units are configured to provide M planes of output data.
    Type: Application
    Filed: June 13, 2019
    Publication date: October 10, 2019
    Inventors: Eric Chung, Karin Strauss, Kalin Ovtcharov, Joo-Young Kim, Olatunji Ruwase
  • Publication number: 20170193361
    Abstract: A neural network training tool selects from a plurality of parallelizing techniques and selects from a plurality of forward-propagation computation techniques. The neural network training tool performs a forward-propagation phase to train a neural network using the selected parallelizing technique and the selected forward-propagation computation technique based on one or more inputs. Additionally, the neural network training tool selects from a plurality computation techniques and from a plurality of parallelizing techniques for a backward-propagation phase. The neural network training tool performs a backward-propagation phase of training the neural network using the selected backward-propagation parallelizing technique and the selected backward-propagation computation technique to generate error gradients and weight deltas and to update weights associated with one or more layers of the neural network.
    Type: Application
    Filed: December 31, 2015
    Publication date: July 6, 2017
    Inventors: Trishul A. Chilimbi, Olatunji Ruwase, Samyam Rajbhandari, Michael Carbin, Yuxiong He
  • Publication number: 20170192793
    Abstract: Efficient instruction processing for sparse data includes extensions to a processor pipeline to identify zero-optimizable instructions that include at least one zero input operand, and bypass the execute stage of the processor pipeline, determining the result of the operation without executing the instruction. When possible, the extensions also bypass the writeback stage of the processor pipeline.
    Type: Application
    Filed: December 31, 2015
    Publication date: July 6, 2017
    Inventors: Trishul A. Chilimbi, Olatunji Ruwase, Vivek Seshadri
  • Publication number: 20170192787
    Abstract: Loop code processor optimizations are implemented as a loop optimizer extension to a processor pipeline. The loop optimizer generates optimized code associated with code loops that include at least one zero-optimizable instruction. The loop optimizer may generate multiple versions of optimized code associated with a particular code loop, where each of the multiple version of optimized code has a different associated condition under which the optimized code can be safely executed.
    Type: Application
    Filed: December 31, 2015
    Publication date: July 6, 2017
    Inventors: Trishul A. Chilimbi, Olatunji Ruwase, Vivek Seshadri
  • Publication number: 20170192896
    Abstract: A zero cache memory system extension includes a zero cache to store cache tags associated with zero cache lines, while a corresponding data cache stores cache tags and data bytes associated with non-zero cache lines. As non-zero data is written to the cache, cache lines may be moved from the zero cache to the data cache. Similarly, as zero data is written to the cache, cache lines may be moved from the data cache to the zero cache.
    Type: Application
    Filed: December 31, 2015
    Publication date: July 6, 2017
    Inventors: Trishul A Chilimbi, Olatunji Ruwase, Vivek Seshadri
  • Publication number: 20160379109
    Abstract: A hardware acceleration component is provided for implementing a convolutional neural network. The hardware acceleration component includes an array of N rows and M columns of functional units, an array of N input data buffers configured to store input data, and an array of M weights data buffers configured to store weights data. Each of the N input data buffers is coupled to a corresponding one of the N rows of functional units. Each of the M weights data buffers is coupled to a corresponding one of the M columns of functional units. Each functional unit in a row is configured to receive a same set of input data. Each functional unit in a column is configured to receive a same set of weights data from the weights data buffer coupled to the row. Each of the functional units is configured to perform a convolution of the received input data and the received weights data, and the M columns of functional units are configured to provide M planes of output data.
    Type: Application
    Filed: June 29, 2015
    Publication date: December 29, 2016
    Inventors: Eric Chung, Karin Strauss, Kalin Ovtcharov, Joo-Young Kim, Olatunji Ruwase
  • Publication number: 20160379108
    Abstract: A method is provided for implementing a deep neural network on a server component that includes a host component including a CPU and a hardware acceleration component coupled to the host component. The deep neural network includes a plurality of layers. The method includes partitioning the deep neural network into a first segment and a second segment, the first segment including a first subset of the plurality of layers, the second segment including a second subset of the plurality of layers, configuring the host component to implement the first segment, and configuring the hardware acceleration component to implement the second segment.
    Type: Application
    Filed: June 29, 2015
    Publication date: December 29, 2016
    Inventors: Eric Chung, Karin Strauss, Kalin Ovtcharov, Joo-Young Kim, Olatunji Ruwase
  • Publication number: 20160092765
    Abstract: A performance investigation tool (PIT) is described herein for investigating the performance of a distributed processing system (DPS). The PIT operates by first receiving input information that describes a graph processing task to be executed using a plurality of computing units. The PIT then determines, based on the input information, at least one time-based performance measure that describes the performance of a DPS that is capable of performing the graphical task. More specifically, the PIT can operate in a manual mode to explore the behavior of a specified DPS, or in an automatic mode to find an optimal DPS from within a search space of candidate DPSs. A configuration system may then be used to construct a selected DPS, using the plurality of computing units. In one case, the graph processing task involves training a deep neural network model having a plurality of layers.
    Type: Application
    Filed: September 29, 2014
    Publication date: March 31, 2016
    Inventors: Trishul Chilimbi, Yutaka Suzue, Johnson T. Apacible, Karthik Kalyanaraman, Olatunji Ruwase, Yuxiong He, Feng Yan