Patents by Inventor Girish Venkataramani

Girish Venkataramani has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SYSTEMS AND METHODS FOR QUANTIZING A NEURAL NETWORK

Publication number: 20210174214

Abstract: Systems and methods quantize an application having a trained Deep Neural Network (DNN) for deployment on target hardware. The application may be instrumented to observe data values generated during execution of the application. Statistics may be generated for the observed data values and presented in a visualization tool. The application may be quantized through a rules based approach. The quantization may be based on the statistics and on constraints imposed by resources available at the target hardware. The systems and methods may present the proposed data types resulting from the quantization and may create a quantized version of the application incorporating the proposed data types. The systems and methods may generate performance data to validate the quantized version of the application. Changes to the rules may be made and the quantization process repeated if the performance is not satisfactory.

Type: Application

Filed: December 1, 2020

Publication date: June 10, 2021

Inventors: Vaidehi Venkatesan, Jayaprabha Shankar, Shixin Zhuang, Girish Venkataramani, FNU Hanumantharayappa
Systems and methods for configuring programmable logic devices for deep learning networks

Patent number: 11023360

Abstract: Systems and methods may configure a programmable logic device to efficiently run a deep learning (DL) network. Architecture code and algorithmic code may be generated. The architecture code may define convolutional and fully connected processor cores structured to run the layers of a Deep Neural Network (DNN). The processor cores may be interconnected by a First In First Out (FIFO) memory. The architecture code may also define stride-efficient memories for implementing convolution. The algorithmic code may include configuration instructions for running the DNN's layers at the processor cores. The algorithmic code may also include a schedule for executing the configuration instructions on the processor cores, for moving network parameters to the processor cores, and for transferring outputs between the layers.

Type: Grant

Filed: February 7, 2019

Date of Patent: June 1, 2021

Assignee: The MathWorks, Inc.

Inventors: Yongfeng Gu, Girish Venkataramani, Wang Chen, Bharathi Yogaraj, Yuteng Zhou, Vibha Patil, Anusha Vasantala, Purshottam Vishwakarma
Systems and methods for generating code for parallel processing units

Patent number: 10949182

Abstract: Systems and methods generate code from a source program where the generated code may be compiled and executed on a Graphics Processing Unit (GPU). A parallel loop analysis check may be performed on regions of the source program identified for parallelization. One or more optimizations also may be applied to the source program that convert mathematical operations into a parallel form. The source program may be partitioned into segments for execution on a host and a device. Kernels may be created for the segments to be executed on the device. The size of the kernels may be determined, and memory transfers between the host and device may be optimized.

Type: Grant

Filed: November 17, 2017

Date of Patent: March 16, 2021

Assignee: The MathWorks, Inc.

Inventors: Girish Venkataramani, Rama P. Kokku, Jayaprabha Shankar, James L. Brock, Chun-Yu Shei, Vijaya Raghavan
SYSTEMS AND METHODS FOR CONFIGURING PROGRAMMABLE LOGIC DEVICES FOR DEEP LEARNING NETWORKS

Publication number: 20200151088

Abstract: Systems and methods may configure a programmable logic device to efficiently run a deep learning (DL) network. Architecture code and algorithmic code may be generated. The architecture code may define convolutional and fully connected processor cores structured to run the layers of a Deep Neural Network (DNN). The processor cores may be interconnected by a First In First Out (FIFO) memory. The architecture code may also define stride-efficient memories for implementing convolution. The algorithmic code may include configuration instructions for running the DNN's layers at the processor cores. The algorithmic code may also include a schedule for executing the configuration instructions on the processor cores, for moving network parameters to the processor cores, and for transferring outputs between the layers.

Type: Application

Filed: February 7, 2019

Publication date: May 14, 2020

Inventors: Yongfeng Gu, Girish Venkataramani, Wang Chen, Bharathi Yogaraj, Yuteng Zhou, Vibha Patil, Anusha Vasantala, Purshottam Vishwakarma
Systems and methods for sharing resources having different data types

Patent number: 10423733

Abstract: A system and method generates optimized code for a source model. The system may include a resource sharing optimizer that evaluates the source model and replaces multiple model elements of the source model that are functionally equivalent with a single shared model element. The model elements replaced with the single shared model element may have different fixed point data types. The resource sharing optimizer may convert some of the fixed point data types to a common fixed point data type.

Type: Grant

Filed: April 14, 2016

Date of Patent: September 24, 2019

Assignee: The MathWorks, Inc.

Inventors: Girish Venkataramani, Yongfeng Gu, Rama Kokku, Sanmukh Rao Kuppannagari
Streaming on hardware-software platforms in model based designs

Patent number: 10387584

Abstract: A method may include receiving functional model information regarding a set of functional blocks associated with a functional model. The functional model may include a streaming algorithm for exchanging streaming data. The method may include receiving architectural model information regarding physical devices included in a target device from a hardware-software co-design platform. The physical devices may include a software based processing device and a hardware based processing device. The method may include mapping the functional blocks to the physical devices to allow the streaming data to be communicated between the software based processing device and the hardware based processing device. The method may include generating a streaming interface to model communication of the streaming data between the software based processing device and the hardware based processing device.

Type: Grant

Filed: December 3, 2014

Date of Patent: August 20, 2019

Assignee: The MathWorks, Inc.

Inventors: Katalin Maria Popovici, Rajiv Ghosh-Roy, Senthilkumar Manickavasagam, Wang Chen, Girish Venkataramani, Wei Zang, Abhijeet H. Gadkari, Matthew H. Fornero
Resource sharing workflows within executable graphical models

Patent number: 10248390

Abstract: A system and method optimizes hardware description generated from a graphical program or model automatically. The system may include a streaming optimizer, a resource sharing optimizer and a delay balancing engine. The streaming optimizer transforms one or more vector data paths in the source model to scalar data paths or to a smaller-sized vector data paths. The resource sharing optimizer may replace multiple blocks of the source model that are functionally equivalent with a single shared block. The streaming and resource sharing optimizers may also configure portions of the modified model to execute at a faster rate. The delay balancing engine may examine the modified model to determine whether any delays or latencies have been introduced. If so, the delay balancing engine may insert one or more blocks into the modified model to correct for any data path misalignment caused by the introduction of the delays or latencies.

Type: Grant

Filed: January 12, 2016

Date of Patent: April 2, 2019

Assignee: The MathWorks, Inc.

Inventors: Girish Venkataramani, Kiran Kintali
Systems and methods for automatically generating code for deep learning systems

Patent number: 10157045

Abstract: Systems and methods may automatically generate code for deep learning networks. The systems methods may provide a code generation framework for generating target specific code. The code generation framework may include one or more predefined class hierarchies for constructing objects of the generated code. The objects of the class hierarchies may provide an interface to predefined libraries of deep learning functions optimized for use on a target platform. The systems and methods may perform one or more optimizations on the code being generated.

Type: Grant

Filed: November 17, 2017

Date of Patent: December 18, 2018

Assignee: The MathWorks, Inc.

Inventors: Girish Venkataramani, Rama P. Kokku, Jayaprabha Shankar, James L. Brock, Chun-Yu Shei, Vijaya Raghavan, Yaohung Tsai
Systems and methods for mapping executable models to programmable logic device resources

Patent number: 10114917

Abstract: Systems and methods automatically generate code from an executable model. The code may be generated from one or more in-memory representations constructed for the model. The in-memory representations may be analyzed, and portions that can be mapped to DSP slices of a programmable logic device may be identified. The portions may be modified based on information for a particular programmable logic device, such as the structure of the device's DSP slices. The modifications may ensure that elements of the generated code get mapped to DSP slices, when the generated code is used to synthesize the programmable logic device.

Type: Grant

Filed: August 1, 2016

Date of Patent: October 30, 2018

Assignee: The MathWorks, Inc.

Inventors: Girish Venkataramani, Purshottam Vishwakarma, Rama Kokku
Systems and methods for estimating performance characteristics of hardware implementations of executable models

Patent number: 10078717

Abstract: Systems and methods automatically generate optimized hardware description language code for a model created in a modeling environment. A training tool selects and provides scripts to a hardware synthesis tool chain that direct the tool chain to synthesize hardware components for core components of the modeling environment. A report generated by the tool chain is evaluated to extract performance data for the core components, and the performance data is stored in a library. An optimization tool estimates the performance of the model using the performance data in the library. Based on the performance estimate and an analysis of the model, the optimization tool selects an optimization technique which it applies to the model generating a revised. Estimating performance, and selecting and applying optimizations may be repeated until a performance constraint is satisfied or a termination criterion is met.

Type: Grant

Filed: December 5, 2014

Date of Patent: September 18, 2018

Assignee: The MathWorks, Inc.

Inventors: Girish Venkataramani, Yongfeng Gu, Rama Kokku
SYSTEMS AND METHODS FOR GENERATING CODE FOR PARALLEL PROCESSING UNITS

Publication number: 20180157471

Abstract: Systems and methods generate code from a source program where the generated code may be compiled and executed on a Graphics Processing Unit (GPU). A parallel loop analysis check may be performed on regions of the source program identified for parallelization. One or more optimizations also may be applied to the source program that convert mathematical operations into a parallel form. The source program may be partitioned into segments for execution on a host and a device. Kernels may be created for the segments to be executed on the device. The size of the kernels may be determined, and memory transfers between the host and device may be optimized.

Type: Application

Filed: November 17, 2017

Publication date: June 7, 2018

Inventors: Girish Venkataramani, Rama P. Kokku, Jayaprabha Shankar, James L. Brock, Chun-Yu Shei, Vijaya Raghavan
SYSTEMS AND METHODS FOR AUTOMATICALLY GENERATING CODE FOR DEEP LEARNING SYSTEMS

Publication number: 20180136912

Abstract: Systems and methods may automatically generate code for deep learning networks. The systems methods may provide a code generation framework for generating target specific code. The code generation framework may include one or more predefined class hierarchies for constructing objects of the generated code. The objects of the class hierarchies may provide an interface to predefined libraries of deep learning functions optimized for use on a target platform. The systems and methods may perform one or more optimizations on the code being generated.

Type: Application

Filed: November 17, 2017

Publication date: May 17, 2018

Inventors: Girish Venkataramani, Rama P. Kokku, Jayaprabha Shankar, James L. Brock, Chun-Yu Shei, Vijaya Raghavan, Yaohung Tsai
Utilizing clock rate pipelining to generate code for multi-rate systems

Patent number: 9846571

Abstract: A device generates a model associated with a multi-rate system. The multi-rate system includes a system associated with a clock rate and a sample rate, and the clock rate is greater than the sample rate. The device identifies the clock rate of the multi-rate system based on the model, and identifies a portion, of the model, associated with the sample rate. The device applies clock rate pipelining to adjust the sample rate associated with the portion of the model so that the sample rate substantially equals the clock rate, and generates code associated with the model and the applied clock rate pipelining.

Type: Grant

Filed: January 14, 2015

Date of Patent: December 19, 2017

Assignee: The MathWorks, Inc.

Inventors: Girish Venkataramani, Yongfeng Gu, Wang Chen
Systems and methods for generating optimized hardware descriptions for models

Patent number: 9817931

Abstract: Systems and methods automatically generate optimized hardware description language (HDL) code for an executable model. An intermediate representation is generated for the executable model, which includes model elements. The intermediate representation includes nodes corresponding to the model elements. The HDL code is generated from the intermediate representation. A synthesis tool chain performs hardware synthesis using the HDL code. The synthesis tool chain generates performance characteristics of hardware components defined by the synthesis tool chain. The performance characteristics are mapped to the nodes of the intermediate representation, and one or more performance bottlenecks are identified. At least one optimization technique is applied to the intermediate representation to produce a revised intermediate representation, which is then used to generate new HDL code. The process may be repeated until the performance bottlenecks are eliminated or a termination criterion is met.

Type: Grant

Filed: December 5, 2014

Date of Patent: November 14, 2017

Assignee: The MathWorks, Inc.

Inventors: Yongfeng Gu, Girish Venkataramani, Rama Kokku
Model-based retiming with functional equivalence constraints

Patent number: 9779195

Abstract: A system and method tests for functional equivalence prior to automatically retiming a high-level specification. An Intermediate Representation (IR) includes one or more graphs or trees based on the high-level specification. A functional equivalence (FE) analyzer determines whether one or more components in the graph meet certain value and state conditions and thus is a candidate for retiming. A scheduler can use components that fail FE as a retiming boundary.

Type: Grant

Filed: March 6, 2015

Date of Patent: October 3, 2017

Assignee: The MathWorks, Inc.

Inventors: Yongfeng Gu, Girish Venkataramani
High throughput synchronous resource-constrained scheduling for model-based design

Patent number: 9740529

Abstract: A system and method for optimizing a system design that includes two or more components, where at least one component is to be implemented using a constrained resource. From an initial schedule, the resource having a longest span time between a start busy time slot and an end busy time slot is identified. The schedule for the other resources is then also extended to the span time. The resulting design can be made synchronous by inserting up-sampler and down-sampler function blocks before and after any strongly connected components.

Type: Grant

Filed: December 4, 2014

Date of Patent: August 22, 2017

Assignee: The MathWorks, Inc.

Inventors: Chun-Yu Shei, Girish Venkataramani
Systems and methods for hardware resource sharing

Patent number: 9710237

Abstract: A system and method optimizes hardware description generated from a graphical program or model having oversampling constraints automatically. The system may include a streaming optimizer, a resource sharing optimizer, a delay balancing engine, and a global scheduler. The streaming optimizer may transform vector data paths to scalar or smaller-sized vector data paths. The resource sharing optimizer may replace multiple, functionally equivalent blocks with a single shared block. The delay balancing may insert one or more elements to correct for data path misalignment. The global scheduler may place portions of the program or model into conditional execution sections and create control logic that controls the model sample times or steps that the portions are enabled. A validation model, a report, or hardware description code that utilizes fewer hardware resources may be generated from a modified version of the model that is created.

Type: Grant

Filed: June 27, 2016

Date of Patent: July 18, 2017

Assignee: The MathWorks, Inc.

Inventor: Girish Venkataramani
Systems and methods for hardware resource sharing

Patent number: 9658835

Abstract: A system and method optimizes hardware description generated from a graphical program or model having oversampling constraints automatically. The system may include a streaming optimizer, a resource sharing optimizer, a delay balancing engine, and a global scheduler. The streaming optimizer may transform vector data paths to scalar or smaller-sized vector data paths. The resource sharing optimizer may replace multiple, functionally equivalent blocks with a single shared block. The delay balancing may insert one or more elements to correct for data path misalignment. The global scheduler may place portions of the program or model into conditional execution sections and create control logic that controls the model sample times or steps that the portions are enabled. A validation model, a report, or hardware description code that utilizes fewer hardware resources may be generated from a modified version of the model that is created.

Type: Grant

Filed: June 27, 2016

Date of Patent: May 23, 2017

Assignee: The MathWorks, Inc.

Inventor: Girish Venkataramani
Systems and methods for optimizing executable models for hardware synthesis

Patent number: 9454627

Abstract: Systems and methods optimize hardware description generated from a graphical model automatically. The system may include an optimizer. The optimizer may add a serializer component and a deserializer component to the model. The serializer component may receive parallel data and may produce serial data. The serializer may introduce one or more idle cycles into the serial data being produced. The deserializer component may receive serial data and may produce parallel data. The serializer and deserializer components may receive and generate control signals. The control signals may include a valid signal for indicating valid data elements of the serial and parallel data, and a start the start signal for indicating the beginning of a new frame or cycle when constructing parallel data from serial data.

Type: Grant

Filed: March 6, 2015

Date of Patent: September 27, 2016

Assignee: The MathWorks, Inc.

Inventors: Girish Venkataramani, Kiran K. Kintali, Wei Zang, Wang Chen
Systems and methods for hardware resource sharing

Patent number: 9436441

Abstract: A system and method optimizes hardware description generated from a graphical program or model having oversampling constraints automatically. The system may include a streaming optimizer, a resource sharing optimizer, a delay balancing engine, and a global scheduler. The streaming optimizer may transform vector data paths to scalar or smaller-sized vector data paths. The resource sharing optimizer may replace multiple, functionally equivalent blocks with a single shared block. The delay balancing may insert one or more elements to correct for data path misalignment. The global scheduler may place portions of the program or model into conditional execution sections and create control logic that controls the model sample times or steps that the portions are enabled. A validation model, a report, or hardware description code that utilizes fewer hardware resources may be generated from a modified version of the model that is created.

Type: Grant

Filed: December 5, 2013

Date of Patent: September 6, 2016

Assignee: The MathWorks, Inc.

Inventor: Girish Venkataramani

1 2 next