Patents by Inventor Girish Venkataramani

Girish Venkataramani has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210174214
    Abstract: Systems and methods quantize an application having a trained Deep Neural Network (DNN) for deployment on target hardware. The application may be instrumented to observe data values generated during execution of the application. Statistics may be generated for the observed data values and presented in a visualization tool. The application may be quantized through a rules based approach. The quantization may be based on the statistics and on constraints imposed by resources available at the target hardware. The systems and methods may present the proposed data types resulting from the quantization and may create a quantized version of the application incorporating the proposed data types. The systems and methods may generate performance data to validate the quantized version of the application. Changes to the rules may be made and the quantization process repeated if the performance is not satisfactory.
    Type: Application
    Filed: December 1, 2020
    Publication date: June 10, 2021
    Inventors: Vaidehi Venkatesan, Jayaprabha Shankar, Shixin Zhuang, Girish Venkataramani, FNU Hanumantharayappa
  • Patent number: 11023360
    Abstract: Systems and methods may configure a programmable logic device to efficiently run a deep learning (DL) network. Architecture code and algorithmic code may be generated. The architecture code may define convolutional and fully connected processor cores structured to run the layers of a Deep Neural Network (DNN). The processor cores may be interconnected by a First In First Out (FIFO) memory. The architecture code may also define stride-efficient memories for implementing convolution. The algorithmic code may include configuration instructions for running the DNN's layers at the processor cores. The algorithmic code may also include a schedule for executing the configuration instructions on the processor cores, for moving network parameters to the processor cores, and for transferring outputs between the layers.
    Type: Grant
    Filed: February 7, 2019
    Date of Patent: June 1, 2021
    Assignee: The MathWorks, Inc.
    Inventors: Yongfeng Gu, Girish Venkataramani, Wang Chen, Bharathi Yogaraj, Yuteng Zhou, Vibha Patil, Anusha Vasantala, Purshottam Vishwakarma
  • Patent number: 10949182
    Abstract: Systems and methods generate code from a source program where the generated code may be compiled and executed on a Graphics Processing Unit (GPU). A parallel loop analysis check may be performed on regions of the source program identified for parallelization. One or more optimizations also may be applied to the source program that convert mathematical operations into a parallel form. The source program may be partitioned into segments for execution on a host and a device. Kernels may be created for the segments to be executed on the device. The size of the kernels may be determined, and memory transfers between the host and device may be optimized.
    Type: Grant
    Filed: November 17, 2017
    Date of Patent: March 16, 2021
    Assignee: The MathWorks, Inc.
    Inventors: Girish Venkataramani, Rama P. Kokku, Jayaprabha Shankar, James L. Brock, Chun-Yu Shei, Vijaya Raghavan
  • Publication number: 20200151088
    Abstract: Systems and methods may configure a programmable logic device to efficiently run a deep learning (DL) network. Architecture code and algorithmic code may be generated. The architecture code may define convolutional and fully connected processor cores structured to run the layers of a Deep Neural Network (DNN). The processor cores may be interconnected by a First In First Out (FIFO) memory. The architecture code may also define stride-efficient memories for implementing convolution. The algorithmic code may include configuration instructions for running the DNN's layers at the processor cores. The algorithmic code may also include a schedule for executing the configuration instructions on the processor cores, for moving network parameters to the processor cores, and for transferring outputs between the layers.
    Type: Application
    Filed: February 7, 2019
    Publication date: May 14, 2020
    Inventors: Yongfeng Gu, Girish Venkataramani, Wang Chen, Bharathi Yogaraj, Yuteng Zhou, Vibha Patil, Anusha Vasantala, Purshottam Vishwakarma
  • Patent number: 10423733
    Abstract: A system and method generates optimized code for a source model. The system may include a resource sharing optimizer that evaluates the source model and replaces multiple model elements of the source model that are functionally equivalent with a single shared model element. The model elements replaced with the single shared model element may have different fixed point data types. The resource sharing optimizer may convert some of the fixed point data types to a common fixed point data type.
    Type: Grant
    Filed: April 14, 2016
    Date of Patent: September 24, 2019
    Assignee: The MathWorks, Inc.
    Inventors: Girish Venkataramani, Yongfeng Gu, Rama Kokku, Sanmukh Rao Kuppannagari
  • Patent number: 10387584
    Abstract: A method may include receiving functional model information regarding a set of functional blocks associated with a functional model. The functional model may include a streaming algorithm for exchanging streaming data. The method may include receiving architectural model information regarding physical devices included in a target device from a hardware-software co-design platform. The physical devices may include a software based processing device and a hardware based processing device. The method may include mapping the functional blocks to the physical devices to allow the streaming data to be communicated between the software based processing device and the hardware based processing device. The method may include generating a streaming interface to model communication of the streaming data between the software based processing device and the hardware based processing device.
    Type: Grant
    Filed: December 3, 2014
    Date of Patent: August 20, 2019
    Assignee: The MathWorks, Inc.
    Inventors: Katalin Maria Popovici, Rajiv Ghosh-Roy, Senthilkumar Manickavasagam, Wang Chen, Girish Venkataramani, Wei Zang, Abhijeet H. Gadkari, Matthew H. Fornero
  • Patent number: 10248390
    Abstract: A system and method optimizes hardware description generated from a graphical program or model automatically. The system may include a streaming optimizer, a resource sharing optimizer and a delay balancing engine. The streaming optimizer transforms one or more vector data paths in the source model to scalar data paths or to a smaller-sized vector data paths. The resource sharing optimizer may replace multiple blocks of the source model that are functionally equivalent with a single shared block. The streaming and resource sharing optimizers may also configure portions of the modified model to execute at a faster rate. The delay balancing engine may examine the modified model to determine whether any delays or latencies have been introduced. If so, the delay balancing engine may insert one or more blocks into the modified model to correct for any data path misalignment caused by the introduction of the delays or latencies.
    Type: Grant
    Filed: January 12, 2016
    Date of Patent: April 2, 2019
    Assignee: The MathWorks, Inc.
    Inventors: Girish Venkataramani, Kiran Kintali
  • Patent number: 10157045
    Abstract: Systems and methods may automatically generate code for deep learning networks. The systems methods may provide a code generation framework for generating target specific code. The code generation framework may include one or more predefined class hierarchies for constructing objects of the generated code. The objects of the class hierarchies may provide an interface to predefined libraries of deep learning functions optimized for use on a target platform. The systems and methods may perform one or more optimizations on the code being generated.
    Type: Grant
    Filed: November 17, 2017
    Date of Patent: December 18, 2018
    Assignee: The MathWorks, Inc.
    Inventors: Girish Venkataramani, Rama P. Kokku, Jayaprabha Shankar, James L. Brock, Chun-Yu Shei, Vijaya Raghavan, Yaohung Tsai
  • Patent number: 10114917
    Abstract: Systems and methods automatically generate code from an executable model. The code may be generated from one or more in-memory representations constructed for the model. The in-memory representations may be analyzed, and portions that can be mapped to DSP slices of a programmable logic device may be identified. The portions may be modified based on information for a particular programmable logic device, such as the structure of the device's DSP slices. The modifications may ensure that elements of the generated code get mapped to DSP slices, when the generated code is used to synthesize the programmable logic device.
    Type: Grant
    Filed: August 1, 2016
    Date of Patent: October 30, 2018
    Assignee: The MathWorks, Inc.
    Inventors: Girish Venkataramani, Purshottam Vishwakarma, Rama Kokku
  • Patent number: 10078717
    Abstract: Systems and methods automatically generate optimized hardware description language code for a model created in a modeling environment. A training tool selects and provides scripts to a hardware synthesis tool chain that direct the tool chain to synthesize hardware components for core components of the modeling environment. A report generated by the tool chain is evaluated to extract performance data for the core components, and the performance data is stored in a library. An optimization tool estimates the performance of the model using the performance data in the library. Based on the performance estimate and an analysis of the model, the optimization tool selects an optimization technique which it applies to the model generating a revised. Estimating performance, and selecting and applying optimizations may be repeated until a performance constraint is satisfied or a termination criterion is met.
    Type: Grant
    Filed: December 5, 2014
    Date of Patent: September 18, 2018
    Assignee: The MathWorks, Inc.
    Inventors: Girish Venkataramani, Yongfeng Gu, Rama Kokku
  • Publication number: 20180157471
    Abstract: Systems and methods generate code from a source program where the generated code may be compiled and executed on a Graphics Processing Unit (GPU). A parallel loop analysis check may be performed on regions of the source program identified for parallelization. One or more optimizations also may be applied to the source program that convert mathematical operations into a parallel form. The source program may be partitioned into segments for execution on a host and a device. Kernels may be created for the segments to be executed on the device. The size of the kernels may be determined, and memory transfers between the host and device may be optimized.
    Type: Application
    Filed: November 17, 2017
    Publication date: June 7, 2018
    Inventors: Girish Venkataramani, Rama P. Kokku, Jayaprabha Shankar, James L. Brock, Chun-Yu Shei, Vijaya Raghavan
  • Publication number: 20180136912
    Abstract: Systems and methods may automatically generate code for deep learning networks. The systems methods may provide a code generation framework for generating target specific code. The code generation framework may include one or more predefined class hierarchies for constructing objects of the generated code. The objects of the class hierarchies may provide an interface to predefined libraries of deep learning functions optimized for use on a target platform. The systems and methods may perform one or more optimizations on the code being generated.
    Type: Application
    Filed: November 17, 2017
    Publication date: May 17, 2018
    Inventors: Girish Venkataramani, Rama P. Kokku, Jayaprabha Shankar, James L. Brock, Chun-Yu Shei, Vijaya Raghavan, Yaohung Tsai
  • Patent number: 9846571
    Abstract: A device generates a model associated with a multi-rate system. The multi-rate system includes a system associated with a clock rate and a sample rate, and the clock rate is greater than the sample rate. The device identifies the clock rate of the multi-rate system based on the model, and identifies a portion, of the model, associated with the sample rate. The device applies clock rate pipelining to adjust the sample rate associated with the portion of the model so that the sample rate substantially equals the clock rate, and generates code associated with the model and the applied clock rate pipelining.
    Type: Grant
    Filed: January 14, 2015
    Date of Patent: December 19, 2017
    Assignee: The MathWorks, Inc.
    Inventors: Girish Venkataramani, Yongfeng Gu, Wang Chen
  • Patent number: 9817931
    Abstract: Systems and methods automatically generate optimized hardware description language (HDL) code for an executable model. An intermediate representation is generated for the executable model, which includes model elements. The intermediate representation includes nodes corresponding to the model elements. The HDL code is generated from the intermediate representation. A synthesis tool chain performs hardware synthesis using the HDL code. The synthesis tool chain generates performance characteristics of hardware components defined by the synthesis tool chain. The performance characteristics are mapped to the nodes of the intermediate representation, and one or more performance bottlenecks are identified. At least one optimization technique is applied to the intermediate representation to produce a revised intermediate representation, which is then used to generate new HDL code. The process may be repeated until the performance bottlenecks are eliminated or a termination criterion is met.
    Type: Grant
    Filed: December 5, 2014
    Date of Patent: November 14, 2017
    Assignee: The MathWorks, Inc.
    Inventors: Yongfeng Gu, Girish Venkataramani, Rama Kokku
  • Patent number: 9779195
    Abstract: A system and method tests for functional equivalence prior to automatically retiming a high-level specification. An Intermediate Representation (IR) includes one or more graphs or trees based on the high-level specification. A functional equivalence (FE) analyzer determines whether one or more components in the graph meet certain value and state conditions and thus is a candidate for retiming. A scheduler can use components that fail FE as a retiming boundary.
    Type: Grant
    Filed: March 6, 2015
    Date of Patent: October 3, 2017
    Assignee: The MathWorks, Inc.
    Inventors: Yongfeng Gu, Girish Venkataramani
  • Patent number: 9740529
    Abstract: A system and method for optimizing a system design that includes two or more components, where at least one component is to be implemented using a constrained resource. From an initial schedule, the resource having a longest span time between a start busy time slot and an end busy time slot is identified. The schedule for the other resources is then also extended to the span time. The resulting design can be made synchronous by inserting up-sampler and down-sampler function blocks before and after any strongly connected components.
    Type: Grant
    Filed: December 4, 2014
    Date of Patent: August 22, 2017
    Assignee: The MathWorks, Inc.
    Inventors: Chun-Yu Shei, Girish Venkataramani
  • Patent number: 9710237
    Abstract: A system and method optimizes hardware description generated from a graphical program or model having oversampling constraints automatically. The system may include a streaming optimizer, a resource sharing optimizer, a delay balancing engine, and a global scheduler. The streaming optimizer may transform vector data paths to scalar or smaller-sized vector data paths. The resource sharing optimizer may replace multiple, functionally equivalent blocks with a single shared block. The delay balancing may insert one or more elements to correct for data path misalignment. The global scheduler may place portions of the program or model into conditional execution sections and create control logic that controls the model sample times or steps that the portions are enabled. A validation model, a report, or hardware description code that utilizes fewer hardware resources may be generated from a modified version of the model that is created.
    Type: Grant
    Filed: June 27, 2016
    Date of Patent: July 18, 2017
    Assignee: The MathWorks, Inc.
    Inventor: Girish Venkataramani
  • Patent number: 9658835
    Abstract: A system and method optimizes hardware description generated from a graphical program or model having oversampling constraints automatically. The system may include a streaming optimizer, a resource sharing optimizer, a delay balancing engine, and a global scheduler. The streaming optimizer may transform vector data paths to scalar or smaller-sized vector data paths. The resource sharing optimizer may replace multiple, functionally equivalent blocks with a single shared block. The delay balancing may insert one or more elements to correct for data path misalignment. The global scheduler may place portions of the program or model into conditional execution sections and create control logic that controls the model sample times or steps that the portions are enabled. A validation model, a report, or hardware description code that utilizes fewer hardware resources may be generated from a modified version of the model that is created.
    Type: Grant
    Filed: June 27, 2016
    Date of Patent: May 23, 2017
    Assignee: The MathWorks, Inc.
    Inventor: Girish Venkataramani
  • Patent number: 9454627
    Abstract: Systems and methods optimize hardware description generated from a graphical model automatically. The system may include an optimizer. The optimizer may add a serializer component and a deserializer component to the model. The serializer component may receive parallel data and may produce serial data. The serializer may introduce one or more idle cycles into the serial data being produced. The deserializer component may receive serial data and may produce parallel data. The serializer and deserializer components may receive and generate control signals. The control signals may include a valid signal for indicating valid data elements of the serial and parallel data, and a start the start signal for indicating the beginning of a new frame or cycle when constructing parallel data from serial data.
    Type: Grant
    Filed: March 6, 2015
    Date of Patent: September 27, 2016
    Assignee: The MathWorks, Inc.
    Inventors: Girish Venkataramani, Kiran K. Kintali, Wei Zang, Wang Chen
  • Patent number: 9436441
    Abstract: A system and method optimizes hardware description generated from a graphical program or model having oversampling constraints automatically. The system may include a streaming optimizer, a resource sharing optimizer, a delay balancing engine, and a global scheduler. The streaming optimizer may transform vector data paths to scalar or smaller-sized vector data paths. The resource sharing optimizer may replace multiple, functionally equivalent blocks with a single shared block. The delay balancing may insert one or more elements to correct for data path misalignment. The global scheduler may place portions of the program or model into conditional execution sections and create control logic that controls the model sample times or steps that the portions are enabled. A validation model, a report, or hardware description code that utilizes fewer hardware resources may be generated from a modified version of the model that is created.
    Type: Grant
    Filed: December 5, 2013
    Date of Patent: September 6, 2016
    Assignee: The MathWorks, Inc.
    Inventor: Girish Venkataramani