Patents by Inventor Brucek Khailany

Brucek Khailany has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Scalable multi-die deep learning system

Patent number: 11769040

Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture implemented on a semiconductor package. The package includes multiple chips, each with a central processing element, a global memory buffer, and processing elements. Each processing element includes a weight buffer, an activation buffer, and multiply-accumulate units to combine, in parallel, the weight values and the activation values.

Type: Grant

Filed: July 19, 2019

Date of Patent: September 26, 2023

Assignee: NVIDIA CORP.

Inventors: Yakun Shao, Rangharajan Venkatesan, Nan Jiang, Brian Matthew Zimmer, Jason Clemons, Nathaniel Pinckney, Matthew R Fojtik, William James Dally, Joel S. Emer, Stephen W. Keckler, Brucek Khailany
IR drop prediction with maximum convolutional neural network

Patent number: 11645533

Abstract: IR drop predictions are obtained using a maximum convolutional neural network. A circuit structure is partitioned into a grid. For cells of the circuit structure in sub-intervals of a clock period, power consumption of the cell is amortized into a set of grid tiles that include portions of the cell, thus forming a set of power maps. The power maps are applied to a neural network to generate IR drop predictions for the circuit structure.

Type: Grant

Filed: March 17, 2020

Date of Patent: May 9, 2023

Assignee: NVIDIA Corp.

Inventors: Zhiyao Xie, Haoxing Ren, Brucek Khailany, Sheng Ye
QUANTIZED NEURAL NETWORK TRAINING AND INFERENCE

Publication number: 20230068941

Abstract: One embodiment of a computer-implemented method for processing a neural network comprises receiving a first quantized matrix that corresponds to a portion of a multi-dimensional input tensor and has been quantized based on a first scale factor; and performing one or more computational operations using the first quantized matrix and the first scale factor to generate one or more data values that correspond to a first portion of a multi-dimensional output tensor.

Type: Application

Filed: February 11, 2022

Publication date: March 2, 2023

Inventors: Thierry TAMBE, Steve DAI, Brucek KHAILANY, Rangharajan VENKATESAN
Efficient Neural Network Accelerator Dataflows

Publication number: 20220076110

Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.

Type: Application

Filed: November 19, 2021

Publication date: March 10, 2022

Applicant: NVIDIA Corp.

Inventors: Yakun Shao, Rangharajan Venkatesan, Miaorong Wang, Daniel Smith, William James Dally, Joel Emer, Stephen W. Keckler, Brucek Khailany
Efficient neural network accelerator dataflows

Patent number: 11270197

Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.

Type: Grant

Filed: November 4, 2019

Date of Patent: March 8, 2022

Assignee: NVIDIA Corp.

Inventors: Yakun Shao, Rangharajan Venkatesan, Miaorong Wang, Daniel Smith, William James Dally, Joel Emer, Stephen W. Keckler, Brucek Khailany
EFFICIENT SOFTMAX COMPUTATION

Publication number: 20220067513

Abstract: Solutions improving efficiency of Softmax computation applied for efficient deep learning inference in transformers and other neural networks. The solutions utilize a reduced-precision implementation of various operations in Softmax, replacing ex with 2x to reduce instruction overhead associated with computing ex, and replacing floating point max computation with integer max computation. Further described is a scalable implementation that decomposes Softmax into UnNormalized Softmax and Normalization operations.

Type: Application

Filed: December 4, 2020

Publication date: March 3, 2022

Applicant: NVIDIA Corp.

Inventors: Jacob Robert Stevens, Rangharajan Venkatesan, Steve Haihang Dai, Brucek Khailany
AVERAGE POWER ESTIMATION USING GRAPH NEURAL NETWORKS

Publication number: 20210158155

Abstract: A graph neural network for average power estimation of netlists is trained with register toggle rates over a power window from an RTL simulation and gate level netlists as input features. Combinational gate toggle rates are applied as labels. The trained graph neural network is then applied to infer combinational gate toggle rates over a different power window of interest and/or different netlist.

Type: Application

Filed: August 13, 2020

Publication date: May 27, 2021

Applicant: NVIDIA Corp.

Inventors: Yanqing Zhang, Haoxing Ren, Brucek Khailany
IR DROP PREDICTION WITH MAXIMUM CONVOLUTIONAL NEURAL NETWORK

Publication number: 20200327417

Abstract: IR drop predictions are obtained using a maximum convolutional neural network. A circuit structure is partitioned into a grid. For cells of the circuit structure in sub-intervals of a clock period, power consumption of the cell is amortized into a set of grid tiles that include portions of the cell, thus forming a set of power maps. The power maps are applied to a neural network to generate IR drop predictions for the circuit structure.

Type: Application

Filed: March 17, 2020

Publication date: October 15, 2020

Applicant: NVIDIA Corp.

Inventors: Zhiyao Xie, Haoxing Ren, Brucek Khailany, Sheng Ye
EFFICIENT NEURAL NETWORK ACCELERATOR DATAFLOWS

Publication number: 20200293867

Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.

Type: Application

Filed: November 4, 2019

Publication date: September 17, 2020

Applicant: NVIDIA Corp.

Inventors: Yakun Shao, Rangharajan Venkatesan, Miaorong Wang, Daniel Smith, William James Dally, Joel Emer, Stephen W. Keckler, Brucek Khailany
Deep learning testability analysis with graph convolutional networks

Patent number: 10657306

Abstract: Techniques to improve the accuracy and speed for detection and remediation of difficult to test nodes in a circuit design netlist. The techniques utilize improved netlist representations, test point insertion, and trained neural networks.

Type: Grant

Filed: July 24, 2019

Date of Patent: May 19, 2020

Assignee: NVIDIA Corp.

Inventors: Yuzhe Ma, Haoxing Ren, Brucek Khailany, Harbinder Sikka, Lijuan Luo, Karthikeyan Natarajan
Deep Learning Testability Analysis with Graph Convolutional Networks

Publication number: 20200151288

Abstract: Techniques to improve the accuracy and speed for detection and remediation of difficult to test nodes in a circuit design netlist. The techniques utilize improved netlist representations, test point insertion, and trained neural networks.

Type: Application

Filed: July 24, 2019

Publication date: May 14, 2020

Applicant: NVIDIA Corp.

Inventors: Yuzhe Ma, Haoxing Ren, Brucek Khailany, Harbinder Sikka, Lijuan Luo, Karthikeyan Natarajan
SCALABLE MULTI-DIE DEEP LEARNING SYSTEM

Publication number: 20200082246

Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture implemented on a semiconductor package. The package includes multiple chips, each with a central processing element, a global memory buffer, and processing elements. Each processing element includes a weight buffer, an activation buffer, and multiply-accumulate units to combine, in parallel, the weight values and the activation values.

Type: Application

Filed: July 19, 2019

Publication date: March 12, 2020

Applicant: NVIDIA Corp.

Inventors: Yakun Shao, Rangharajan Venkatesan, Nan Jiang, Brian Matthew Zimmer, Jason Clemons, Nathaniel Pinckney, Matthew R. Fojtik, William James Dally, Joel S. Emer, Stephen W. Keckler, Brucek Khailany
Machine learning based post route path delay estimator from synthesis netlist

Patent number: 10489542

Abstract: A neural network including an embedding layer to receive a gate function vector and an embedding width and alter a shape of the gate function vector by the embedding width, a concatenator to receive a gate feature input vector and concatenate the gate feature input vector with the gate function vector altered by the embedding width, a convolution layer to receive a window size, stride, and output feature size and generate an output convolution vector with a shape based on a length of the gate function vector, the window size of the convolution layer, and the output feature size of the convolution layer, and a fully connected layer to reduce the gate output convolution vector to a final path delay output.

Type: Grant

Filed: April 24, 2018

Date of Patent: November 26, 2019

Assignee: NVIDIA Corp.

Inventors: Mark Ren, Brucek Khailany
Machine learning based post route path delay estimator from synthesis netlist

Publication number: 20190325092

Abstract: A neural network including an embedding layer to receive a gate function vector and an embedding width and alter a shape of the gate function vector by the embedding width, a concatenator to receive a gate feature input vector and concatenate the gate feature input vector with the gate function vector altered by the embedding width, a convolution layer to receive a window size, stride, and output feature size and generate an output convolution vector with a shape based on a length of the gate function vector, the window size of the convolution layer, and the output feature size of the convolution layer, and a fully connected layer to reduce the gate output convolution vector to a final path delay output.

Type: Application

Filed: April 24, 2018

Publication date: October 24, 2019

Inventors: Mark Ren, Brucek Khailany
Tracing command execution in a parallel processing system

Patent number: 8694757

Abstract: Tracing command execution in a data processing system having a host processor and a co-processor. The host processor maintains a record of a plurality of commands for the co-processor, storing each of the plurality of commands is stored in a command queue. Hardware trace logic is provided to store one or more events based, at least in part, on transfer of the plurality of commands to a small memory. Software is executed to store the one or more events to a main memory, wherein the one or more events are aggregated into a single memory trace within the main memory.

Type: Grant

Filed: August 15, 2008

Date of Patent: April 8, 2014

Assignee: Calos Fund Limited Liability Company

Inventors: Brucek Khailany, Mark Rygh, Jim Jian Lin, Udo Uebel
Data exchange and communication between execution units in a parallel processor

Patent number: 8412917

Abstract: Disclosed are methods and systems for dynamically determining data-transfer paths. The data-transfer paths are dynamically determined in response to an instruction that facilitates data transfer among execution lanes in an integrated-circuit processing device operable to execute operations in parallel. In addition, embodiments include an integrated-circuit processing device operable to execute operations in parallel, including the capability of providing confirmation information to potential source lanes, the confirmation information indicating whether the potential source lanes may send data to requested destination lanes during a data-transfer interval.

Type: Grant

Filed: September 20, 2011

Date of Patent: April 2, 2013

Assignee: Calos Fund Limited Liability Company

Inventors: Brucek Khailany, William James Dally, Ujval J. Kapasi, Jim Jian Lin
Processor with enhanced combined-arithmetic capability

Patent number: 8122078

Abstract: A method of operation within an integrated-circuit processing device having an enhanced combined-arithmetic capability. In response to an instruction indicating a combined arithmetic operation, the processor generates a dot-product of first and second operands, adds the dot-product to an accumulated value, and then outputs the sum of the accumulated value and the dot-product.

Type: Grant

Filed: October 9, 2007

Date of Patent: February 21, 2012

Assignee: Calos Fund, LLC

Inventors: Brucek Khailany, William James Dally, Raghunath Rao, DeForest Tovey
DATA EXCHANGE AND COMMUNICATION BETWEEN EXECUTION UNITS IN A PARALLEL PROCESSOR

Publication number: 20120011349

Abstract: Disclosed are methods and systems for dynamically determining data-transfer paths. The data-transfer pats are determined in response to an instruction that facilitates data transfer among execution lanes in an integrated-circuit processing device operable to execute operations in parallel.

Type: Application

Filed: September 20, 2011

Publication date: January 12, 2012

Applicant: Calos Fund Limited Liability Company

Inventors: Brucek Khailany, William James Dally, Ujval J. Kapasi, Jim Jian Lin, Raghunath Rao, DeForest Tovey, Mark Rygh, Jung-Ho Ahn
Data exchange and communication between execution units in a parallel processor

Patent number: 8024553

Abstract: A method of operation within an integrated-circuit processing device having a plurality of execution lanes. Upon receiving an instruction to exchange data between the execution lanes, respective requests from the execution lanes are examined to determine a set of the execution lanes that may send data to one or more others of the execution lanes during a first interval. Each execution lane within the set of the execution lanes is signaled to indicate that the execution lane may send data to the one or others of the execution lanes.

Type: Grant

Filed: August 15, 2008

Date of Patent: September 20, 2011

Assignee: Calos Fund Limited Liability Company

Inventors: Brucek Khailany, William James Dally, Ujval J. Kapasi, Jim Jian Lin
APPARATUS AND METHOD FOR LOADING AND STORING MULTI-DIMENSIONAL ARRAYS OF DATA IN A PARALLEL PROCESSING UNIT

Publication number: 20100257329

Abstract: An application programming interface is disclosed for loading and storing multidimensional arrays of data between a data parallel processing unit and an external memory. Physical addresses reference the external memory and define two-dimensional arrays of data storage locations corresponding to data records. The data parallel processing unit has multiple processing lanes to parallel process data records residing in respective register files. The interface comprises an X-dimension function call parameter to define an X-dimension in the memory array corresponding to a record for one lane and a Y-dimension function call parameter to define a Y-dimension in the memory array corresponding to the record for one lane. The X-dimension and Y-dimension function call parameters cooperate to generate memory accesses corresponding to the records.

Type: Application

Filed: August 6, 2009

Publication date: October 7, 2010

Inventors: Brucek Khailany, Nuwan Jayasena, Brian Pharris, Timothy Southgate

1 2 next