Patents by Inventor Brucek Khailany
Brucek Khailany has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11769040Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture implemented on a semiconductor package. The package includes multiple chips, each with a central processing element, a global memory buffer, and processing elements. Each processing element includes a weight buffer, an activation buffer, and multiply-accumulate units to combine, in parallel, the weight values and the activation values.Type: GrantFiled: July 19, 2019Date of Patent: September 26, 2023Assignee: NVIDIA CORP.Inventors: Yakun Shao, Rangharajan Venkatesan, Nan Jiang, Brian Matthew Zimmer, Jason Clemons, Nathaniel Pinckney, Matthew R Fojtik, William James Dally, Joel S. Emer, Stephen W. Keckler, Brucek Khailany
-
Patent number: 11645533Abstract: IR drop predictions are obtained using a maximum convolutional neural network. A circuit structure is partitioned into a grid. For cells of the circuit structure in sub-intervals of a clock period, power consumption of the cell is amortized into a set of grid tiles that include portions of the cell, thus forming a set of power maps. The power maps are applied to a neural network to generate IR drop predictions for the circuit structure.Type: GrantFiled: March 17, 2020Date of Patent: May 9, 2023Assignee: NVIDIA Corp.Inventors: Zhiyao Xie, Haoxing Ren, Brucek Khailany, Sheng Ye
-
Publication number: 20230068941Abstract: One embodiment of a computer-implemented method for processing a neural network comprises receiving a first quantized matrix that corresponds to a portion of a multi-dimensional input tensor and has been quantized based on a first scale factor; and performing one or more computational operations using the first quantized matrix and the first scale factor to generate one or more data values that correspond to a first portion of a multi-dimensional output tensor.Type: ApplicationFiled: February 11, 2022Publication date: March 2, 2023Inventors: Thierry TAMBE, Steve DAI, Brucek KHAILANY, Rangharajan VENKATESAN
-
Publication number: 20220076110Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.Type: ApplicationFiled: November 19, 2021Publication date: March 10, 2022Applicant: NVIDIA Corp.Inventors: Yakun Shao, Rangharajan Venkatesan, Miaorong Wang, Daniel Smith, William James Dally, Joel Emer, Stephen W. Keckler, Brucek Khailany
-
Patent number: 11270197Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.Type: GrantFiled: November 4, 2019Date of Patent: March 8, 2022Assignee: NVIDIA Corp.Inventors: Yakun Shao, Rangharajan Venkatesan, Miaorong Wang, Daniel Smith, William James Dally, Joel Emer, Stephen W. Keckler, Brucek Khailany
-
Publication number: 20220067513Abstract: Solutions improving efficiency of Softmax computation applied for efficient deep learning inference in transformers and other neural networks. The solutions utilize a reduced-precision implementation of various operations in Softmax, replacing ex with 2x to reduce instruction overhead associated with computing ex, and replacing floating point max computation with integer max computation. Further described is a scalable implementation that decomposes Softmax into UnNormalized Softmax and Normalization operations.Type: ApplicationFiled: December 4, 2020Publication date: March 3, 2022Applicant: NVIDIA Corp.Inventors: Jacob Robert Stevens, Rangharajan Venkatesan, Steve Haihang Dai, Brucek Khailany
-
Publication number: 20210158155Abstract: A graph neural network for average power estimation of netlists is trained with register toggle rates over a power window from an RTL simulation and gate level netlists as input features. Combinational gate toggle rates are applied as labels. The trained graph neural network is then applied to infer combinational gate toggle rates over a different power window of interest and/or different netlist.Type: ApplicationFiled: August 13, 2020Publication date: May 27, 2021Applicant: NVIDIA Corp.Inventors: Yanqing Zhang, Haoxing Ren, Brucek Khailany
-
Publication number: 20200327417Abstract: IR drop predictions are obtained using a maximum convolutional neural network. A circuit structure is partitioned into a grid. For cells of the circuit structure in sub-intervals of a clock period, power consumption of the cell is amortized into a set of grid tiles that include portions of the cell, thus forming a set of power maps. The power maps are applied to a neural network to generate IR drop predictions for the circuit structure.Type: ApplicationFiled: March 17, 2020Publication date: October 15, 2020Applicant: NVIDIA Corp.Inventors: Zhiyao Xie, Haoxing Ren, Brucek Khailany, Sheng Ye
-
Publication number: 20200293867Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.Type: ApplicationFiled: November 4, 2019Publication date: September 17, 2020Applicant: NVIDIA Corp.Inventors: Yakun Shao, Rangharajan Venkatesan, Miaorong Wang, Daniel Smith, William James Dally, Joel Emer, Stephen W. Keckler, Brucek Khailany
-
Patent number: 10657306Abstract: Techniques to improve the accuracy and speed for detection and remediation of difficult to test nodes in a circuit design netlist. The techniques utilize improved netlist representations, test point insertion, and trained neural networks.Type: GrantFiled: July 24, 2019Date of Patent: May 19, 2020Assignee: NVIDIA Corp.Inventors: Yuzhe Ma, Haoxing Ren, Brucek Khailany, Harbinder Sikka, Lijuan Luo, Karthikeyan Natarajan
-
Publication number: 20200151288Abstract: Techniques to improve the accuracy and speed for detection and remediation of difficult to test nodes in a circuit design netlist. The techniques utilize improved netlist representations, test point insertion, and trained neural networks.Type: ApplicationFiled: July 24, 2019Publication date: May 14, 2020Applicant: NVIDIA Corp.Inventors: Yuzhe Ma, Haoxing Ren, Brucek Khailany, Harbinder Sikka, Lijuan Luo, Karthikeyan Natarajan
-
Publication number: 20200082246Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture implemented on a semiconductor package. The package includes multiple chips, each with a central processing element, a global memory buffer, and processing elements. Each processing element includes a weight buffer, an activation buffer, and multiply-accumulate units to combine, in parallel, the weight values and the activation values.Type: ApplicationFiled: July 19, 2019Publication date: March 12, 2020Applicant: NVIDIA Corp.Inventors: Yakun Shao, Rangharajan Venkatesan, Nan Jiang, Brian Matthew Zimmer, Jason Clemons, Nathaniel Pinckney, Matthew R. Fojtik, William James Dally, Joel S. Emer, Stephen W. Keckler, Brucek Khailany
-
Patent number: 10489542Abstract: A neural network including an embedding layer to receive a gate function vector and an embedding width and alter a shape of the gate function vector by the embedding width, a concatenator to receive a gate feature input vector and concatenate the gate feature input vector with the gate function vector altered by the embedding width, a convolution layer to receive a window size, stride, and output feature size and generate an output convolution vector with a shape based on a length of the gate function vector, the window size of the convolution layer, and the output feature size of the convolution layer, and a fully connected layer to reduce the gate output convolution vector to a final path delay output.Type: GrantFiled: April 24, 2018Date of Patent: November 26, 2019Assignee: NVIDIA Corp.Inventors: Mark Ren, Brucek Khailany
-
Publication number: 20190325092Abstract: A neural network including an embedding layer to receive a gate function vector and an embedding width and alter a shape of the gate function vector by the embedding width, a concatenator to receive a gate feature input vector and concatenate the gate feature input vector with the gate function vector altered by the embedding width, a convolution layer to receive a window size, stride, and output feature size and generate an output convolution vector with a shape based on a length of the gate function vector, the window size of the convolution layer, and the output feature size of the convolution layer, and a fully connected layer to reduce the gate output convolution vector to a final path delay output.Type: ApplicationFiled: April 24, 2018Publication date: October 24, 2019Inventors: Mark Ren, Brucek Khailany
-
Patent number: 8694757Abstract: Tracing command execution in a data processing system having a host processor and a co-processor. The host processor maintains a record of a plurality of commands for the co-processor, storing each of the plurality of commands is stored in a command queue. Hardware trace logic is provided to store one or more events based, at least in part, on transfer of the plurality of commands to a small memory. Software is executed to store the one or more events to a main memory, wherein the one or more events are aggregated into a single memory trace within the main memory.Type: GrantFiled: August 15, 2008Date of Patent: April 8, 2014Assignee: Calos Fund Limited Liability CompanyInventors: Brucek Khailany, Mark Rygh, Jim Jian Lin, Udo Uebel
-
Patent number: 8412917Abstract: Disclosed are methods and systems for dynamically determining data-transfer paths. The data-transfer paths are dynamically determined in response to an instruction that facilitates data transfer among execution lanes in an integrated-circuit processing device operable to execute operations in parallel. In addition, embodiments include an integrated-circuit processing device operable to execute operations in parallel, including the capability of providing confirmation information to potential source lanes, the confirmation information indicating whether the potential source lanes may send data to requested destination lanes during a data-transfer interval.Type: GrantFiled: September 20, 2011Date of Patent: April 2, 2013Assignee: Calos Fund Limited Liability CompanyInventors: Brucek Khailany, William James Dally, Ujval J. Kapasi, Jim Jian Lin
-
Patent number: 8122078Abstract: A method of operation within an integrated-circuit processing device having an enhanced combined-arithmetic capability. In response to an instruction indicating a combined arithmetic operation, the processor generates a dot-product of first and second operands, adds the dot-product to an accumulated value, and then outputs the sum of the accumulated value and the dot-product.Type: GrantFiled: October 9, 2007Date of Patent: February 21, 2012Assignee: Calos Fund, LLCInventors: Brucek Khailany, William James Dally, Raghunath Rao, DeForest Tovey
-
Publication number: 20120011349Abstract: Disclosed are methods and systems for dynamically determining data-transfer paths. The data-transfer pats are determined in response to an instruction that facilitates data transfer among execution lanes in an integrated-circuit processing device operable to execute operations in parallel.Type: ApplicationFiled: September 20, 2011Publication date: January 12, 2012Applicant: Calos Fund Limited Liability CompanyInventors: Brucek Khailany, William James Dally, Ujval J. Kapasi, Jim Jian Lin, Raghunath Rao, DeForest Tovey, Mark Rygh, Jung-Ho Ahn
-
Patent number: 8024553Abstract: A method of operation within an integrated-circuit processing device having a plurality of execution lanes. Upon receiving an instruction to exchange data between the execution lanes, respective requests from the execution lanes are examined to determine a set of the execution lanes that may send data to one or more others of the execution lanes during a first interval. Each execution lane within the set of the execution lanes is signaled to indicate that the execution lane may send data to the one or others of the execution lanes.Type: GrantFiled: August 15, 2008Date of Patent: September 20, 2011Assignee: Calos Fund Limited Liability CompanyInventors: Brucek Khailany, William James Dally, Ujval J. Kapasi, Jim Jian Lin
-
Publication number: 20100257329Abstract: An application programming interface is disclosed for loading and storing multidimensional arrays of data between a data parallel processing unit and an external memory. Physical addresses reference the external memory and define two-dimensional arrays of data storage locations corresponding to data records. The data parallel processing unit has multiple processing lanes to parallel process data records residing in respective register files. The interface comprises an X-dimension function call parameter to define an X-dimension in the memory array corresponding to a record for one lane and a Y-dimension function call parameter to define a Y-dimension in the memory array corresponding to the record for one lane. The X-dimension and Y-dimension function call parameters cooperate to generate memory accesses corresponding to the records.Type: ApplicationFiled: August 6, 2009Publication date: October 7, 2010Inventors: Brucek Khailany, Nuwan Jayasena, Brian Pharris, Timothy Southgate