Patents by Inventor Brucek Khailany

Brucek Khailany has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11769040
    Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture implemented on a semiconductor package. The package includes multiple chips, each with a central processing element, a global memory buffer, and processing elements. Each processing element includes a weight buffer, an activation buffer, and multiply-accumulate units to combine, in parallel, the weight values and the activation values.
    Type: Grant
    Filed: July 19, 2019
    Date of Patent: September 26, 2023
    Assignee: NVIDIA CORP.
    Inventors: Yakun Shao, Rangharajan Venkatesan, Nan Jiang, Brian Matthew Zimmer, Jason Clemons, Nathaniel Pinckney, Matthew R Fojtik, William James Dally, Joel S. Emer, Stephen W. Keckler, Brucek Khailany
  • Patent number: 11645533
    Abstract: IR drop predictions are obtained using a maximum convolutional neural network. A circuit structure is partitioned into a grid. For cells of the circuit structure in sub-intervals of a clock period, power consumption of the cell is amortized into a set of grid tiles that include portions of the cell, thus forming a set of power maps. The power maps are applied to a neural network to generate IR drop predictions for the circuit structure.
    Type: Grant
    Filed: March 17, 2020
    Date of Patent: May 9, 2023
    Assignee: NVIDIA Corp.
    Inventors: Zhiyao Xie, Haoxing Ren, Brucek Khailany, Sheng Ye
  • Publication number: 20230068941
    Abstract: One embodiment of a computer-implemented method for processing a neural network comprises receiving a first quantized matrix that corresponds to a portion of a multi-dimensional input tensor and has been quantized based on a first scale factor; and performing one or more computational operations using the first quantized matrix and the first scale factor to generate one or more data values that correspond to a first portion of a multi-dimensional output tensor.
    Type: Application
    Filed: February 11, 2022
    Publication date: March 2, 2023
    Inventors: Thierry TAMBE, Steve DAI, Brucek KHAILANY, Rangharajan VENKATESAN
  • Publication number: 20220076110
    Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.
    Type: Application
    Filed: November 19, 2021
    Publication date: March 10, 2022
    Applicant: NVIDIA Corp.
    Inventors: Yakun Shao, Rangharajan Venkatesan, Miaorong Wang, Daniel Smith, William James Dally, Joel Emer, Stephen W. Keckler, Brucek Khailany
  • Patent number: 11270197
    Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.
    Type: Grant
    Filed: November 4, 2019
    Date of Patent: March 8, 2022
    Assignee: NVIDIA Corp.
    Inventors: Yakun Shao, Rangharajan Venkatesan, Miaorong Wang, Daniel Smith, William James Dally, Joel Emer, Stephen W. Keckler, Brucek Khailany
  • Publication number: 20220067513
    Abstract: Solutions improving efficiency of Softmax computation applied for efficient deep learning inference in transformers and other neural networks. The solutions utilize a reduced-precision implementation of various operations in Softmax, replacing ex with 2x to reduce instruction overhead associated with computing ex, and replacing floating point max computation with integer max computation. Further described is a scalable implementation that decomposes Softmax into UnNormalized Softmax and Normalization operations.
    Type: Application
    Filed: December 4, 2020
    Publication date: March 3, 2022
    Applicant: NVIDIA Corp.
    Inventors: Jacob Robert Stevens, Rangharajan Venkatesan, Steve Haihang Dai, Brucek Khailany
  • Publication number: 20210158155
    Abstract: A graph neural network for average power estimation of netlists is trained with register toggle rates over a power window from an RTL simulation and gate level netlists as input features. Combinational gate toggle rates are applied as labels. The trained graph neural network is then applied to infer combinational gate toggle rates over a different power window of interest and/or different netlist.
    Type: Application
    Filed: August 13, 2020
    Publication date: May 27, 2021
    Applicant: NVIDIA Corp.
    Inventors: Yanqing Zhang, Haoxing Ren, Brucek Khailany
  • Publication number: 20200327417
    Abstract: IR drop predictions are obtained using a maximum convolutional neural network. A circuit structure is partitioned into a grid. For cells of the circuit structure in sub-intervals of a clock period, power consumption of the cell is amortized into a set of grid tiles that include portions of the cell, thus forming a set of power maps. The power maps are applied to a neural network to generate IR drop predictions for the circuit structure.
    Type: Application
    Filed: March 17, 2020
    Publication date: October 15, 2020
    Applicant: NVIDIA Corp.
    Inventors: Zhiyao Xie, Haoxing Ren, Brucek Khailany, Sheng Ye
  • Publication number: 20200293867
    Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.
    Type: Application
    Filed: November 4, 2019
    Publication date: September 17, 2020
    Applicant: NVIDIA Corp.
    Inventors: Yakun Shao, Rangharajan Venkatesan, Miaorong Wang, Daniel Smith, William James Dally, Joel Emer, Stephen W. Keckler, Brucek Khailany
  • Patent number: 10657306
    Abstract: Techniques to improve the accuracy and speed for detection and remediation of difficult to test nodes in a circuit design netlist. The techniques utilize improved netlist representations, test point insertion, and trained neural networks.
    Type: Grant
    Filed: July 24, 2019
    Date of Patent: May 19, 2020
    Assignee: NVIDIA Corp.
    Inventors: Yuzhe Ma, Haoxing Ren, Brucek Khailany, Harbinder Sikka, Lijuan Luo, Karthikeyan Natarajan
  • Publication number: 20200151288
    Abstract: Techniques to improve the accuracy and speed for detection and remediation of difficult to test nodes in a circuit design netlist. The techniques utilize improved netlist representations, test point insertion, and trained neural networks.
    Type: Application
    Filed: July 24, 2019
    Publication date: May 14, 2020
    Applicant: NVIDIA Corp.
    Inventors: Yuzhe Ma, Haoxing Ren, Brucek Khailany, Harbinder Sikka, Lijuan Luo, Karthikeyan Natarajan
  • Publication number: 20200082246
    Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture implemented on a semiconductor package. The package includes multiple chips, each with a central processing element, a global memory buffer, and processing elements. Each processing element includes a weight buffer, an activation buffer, and multiply-accumulate units to combine, in parallel, the weight values and the activation values.
    Type: Application
    Filed: July 19, 2019
    Publication date: March 12, 2020
    Applicant: NVIDIA Corp.
    Inventors: Yakun Shao, Rangharajan Venkatesan, Nan Jiang, Brian Matthew Zimmer, Jason Clemons, Nathaniel Pinckney, Matthew R. Fojtik, William James Dally, Joel S. Emer, Stephen W. Keckler, Brucek Khailany
  • Patent number: 10489542
    Abstract: A neural network including an embedding layer to receive a gate function vector and an embedding width and alter a shape of the gate function vector by the embedding width, a concatenator to receive a gate feature input vector and concatenate the gate feature input vector with the gate function vector altered by the embedding width, a convolution layer to receive a window size, stride, and output feature size and generate an output convolution vector with a shape based on a length of the gate function vector, the window size of the convolution layer, and the output feature size of the convolution layer, and a fully connected layer to reduce the gate output convolution vector to a final path delay output.
    Type: Grant
    Filed: April 24, 2018
    Date of Patent: November 26, 2019
    Assignee: NVIDIA Corp.
    Inventors: Mark Ren, Brucek Khailany
  • Publication number: 20190325092
    Abstract: A neural network including an embedding layer to receive a gate function vector and an embedding width and alter a shape of the gate function vector by the embedding width, a concatenator to receive a gate feature input vector and concatenate the gate feature input vector with the gate function vector altered by the embedding width, a convolution layer to receive a window size, stride, and output feature size and generate an output convolution vector with a shape based on a length of the gate function vector, the window size of the convolution layer, and the output feature size of the convolution layer, and a fully connected layer to reduce the gate output convolution vector to a final path delay output.
    Type: Application
    Filed: April 24, 2018
    Publication date: October 24, 2019
    Inventors: Mark Ren, Brucek Khailany
  • Patent number: 8694757
    Abstract: Tracing command execution in a data processing system having a host processor and a co-processor. The host processor maintains a record of a plurality of commands for the co-processor, storing each of the plurality of commands is stored in a command queue. Hardware trace logic is provided to store one or more events based, at least in part, on transfer of the plurality of commands to a small memory. Software is executed to store the one or more events to a main memory, wherein the one or more events are aggregated into a single memory trace within the main memory.
    Type: Grant
    Filed: August 15, 2008
    Date of Patent: April 8, 2014
    Assignee: Calos Fund Limited Liability Company
    Inventors: Brucek Khailany, Mark Rygh, Jim Jian Lin, Udo Uebel
  • Patent number: 8412917
    Abstract: Disclosed are methods and systems for dynamically determining data-transfer paths. The data-transfer paths are dynamically determined in response to an instruction that facilitates data transfer among execution lanes in an integrated-circuit processing device operable to execute operations in parallel. In addition, embodiments include an integrated-circuit processing device operable to execute operations in parallel, including the capability of providing confirmation information to potential source lanes, the confirmation information indicating whether the potential source lanes may send data to requested destination lanes during a data-transfer interval.
    Type: Grant
    Filed: September 20, 2011
    Date of Patent: April 2, 2013
    Assignee: Calos Fund Limited Liability Company
    Inventors: Brucek Khailany, William James Dally, Ujval J. Kapasi, Jim Jian Lin
  • Patent number: 8122078
    Abstract: A method of operation within an integrated-circuit processing device having an enhanced combined-arithmetic capability. In response to an instruction indicating a combined arithmetic operation, the processor generates a dot-product of first and second operands, adds the dot-product to an accumulated value, and then outputs the sum of the accumulated value and the dot-product.
    Type: Grant
    Filed: October 9, 2007
    Date of Patent: February 21, 2012
    Assignee: Calos Fund, LLC
    Inventors: Brucek Khailany, William James Dally, Raghunath Rao, DeForest Tovey
  • Publication number: 20120011349
    Abstract: Disclosed are methods and systems for dynamically determining data-transfer paths. The data-transfer pats are determined in response to an instruction that facilitates data transfer among execution lanes in an integrated-circuit processing device operable to execute operations in parallel.
    Type: Application
    Filed: September 20, 2011
    Publication date: January 12, 2012
    Applicant: Calos Fund Limited Liability Company
    Inventors: Brucek Khailany, William James Dally, Ujval J. Kapasi, Jim Jian Lin, Raghunath Rao, DeForest Tovey, Mark Rygh, Jung-Ho Ahn
  • Patent number: 8024553
    Abstract: A method of operation within an integrated-circuit processing device having a plurality of execution lanes. Upon receiving an instruction to exchange data between the execution lanes, respective requests from the execution lanes are examined to determine a set of the execution lanes that may send data to one or more others of the execution lanes during a first interval. Each execution lane within the set of the execution lanes is signaled to indicate that the execution lane may send data to the one or others of the execution lanes.
    Type: Grant
    Filed: August 15, 2008
    Date of Patent: September 20, 2011
    Assignee: Calos Fund Limited Liability Company
    Inventors: Brucek Khailany, William James Dally, Ujval J. Kapasi, Jim Jian Lin
  • Publication number: 20100257329
    Abstract: An application programming interface is disclosed for loading and storing multidimensional arrays of data between a data parallel processing unit and an external memory. Physical addresses reference the external memory and define two-dimensional arrays of data storage locations corresponding to data records. The data parallel processing unit has multiple processing lanes to parallel process data records residing in respective register files. The interface comprises an X-dimension function call parameter to define an X-dimension in the memory array corresponding to a record for one lane and a Y-dimension function call parameter to define a Y-dimension in the memory array corresponding to the record for one lane. The X-dimension and Y-dimension function call parameters cooperate to generate memory accesses corresponding to the records.
    Type: Application
    Filed: August 6, 2009
    Publication date: October 7, 2010
    Inventors: Brucek Khailany, Nuwan Jayasena, Brian Pharris, Timothy Southgate