Patents by Inventor Brucek Kurdo Khailany

Brucek Kurdo Khailany has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

NEURAL NETWORK ACCELERATOR USING LOGARITHMIC-BASED ARITHMETIC

Publication number: 20240112007

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.

Type: Application

Filed: December 12, 2023

Publication date: April 4, 2024

Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany
Neural network accelerator using logarithmic-based arithmetic

Patent number: 11886980

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.

Type: Grant

Filed: August 23, 2019

Date of Patent: January 30, 2024

Assignee: NVIDIA Corporation

Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany
OPTIMALLY CLIPPED TENSORS AND VECTORS

Publication number: 20230237308

Abstract: Quantizing tensors and vectors processed within a neural network reduces power consumption and may accelerate processing. Quantization reduces the number of bits used to represent a value, where decreasing the number of bits used can decrease the accuracy of computations that use the value. Ideally, quantization is performed without reducing accuracy. Quantization-aware training (QAT) is performed by dynamically quantizing tensors (weights and activations) using optimal clipping scalars. “Optimal” in that the mean squared error (MSE) of the quantized operation is minimized and the clipping scalars define the degree or amount of quantization for various tensors of the operation. Conventional techniques that quantize tensors during training suffer from high amounts of noise (error). Other techniques compute the clipping scalars offline through a brute force search to provide high accuracy.

Type: Application

Filed: July 26, 2022

Publication date: July 27, 2023

Inventors: Charbel Sakr, Steve Haihang Dai, Brucek Kurdo Khailany, William James Dally, Rangharajan Venkatesan, Brian Matthew Zimmer
SCALABLE TENSOR NETWORK CONTRACTION USING REINFORCEMENT LEARNING

Publication number: 20230229916

Abstract: A method for contracting a tensor network is provided. The method comprises generating a graph representation of the tensor network, processing the graph representation to determine a contraction for the tensor network by an agent that implements a reinforcement learning algorithm, and processing the tensor network in accordance with the contraction to generate a contracted tensor network.

Type: Application

Filed: January 20, 2023

Publication date: July 20, 2023

Inventors: Gal Chechik, Eli Alexander Meirom, Haggai Maron, Brucek Kurdo Khailany, Paul Martin Springer, Shie Mannor
REDUCING CROSSTALK PESSIMISM USING GPU-ACCELERATED GATE SIMULATION AND MACHINE LEARNING

Publication number: 20230089606

Abstract: To facilitate crosstalk analysis for an IC design, a plurality of input vectors are input into a gate-level simulation. In response, the gate-level simulation determines timing windows for all nets within the IC design, may perform aggressor pruning, and may then determine and output aggressor/victim pairs and associated features for the IC design. This gate-level simulation may be accelerated utilizing one or more graphics processor units (GPUs). Additionally, the aggressor/victim pairs and associated features for the IC design are then input into a trained machine learning environment, which outputs predicted delta delays for each of the aggressor/victim pairs. In this way, crosstalk analysis may be performed more accurately and efficiently.

Type: Application

Filed: December 1, 2021

Publication date: March 23, 2023

Inventors: Vidya Chhabria, Benjamin Andrew Keller, Yanqing Zhang, Brucek Kurdo Khailany, Haoxing Ren
DETERMINING IR DROP

Publication number: 20220067481

Abstract: The IR drop for a portion of a circuit may include a voltage drop across resistance, and may include a product of current I passing through resistance with a resistance value R. In order to determine IR drop for a circuit in a more accurate and efficient manner, a neural network produces coefficient maps (that each indicate a time-varying distribution of power within an associated portion of the circuit), and these coefficient maps are then used by the neural network to determine an IR drop for each of a plurality of portions of the circuit.

Type: Application

Filed: March 24, 2021

Publication date: March 3, 2022

Inventors: Vidya Chhabria, Yanqing Zhang, Haoxing Ren, Brucek Kurdo Khailany
FINE-GRAINED PER-VECTOR SCALING FOR NEURAL NETWORK QUANTIZATION

Publication number: 20220067530

Abstract: Today neural networks are used to enable autonomous vehicles and improve the quality of speech recognition, real-time language translation, and online search optimizations. However, operation of the neural networks for these applications consumes energy. Quantization of parameters used by the neural networks reduces the amount of memory needed to store the parameters while also reducing the power consumed during operation of the neural network. Matrix operations performed by the neural networks require many multiplication calculations, so reducing the number of bits that are multiplied reduces the energy that is consumed. Quantizing smaller sets of the parameters using a shared scale factor improves accuracy compared with quantizing larger sets of the parameters. Accuracy of the calculations may be maintained by quantizing and scaling the parameters using fine-grained per-vector scale factors. A vector includes one or more elements within a single dimension of a multi-dimensional matrix.

Type: Application

Filed: October 30, 2020

Publication date: March 3, 2022

Inventors: Brucek Kurdo Khailany, Steve Haihang Dai, Rangharajan Venkatesan, Haoxing Ren
FINE-GRAINED PER-VECTOR SCALING FOR NEURAL NETWORK QUANTIZATION

Publication number: 20220067512

Abstract: Today neural networks are used to enable autonomous vehicles and improve the quality of speech recognition, real-time language translation, and online search optimizations. However, operation of the neural networks for these applications consumes energy. Quantization of parameters used by the neural networks reduces the amount of memory needed to store the parameters while also reducing the power consumed during operation of the neural network. Matrix operations performed by the neural networks require many multiplication calculations, so reducing the number of bits that are multiplied reduces the energy that is consumed. Quantizing smaller sets of the parameters using a shared scale factor improves accuracy compared with quantizing larger sets of the parameters. Accuracy of the calculations may be maintained by quantizing and scaling the parameters using fine-grained per-vector scale factors. A vector includes one or more elements within a single dimension of a multi-dimensional matrix.

Type: Application

Filed: October 30, 2020

Publication date: March 3, 2022

Inventors: Brucek Kurdo Khailany, Steve Haihang Dai, Rangharajan Venkatesan, Haoxing Ren
ASYNCHRONOUS ACCUMULATOR USING LOGARITHMIC-BASED ARITHMETIC

Publication number: 20210056399

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.

Type: Application

Filed: January 23, 2020

Publication date: February 25, 2021

Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany, Stephen G. Tell
INFERENCE ACCELERATOR USING LOGARITHMIC-BASED ARITHMETIC

Publication number: 20210056446

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.

Type: Application

Filed: January 23, 2020

Publication date: February 25, 2021

Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany
NEURAL NETWORK ACCELERATOR USING LOGARITHMIC-BASED ARITHMETIC

Publication number: 20210056397

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.

Type: Application

Filed: August 23, 2019

Publication date: February 25, 2021

Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany
System and method for configuring a channel

Patent number: 9886409

Abstract: An integrated circuit device comprises pin resources, a memory controller circuit, a network interface controller circuit, and transmitter circuitry. The pin resources comprise pads coupled to off-chip pins of the integrated circuit device. The memory controller circuit comprises a first interface and the network interface controller circuit comprises a second interface. The transmitter circuitry is configurable to selectively couple either a first signal of the first interface or a second signal of the second interface to a first pad of the pin resources based on a pin distribution between the first interface and the second interface.

Type: Grant

Filed: May 18, 2015

Date of Patent: February 6, 2018

Assignee: NVIDIA Corporation

Inventors: Stephen William Keckler, William J. Dally, Steven Lee Scott, Brucek Kurdo Khailany, Michael Allen Parker
SYSTEM AND METHOD FOR CONFIGURING A CHANNEL

Publication number: 20170212857

Abstract: An integrated circuit device comprises pin resources, a memory controller circuit, a network interface controller circuit, and transmitter circuitry. The pin resources comprise pads coupled to off-chip pins of the integrated circuit device. The memory controller circuit comprises a first interface and the network interface controller circuit comprises a second interface. The transmitter circuitry is configurable to selectively couple either a first signal of the first interface or a second signal of the second interface to a first pad of the pin resources based on a pin distribution between the first interface and the second interface.

Type: Application

Filed: May 18, 2015

Publication date: July 27, 2017

Inventors: Stephen William Keckler, William J. Dally, Steven Lee Scott, Brucek Kurdo Khailany, Michael Allen Parker
Pausible bisynchronous FIFO

Patent number: 9672008

Abstract: A system, method, and computer program product are provided for a pausible bisynchronous FIFO. Data is written synchronously with a first clock signal of a first clock domain to an entry of a dual-port memory array and an increment signal is generated in the first clock domain. The increment signal is determined to transition near an edge of a second dock signal, where the second clock signal is a pausible clock signal. A next edge of the second clock signal of the second clock domain is delayed and the increment signal to the second clock domain and is transmitted.

Type: Grant

Filed: November 20, 2015

Date of Patent: June 6, 2017

Assignee: NVIDIA Corporation

Inventors: Benjamin Andrew Keller, Matthew Rudolph Fojtik, Brucek Kurdo Khailany
Partitioned register file

Patent number: 9489201

Abstract: A system includes a processing unit and a register file. The register file includes at least a first memory structure and a second memory structure. The first memory structure has a lower access energy than the second memory structure. The processing unit is configured to address the register file using a single logical namespace for both the first memory structure and the second memory structure.

Type: Grant

Filed: November 18, 2013

Date of Patent: November 8, 2016

Assignee: NVIDIA Corporation

Inventors: Brucek Kurdo Khailany, Mark Alan Gebhart
PAUSIBLE BISYNCHRONOUS FIFO

Publication number: 20160148661

Abstract: A system, method, and computer program product are provided for a pausible bisynchronous FIFO. Data is written synchronously with a first clock signal of a first clock domain to an entry of a dual-port memory array and an increment signal is generated in the first clock domain. The increment signal is determined to transition near an edge of a second dock signal, where the second clock signal is a pausible clock signal. A next edge of the second clock signal of the second clock domain is delayed and the increment signal to the second clock domain and is transmitted.

Type: Application

Filed: November 20, 2015

Publication date: May 26, 2016

Inventors: Benjamin Andrew Keller, Matthew Rudolph Fojtik, Brucek Kurdo Khailany
System, method, and computer program product for managing cache miss requests

Patent number: 9323679

Abstract: A system, method, and computer program product are provided for managing miss requests. In use, a miss request is received at a unified miss handler from one of a plurality of distributed local caches. Additionally, the miss request is managed, utilizing the unified miss handler.

Type: Grant

Filed: August 14, 2012

Date of Patent: April 26, 2016

Assignee: NVIDIA Corporation

Inventors: Brucek Kurdo Khailany, Ronny Meir Krashinsky, James David Balfour
Ground-referenced single-ended memory interconnect

Patent number: 9251870

Abstract: A system is provided for transmitting signals. The system comprises a first processing unit, a cache memory, and a package. The first processing unit comprises a first ground-referenced single-ended signaling (GRS) interface circuit and the second processing unit comprises a second GRS interface circuit. The cache memory comprises a third and a fourth GRS interface circuit. The package comprises one or more electrical traces that couple the first GRS interface to the third GRS interface and couple the second GRS interface to the fourth GRS interface, where the first GRS interface circuit, the second GRS interface, the third GRS interface, and the fourth GRS interface circuit are each configured to transmit a pulse along one trace of the one or more electrical traces by discharging a capacitor between the one trace and a ground network.

Type: Grant

Filed: April 4, 2013

Date of Patent: February 2, 2016

Assignee: NVIDIA Corporation

Inventors: William J. Dally, John W. Poulton, Thomas Hastings Greer, III, Brucek Kurdo Khailany, Carl Thomas Gray
High-density latch arrays

Patent number: 9245601

Abstract: A system and device are provided for implementing memory arrays using high-density latch cells. The device includes an array of cells arranged into columns and rows. Each cell comprises a latch cell that includes a transmission gate, a pair of inverters, and an output buffer. Each row of latch cells is connected to at least one common node for addressing the row of latch cells, and each column of latch cells is connected to a particular bit of an input signal and a particular bit of an output signal. A register file may be implemented using one or more arrays of the high-density latch cells to replace any or all of the banks of SRAM cells typically used to implement the register file.

Type: Grant

Filed: June 4, 2014

Date of Patent: January 26, 2016

Assignee: NVIDIA Corporation

Inventors: Mahmut Ersin Sinangil, John W. Poulton, Brucek Kurdo Khailany, John H. Edmondson
HIGH-DENSITY LATCH ARRAYS

Publication number: 20150357009

Abstract: A system and device are provided for implementing memory arrays using high-density latch cells. The device includes an array of cells arranged into columns and rows. Each cell comprises a latch cell that includes a transmission gate, a pair of inverters, and an output buffer. Each row of latch cells is connected to at least one common node for addressing the row of latch cells, and each column of latch cells is connected to a particular bit of an input signal and a particular bit of an output signal. A register file may be implemented using one or more arrays of the high-density latch cells to replace any or all of the banks of SRAM cells typically used to implement the register file.

Type: Application

Filed: June 4, 2014

Publication date: December 10, 2015

Inventors: Mahmut Ersin Sinangil, John W. Poulton, Brucek Kurdo Khailany, John H. Edmondson

1 2 next