Patents by Inventor William James Dally

William James Dally has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

ASYNCHRONOUS ACCUMULATOR USING LOGARITHMIC-BASED ARITHMETIC

Publication number: 20210056399

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.

Type: Application

Filed: January 23, 2020

Publication date: February 25, 2021

Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany, Stephen G. Tell
NEURAL NETWORK ACCELERATOR USING LOGARITHMIC-BASED ARITHMETIC

Publication number: 20210056397

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.

Type: Application

Filed: August 23, 2019

Publication date: February 25, 2021

Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany
INFERENCE ACCELERATOR USING LOGARITHMIC-BASED ARITHMETIC

Publication number: 20210056446

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.

Type: Application

Filed: January 23, 2020

Publication date: February 25, 2021

Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany
PROCESSOR FOR PERFORMING DYNAMIC PROGRAMMING ACCORDING TO AN INSTRUCTION, AND A METHOD FOR CONFIGURING A PROCESSOR FOR DYNAMIC PROGRAMMING VIA AN INSTRUCTION

Publication number: 20210048992

Abstract: The disclosure provides processors that are configured to perform dynamic programming according to an instruction, a method for configuring a processor for dynamic programming according to an instruction and a method of computing a modified Smith Waterman algorithm employing an instruction for configuring a parallel processing unit. In one example, the method for configuring includes: (1) receiving, by execution cores of the processor, an instruction that directs the execution cores to compute a set of recurrence equations employing a matrix, (2) configuring the execution cores, according to the set of recurrence equations, to compute states for elements of the matrix, and (3) storing the computed states for current elements of the matrix in registers of the execution cores, wherein the computed states are determined based on the set of recurrence equations and input data.

Type: Application

Filed: March 6, 2020

Publication date: February 18, 2021

Inventor: William James Dally
EFFICIENT NEURAL NETWORK ACCELERATOR DATAFLOWS

Publication number: 20200293867

Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.

Type: Application

Filed: November 4, 2019

Publication date: September 17, 2020

Applicant: NVIDIA Corp.

Inventors: Yakun Shao, Rangharajan Venkatesan, Miaorong Wang, Daniel Smith, William James Dally, Joel Emer, Stephen W. Keckler, Brucek Khailany
Proportional AC-coupled edge-boosting transmit equalization for multi-level pulse-amplitude modulated signaling

Patent number: 10623217

Abstract: A PAM signaling system utilizes multiple equalizers on each data lane of a serial data bus, each of the equalizers associated with a different signal eye of the serial data bus.

Type: Grant

Filed: May 29, 2019

Date of Patent: April 14, 2020

Assignee: NVIDIA Corp.

Inventors: Walker Turner, William James Dally
SCALABLE MULTI-DIE DEEP LEARNING SYSTEM

Publication number: 20200082246

Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture implemented on a semiconductor package. The package includes multiple chips, each with a central processing element, a global memory buffer, and processing elements. Each processing element includes a weight buffer, an activation buffer, and multiply-accumulate units to combine, in parallel, the weight values and the activation values.

Type: Application

Filed: July 19, 2019

Publication date: March 12, 2020

Applicant: NVIDIA Corp.

Inventors: Yakun Shao, Rangharajan Venkatesan, Nan Jiang, Brian Matthew Zimmer, Jason Clemons, Nathaniel Pinckney, Matthew R. Fojtik, William James Dally, Joel S. Emer, Stephen W. Keckler, Brucek Khailany
Network endpoint congestion management

Patent number: 10063481

Abstract: A congestion management protocol that can be used for small messages in which the last-hop switch determines the congestion of the end point. The last-hop switch drops messages when the end point is congested and schedules a retransmission. A second congestion management protocol transmits small messages in a speculative mode to avoid the overhead caused by reservation handshakes.

Type: Grant

Filed: May 23, 2016

Date of Patent: August 28, 2018

Assignee: U.S. Department of Energy

Inventors: Nan Jiang, Larry Robert Dennison, William James Dally
DRAM with segmented word line switching circuit for causing selection of portion of rows and circuitry for a variable page width control scheme

Patent number: 10026468

Abstract: This description is directed to a dynamic random access memory (DRAM) array having a plurality of rows and a plurality of columns. The array further includes a plurality of cells, each of which are associated with one of the columns and one of the rows. Each cell includes a capacitor that is selectively coupled to a bit line of its associate column so as to share charge with the bit line when the cell is selected. There is a segmented word line circuit for each row, which is controllable to cause selection of only a portion of the cells in the row.

Type: Grant

Filed: February 10, 2017

Date of Patent: July 17, 2018

Assignee: NVIDIA CORPORATION

Inventor: William James Dally
DRAM WITH SEGMENTED PAGE CONFIGURATION

Publication number: 20170154667

Abstract: This description is directed to a dynamic random access memory (DRAM) array having a plurality of rows and a plurality of columns. The array further includes a plurality of cells, each of which are associated with one of the columns and one of the rows. Each cell includes a capacitor that is selectively coupled to a bit line of its associate column so as to share charge with the bit line when the cell is selected. There is a segmented word line circuit for each row, which is controllable to cause selection of only a portion of the cells in the row.

Type: Application

Filed: February 10, 2017

Publication date: June 1, 2017

Inventor: William James Dally
SRAM voltage assist

Patent number: 9460776

Abstract: The disclosure provides for an SRAM array having a plurality of wordlines and a plurality of bitlines, referred to generally as SRAM lines. The array has a plurality of cells, each cell being defined by an intersection between one of the wordlines and one of the bitlines. The SRAM array further includes voltage boost circuitry operatively coupled with the cells, the voltage boost circuitry being configured to provide an amount of voltage boost that is based on an address of a cell to be accessed and/or to provide this voltage boost on an SRAM line via capacitive charge coupling.

Type: Grant

Filed: January 23, 2013

Date of Patent: October 4, 2016

Assignee: NVIDIA Corporation

Inventor: William James Dally
Current parking response to transient load demands

Patent number: 9287778

Abstract: Embodiments are disclosed relating to an electric power conversion device and methods for controlling the operation thereof. One disclosed embodiment provides an electric power conversion device comprising a first current control mechanism coupled to an electric power source and an upstream end of an inductor, where the first current control mechanism is operable to control inductor current. The electric power conversion device further comprises a second current control mechanism coupled between the downstream end of the inductor and a load, where the second current control mechanism is operable to control how much of the inductor current is delivered to the load.

Type: Grant

Filed: October 8, 2012

Date of Patent: March 15, 2016

Assignee: NVIDIA Corporation

Inventor: William James Dally
Multi-stage power supply with fast transient response

Patent number: 9178421

Abstract: Embodiments are disclosed relating to an electric power conversion device and methods for controlling the operation thereof. One disclosed embodiment provides a multi-stage electric power conversion device including a first regulator stage including a first stage energy storage device and a second regulator stage including a second stage energy storage device, the second stage energy storage device being operatively coupled between the first stage energy storage device and the load. The device further includes a control mechanism operative to control (i) a first stage output voltage on a node between the first stage energy storage device and the second stage energy storage device and (ii) a second stage output voltage on a node between the second stage energy storage device and the load.

Type: Grant

Filed: October 30, 2012

Date of Patent: November 3, 2015

Assignee: NVIDIA Corporation

Inventor: William James Dally
Unified streaming multiprocessor memory

Patent number: 9069664

Abstract: One embodiment of the present invention sets forth a technique for providing a unified memory for access by execution threads in a processing system. Several logically separate memories are combined into a single unified memory that includes a single set of shared memory banks, an allocation of space in each bank across the logical memories, a mapping rule that maps the address space of each logical memory to its partition of the shared physical memory, a circuitry including switches and multiplexers that supports the mapping, and an arbitration scheme that allocates access to the banks.

Type: Grant

Filed: September 22, 2011

Date of Patent: June 30, 2015

Assignee: NVIDIA Corporation

Inventor: William James Dally
Hierarchical memory addressing

Patent number: 8982140

Abstract: One embodiment of the present invention sets forth a technique for addressing data in a hierarchical graphics processing unit cluster. A hierarchical address is constructed based on the location of a storage circuit where a target unit of data resides. The hierarchical address comprises a level field indicating a hierarchical level for the unit of data and a node identifier that indicates which GPU within the GPU cluster currently stores the unit of data. The hierarchical address may further comprise one or more identifiers that indicate which storage circuit in a particular hierarchical level currently stores the unit of data. The hierarchical address is constructed and interpreted based on the level field. The technique advantageously enables programs executing within the GPU cluster to efficiently access data residing in other GPUs using the hierarchical address.

Type: Grant

Filed: September 23, 2011

Date of Patent: March 17, 2015

Assignee: NVIDIA Corporation

Inventor: William James Dally
Timing calibration for on-chip interconnect

Patent number: 8941430

Abstract: One embodiment sets forth a timing calibration technique for on-chip source-synchronous, complementary metal-oxide-semiconductor (CMOS) repeater-based interconnect. Two transition patterns may be applied to calibrate the delay of an on-chip data or clock wire. Calibration logic is configured to apply the transition patterns and then trim the delays of the clock and data wires based on captured calibration patterns. The trimming adjusts the delay of the clock and data wires using a configurable delay circuit. Timing errors may be caused by crosstalk, power-supply-induced jitter (PSIJ), or wire delay variation due to transistor and wire metallization mismatch. Chip yields may be improved by reducing the occurrence of timing errors due to mismatched delays between different wires of an on-chip interconnect.

Type: Grant

Filed: September 12, 2012

Date of Patent: January 27, 2015

Assignee: NVIDIA Corporation

Inventors: Robert Palmer, John W. Poulton, Thomas Hastings Greer, III, William James Dally
ELECTRIC POWER CONVERSION WITH ASSYMETRIC PHASE RESPONSE

Publication number: 20140232368

Abstract: The disclosure is directed to a multi-phase electric power conversion device coupled between a power source and a load. The device includes a first regulator phase and a second regulator phase arranged in parallel, so that a first phase current and a second phase current are controllably provided in parallel to satisfy the current demand requirements of the load. Each phase current is based on current generated in an energy storage device within the respective phase. The regulator phases are asymmetric in that the energy storage device of the second regulator phase is configured so that its current can be varied more rapidly than the current in the energy storage device of the first regulator phase.

Type: Application

Filed: February 19, 2013

Publication date: August 21, 2014

Applicant: NVIDIA CORPORATION

Inventor: William James Dally
DRAM WITH SEGMENTED PAGE CONFIGURATION

Publication number: 20140219007

Abstract: This description is directed to a dynamic random access memory (DRAM) array having a plurality of rows and a plurality of columns. The array further includes a plurality of cells, each of which are associated with one of the columns and one of the rows. Each cell includes a capacitor that is selectively coupled to a bit line of its associate column so as to share charge with the bit line when the cell is selected. There is a segmented word line circuit for each row, which is controllable to cause selection of only a portion of the cells in the row.

Type: Application

Filed: February 7, 2013

Publication date: August 7, 2014

Applicant: NVIDIA Corporation

Inventor: William James Dally
SRAM VOLTAGE ASSIST

Publication number: 20140204657

Abstract: The disclosure provides for an SRAM array having a plurality of wordlines and a plurality of bitlines, referred to generally as SRAM lines. The array has a plurality of cells, each cell being defined by an intersection between one of the wordlines and one of the bitlines. The SRAM array further includes voltage boost circuitry operatively coupled with the cells, the voltage boost circuitry being configured to provide an amount of voltage boost that is based on an address of a cell to be accessed and/or to provide this voltage boost on an SRAM line via capacitive charge coupling.

Type: Application

Filed: January 23, 2013

Publication date: July 24, 2014

Applicant: NVIDIA Corporation

Inventor: William James Dally
System and method for explicitly managing cache coherence

Patent number: 8788761

Abstract: One embodiment of the present invention sets forth am extension to a cache coherence protocol with two explicit control states, P (private), and R (read-only), that provide explicit program control of cache lines for which the program logic can guarantee correct behavior. In the private state, only the owner of a cache line can access the cache line for read or write operations. In the read-only state, only read operations can be performed on the cache line, thereby disallowing write operations to be performed.

Type: Grant

Filed: September 23, 2011

Date of Patent: July 22, 2014

Assignee: NVIDIA Corporation

Inventor: William James Dally

prev 1 2 3 4 next