Patents by Inventor William James Dally

William James Dally has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210056399
    Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.
    Type: Application
    Filed: January 23, 2020
    Publication date: February 25, 2021
    Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany, Stephen G. Tell
  • Publication number: 20210056397
    Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.
    Type: Application
    Filed: August 23, 2019
    Publication date: February 25, 2021
    Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany
  • Publication number: 20210056446
    Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum.
    Type: Application
    Filed: January 23, 2020
    Publication date: February 25, 2021
    Inventors: William James Dally, Rangharajan Venkatesan, Brucek Kurdo Khailany
  • Publication number: 20210048992
    Abstract: The disclosure provides processors that are configured to perform dynamic programming according to an instruction, a method for configuring a processor for dynamic programming according to an instruction and a method of computing a modified Smith Waterman algorithm employing an instruction for configuring a parallel processing unit. In one example, the method for configuring includes: (1) receiving, by execution cores of the processor, an instruction that directs the execution cores to compute a set of recurrence equations employing a matrix, (2) configuring the execution cores, according to the set of recurrence equations, to compute states for elements of the matrix, and (3) storing the computed states for current elements of the matrix in registers of the execution cores, wherein the computed states are determined based on the set of recurrence equations and input data.
    Type: Application
    Filed: March 6, 2020
    Publication date: February 18, 2021
    Inventor: William James Dally
  • Publication number: 20200293867
    Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.
    Type: Application
    Filed: November 4, 2019
    Publication date: September 17, 2020
    Applicant: NVIDIA Corp.
    Inventors: Yakun Shao, Rangharajan Venkatesan, Miaorong Wang, Daniel Smith, William James Dally, Joel Emer, Stephen W. Keckler, Brucek Khailany
  • Patent number: 10623217
    Abstract: A PAM signaling system utilizes multiple equalizers on each data lane of a serial data bus, each of the equalizers associated with a different signal eye of the serial data bus.
    Type: Grant
    Filed: May 29, 2019
    Date of Patent: April 14, 2020
    Assignee: NVIDIA Corp.
    Inventors: Walker Turner, William James Dally
  • Publication number: 20200082246
    Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture implemented on a semiconductor package. The package includes multiple chips, each with a central processing element, a global memory buffer, and processing elements. Each processing element includes a weight buffer, an activation buffer, and multiply-accumulate units to combine, in parallel, the weight values and the activation values.
    Type: Application
    Filed: July 19, 2019
    Publication date: March 12, 2020
    Applicant: NVIDIA Corp.
    Inventors: Yakun Shao, Rangharajan Venkatesan, Nan Jiang, Brian Matthew Zimmer, Jason Clemons, Nathaniel Pinckney, Matthew R. Fojtik, William James Dally, Joel S. Emer, Stephen W. Keckler, Brucek Khailany
  • Patent number: 10063481
    Abstract: A congestion management protocol that can be used for small messages in which the last-hop switch determines the congestion of the end point. The last-hop switch drops messages when the end point is congested and schedules a retransmission. A second congestion management protocol transmits small messages in a speculative mode to avoid the overhead caused by reservation handshakes.
    Type: Grant
    Filed: May 23, 2016
    Date of Patent: August 28, 2018
    Assignee: U.S. Department of Energy
    Inventors: Nan Jiang, Larry Robert Dennison, William James Dally
  • Patent number: 10026468
    Abstract: This description is directed to a dynamic random access memory (DRAM) array having a plurality of rows and a plurality of columns. The array further includes a plurality of cells, each of which are associated with one of the columns and one of the rows. Each cell includes a capacitor that is selectively coupled to a bit line of its associate column so as to share charge with the bit line when the cell is selected. There is a segmented word line circuit for each row, which is controllable to cause selection of only a portion of the cells in the row.
    Type: Grant
    Filed: February 10, 2017
    Date of Patent: July 17, 2018
    Assignee: NVIDIA CORPORATION
    Inventor: William James Dally
  • Publication number: 20170154667
    Abstract: This description is directed to a dynamic random access memory (DRAM) array having a plurality of rows and a plurality of columns. The array further includes a plurality of cells, each of which are associated with one of the columns and one of the rows. Each cell includes a capacitor that is selectively coupled to a bit line of its associate column so as to share charge with the bit line when the cell is selected. There is a segmented word line circuit for each row, which is controllable to cause selection of only a portion of the cells in the row.
    Type: Application
    Filed: February 10, 2017
    Publication date: June 1, 2017
    Inventor: William James Dally
  • Patent number: 9460776
    Abstract: The disclosure provides for an SRAM array having a plurality of wordlines and a plurality of bitlines, referred to generally as SRAM lines. The array has a plurality of cells, each cell being defined by an intersection between one of the wordlines and one of the bitlines. The SRAM array further includes voltage boost circuitry operatively coupled with the cells, the voltage boost circuitry being configured to provide an amount of voltage boost that is based on an address of a cell to be accessed and/or to provide this voltage boost on an SRAM line via capacitive charge coupling.
    Type: Grant
    Filed: January 23, 2013
    Date of Patent: October 4, 2016
    Assignee: NVIDIA Corporation
    Inventor: William James Dally
  • Patent number: 9287778
    Abstract: Embodiments are disclosed relating to an electric power conversion device and methods for controlling the operation thereof. One disclosed embodiment provides an electric power conversion device comprising a first current control mechanism coupled to an electric power source and an upstream end of an inductor, where the first current control mechanism is operable to control inductor current. The electric power conversion device further comprises a second current control mechanism coupled between the downstream end of the inductor and a load, where the second current control mechanism is operable to control how much of the inductor current is delivered to the load.
    Type: Grant
    Filed: October 8, 2012
    Date of Patent: March 15, 2016
    Assignee: NVIDIA Corporation
    Inventor: William James Dally
  • Patent number: 9178421
    Abstract: Embodiments are disclosed relating to an electric power conversion device and methods for controlling the operation thereof. One disclosed embodiment provides a multi-stage electric power conversion device including a first regulator stage including a first stage energy storage device and a second regulator stage including a second stage energy storage device, the second stage energy storage device being operatively coupled between the first stage energy storage device and the load. The device further includes a control mechanism operative to control (i) a first stage output voltage on a node between the first stage energy storage device and the second stage energy storage device and (ii) a second stage output voltage on a node between the second stage energy storage device and the load.
    Type: Grant
    Filed: October 30, 2012
    Date of Patent: November 3, 2015
    Assignee: NVIDIA Corporation
    Inventor: William James Dally
  • Patent number: 9069664
    Abstract: One embodiment of the present invention sets forth a technique for providing a unified memory for access by execution threads in a processing system. Several logically separate memories are combined into a single unified memory that includes a single set of shared memory banks, an allocation of space in each bank across the logical memories, a mapping rule that maps the address space of each logical memory to its partition of the shared physical memory, a circuitry including switches and multiplexers that supports the mapping, and an arbitration scheme that allocates access to the banks.
    Type: Grant
    Filed: September 22, 2011
    Date of Patent: June 30, 2015
    Assignee: NVIDIA Corporation
    Inventor: William James Dally
  • Patent number: 8982140
    Abstract: One embodiment of the present invention sets forth a technique for addressing data in a hierarchical graphics processing unit cluster. A hierarchical address is constructed based on the location of a storage circuit where a target unit of data resides. The hierarchical address comprises a level field indicating a hierarchical level for the unit of data and a node identifier that indicates which GPU within the GPU cluster currently stores the unit of data. The hierarchical address may further comprise one or more identifiers that indicate which storage circuit in a particular hierarchical level currently stores the unit of data. The hierarchical address is constructed and interpreted based on the level field. The technique advantageously enables programs executing within the GPU cluster to efficiently access data residing in other GPUs using the hierarchical address.
    Type: Grant
    Filed: September 23, 2011
    Date of Patent: March 17, 2015
    Assignee: NVIDIA Corporation
    Inventor: William James Dally
  • Patent number: 8941430
    Abstract: One embodiment sets forth a timing calibration technique for on-chip source-synchronous, complementary metal-oxide-semiconductor (CMOS) repeater-based interconnect. Two transition patterns may be applied to calibrate the delay of an on-chip data or clock wire. Calibration logic is configured to apply the transition patterns and then trim the delays of the clock and data wires based on captured calibration patterns. The trimming adjusts the delay of the clock and data wires using a configurable delay circuit. Timing errors may be caused by crosstalk, power-supply-induced jitter (PSIJ), or wire delay variation due to transistor and wire metallization mismatch. Chip yields may be improved by reducing the occurrence of timing errors due to mismatched delays between different wires of an on-chip interconnect.
    Type: Grant
    Filed: September 12, 2012
    Date of Patent: January 27, 2015
    Assignee: NVIDIA Corporation
    Inventors: Robert Palmer, John W. Poulton, Thomas Hastings Greer, III, William James Dally
  • Publication number: 20140232368
    Abstract: The disclosure is directed to a multi-phase electric power conversion device coupled between a power source and a load. The device includes a first regulator phase and a second regulator phase arranged in parallel, so that a first phase current and a second phase current are controllably provided in parallel to satisfy the current demand requirements of the load. Each phase current is based on current generated in an energy storage device within the respective phase. The regulator phases are asymmetric in that the energy storage device of the second regulator phase is configured so that its current can be varied more rapidly than the current in the energy storage device of the first regulator phase.
    Type: Application
    Filed: February 19, 2013
    Publication date: August 21, 2014
    Applicant: NVIDIA CORPORATION
    Inventor: William James Dally
  • Publication number: 20140219007
    Abstract: This description is directed to a dynamic random access memory (DRAM) array having a plurality of rows and a plurality of columns. The array further includes a plurality of cells, each of which are associated with one of the columns and one of the rows. Each cell includes a capacitor that is selectively coupled to a bit line of its associate column so as to share charge with the bit line when the cell is selected. There is a segmented word line circuit for each row, which is controllable to cause selection of only a portion of the cells in the row.
    Type: Application
    Filed: February 7, 2013
    Publication date: August 7, 2014
    Applicant: NVIDIA Corporation
    Inventor: William James Dally
  • Publication number: 20140204657
    Abstract: The disclosure provides for an SRAM array having a plurality of wordlines and a plurality of bitlines, referred to generally as SRAM lines. The array has a plurality of cells, each cell being defined by an intersection between one of the wordlines and one of the bitlines. The SRAM array further includes voltage boost circuitry operatively coupled with the cells, the voltage boost circuitry being configured to provide an amount of voltage boost that is based on an address of a cell to be accessed and/or to provide this voltage boost on an SRAM line via capacitive charge coupling.
    Type: Application
    Filed: January 23, 2013
    Publication date: July 24, 2014
    Applicant: NVIDIA Corporation
    Inventor: William James Dally
  • Patent number: 8788761
    Abstract: One embodiment of the present invention sets forth am extension to a cache coherence protocol with two explicit control states, P (private), and R (read-only), that provide explicit program control of cache lines for which the program logic can guarantee correct behavior. In the private state, only the owner of a cache line can access the cache line for read or write operations. In the read-only state, only read operations can be performed on the cache line, thereby disallowing write operations to be performed.
    Type: Grant
    Filed: September 23, 2011
    Date of Patent: July 22, 2014
    Assignee: NVIDIA Corporation
    Inventor: William James Dally