Patents by Inventor William James Dally

William James Dally has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8732711
    Abstract: One embodiment of the present invention sets forth a technique for scheduling thread execution in a multi-threaded processing environment. A two-level scheduler maintains a small set of active threads called strands to hide function unit pipeline latency and local memory access latency. The strands are a sub-set of a larger set of pending threads that is also maintained by the two-leveler scheduler. Pending threads are promoted to strands and strands are demoted to pending threads based on latency characteristics. The two-level scheduler selects strands for execution based on strand state. The longer latency of the pending threads is hidden by selecting strands for execution. When the latency for a pending thread is expired, the pending thread may be promoted to a strand and begin (or resume) execution. When a strand encounters a latency event, the strand may be demoted to a pending thread while the latency is incurred.
    Type: Grant
    Filed: June 1, 2011
    Date of Patent: May 20, 2014
    Assignee: NVIDIA Corporation
    Inventors: William James Dally, Stephen William Keckler, David Tarjan, John Erik Lindholm, Mark Alan Gebhart, Daniel Robert Johnson
  • Publication number: 20140117951
    Abstract: Embodiments are disclosed relating to an electric power conversion device and methods for controlling the operation thereof. One disclosed embodiment provides a multi-stage electric power conversion device including a first regulator stage including a first stage energy storage device and a second regulator stage including a second stage energy storage device, the second stage energy storage device being operatively coupled between the first stage energy storage device and the load. The device further includes a control mechanism operative to control (i) a first stage output voltage on a node between the first stage energy storage device and the second stage energy storage device and (ii) a second stage output voltage on a node between the second stage energy storage device and the load.
    Type: Application
    Filed: October 30, 2012
    Publication date: May 1, 2014
    Applicant: NVIDIA CORPORATION
    Inventor: William James Dally
  • Publication number: 20140097813
    Abstract: Embodiments are disclosed relating to an electric power conversion device and methods for controlling the operation thereof. One disclosed embodiment provides an electric power conversion device comprising a first current control mechanism coupled to an electric power source and an upstream end of an inductor, where the first current control mechanism is operable to control inductor current. The electric power conversion device further comprises a second current control mechanism coupled between the downstream end of the inductor and a load, where the second current control mechanism is operable to control how much of the inductor current is delivered to the load.
    Type: Application
    Filed: October 8, 2012
    Publication date: April 10, 2014
    Applicant: NVIDIA Corporation
    Inventor: William James Dally
  • Patent number: 8689159
    Abstract: One embodiment sets forth a technique for on-chip satisfying timing requirements of on-chip source-synchronous, CMOS-repeater-based interconnect. Each channel of the on-chip interconnect may include one or more redundant wires. Calibration logic is configured to apply transition patterns to wires comprising each channel and calibration patterns that are generated in response to the transition patterns are captured. Based on the calibration patterns, wires that best satisfy the timing requirements of the on-chip interconnect are selected for use to transmit data. The calibration logic also trims the delays of the clock and selected data wires based on captured calibration patterns to improve the timing margin of the on-chip interconnect. Improving the timing margin of the on-chip interconnect improves chip yields.
    Type: Grant
    Filed: September 12, 2012
    Date of Patent: April 1, 2014
    Assignee: NVIDIA Corporation
    Inventors: Robert Palmer, John W. Poulton, Thomas Hastings Greer, III, William James Dally
  • Publication number: 20140077857
    Abstract: One embodiment sets forth a technique for delaying signals by varying amounts. A configurable delay circuit includes fixed and tri-state inverters. Pullup and pulldown transistors within one or more tri-state inverters may be activated to reduce the delay introduced by fixed inverters. The pullup and pulldown transistors within one or more tri-state inverters may be separately activated to independently adjust the rising delay and the falling delay incurred by the input signal.
    Type: Application
    Filed: September 14, 2012
    Publication date: March 20, 2014
    Inventors: John W. POULTON, Robert Palmer, William James Dally
  • Publication number: 20140075403
    Abstract: One embodiment sets forth a technique for on-chip satisfying timing requirements of on-chip source-synchronous, CMOS-repeater-based interconnect. Each channel of the on-chip interconnect may include one or more redundant wires. Calibration logic is configured to apply transition patterns to wires comprising each channel and calibration patterns that are generated in response to the transition patterns are captured. Based on the calibration patterns, wires that best satisfy the timing requirements of the on-chip interconnect are selected for use to transmit data. The calibration logic also trims the delays of the clock and selected data wires based on captured calibration patterns to improve the timing margin of the on-chip interconnect. Improving the timing margin of the on-chip interconnect improves chip yields.
    Type: Application
    Filed: September 12, 2012
    Publication date: March 13, 2014
    Inventors: Robert PALMER, John W. POULTON, Thomas Hastings GREER, III, William James DALLY
  • Publication number: 20140070862
    Abstract: One embodiment sets forth a timing calibration technique for on-chip source-synchronous, complementary metal-oxide-semiconductor (CMOS) repeater-based interconnect. Two transition patterns may be applied to calibrate the delay of an on-chip data or clock wire. Calibration logic is configured to apply the transition patterns and then trim the delays of the clock and data wires based on captured calibration patterns. The trimming adjusts the delay of the clock and data wires using a configurable delay circuit. Timing errors may be caused by crosstalk, power-supply-induced jitter (PSIJ), or wire delay variation due to transistor and wire metallization mismatch. Chip yields may be improved by reducing the occurrence of timing errors due to mismatched delays between different wires of an on-chip interconnect.
    Type: Application
    Filed: September 12, 2012
    Publication date: March 13, 2014
    Inventors: Robert PALMER, John W. POULTON, Thomas Hastings GREER, III, William James DALLY
  • Patent number: 8659337
    Abstract: One embodiment of the present invention sets forth a technique for capturing and holding a level of an input signal using a latch circuit that presents a low number of loads to the clock signal. The clock is only coupled to a bridging transistor and a pair of clock-activated pull-down or pull-up transistors. The level of the input signal is propagated to the output signal when the storage sub-circuit is not enabled. The storage sub-circuit is enabled by the bridging transistor and a propagation sub-circuit is activated and deactivated by the pair of clock-activated transistors.
    Type: Grant
    Filed: July 21, 2011
    Date of Patent: February 25, 2014
    Assignee: NVIDIA Corporation
    Inventors: Ilyas Elkin, William James Dally, Jonah M. Alben
  • Patent number: 8604857
    Abstract: One embodiment of the present invention sets forth a technique for reducing jitter caused by changes in a power supply for a clock generated by a ring oscillator of inverter devices. An inverter sub-circuit is coupled in parallel with a current-starved inverter sub-circuit to produce an inverter circuit that is insensitive to changes in the power supply voltage. When the ring oscillator is used as the voltage controlled oscillator of a phase locked loop, the delay of the inverters may be controlled by varying a bias current for each inverter in response to changes in the power supply voltage to reduce any jitter in a clock output produced by the changes in the power supply voltage. When the transistor devices are sized appropriately and the bias current is adjusted, the sensitivity of the inverter circuit to changes in the power supply voltage may be reduced.
    Type: Grant
    Filed: November 10, 2011
    Date of Patent: December 10, 2013
    Assignee: NVIDIA Corporation
    Inventor: William James Dally
  • Publication number: 20130120047
    Abstract: One embodiment of the present invention sets forth a technique for reducing jitter caused by changes in a power supply for a clock generated by a ring oscillator of inverter devices. An inverter sub-circuit is coupled in parallel with a current-starved inverter sub-circuit to produce an inverter circuit that is insensitive to changes in the power supply voltage. When the ring oscillator is used as the voltage controlled oscillator of a phase locked loop, the delay of the inverters may be controlled by varying a bias current for each inverter in response to changes in the power supply voltage to reduce any jitter in a clock output produced by the changes in the power supply voltage. When the transistor devices are sized appropriately and the bias current is adjusted, the sensitivity of the inverter circuit to changes in the power supply voltage may be reduced.
    Type: Application
    Filed: November 10, 2011
    Publication date: May 16, 2013
    Inventor: William James DALLY
  • Patent number: 8412917
    Abstract: Disclosed are methods and systems for dynamically determining data-transfer paths. The data-transfer paths are dynamically determined in response to an instruction that facilitates data transfer among execution lanes in an integrated-circuit processing device operable to execute operations in parallel. In addition, embodiments include an integrated-circuit processing device operable to execute operations in parallel, including the capability of providing confirmation information to potential source lanes, the confirmation information indicating whether the potential source lanes may send data to requested destination lanes during a data-transfer interval.
    Type: Grant
    Filed: September 20, 2011
    Date of Patent: April 2, 2013
    Assignee: Calos Fund Limited Liability Company
    Inventors: Brucek Khailany, William James Dally, Ujval J. Kapasi, Jim Jian Lin
  • Publication number: 20130021078
    Abstract: One embodiment of the present invention sets forth a technique for capturing and holding a level of an input signal using a latch circuit that presents a low number of loads to the clock signal. The clock is only coupled to a bridging transistor and a pair of clock-activated pull-down or pull-up transistors. The level of the input signal is propagated to the output signal when the storage sub-circuit is not enabled. The storage sub-circuit is enabled by the bridging transistor and a propagation sub-circuit is activated and deactivated by the pair of clock-activated transistors.
    Type: Application
    Filed: July 21, 2011
    Publication date: January 24, 2013
    Inventors: Ilyas ELKIN, William James DALLY, Jonah M. ALBEN
  • Publication number: 20120079201
    Abstract: One embodiment of the present invention sets forth am extension to a cache coherence protocol with two explicit control states, P (private), and R (read-only), that provide explicit program control of cache lines for which the program logic can guarantee correct behavior. In the private state, only the owner of a cache line can access the cache line for read or write operations. In the read-only state, only read operations can be performed on the cache line, thereby disallowing write operations to be performed.
    Type: Application
    Filed: September 23, 2011
    Publication date: March 29, 2012
    Inventor: William James Dally
  • Publication number: 20120079200
    Abstract: One embodiment of the present invention sets forth a technique for providing a unified memory for access by execution threads in a processing system. Several logically separate memories are combined into a single unified memory that includes a single set of shared memory banks, an allocation of space in each bank across the logical memories, a mapping rule that maps the address space of each logical memory to its partition of the shared physical memory, a circuitry including switches and multiplexers that supports the mapping, and an arbitration scheme that allocates access to the banks.
    Type: Application
    Filed: September 22, 2011
    Publication date: March 29, 2012
    Inventor: William James DALLY
  • Publication number: 20120075319
    Abstract: One embodiment of the present invention sets forth a technique for addressing data in a hierarchical graphics processing unit cluster. A hierarchical address is constructed based on the location of a storage circuit where a target unit of data resides. The hierarchical address comprises a level field indicating a hierarchical level for the unit of data and a node identifier that indicates which GPU within the GPU cluster currently stores the unit of data. The hierarchical address may further comprise one or more identifiers that indicate which storage circuit in a particular hierarchical level currently stores the unit of data. The hierarchical address is constructed and interpreted based on the level field. The technique advantageously enables programs executing within the GPU cluster to efficiently access data residing in other GPUs using the hierarchical address.
    Type: Application
    Filed: September 23, 2011
    Publication date: March 29, 2012
    Inventor: William James Dally
  • Publication number: 20120079503
    Abstract: One embodiment of the present invention sets forth a technique for scheduling thread execution in a multi-threaded processing environment. A two-level scheduler maintains a small set of active threads called strands to hide function unit pipeline latency and local memory access latency. The strands are a sub-set of a larger set of pending threads that is also maintained by the two-leveler scheduler. Pending threads are promoted to strands and strands are demoted to pending threads based on latency characteristics. The two-level scheduler selects strands for execution based on strand state. The longer latency of the pending threads is hidden by selecting strands for execution. When the latency for a pending thread is expired, the pending thread may be promoted to a strand and begin (or resume) execution. When a strand encounters a latency event, the strand may be demoted to a pending thread while the latency is incurred.
    Type: Application
    Filed: June 1, 2011
    Publication date: March 29, 2012
    Inventors: William James DALLY, Stephen William Keckler, David Tarjan, John Erik Lindholm, Mark Alan Gebhart, Daniel Robert Johnson
  • Publication number: 20120079241
    Abstract: One embodiment of the present invention sets forth a technique for scheduling thread execution in a multi-threaded processing environment. A two-level scheduler maintains a small set of active threads called strands to hide function unit pipeline latency and local memory access latency. The strands are a sub-set of a larger set of pending threads that is also maintained by the two-leveler scheduler. Pending threads are promoted to strands and strands are demoted to pending threads based on latency characteristics, such as whether outstanding load operations have been executed. The longer latency of the pending threads is hidden by selecting strands for execution. When the latency for a pending thread is expired, the pending thread may be promoted to a strand and begin (or resume) execution. When a strand encounters a latency event, the strand may be demoted to a pending thread while the latency is incurred.
    Type: Application
    Filed: September 23, 2011
    Publication date: March 29, 2012
    Inventors: William James DALLY, John Erik Lindholm
  • Patent number: 8122078
    Abstract: A method of operation within an integrated-circuit processing device having an enhanced combined-arithmetic capability. In response to an instruction indicating a combined arithmetic operation, the processor generates a dot-product of first and second operands, adds the dot-product to an accumulated value, and then outputs the sum of the accumulated value and the dot-product.
    Type: Grant
    Filed: October 9, 2007
    Date of Patent: February 21, 2012
    Assignee: Calos Fund, LLC
    Inventors: Brucek Khailany, William James Dally, Raghunath Rao, DeForest Tovey
  • Publication number: 20120011349
    Abstract: Disclosed are methods and systems for dynamically determining data-transfer paths. The data-transfer pats are determined in response to an instruction that facilitates data transfer among execution lanes in an integrated-circuit processing device operable to execute operations in parallel.
    Type: Application
    Filed: September 20, 2011
    Publication date: January 12, 2012
    Applicant: Calos Fund Limited Liability Company
    Inventors: Brucek Khailany, William James Dally, Ujval J. Kapasi, Jim Jian Lin, Raghunath Rao, DeForest Tovey, Mark Rygh, Jung-Ho Ahn
  • Patent number: 8024553
    Abstract: A method of operation within an integrated-circuit processing device having a plurality of execution lanes. Upon receiving an instruction to exchange data between the execution lanes, respective requests from the execution lanes are examined to determine a set of the execution lanes that may send data to one or more others of the execution lanes during a first interval. Each execution lane within the set of the execution lanes is signaled to indicate that the execution lane may send data to the one or others of the execution lanes.
    Type: Grant
    Filed: August 15, 2008
    Date of Patent: September 20, 2011
    Assignee: Calos Fund Limited Liability Company
    Inventors: Brucek Khailany, William James Dally, Ujval J. Kapasi, Jim Jian Lin