Patents by Inventor William J. Dally
William J. Dally has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11847550Abstract: A method, computer program product, and system perform computations using a processor. A first instruction including a first index vector operand and a second index vector operand is received and the first index vector operand is decoded to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array. The second index vector operand is decoded to produce second coordinate sets for a second array, each second coordinate set including at least a third coordinate and a fourth coordinate of a position of a non-zero element in the second array. The first coordinate sets are summed with the second coordinate sets to produce output coordinate sets and the output coordinate sets are converted into a set of linear indices.Type: GrantFiled: December 4, 2020Date of Patent: December 19, 2023Assignee: NVIDIA CorporationInventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
-
Patent number: 11226820Abstract: Systems and methods for managing context switches among threads in a processing system. A processor may perform a context switch between threads using separate context registers. A context switch allows a processor to switch from processing a thread that is waiting for data to one that is ready for additional processing. The processor includes control registers with entries which may indicate that an associated context is waiting for data from an external source.Type: GrantFiled: November 23, 2016Date of Patent: January 18, 2022Assignee: ARM Finance Overseas LimitedInventors: Robert Gelinas, W. Patrick Hays, Sol Katzman, William J. Dally
-
Patent number: 10997496Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. Compressed-sparse data is received for input to a processing element, wherein the compressed-sparse data encodes non-zero elements and corresponding multi-dimensional positions. The non-zero elements are processed in parallel by the processing element to produce a plurality of result values. The corresponding multi-dimensional positions are processed in parallel by the processing element to produce destination addresses for each result value in the plurality of result values. Each result value is transmitted to a destination accumulator associated with the destination address for the result value.Type: GrantFiled: March 14, 2017Date of Patent: May 4, 2021Assignee: NVIDIA CorporationInventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
-
Publication number: 20210089864Abstract: A method, computer program product, and system perform computations using a processor. A first instruction including a first index vector operand and a second index vector operand is received and the first index vector operand is decoded to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array. The second index vector operand is decoded to produce second coordinate sets for a second array, each second coordinate set including at least a third coordinate and a fourth coordinate of a position of a non-zero element in the second array. The first coordinate sets are summed with the second coordinate sets to produce output coordinate sets and the output coordinate sets are converted into a set of linear indices.Type: ApplicationFiled: December 4, 2020Publication date: March 25, 2021Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
-
Patent number: 10891538Abstract: A method, computer program product, and system perform computations using a processor. A first instruction including a first index vector operand and a second index vector operand is received and the first index vector operand is decoded to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array. The second index vector operand is decoded to produce second coordinate sets for a second array, each second coordinate set including at least a third coordinate and a fourth coordinate of a position of a non-zero element in the second array. The first coordinate sets are summed with the second coordinate sets to produce output coordinate sets and the output coordinate sets are converted into a set of linear indices.Type: GrantFiled: July 25, 2017Date of Patent: January 12, 2021Assignee: NVIDIA CorporationInventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
-
Patent number: 10860922Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. A first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a 3D space is received. A second vector comprising only non-zero input activation values and second associated positions of the non-zero input activation values within a 2D space is received. The non-zero weight values are multiplied with the non-zero input activation values, within a multiplier array, to produce a third vector of products. The first associated positions are combined with the second associated positions to produce a fourth vector of positions, where each position in the fourth vector is associated with a respective product in the third vector. The products in the third vector are transmitted to adders in an accumulator array, based on the position associated with each one of the products.Type: GrantFiled: November 18, 2019Date of Patent: December 8, 2020Assignee: NVIDIA CorporationInventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
-
Publication number: 20200082254Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. A first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a 3D space is received. A second vector comprising only non-zero input activation values and second associated positions of the non-zero input activation values within a 2D space is received. The non-zero weight values are multiplied with the non-zero input activation values, within a multiplier array, to produce a third vector of products. The first associated positions are combined with the second associated positions to produce a fourth vector of positions, where each position in the fourth vector is associated with a respective product in the third vector. The products in the third vector are transmitted to adders in an accumulator array, based on the position associated with each one of the products.Type: ApplicationFiled: November 18, 2019Publication date: March 12, 2020Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
-
Patent number: 10528864Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. A first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a 3D space is received. A second vector comprising only non-zero input activation values and second associated positions of the non-zero input activation values within a 2D space is received. The non-zero weight values are multiplied with the non-zero input activation values, within a multiplier array, to produce a third vector of products. The first associated positions are combined with the second associated positions to produce a fourth vector of positions, where each position in the fourth vector is associated with a respective product in the third vector. The products in the third vector are transmitted to adders in an accumulator array, based on the position associated with each one of the products.Type: GrantFiled: March 14, 2017Date of Patent: January 7, 2020Assignee: NVIDIA CorporationInventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
-
Patent number: 10505451Abstract: A system and method are provided for controlling a modified buck converter circuit. A pull-up switching mechanism that is coupled to an upstream terminal of an inductor within a modified buck converter circuit is enabled. A load current at the output of the modified buck regulator circuit is measured. A capacitor current associated with a capacitor that is coupled to a downstream terminal of the inductor is continuously sensed and the pull-up switching mechanism is disabled when the capacitor current is greater than a sum of the load current and an enabling current value.Type: GrantFiled: January 15, 2019Date of Patent: December 10, 2019Assignee: NVIDIA CorporationInventors: Sudhir Shrikantha Kudva, William J. Dally, Thomas Hastings Greer, III, Carl Thomas Gray
-
Patent number: 10361023Abstract: A magnetic power supply coupling system is disclosed. An integrated circuit module includes an integrated circuit die and a secondary winding that is configured to generate an induced, alternating current based on a magnetic flux. A primary winding is external to the integrated circuit module, proximate to the integrated circuit module, and coupled to a main power supply corresponding to an alternating current that generates the magnetic flux. The induced, alternating current is converted into a direct current at a voltage level to supply power to the integrated circuit die.Type: GrantFiled: July 31, 2015Date of Patent: July 23, 2019Assignee: NVIDIA CorporationInventors: William J. Dally, Thomas Hastings Greer, III, Sudhir Shrikantha Kudva
-
Publication number: 20190173380Abstract: A system and method are provided for controlling a modified buck converter circuit. A pull-up switching mechanism that is coupled to an upstream terminal of an inductor within a modified buck converter circuit is enabled. A load current at the output of the modified buck regulator circuit is measured. A capacitor current associated with a capacitor that is coupled to a downstream terminal of the inductor is continuously sensed and the pull-up switching mechanism is disabled when the capacitor current is greater than a sum of the load current and an enabling current value.Type: ApplicationFiled: January 15, 2019Publication date: June 6, 2019Inventors: Sudhir Shrikantha Kudva, William J. Dally, Thomas Hastings Greer, III, Carl Thomas Gray
-
Patent number: 10224813Abstract: A system and method are provided for controlling a modified buck converter circuit. A pull-up switching mechanism that is coupled to an upstream terminal of an inductor within a modified buck converter circuit is enabled. A load current at the output of the modified buck regulator circuit is measured. A capacitor current associated with a capacitor that is coupled to a downstream terminal of the inductor is continuously sensed and the pull-up switching mechanism is disabled when the capacitor current is greater than a sum of the load current and an enabling current value.Type: GrantFiled: March 24, 2016Date of Patent: March 5, 2019Assignee: NVIDIA CorporationInventors: Sudhir Shrikantha Kudva, William J. Dally, Thomas Hastings Greer, III, Carl Thomas Gray
-
Patent number: 10153985Abstract: A multiprocessor computer system comprises a dragonfly processor interconnect network that comprises a plurality of processor nodes, a plurality of routers, each router directly coupled to a plurality of terminal nodes, the routers coupled to one another and arranged into a group, and a plurality of groups of routers, such that each group is connected to each other group via at least one direct connection.Type: GrantFiled: February 17, 2017Date of Patent: December 11, 2018Assignees: Intel Corporation, The Board of Trustees of the Leland Stanford Junior UniversityInventors: John Kim, Dennis C. Abts, Steven L. Scott, William J. Dally
-
Patent number: 10128904Abstract: A repeater circuit is disclosed. The repeater circuit is coupled to a transmission line driven by a first transmitter circuit and configured to detect a signal transition from a first voltage level to a second voltage level at a first position on the transmission line. The repeater circuit then reinforces the signal transition from the second voltage level to a third voltage level at the first position on the transmission line without interrupting a current through the transmission line.Type: GrantFiled: June 23, 2015Date of Patent: November 13, 2018Assignee: NVIDIA CORPORATIONInventor: William J. Dally
-
Patent number: 10096134Abstract: A method, computer program product, and system for sparse convolutional neural networks that improves efficiency is described. Multi-bit data for input to a processing element is received at a compaction engine. The multi-bit data is determined to equal zero and a single bit signal is transmitted from the memory interface to the processing element in lieu of the multi-bit data, where the single bit signal indicates that the multi-bit data equals zero. A compacted data sequence for input to a processing element is received by a memory interface. The compacted data sequence is transmitted from the memory interface to an expansion engine. Non-zero values are extracted from the compacted data sequence and zeros are inserted between the non-zero values by the expansion engine to generate an expanded data sequence that is output to the processing element.Type: GrantFiled: February 1, 2017Date of Patent: October 9, 2018Assignee: NVIDIA CorporationInventors: Zhou Yan, Franciscus Wilhelmus Sijstermans, Yuanzhi Hua, Xiaojun Wang, Jeffrey Michael Pool, William J. Dally, Liang Chen
-
Patent number: 10056862Abstract: A method includes measuring one or more performance metrics of a set of solar cells coupled to an inverter. Based at least on the performance metrics meeting a first criterion, a first subset of the set of solar cells are disabled, reducing a voltage, power, or current provided to the inverter. Based at least on the performance metrics meeting a second criterion, a second subset of the set of solar cells are disabled, further reducing a voltage, power, or current provided to the inverter.Type: GrantFiled: May 26, 2015Date of Patent: August 21, 2018Assignee: SunPower CorporationInventors: Andrew J Ponec, Darren Hau, Benjamin A. Johnson, Daniel J. M. Maren, William J. Dally
-
Publication number: 20180218518Abstract: A method, computer program product, and system for sparse convolutional neural networks that improves efficiency is described. Multi-bit data for input to a processing element is received at a compaction engine. The multi-bit data is determined to equal zero and a single bit signal is transmitted from the memory interface to the processing element in lieu of the multi-bit data, where the single bit signal indicates that the multi-bit data equals zero. A compacted data sequence for input to a processing element is received by a memory interface. The compacted data sequence is transmitted from the memory interface to an expansion engine. Non-zero values are extracted from the compacted data sequence and zeros are inserted between the non-zero values by the expansion engine to generate an expanded data sequence that is output to the processing element.Type: ApplicationFiled: February 1, 2017Publication date: August 2, 2018Inventors: Zhou Yan, Franciscus Wilhelmus Sijstermans, Yuanzhi Hua, Xiaojun Wang, Jeffrey Michael Pool, William J. Dally, Liang Chen
-
Patent number: 9928104Abstract: A system, method, and computer program product are provided for accessing a queue. The method includes receiving a first request to reserve a data record entry in a queue, updating a queue state block based on the first request, and returning a response to the request. A second request is received to commit the data record entry and the queue state block is updated based on the second request.Type: GrantFiled: June 19, 2013Date of Patent: March 27, 2018Assignee: NVIDIA CorporationInventors: William J. Dally, James David Balfour, Ignacio Llamas Ubieto
-
Publication number: 20180046916Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. Compressed-sparse data is received for input to a processing element, wherein the compressed-sparse data encodes non-zero elements and corresponding multi-dimensional positions. The non-zero elements are processed in parallel by the processing element to produce a plurality of result values. The corresponding multi-dimensional positions are processed in parallel by the processing element to produce destination addresses for each result value in the plurality of result values. Each result value is transmitted to a destination accumulator associated with the destination address for the result value.Type: ApplicationFiled: March 14, 2017Publication date: February 15, 2018Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
-
Publication number: 20180046900Abstract: A method, computer program product, and system perform computations using a processor. A first instruction including a first index vector operand and a second index vector operand is received and the first index vector operand is decoded to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array. The second index vector operand is decoded to produce second coordinate sets for a second array, each second coordinate set including at least a third coordinate and a fourth coordinate of a position of a non-zero element in the second array. The first coordinate sets are summed with the second coordinate sets to produce output coordinate sets and the output coordinate sets are converted into a set of linear indices.Type: ApplicationFiled: July 25, 2017Publication date: February 15, 2018Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison