Patents by Inventor Torsten HOEFLER
Torsten HOEFLER has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11886938Abstract: One example provides an integrated computing device, comprising one or more computing clusters, and one or more network controllers, each network controller comprising a local data notification queue to queue send message notifications originating from the computing clusters on the integrated computing device, a remote data notification queue to queue receive message notifications originating from network controllers on remote integrated computing devices, a local no-data notification queue to queue receive message notifications originating from computing clusters on the integrated computing device, and a connection scheduler configured to schedule sending of data from memory on the integrated computing device when a send message notification in the local data notification queue is matched with a receive message notification in the remote data notification queue, and to schedule sending of receive message notifications from the local no-data notification queue.Type: GrantFiled: March 11, 2021Date of Patent: January 30, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Deepak Goel, Mattheus C Heddes, Torsten Hoefler, Xiaoling Xu
-
Publication number: 20230333739Abstract: Embodiments of the present disclosure include a digital circuit and method for multi-stage compression. Digital data values are compressed using a multi-stage compression algorithm and stored in a memory. A decompression circuit receives the values and performs a partial decompression. The partially compressed values are provided to a processor, which performs the final decompression. In one embodiment, a vector of N length compressed values are decompressed using a first bit mask into two N length sets having non-zero values. The two N length sets are further decompressed using two M length bit masks into M length sparse vectors, each having non-zero values.Type: ApplicationFiled: June 23, 2023Publication date: October 19, 2023Inventors: Mattheus C. HEDDES, Ankit MORE, Nishit SHAH, Torsten HOEFLER
-
Patent number: 11720252Abstract: Embodiments of the present disclosure include a digital circuit and method for multi-stage compression. Digital data values are compressed using a multi-stage compression algorithm and stored in a memory. A decompression circuit receives the values and performs a partial decompression. The partially compressed values are provided to a processor, which performs the final decompression. In one embodiment, a vector of N length compressed values are decompressed using a first bit mask into two N length sets having non-zero values. The two N length sets are further decompressed using two M length bit masks into M length sparse vectors, each having non-zero values.Type: GrantFiled: March 4, 2022Date of Patent: August 8, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Mattheus C. Heddes, Ankit More, Nishit Shah, Torsten Hoefler
-
Patent number: 11580388Abstract: Embodiments of the present disclosure include techniques for processing neural networks. Various forms of parallelism may be implemented using topology that combines sequences of processors. In one embodiment, the present disclosure includes a computer system comprising a plurality of processor groups, the processor groups each comprising a plurality of processors. A plurality of network switches are coupled to subsets of the plurality of processor groups. A subset of the processors in the processor groups may be configurable to form sequences, and the network switches are configurable to form at least one sequence across one or more of the plurality of processor groups to perform neural network computations. Various alternative configurations for creating Hamiltonian cycles are disclosed to support data parallelism, pipeline parallelism, layer parallelism, or combinations thereof.Type: GrantFiled: January 3, 2020Date of Patent: February 14, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Torsten Hoefler, Mattheus C. Heddes, Deepak Goel, Jonathan R Belk
-
Publication number: 20220291976Abstract: One example provides an integrated computing device, comprising one or more computing clusters, and one or more network controllers, each network controller comprising a local data notification queue to queue send message notifications originating from the computing clusters on the integrated computing device, a remote data notification queue to queue receive message notifications originating from network controllers on remote integrated computing devices, a local no-data notification queue to queue receive message notifications originating from computing clusters on the integrated computing device, and a connection scheduler configured to schedule sending of data from memory on the integrated computing device when a send message notification in the local data notification queue is matched with a receive message notification in the remote data notification queue, and to schedule sending of receive message notifications from the local no-data notification queue.Type: ApplicationFiled: March 11, 2021Publication date: September 15, 2022Applicant: Microsoft Technology Licensing, LLCInventors: Deepak GOEL, Mattheus C. HEDDES, Torsten HOEFLER, Xiaoling XU
-
Publication number: 20220244911Abstract: The present disclosure includes digital circuits that generate values of a power of two (2) raised to an input value. For example, a digital circuit may include combinational logic that receives first digital bits representing an input mantissa of an input value and second digital bits representing an input exponent of the input value. The combinational logic generates a plurality of output mantissas and plurality of output exponents corresponding to an approximate value of a power of two (2) raised to a power of the input value when the input value is positive and negative and when the input exponent is above and below a first value. Selection circuits are configured to receive output mantissas and output exponents. The selection circuits include selection control inputs coupled to the input exponent and an input sign bit of the input value to select one of the output mantissas and one output exponents.Type: ApplicationFiled: January 29, 2021Publication date: August 4, 2022Inventors: Torsten Hoefler, Mattheus C Heddes
-
Publication number: 20220138524Abstract: Embodiments of the present disclosure include systems and methods for training neural networks based on dual pipeline architectures. In some embodiments, a first set of compute elements are configured to implement a first set of layers of a first instance of a neural network. A second set of compute elements are configured to implement a second set of layers of the first instance of the neural network. The second set of compute elements are further configured to implement a first set of layers of a second instance of the neural network. The first set of compute elements are further configured to implement a second set of layers of the second instance of the neural network. The first set of layers of the first instance of the neural network and the first set of layers of the second instance of the neural network are each configured to receive training data.Type: ApplicationFiled: January 15, 2021Publication date: May 5, 2022Inventors: Mattheus HEDDES, Torsten HOEFLER, Kenneth Andrew COLWELL, Amar PHANISHAYEE
-
Patent number: 11076210Abstract: Embodiments of the present disclosure include techniques for processing neural networks. Various forms of parallelism may be implemented using topology that combines sequences of processors. In one embodiment, the present disclosure includes a computer system comprising one or more processor groups, the processor groups each comprising a plurality of processors. A plurality of network switches are coupled to subsets of the plurality of processor groups. In one embodiment, the switches may be optical network switches. Processors in the processor groups may be configurable to form sequences, and the network switches are configurable to form at least one sequence across one or more of the plurality of processor groups to perform neural network computations. Various alternative configurations for creating Hamiltonian cycles are disclosed to support data parallelism, pipeline parallelism, layer parallelism, or combinations thereof.Type: GrantFiled: May 26, 2020Date of Patent: July 27, 2021Assignee: Microsoft Technology Licensing, LLCInventors: Torsten Hoefler, Mattheus C. Heddes, Jonathan R. Belk
-
Publication number: 20210209460Abstract: Embodiments of the present disclosure include techniques for processing neural networks. Various forms of parallelism may be implemented using topology that combines sequences of processors. In one embodiment, the present disclosure includes a computer system comprising a plurality of processor groups, the processor groups each comprising a plurality of processors. A plurality of network switches are coupled to subsets of the plurality of processor groups. A subset of the processors in the processor groups may be configurable to form sequences, and the network switches are configurable to form at least one sequence across one or more of the plurality of processor groups to perform neural network computations. Various alternative configurations for creating Hamiltonian cycles are disclosed to support data parallelism, pipeline parallelism, layer parallelism, or combinations thereof.Type: ApplicationFiled: January 3, 2020Publication date: July 8, 2021Inventors: Torsten HOEFLER, Mattheus C. HEDDES, Deepak GOEL, Jonathan R. BELK
-
Publication number: 20210211787Abstract: Embodiments of the present disclosure include techniques for processing neural networks. Various forms of parallelism may be implemented using topology that combines sequences of processors. In one embodiment, the present disclosure includes a computer system comprising one or more processor groups, the processor groups each comprising a plurality of processors. A plurality of network switches are coupled to subsets of the plurality of processor groups. In one embodiment, the switches may be optical network switches. Processors in the processor groups may be configurable to form sequences, and the network switches are configurable to form at least one sequence across one or more of the plurality of processor groups to perform neural network computations. Various alternative configurations for creating Hamiltonian cycles are disclosed to support data parallelism, pipeline parallelism, layer parallelism, or combinations thereof.Type: ApplicationFiled: May 26, 2020Publication date: July 8, 2021Inventors: Torsten HOEFLER, Mattheus C. HEDDES, Jonathan R. BELK