Patents by Inventor Mattheus C. HEDDES

Mattheus C. HEDDES has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11942970
    Abstract: Embodiments of the present disclosure include techniques for compressing data using a tree encoded bit mask that may result in higher compression ratios. In one embodiment, an input vector having a plurality of values is received by a first plurality of switch circuits. Selection of the input values is controlled by sets of bits from the bit mask. The sets of bits specify locations of portions of the input vector where particular value of interest reside. The switch circuits output multiple values of the input vector, which include the particular value of interest. A second stage of switch circuits is controlled by logic circuit that detects values on the outputs of the first stage of switch circuits and outputs the values of interest. In some embodiments, the values of interest may be non-zero values of a sparse input vector, and the switch circuits may be multiplexers.
    Type: Grant
    Filed: March 4, 2022
    Date of Patent: March 26, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Nishit Shah, Ankit More, Mattheus C. Heddes
  • Patent number: 11886938
    Abstract: One example provides an integrated computing device, comprising one or more computing clusters, and one or more network controllers, each network controller comprising a local data notification queue to queue send message notifications originating from the computing clusters on the integrated computing device, a remote data notification queue to queue receive message notifications originating from network controllers on remote integrated computing devices, a local no-data notification queue to queue receive message notifications originating from computing clusters on the integrated computing device, and a connection scheduler configured to schedule sending of data from memory on the integrated computing device when a send message notification in the local data notification queue is matched with a receive message notification in the remote data notification queue, and to schedule sending of receive message notifications from the local no-data notification queue.
    Type: Grant
    Filed: March 11, 2021
    Date of Patent: January 30, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Deepak Goel, Mattheus C Heddes, Torsten Hoefler, Xiaoling Xu
  • Patent number: 11848689
    Abstract: Embodiments of the present disclosure include a digital circuit and method for compressing input digital values. A plurality of input digital values may include zero values and non-zero values. The input digital values are received on M inputs of a first switching stage. The first switching stage is arranged in groups that rearrange the non-zero values on first switching stage outputs according to a compression and shift. The compression and shift position the non-zero values on outputs coupled to inputs of a second switching stage. The second switching stage consecutively couples non-zero values to N outputs, where N is less than M.
    Type: Grant
    Filed: March 4, 2022
    Date of Patent: December 19, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Ankit More, Mattheus C. Heddes, Nishit Shah
  • Publication number: 20230334284
    Abstract: Embodiments of the present disclosure include systems and methods for sparsifying vectors for neural network models based on overlapping windows. A window is used to select a first set of elements in a vector of elements. A first element is selected from the first set of elements having the highest absolute value. The window is slid along the vector by a defined number of elements. The window is used to select a second set of elements in the vector, wherein the first set of elements and the second set of elements share at least one common element. A second element is selected from the second set of elements having the highest absolute value.
    Type: Application
    Filed: May 27, 2022
    Publication date: October 19, 2023
    Inventors: Girish Vishnu VARATKAR, Ankit MORE, Bita DARVISH ROUHANI, Mattheus C. HEDDES, Gaurav AGRAWAL
  • Publication number: 20230333739
    Abstract: Embodiments of the present disclosure include a digital circuit and method for multi-stage compression. Digital data values are compressed using a multi-stage compression algorithm and stored in a memory. A decompression circuit receives the values and performs a partial decompression. The partially compressed values are provided to a processor, which performs the final decompression. In one embodiment, a vector of N length compressed values are decompressed using a first bit mask into two N length sets having non-zero values. The two N length sets are further decompressed using two M length bit masks into M length sparse vectors, each having non-zero values.
    Type: Application
    Filed: June 23, 2023
    Publication date: October 19, 2023
    Inventors: Mattheus C. HEDDES, Ankit MORE, Nishit SHAH, Torsten HOEFLER
  • Publication number: 20230318620
    Abstract: Embodiments of the present disclosure include a digital circuit and method for compressing input digital values. A plurality of input digital values may include zero values and non-zero values. The input digital values are received on M inputs of a first switching stage. The first switching stage is arranged in groups that rearrange the non-zero values on first switching stage outputs according to a compression and shift. The compression and shift position the non-zero values on outputs coupled to inputs of a second switching stage. The second switching stage consecutively couples non-zero values to N outputs, where N is less than M.
    Type: Application
    Filed: March 4, 2022
    Publication date: October 5, 2023
    Inventors: Ankit MORE, Mattheus C. HEDDES, Nishit SHAH
  • Publication number: 20230283296
    Abstract: Embodiments of the present disclosure include techniques for compressing data using a tree encoded bit mask that may result in higher compression ratios. In one embodiment, an input vector having a plurality of values is received by a first plurality of switch circuits. Selection of the input values is controlled by sets of bits from the bit mask. The sets of bits specify locations of portions of the input vector where particular value of interest reside. The switch circuits output multiple values of the input vector, which include the particular value of interest. A second stage of switch circuits is controlled by logic circuit that detects values on the outputs of the first stage of switch circuits and outputs the values of interest. In some embodiments, the values of interest may be non-zero values of a sparse input vector, and the switch circuits may be multiplexers.
    Type: Application
    Filed: March 4, 2022
    Publication date: September 7, 2023
    Inventors: Nishit SHAH, Ankit MORE, Mattheus C. HEDDES
  • Patent number: 11720252
    Abstract: Embodiments of the present disclosure include a digital circuit and method for multi-stage compression. Digital data values are compressed using a multi-stage compression algorithm and stored in a memory. A decompression circuit receives the values and performs a partial decompression. The partially compressed values are provided to a processor, which performs the final decompression. In one embodiment, a vector of N length compressed values are decompressed using a first bit mask into two N length sets having non-zero values. The two N length sets are further decompressed using two M length bit masks into M length sparse vectors, each having non-zero values.
    Type: Grant
    Filed: March 4, 2022
    Date of Patent: August 8, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Mattheus C. Heddes, Ankit More, Nishit Shah, Torsten Hoefler
  • Patent number: 11580388
    Abstract: Embodiments of the present disclosure include techniques for processing neural networks. Various forms of parallelism may be implemented using topology that combines sequences of processors. In one embodiment, the present disclosure includes a computer system comprising a plurality of processor groups, the processor groups each comprising a plurality of processors. A plurality of network switches are coupled to subsets of the plurality of processor groups. A subset of the processors in the processor groups may be configurable to form sequences, and the network switches are configurable to form at least one sequence across one or more of the plurality of processor groups to perform neural network computations. Various alternative configurations for creating Hamiltonian cycles are disclosed to support data parallelism, pipeline parallelism, layer parallelism, or combinations thereof.
    Type: Grant
    Filed: January 3, 2020
    Date of Patent: February 14, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Torsten Hoefler, Mattheus C. Heddes, Deepak Goel, Jonathan R Belk
  • Publication number: 20220405571
    Abstract: Embodiments of the present disclosure include systems and methods for sparsifying narrow data formats for neural networks. A plurality of activation values in a neural network are provided to a muxing unit. A set of sparsification operations are performed on a plurality of weight values to generate a subset of the plurality of weight values and mask values associated with the plurality of weight values. The subset of the plurality of weight values are provided to a matrix multiplication unit. The muxing unit generates a subset of the plurality of activation values based on the mask values and provides the subset of the plurality of activation values to the matrix multiplication unit. The matrix multiplication unit performs a set of matrix multiplication operations on the subset of the plurality of weight values and the subset of the plurality of activation values to generate a set of outputs.
    Type: Application
    Filed: June 16, 2021
    Publication date: December 22, 2022
    Inventors: Bita DARVISH ROUHANI, Venmugil Elango, Eric S. Chung, Douglas C Burger, Mattheus C. Heddes, Nishit Shah, Rasoul Shafipour, Ankit More
  • Publication number: 20220291976
    Abstract: One example provides an integrated computing device, comprising one or more computing clusters, and one or more network controllers, each network controller comprising a local data notification queue to queue send message notifications originating from the computing clusters on the integrated computing device, a remote data notification queue to queue receive message notifications originating from network controllers on remote integrated computing devices, a local no-data notification queue to queue receive message notifications originating from computing clusters on the integrated computing device, and a connection scheduler configured to schedule sending of data from memory on the integrated computing device when a send message notification in the local data notification queue is matched with a receive message notification in the remote data notification queue, and to schedule sending of receive message notifications from the local no-data notification queue.
    Type: Application
    Filed: March 11, 2021
    Publication date: September 15, 2022
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Deepak GOEL, Mattheus C. HEDDES, Torsten HOEFLER, Xiaoling XU
  • Publication number: 20220244911
    Abstract: The present disclosure includes digital circuits that generate values of a power of two (2) raised to an input value. For example, a digital circuit may include combinational logic that receives first digital bits representing an input mantissa of an input value and second digital bits representing an input exponent of the input value. The combinational logic generates a plurality of output mantissas and plurality of output exponents corresponding to an approximate value of a power of two (2) raised to a power of the input value when the input value is positive and negative and when the input exponent is above and below a first value. Selection circuits are configured to receive output mantissas and output exponents. The selection circuits include selection control inputs coupled to the input exponent and an input sign bit of the input value to select one of the output mantissas and one output exponents.
    Type: Application
    Filed: January 29, 2021
    Publication date: August 4, 2022
    Inventors: Torsten Hoefler, Mattheus C Heddes
  • Patent number: 11076210
    Abstract: Embodiments of the present disclosure include techniques for processing neural networks. Various forms of parallelism may be implemented using topology that combines sequences of processors. In one embodiment, the present disclosure includes a computer system comprising one or more processor groups, the processor groups each comprising a plurality of processors. A plurality of network switches are coupled to subsets of the plurality of processor groups. In one embodiment, the switches may be optical network switches. Processors in the processor groups may be configurable to form sequences, and the network switches are configurable to form at least one sequence across one or more of the plurality of processor groups to perform neural network computations. Various alternative configurations for creating Hamiltonian cycles are disclosed to support data parallelism, pipeline parallelism, layer parallelism, or combinations thereof.
    Type: Grant
    Filed: May 26, 2020
    Date of Patent: July 27, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Torsten Hoefler, Mattheus C. Heddes, Jonathan R. Belk
  • Publication number: 20210209460
    Abstract: Embodiments of the present disclosure include techniques for processing neural networks. Various forms of parallelism may be implemented using topology that combines sequences of processors. In one embodiment, the present disclosure includes a computer system comprising a plurality of processor groups, the processor groups each comprising a plurality of processors. A plurality of network switches are coupled to subsets of the plurality of processor groups. A subset of the processors in the processor groups may be configurable to form sequences, and the network switches are configurable to form at least one sequence across one or more of the plurality of processor groups to perform neural network computations. Various alternative configurations for creating Hamiltonian cycles are disclosed to support data parallelism, pipeline parallelism, layer parallelism, or combinations thereof.
    Type: Application
    Filed: January 3, 2020
    Publication date: July 8, 2021
    Inventors: Torsten HOEFLER, Mattheus C. HEDDES, Deepak GOEL, Jonathan R. BELK
  • Publication number: 20210211787
    Abstract: Embodiments of the present disclosure include techniques for processing neural networks. Various forms of parallelism may be implemented using topology that combines sequences of processors. In one embodiment, the present disclosure includes a computer system comprising one or more processor groups, the processor groups each comprising a plurality of processors. A plurality of network switches are coupled to subsets of the plurality of processor groups. In one embodiment, the switches may be optical network switches. Processors in the processor groups may be configurable to form sequences, and the network switches are configurable to form at least one sequence across one or more of the plurality of processor groups to perform neural network computations. Various alternative configurations for creating Hamiltonian cycles are disclosed to support data parallelism, pipeline parallelism, layer parallelism, or combinations thereof.
    Type: Application
    Filed: May 26, 2020
    Publication date: July 8, 2021
    Inventors: Torsten HOEFLER, Mattheus C. HEDDES, Jonathan R. BELK
  • Patent number: 5450351
    Abstract: A content addressable memory (CAM) implementation using random access memory (RAM) and a method for operating the implementation are described, wherein the RAM is divided into smaller, individually addressable units, which are addressed by a subword of the applied keyword, and the outputs of which are bitwise ANDed. The result of the bitwise AND operation is used to activate the matching lines of the CAM implementation. The new implementation allows the use of conventional circuit design.
    Type: Grant
    Filed: November 19, 1993
    Date of Patent: September 12, 1995
    Assignee: International Business Machines Corporation
    Inventor: Mattheus C. A. Heddes