Patents Assigned to Habana Labs Ltd.
  • Patent number: 12141093
    Abstract: A system includes a first processing device and a second processing device, each of which is coupled to a NIC implemented with an RDMA interface. The NICs are capable of rendezvous flows of RDMA write exchange. In an example where the first NIC is at the sender side and the second NIC is at the receiver side, a rendezvous flow is initiated by an execution of a RDMA write operation by the second NIC. The second NIC provides at least an address of a buffer in the second processing device to the first NIC through the RDMA write operation. Then the first NIC initiates a RDMA write operation to send data in a buffer in the first processing device to the second NIC. The second NIC may acknowledge receipt of the data with the second NIC. The second NIC can update a CI of the WQE based on the acknowledgement.
    Type: Grant
    Filed: January 25, 2022
    Date of Patent: November 12, 2024
    Assignee: Habana Labs Ltd.
    Inventors: Itay Zur, Ira Joffe, Shlomi Gridish, Amit Pessach, Yanai Pomeranz
  • Patent number: 12066936
    Abstract: An example cache memory includes a schedule module, control modules, a datapath, and an output module. The cache memory receives requests to read and/or write cache lines. The schedule module maintains a queue of the requests. The schedule module may assign the requests to the control modules based on the queue. A control module, which receives a request, controls the datapath to execute the request, i.e., to read or write the cache line. The control module can control the execution by the datapath from start to end. Multiple control modules may control parallel executions by the datapath. The output module outputs, e.g., to a processor, responses of the cache memory to the requests after the executions. A response may include a cache line. The cache memory may include a buffer that temporarily stores cache lines before the output to avoid deadlock in the datapath during the parallel executions of requests.
    Type: Grant
    Filed: March 21, 2022
    Date of Patent: August 20, 2024
    Assignee: Habana Labs Ltd.
    Inventors: Ehud Eliaz, Eitan Joshua, Yori Teichman, Ofer Eizenberg
  • Publication number: 20240265260
    Abstract: A DNN can be compressed by pruning one or more tensors for a deep learning operation. A first pruning parameter and a second pruning parameter are determined for a tensor. A vector having a size of the second pruning parameter may be extracted from the tensor. Pruning probabilities may be determined for the elements in the vector. One or more elements in the vector are selected based on the pruning probabilities. Alternatively, a matrix, in lieu of the vector, may be extracted from the tensor. Pruning probabilities may be determined for the columns in the matrix. One or more columns are selected based on their pruning probabilities. The number of the selected element(s) or column(s) may equal the first pruning parameter. The tensor can be modified by modifying the value(s) of the selected element(s) or column(s) and setting the value(s) of one or more unselected elements or columns to zero.
    Type: Application
    Filed: February 6, 2023
    Publication date: August 8, 2024
    Applicant: Habana Labs Ltd.
    Inventors: Brian Chmiel, Itay Hubara, Ron Banner
  • Patent number: 11847491
    Abstract: An apparatus for Machine Learning (ML) processing includes computational engines and a Central Processing Unit (CPU). The CPU is configured to receive a work plan for processing one or more samples in accordance with a ML model represented by a corresponding ML graph. The work plan specifies jobs required for executing at least a subgraph of the ML graph by the computational engines, the at least subgraph includes multiple inputs, and is executable independently of other parts of the ML graph when the inputs are valid. The CPU is further configured to pre-process only a partial subset of the jobs in the work plan corresponding to the at least subgraph, for producing a group of pre-processed jobs that are required for executing part of the at least subgraph based on the one or more samples, and to submit the pre-processed jobs in the group to the computational engines for execution.
    Type: Grant
    Filed: April 22, 2021
    Date of Patent: December 19, 2023
    Assignee: HABANA LABS LTD.
    Inventors: Oren Kaidar, Oded Gabbay
  • Patent number: 11714653
    Abstract: A method for computing includes defining a processing pipeline, including at least a first stage in which producer processors compute and output data to respective locations in a buffer and a second processing stage in which one or more consumer processors read the data from the buffer and apply a computational task to the data read from the buffer. The computational task is broken into multiple, independent work units, for application by the consumer processors to respective ranges of the data in the buffer, and respective indexes are assigned to the work units in a predefined index space. A mapping is generated between the index space and the addresses in the buffer, and execution of the work units is scheduled such that at least one of the work units can begin execution before all the producer processors have completed the first processing stage.
    Type: Grant
    Filed: February 15, 2021
    Date of Patent: August 1, 2023
    Assignee: HABANA LABS LTD.
    Inventors: Tzachi Cohen, Michael Zuckerman, Doron Singer, Ron Shalev, Amos Goldman
  • Patent number: 11532338
    Abstract: An electronic circuit includes a memory buffer and control logic. The memory buffer is configured to transfer data from a first domain to a second domain of the circuit, the first and the second domains operate in synchronization with respective clock signals. The control logic is configured to maintain a write indicator in the first domain indicative of a next write position in the memory buffer for storing data, to maintain a read indicator in the second domain indicative of a next read position in the memory buffer for retrieving the stored data, to generate in the second domain, based on the write and the read indicators, a first signal that is indicative of whether the memory buffer has data for reading or has become empty, and retain the first signal in a state that indicates that the memory buffer has become empty, until writing to the memory buffer resumes.
    Type: Grant
    Filed: April 6, 2021
    Date of Patent: December 20, 2022
    Assignee: HABANA LABS LTD.
    Inventors: Ehud Eliaz, Yamin Mokatren
  • Patent number: 11468147
    Abstract: A computational apparatus for implementing a neural network model having multiple neurons that evaluate an activation function, the apparatus including a memory and circuitry. The memory is configured to hold values of a difference-function, each value being a respective difference between the activation function and a predefined baseline function. The circuitry is configured to evaluate the neural network model, including, for at least one of the neurons: evaluate the baseline function at the argument, retrieve from the memory one or more values of the difference-function responsively to the argument, and evaluate the activation function at the argument based on the baseline function at the argument and on the one or more values of the difference-function.
    Type: Grant
    Filed: February 24, 2020
    Date of Patent: October 11, 2022
    Assignee: HABANA LABS LTD.
    Inventors: Elad Hofer, Sergei Gofman, Shlomo Raikin
  • Patent number: 11467827
    Abstract: A method for computing includes providing software source code defining a processing pipeline including multiple, sequential stages of parallel computations, in which a plurality of processors apply a computational task to data read from a buffer. A static code analysis is applied to the software source code so as to break the computational task into multiple, independent work units, and to define an index space in which the work units are identified by respective indexes. Based on the static code analysis, mapping parameters that define a mapping between the index space and addresses in the buffer are computed, indicating by the mapping the respective ranges of the data to which the work units are to be applied. The source code is compiled so that the processors execute the work units identified by the respective indexes while accessing the data in the buffer in accordance with the mapping.
    Type: Grant
    Filed: April 6, 2021
    Date of Patent: October 11, 2022
    Assignee: HABANA LABS LTD.
    Inventors: Michael Zuckerman, Tzachi Cohen, Doron Singer, Ron Shalev, Amos Goldman
  • Publication number: 20220201075
    Abstract: Systems, and method and computer readable media that store instructions for remote direct memory access (RDMA) transfers.
    Type: Application
    Filed: December 17, 2020
    Publication date: June 23, 2022
    Applicant: HABANA LABS LTD.
    Inventors: Itay Zur, Ira Joffe, Guy Hershtig, Amit Pessach, Yanai Pomeranz
  • Patent number: 11321092
    Abstract: A processor includes an internal memory and processing circuitry. The internal memory is configured to store a definition of a multi-dimensional array stored in an external memory, and indices that specify elements of the multi-dimensional array in terms of multi-dimensional coordinates of the elements within the array. The processing circuitry is configured to execute instructions in accordance with an Instruction Set Architecture (ISA) defined for the processor. At least some of the instructions in the ISA access the multi-dimensional array by operating on the multi-dimensional coordinates specified in the indices.
    Type: Grant
    Filed: October 25, 2018
    Date of Patent: May 3, 2022
    Assignee: HABANA LABS LTD.
    Inventors: Shlomo Raikin, Sergei Gofman, Ran Halutz, Evgeny Spektor, Amos Goldman, Ron Shalev
  • Publication number: 20220060423
    Abstract: Systems, and method and computer readable media that store instructions for remote direct memory access (RDMA) congestion control.
    Type: Application
    Filed: August 23, 2020
    Publication date: February 24, 2022
    Applicant: HABANA LABS LTD.
    Inventors: ITAY ZUR, Ira Joffe, Shlomo Raikin
  • Patent number: 11249724
    Abstract: A computational apparatus includes a memory unit and Read-Modify-Write (RMW) logic. The memory unit is configured to hold a data value. The RMW logic, which is coupled to the memory unit, is configured to perform an atomic RMW operation on the data value stored in the memory unit.
    Type: Grant
    Filed: August 28, 2019
    Date of Patent: February 15, 2022
    Assignee: HABANA LABS LTD.
    Inventors: Shlomo Raikin, Ron Shalev, Sergei Gofman, Ran Halutz, Nadav Klein
  • Patent number: 11240162
    Abstract: Systems, and method and computer readable media that store instructions for remote direct memory access (RDMA) congestion control.
    Type: Grant
    Filed: August 23, 2020
    Date of Patent: February 1, 2022
    Assignee: HABANA LABS LTD.
    Inventors: Itay Zur, Ira Joffe, Shlomo Raikin
  • Patent number: 10915297
    Abstract: Computational apparatus includes a systolic array of processing elements. In each of a sequence of processing cycles, the processing elements in a first row of the array each receive a respective first plurality of first operands, while the processing elements in a first column of the array each receive a respective second plurality of second operands. Each processing element, except in the first row and first column, receives the respective first and second pluralities of the operands from adjacent processing elements in a preceding row and column of the array. Each processing element multiplies pairs of the first and second operands together to generate multiple respective products, and accumulates the products in accumulators. Synchronization logic loads a succession of first and second vectors of the operands into the array, and upon completion of processing triggers the processing elements to transfer respective data values from the accumulators out of the array.
    Type: Grant
    Filed: November 12, 2018
    Date of Patent: February 9, 2021
    Assignee: HABANA LABS LTD.
    Inventors: Ran Halutz, Tomer Rothschild, Ron Shalev
  • Patent number: 10915494
    Abstract: A vector processor includes a coefficient memory and a processor. The processor has an Instruction Set Architecture (ISA), which includes an instruction that approximates a mathematical function by a polynomial. The processor is configured to approximate the mathematical function over an argument, by reading one or more coefficients of the polynomial from the coefficient memory and evaluating the polynomial at the argument using the coefficients.
    Type: Grant
    Filed: November 11, 2018
    Date of Patent: February 9, 2021
    Assignee: HABANA LABS LTD.
    Inventors: Ron Shalev, Evgeny Spektor, Sergei Gofman, Ran Halutz, Shlomo Raikin, Hilla Ben Yaacov
  • Patent number: 10853070
    Abstract: A processor includes a processing engine, an address queue, an address generation unit, and logic circuitry. The processing engine is configured to process instructions that access data in an external memory. The address generation unit is configured to generate respective addresses for the instructions to be processed by the processing engine, to provide the addresses to the processing engine, and to write the addresses to the address queue. The logic circuitry is configured to access the external memory on behalf of the processing engine while compensating for variations in access latency to the external memory, by reading the addresses from the address queue, and executing the instructions in the external memory in accordance with the addresses read from the address queue.
    Type: Grant
    Filed: October 3, 2018
    Date of Patent: December 1, 2020
    Assignee: HABANA LABS LTD.
    Inventors: Ron Shalev, Evgeny Spektor, Ran Halutz
  • Patent number: 10853448
    Abstract: Computational apparatus includes a memory, which is configured to contain multiple matrices of input data values. An array of processing elements is configured to perform multiplications of respective first and second input operands and to accumulate products of the multiplication to generate respective output values. Data access logic is configured to select from the memory a plurality of mutually-disjoint first matrices and a second matrix, and to distribute to the processing elements the input data values in a sequence that is interleaved among the first matrices, along with corresponding input data values from the second matrix, so as to cause the processing elements to compute, in the interleaved sequence, respective convolutions of each of the first matrices with the second matrix.
    Type: Grant
    Filed: September 11, 2017
    Date of Patent: December 1, 2020
    Assignee: HABANA LABS LTD.
    Inventors: Ron Shalev, Tomer Rothschild
  • Patent number: 10713214
    Abstract: Computational apparatus includes a systolic array of processing elements, each including a multiplier and first and second accumulators. In each of a sequence of processing cycles, the processing elements perform the following steps concurrently: Each processing element, except in the first row and first column of the array, receives first and second operands from adjacent processing elements in a preceding row and column of the array, respectively, multiplies the first and second operands together to generate a product, and accumulates the product in the first accumulator. In addition, each processing element passes a stored output data value from the second accumulator to a succeeding processing element along a respective column of the array, receives a new output data value from a preceding processing element along the respective column, and stores the new output data value in the second accumulator.
    Type: Grant
    Filed: September 20, 2018
    Date of Patent: July 14, 2020
    Assignee: HABANA LABS LTD.
    Inventors: Ron Shalev, Ran Halutz
  • Patent number: 10491241
    Abstract: An apparatus includes an input interface and compression circuitry. The input interface is configured to receive input source data. The compression circuitry in configured to set a symbol anchor value, having a highest occurrence probability among the symbol values in the input source data, to generate a bit-map by (i) for every symbol in the input source data whose symbol value is the anchor value, setting a respective bit in the bit-map to a first binary value, and (ii) for every symbol in the source data whose symbol value differs from the anchor value, setting the respective bit in the bit-map to a second binary value, and to generate compressed data including (i) the bit-map and (ii) the symbols whose symbol values differ from the symbol anchor value.
    Type: Grant
    Filed: July 1, 2018
    Date of Patent: November 26, 2019
    Assignee: Habana Labs Ltd.
    Inventor: Shlomo Raikin
  • Patent number: 10491239
    Abstract: A computational device includes an input memory, which receives a first array of input numbers having a first precision represented by N bits. An output memory stores a second array of output numbers having a second precision represented by M bits, M<N. Quantization logic reads the input numbers from the input memory, extracts from each input number a set of M bits, at a bit offset within the input number that is indicated by a quantization factor, and writes a corresponding output number based on the extracted set of bits to the second array in the output memory. A quantization controller sets the quantization factor so as to optimally fit an available range of the output numbers in the second array to an actual range of the input numbers in the first array in extraction of the M bits from the input numbers.
    Type: Grant
    Filed: January 30, 2018
    Date of Patent: November 26, 2019
    Assignee: Habana Labs Ltd.
    Inventor: Itay Hubara