Patents by Inventor Chen Koren

Chen Koren has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SYSTEMS AND METHODS OF INSTRUCTIONS TO ACCELERATE MULTIPLICATION OF SPARSE MATRICES USING BITMASKS THAT IDENTIFY NON-ZERO ELEMENTS

Publication number: 20240078285

Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.

Type: Application

Filed: November 6, 2023

Publication date: March 7, 2024

Inventors: Dan BAUM, Chen KOREN, Elmoustapha OULD-AHMED-VALL, Michael ESPIG, Christopher J. HUGHES, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Alexander F. HEINECKE
Systems and methods of instructions to accelerate multiplication of sparse matrices using bitmasks that identify non-zero elements

Patent number: 11847185

Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.

Type: Grant

Filed: September 24, 2021

Date of Patent: December 19, 2023

Assignee: Intel Corporation

Inventors: Dan Baum, Chen Koren, Elmoustapha Ould-Ahmed-Vall, Michael Espig, Christopher J. Hughes, Raanan Sade, Robert Valentine, Mark J. Charney, Alexander F. Heinecke
Concurrent compute and ECC for in-memory matrix vector operations

Patent number: 11513893

Abstract: A system includes a compute circuit that preemptively performs a computation on a data word before receiving an indication of data errors from an error checking and correction (ECC) circuit. The ECC circuit reads the data word from a memory array and performs error detection and error correction on the data word. The compute circuit reads the data word and performs the computation on the data word to generate an output value, without waiting for the ECC circuit to check and correct the data word. In response to error detection in the data word by the ECC circuit, the compute circuit delays outputting the output value until correction of the output value in accordance with the error detection by the ECC circuit.

Type: Grant

Filed: December 21, 2020

Date of Patent: November 29, 2022

Assignee: Intel Corporation

Inventors: Somnath Paul, Charles Augustine, Chen Koren, George Shchupak, Muhammad M. Khellah
Ultra-deep compute static random access memory with high compute throughput and multi-directional data propagation

Patent number: 11450672

Abstract: An ultra-deep compute Static Random Access Memory (SRAM) with high compute throughput and multi-directional data transfer capability is provided. Compute units are placed in both horizontal and vertical directions to achieve a symmetric layout while enabling communication between the compute units. An SRAM array supports simultaneous read and write to the left and right section of the same SRAM subarray by duplicating pre-decoding logic inside the SRAM array. This allows applications with non-overlapping read and write address spaces to have twice the bandwidth as compared to a baseline SRAM array.

Type: Grant

Filed: April 27, 2020

Date of Patent: September 20, 2022

Assignee: Intel Corporation

Inventors: Charles Augustine, Somnath Paul, Muhammad M. Khellah, Chen Koren
SYSTEMS AND METHODS OF INSTRUCTIONS TO ACCELERATE MULTIPLICATION OF SPARSE MATRICES USING BITMASKS THAT IDENTIFY NON-ZERO ELEMENTS

Publication number: 20220012305

Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.

Type: Application

Filed: September 24, 2021

Publication date: January 13, 2022

Inventors: Dan BAUM, Chen KOREN, Elmoustapha OULD-AHMED-VALL, Michael ESPIG, Christopher J. HUGHES, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Alexander F. HEINECKE
CONCURRENT COMPUTE AND ECC FOR IN-MEMORY MATRIX VECTOR OPERATIONS

Publication number: 20210109809

Abstract: A system includes a compute circuit that preemptively performs a computation on a data word before receiving an indication of data errors from an error checking and correction (ECC) circuit. The ECC circuit reads the data word from a memory array and performs error detection and error correction on the data word. The compute circuit reads the data word and performs the computation on the data word to generate an output value, without waiting for the ECC circuit to check and correct the data word. In response to error detection in the data word by the ECC circuit, the compute circuit delays outputting the output value until correction of the output value in accordance with the error detection by the ECC circuit.

Type: Application

Filed: December 21, 2020

Publication date: April 15, 2021

Inventors: Somnath Paul, Charles Augustine, Chen Koren, George Shchupak, Muhammad M. Khellah
Apparatus and method for a masked multiply instruction to support neural network pruning operations

Patent number: 10929503

Abstract: An apparatus and method for a masked multiply instruction to support neural network pruning operations. For example, one embodiment of a processor comprises: a decoder to decode a matrix multiplication with masking (GEMM) instruction identifying a destination matrix register to store a result, and source registers storing an A-matrix, a B-matrix, and a matrix mask; execution circuitry to execute the GEMM instruction, the execution circuitry to multiply a plurality of B-matrix elements with a plurality of A-matrix elements, each of the B-matrix elements associated with a mask value in the matrix mask, wherein if the mask value is set to a first value, then the execution circuitry is to multiply the B-matrix element with one or more of the A-matrix elements to generate a first partial result, and if the mask value is set to a second value, then the execution circuitry is to multiply an alternate B-matrix element with a one or more of the A-matrix elements to generate a second partial result.

Type: Grant

Filed: December 21, 2018

Date of Patent: February 23, 2021

Assignee: Intel Corporation

Inventors: Omid Azizi, Chen Koren, Nitin Garegrat
ULTRA-DEEP COMPUTE STATIC RANDOM ACCESS MEMORY WITH HIGH COMPUTE THROUGHPUT AND MULTI-DIRECTIONAL DATA PROPAGATION

Publication number: 20200258890

Abstract: An ultra-deep compute Static Random Access Memory (SRAM) with high compute throughput and multi-directional data transfer capability is provided. Compute units are placed in both horizontal and vertical directions to achieve a symmetric layout while enabling communication between the compute units. An SRAM array supports simultaneous read and write to the left and right section of the same SRAM subarray by duplicating pre-decoding logic inside the SRAM array. This allows applications with non-overlapping read and write address spaces to have twice the bandwidth as compared to a baseline SRAM array.

Type: Application

Filed: April 27, 2020

Publication date: August 13, 2020

Inventors: Charles AUGUSTINE, Somnath PAUL, Muhammad M. KHELLAH, Chen KOREN
SYSTEMS AND METHODS TO ACCELERATE MULTIPLICATION OF SPARSE MATRICES

Publication number: 20200210517

Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.

Type: Application

Filed: December 27, 2018

Publication date: July 2, 2020

Inventors: Dan BAUM, Chen KOREN, Elmoustapha OULD-AHMED-VALL, Michael ESPIG, Christopher J. HUGHES, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Alexander F. HEINECKE
Matrix multiplication acceleration of sparse matrices using column folding and squeezing

Patent number: 10620951

Abstract: Disclosed embodiments relate to sparse matrix multiplication (SMM) acceleration using column folding and squeezing. In one example, a processor, in response to a SMM instruction having fields to specify locations of first, second, and output matrices, the second matrix being a sparse matrix, uses execution circuitry to pack the second matrix by replacing one or more zero-valued elements with non-zero elements yet to be processed, each of the replaced elements further including a field to identify its logical position within the second matrix, and, the execution circuitry further to, for each non-zero element at row M and column K of the specified first matrix, generate a product of the element and each corresponding non-zero element at row K, column N of the packed second matrix, and accumulate each generated product with a previous value of a corresponding element at row M and column N of the specified output matrix.

Type: Grant

Filed: June 22, 2018

Date of Patent: April 14, 2020

Assignee: Intel Corporation

Inventors: Omid Azizi, Guy Boudoukh, Tony Werner, Andrew Yang, Michael Rotzin, Chen Koren, Eriko Nurvitadhi
Accelerator for processing data

Patent number: 10509846

Abstract: An accelerator for increasing the processing speed of a processor. The accelerator operates in two distinct modes. In a first mode for dense layer processing, row data sets and column data sets are sent to a multiplier for multiplication. In a second mode for sparse layer processing compressed row data sets are received by a row multiplexer and compressed column data sets are received by a column multiplexer. Each multiplexer is configured to compare the indexes of data sets with one another to determine matching indexes. When indexes match, the matching data sets are selected and sent to the multiplier for multiplication. When indexes do not match, data sets are stored in memory devices for subsequent cycles.

Type: Grant

Filed: December 13, 2017

Date of Patent: December 17, 2019

Assignee: Intel Corporation

Inventors: Chen Koren, Dan Baum
APPARATUS AND METHOD FOR A MASKED MULTIPLY INSTRUCTION TO SUPPORT NEURAL NETWORK PRUNING OPERATIONS

Publication number: 20190121837

Abstract: An apparatus and method for a masked multiply instruction to support neural network pruning operations. For example, one embodiment of a processor comprises: a decoder to decode a matrix multiplication with masking (GEMM) instruction identifying a destination matrix register to store a result, and source registers storing an A-matrix, a B-matrix, and a matrix mask; execution circuitry to execute the GEMM instruction, the execution circuitry to multiply a plurality of B-matrix elements with a plurality of A-matrix elements, each of the B-matrix elements associated with a mask value in the matrix mask, wherein if the mask value is set to a first value, then the execution circuitry is to multiply the B-matrix element with one or more of the A-matrix elements to generate a first partial result, and if the mask value is set to a second value, then the execution circuitry is to multiply an alternate B-matrix element with a one or more of the A-matrix elements to generate a second partial result.

Type: Application

Filed: December 21, 2018

Publication date: April 25, 2019

Inventors: OMID AZIZI, CHEN KOREN, NITIN GAREGRAT
ACCELERATOR FOR PROCESSING DATA

Publication number: 20190042538

Abstract: An accelerator for increasing the processing speed of a processor. The accelerator operates in two distinct modes. In a first mode for dense layer processing, row data sets and column data sets are sent to a multiplier for multiplication. In a second mode for sparse layer processing compressed row data sets are received by a row multiplexer and compressed column data sets are received by a column multiplexer. Each multiplexer is configured to compare the indexes of data sets with one another to determine matching indexes. When indexes match, the matching data sets are selected and sent to the multiplier for multiplication. When indexes do not match, data sets are stored in memory devices for subsequent cycles.

Type: Application

Filed: December 13, 2017

Publication date: February 7, 2019

Inventors: Chen Koren, Dan Baum
MATRIX MULTIPLICATION ACCELERATION OF SPARSE MATRICES USING COLUMN FOLDING AND SQUEEZING

Publication number: 20190042237

Abstract: Disclosed embodiments relate to sparse matrix multiplication (SMM) acceleration using column folding and squeezing. In one example, a processor, in response to a SMM instruction having fields to specify locations of first, second, and output matrices, the second matrix being a sparse matrix, uses execution circuitry to pack the second matrix by replacing one or more zero-valued elements with non-zero elements yet to be processed, each of the replaced elements further including a field to identify its logical position within the second matrix, and, the execution circuitry further to, for each non-zero element at row M and column K of the specified first matrix, generate a product of the element and each corresponding non-zero element at row K, column N of the packed second matrix, and accumulate each generated product with a previous value of a corresponding element at row M and column N of the specified output matrix.

Type: Application

Filed: June 22, 2018

Publication date: February 7, 2019

Inventors: Omid AZIZI, Guy BOUDOUKH, Tony WERNER, Andrew YANG, Michael ROTZIN, Chen KOREN, Eriko NURVITADHI
Apparatus and method for implement a multi-level memory hierarchy

Patent number: 9448879

Abstract: An apparatus and method are described for detecting and correcting instruction fetch errors within a processor core. For example, in one embodiment, an instruction processing apparatus for detecting and recovering from instruction fetch errors comprises, the instruction processing apparatus performing the operations of: detecting an error associated with an instruction in response to an instruction fetch operation; and determining if the instruction is from a speculative access, wherein if the instruction is not from a speculative access, then responsively performing one or more operations to ensure that the error does not corrupt an architectural state of the processor core.

Type: Grant

Filed: December 22, 2011

Date of Patent: September 20, 2016

Assignee: INTEL CORPORATION

Inventors: Theodros Yigzaw, Oded Lempel, Hisham Shafi, Geeyarpuram N. Santhanakrishnan, Jose A. Vargas, Ganapati N Srinivasa, Mohan J Kumar, Larisa Novakovsky, Lihu Rappoport, Chen Koren, Julius Mandelblat, Michael Mishaeli
Multi-level tracking of in-use state of cache lines

Patent number: 9348591

Abstract: This disclosure includes tracking of in-use states of cache lines to improve throughput of pipelines and thus increase performance of processors. Access data for a number of sets of instructions stored in an instruction cache may be tracked using an in-use array in a first array until the data for one or more of those sets reach a threshold condition. A second array may then be used as the in-use array to track the sets of instructions after a micro-operation is inserted into the pipeline. When the micro-operation retires from the pipeline, the first array may be cleared. The process may repeat after the second array reaches the threshold condition. During the tracking, an in-use state for an instruction line may be detected by inspecting a corresponding bit in each of the arrays. Additional arrays may also be used to track the in-use state.

Type: Grant

Filed: December 29, 2011

Date of Patent: May 24, 2016

Assignee: Intel Corporation

Inventors: Ilhyun Kim, Chen Koren, Alexandre J. Farcy, Robert L. Hinton, Choon Wei Khor, Lihu Rappoport
APPARATUS AND METHOD FOR IMPLEMENT A MULTI-LEVEL MEMORY HIERARCHY

Publication number: 20140298140

Abstract: An apparatus and method are described for detecting and correcting instruction fetch errors within a processor core. For example, in one embodiment, an instruction processing apparatus for detecting and recovering from instruction fetch errors comprises, the instruction processing apparatus performing the operations of: detecting an error associated with an instruction in response to an instruction fetch operation; and determining if the instruction is from a speculative access, wherein if the instruction is not from a speculative access, then responsively performing one or more operations to ensure that the error does not corrupt an architectural state of the processor core.

Type: Application

Filed: December 22, 2011

Publication date: October 2, 2014

Inventors: Theodros Yigzaw, Oded Lempel, Hisham Hafi, Geeyarpuram N Santhanakrisnan, Jose A Vargas, Ganapati N Srinivasa, Mohan J Kumar, Larisa Novakovsky, Lihu Rappoport, Chen Koren, Julius Yuli Mandelblat
Method and apparatus for inclusion of TLB entries in a micro-op cache of a processor

Patent number: 8782374

Abstract: Methods and apparatus for inclusion of TLB (translation look-aside buffer) in processor micro-op caches are disclosed. Some embodiments for inclusion of TLB entries have micro-op cache inclusion fields, which are set responsive to accessing the TLB entry. Inclusion logic may the flush the micro-op cache or portions of the micro-op cache and clear corresponding inclusion fields responsive to a replacement or invalidation of a TLB entry whenever its associated inclusion field had been set. Front-end processor state may also be cleared and instructions refetched when replacement resulted from a TLB miss.

Type: Grant

Filed: December 2, 2008

Date of Patent: July 15, 2014

Assignee: Intel Corporation

Inventors: Lihu Rappoport, Chen Koren, Franck Sala, Oded Lempel, Ido Ouziel, Ron Gabor, Gregory Pribush, Lior Libis
MULTI-LEVEL TRACKING OF IN-USE STATE OF CACHE LINES

Publication number: 20130275733

Abstract: This disclosure includes tracking of in-use states of cache lines to improve throughput of pipelines and thus increase performance of processors. Access data for a number of sets of instructions stored in an instruction cache may be tracked using an in-use array in a first array until the data for one or more of those sets reach a threshold condition. A second array may then be used as the in-use array to track the sets of instructions after a micro-operation is inserted into the pipeline. When the micro-operation retires from the pipeline, the first array may be cleared. The process may repeat after the second array reaches the threshold condition. During the tracking, an in-use state for an instruction line may be detected by inspecting a corresponding bit in each of the arrays. Additional arrays may also be used to track the in-use state.

Type: Application

Filed: December 29, 2011

Publication date: October 17, 2013

Inventors: Ilhyun Kim, Chen Koren, Alexandre J. Farcy, Robert L. Hinton, Choon Wei Khor, Lihu Rappoport
Method and apparatus for pipeline inclusion and instruction restarts in a micro-op cache of a processor

Patent number: 8433850

Abstract: Methods and apparatus for instruction restarts and inclusion in processor micro-op caches are disclosed. Embodiments of micro-op caches have way storage fields to record the instruction-cache ways storing corresponding macroinstructions. Instruction-cache in-use indications associated with the instruction-cache lines storing the instructions are updated upon micro-op cache hits. In-use indications can be located using the recorded instruction-cache ways in micro-op cache lines. Victim-cache deallocation micro-ops are enqueued in a micro-op queue after micro-op cache miss synchronizations, responsive to evictions from the instruction-cache into a victim-cache. Inclusion logic also locates and evicts micro-op cache lines corresponding to the recorded instruction-cache ways, responsive to evictions from the instruction-cache.

Type: Grant

Filed: December 2, 2008

Date of Patent: April 30, 2013

Assignee: Intel Corporation

Inventors: Lihu Rappoport, Chen Koren, Franck Sala, Ilhyun Kim, Lior Libis, Ron Gabor, Oded Lempel

1 2 next