Patents by Inventor Michael Behar

Michael Behar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

INNER PRODUCT CONVOLUTIONAL NEURAL NETWORK ACCELERATOR

Publication number: 20250086445

Abstract: A convolutional neural network (CNN) accelerator, including: a CNN circuit for performing a multiple-layer CNN computation, wherein the multiple layers are to receive an input feature according to an input feature map (IFM) and a weight matrix per output feature, wherein an output of a first layer provides an input for a next layer; and a mapping circuit to access a three-dimensional input matrix stored as a Z-major matrix; wherein the CNN circuit is to perform an inner-product direct convolution on the Z-major matrix, wherein the direct convolution lacks a lowering operation.

Type: Application

Filed: September 18, 2024

Publication date: March 13, 2025

Applicant: Intel Corporation

Inventors: Ehud Cohen, Moshe Maor, Ashutosh Parkhi, Michael Behar, Yaniv Fais
Real time context dependent deep learning

Patent number: 12223427

Abstract: In an example, an apparatus comprises a plurality of execution units comprising and logic, at least partially including hardware logic, to receive a plurality of data inputs for training a neural network, wherein the data inputs comprise training data and weights inputs; represent the data inputs in a first form; and represent the weight inputs in a second form. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: May 30, 2023

Date of Patent: February 11, 2025

Assignee: INTEL CORPORATION

Inventors: Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Jeremie Dreyfuss, Amit Bleiweiss, Tomer Schwartz, Raanan Yonatan Yehezkel Rohekar, Michael Behar, Amitai Armon, Uzi Sarel
Methods and apparatus to configure heterogenous components in an accelerator

Patent number: 12217101

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to configure heterogenous components in an accelerator. An example apparatus includes a graph compiler to identify a workload node in a workload and generate a selector for the workload node, and the selector to identify an input condition and an output condition of a compute building block, wherein the graph compiler is to, in response to obtaining the identified input condition and output condition from the selector, map the workload node to the compute building block.

Type: Grant

Filed: April 28, 2023

Date of Patent: February 4, 2025

Assignee: INTEL CORPORATION

Inventors: Michael Behar, Moshe Maor, Ronen Gabbai, Roni Rosner, Zigi Walter, Oren Agam
Compression for deep learning in case of sparse values mapped to non-zero value

Patent number: 12147914

Abstract: Embodiments described herein provide a processing apparatus comprising compute circuitry to generate neural network data for a convolutional neural network (CNN) and write the neural network data to a memory buffer. The compute circuitry additionally includes a direct memory access (DMA) controller including a hardware codec having encode circuitry and a decode circuitry. The DMA controller reads the neural network data from the memory buffer, encode the neural network data via the encode circuit, writes encoded neural network data to a memory device coupled with the processing apparatus, writes metadata for the encoded neural network data to the memory device coupled with the processing apparatus, and decodes encoded neural network data via the decode circuit in response to a request from the compute circuitry.

Type: Grant

Filed: September 14, 2023

Date of Patent: November 19, 2024

Assignee: Intel Corporation

Inventors: Ajit Singh, Bharat Daga, Michael Behar
Inner product convolutional neural network accelerator

Patent number: 12131250

Abstract: A convolutional neural network (CNN) accelerator, including: a CNN circuit for performing a multiple-layer CNN computation, wherein the multiple layers are to receive an input feature according to an input feature map (IFM) and a weight matrix per output feature, wherein an output of a first layer provides an input for a next layer; and a mapping circuit to access a three-dimensional input matrix stored as a Z-major matrix; wherein the CNN circuit is to perform an inner-product direct convolution on the Z-major matrix, wherein the direct convolution lacks a lowering operation.

Type: Grant

Filed: September 29, 2017

Date of Patent: October 29, 2024

Assignee: Intel Corporation

Inventors: Ehud Cohen, Moshe Maor, Ashutosh Parkhi, Michael Behar, Yaniv Fais
HARDWARE IP OPTIMIZED CONVOLUTIONAL NEURAL NETWORK

Publication number: 20240112033

Abstract: In an example, an apparatus comprises at least one execution platform; and logic, at least partially including hardware logic, to receive a trained neural network model in a model optimizer and convert the trained neural network model to an optimized model comprising parameters that are fit to the at least one execution platform. Other embodiments are also disclosed and claimed.

Type: Application

Filed: November 20, 2023

Publication date: April 4, 2024

Applicant: Intel Corporation

Inventors: Amit Bleiweiss, Itamar Ben-Ari, Michael Behar, Guy Jacob, Gal Leibovich, Jacob Subag, Lev Faivishevsky, Yaniv Fais, Tomer Schwartz
COMPRESSION FOR DEEP LEARNING IN CASE OF SPARSE VALUES MAPPED TO NON-ZERO VALUE

Publication number: 20240078453

Abstract: Embodiments described herein provide a processing apparatus comprising compute circuitry to generate neural network data for a convolutional neural network (CNN) and write the neural network data to a memory buffer. The compute circuitry additionally includes a direct memory access (DMA) controller including a hardware codec having encode circuitry and a decode circuitry. The DMA controller reads the neural network data from the memory buffer, encode the neural network data via the encode circuit, writes encoded neural network data to a memory device coupled with the processing apparatus, writes metadata for the encoded neural network data to the memory device coupled with the processing apparatus, and decodes encoded neural network data via the decode circuit in response to a request from the compute circuitry.

Type: Application

Filed: September 14, 2023

Publication date: March 7, 2024

Applicant: Intel Corporation

Inventors: Ajit Singh, Bharat Daga, Michael Behar
Methods and apparatus to enable out-of-order pipelined execution of static mapping of a workload

Patent number: 11847497

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed that enable out-of-order pipelined execution of static mapping of a workload to one or more computational building blocks of an accelerator. An example apparatus includes an interface to load a first number of credits into memory; a comparator to compare the first number of credits to a threshold number of credits associated with memory availability in a buffer; and a dispatcher to, when the first number of credits meets the threshold number of credits, select a workload node of the workload to be executed at a first one of the one or more computational building blocks.

Type: Grant

Filed: December 23, 2021

Date of Patent: December 19, 2023

Assignee: Intel Corporation

Inventors: Michael Behar, Moshe Maor, Ronen Gabbai, Roni Rosner, Zigi Walter, Oren Agam
REAL TIME CONTEXT DEPENDENT DEEP LEARNING

Publication number: 20230394305

Abstract: In an example, an apparatus comprises a plurality of execution units comprising and logic, at least partially including hardware logic, to receive a plurality of data inputs for training a neural network, wherein the data inputs comprise training data and weights inputs; represent the data inputs in a first form; and represent the weight inputs in a second form. Other embodiments are also disclosed and claimed.

Type: Application

Filed: May 30, 2023

Publication date: December 7, 2023

Applicant: Intel Corporation

Inventors: Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Jeremie Dreyfuss, Amit Bleiweiss, Tomer Schwartz, Raanan Yonatan Yehezkel Rohekar, Michael Behar, Amitai Armon, Uzi Sarel
METHODS AND APPARATUS TO CONFIGURE HETEROGENOUS COMPONENTS IN AN ACCELERATOR

Publication number: 20230333913

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to configure heterogenous components in an accelerator. An example apparatus includes a graph compiler to identify a workload node in a workload and generate a selector for the workload node, and the selector to identify an input condition and an output condition of a compute building block, wherein the graph compiler is to, in response to obtaining the identified input condition and output condition from the selector, map the workload node to the compute building block.

Type: Application

Filed: April 28, 2023

Publication date: October 19, 2023

Inventors: Michael Behar, Moshe Maor, Ronen Gabbai, Roni Rosner, Zigi Walter, Oren Agam
Compression for deep learning in case of sparse values mapped to non-zero value

Patent number: 11763183

Abstract: Embodiments described herein provide a processing apparatus comprising compute circuitry to generate neural network data for a convolutional neural network (CNN) and write the neural network data to a memory buffer. The compute circuitry additionally includes a direct memory access (DMA) controller including a hardware codec having encode circuitry and a decode circuitry. The DMA controller reads the neural network data from the memory buffer, encode the neural network data via the encode circuit, writes encoded neural network data to a memory device coupled with the processing apparatus, writes metadata for the encoded neural network data to the memory device coupled with the processing apparatus, and decodes encoded neural network data via the decode circuit in response to a request from the compute circuitry.

Type: Grant

Filed: July 30, 2021

Date of Patent: September 19, 2023

Assignee: Intel Corporation

Inventors: Ajit Singh, Bharat Daga, Michael Behar
Real time context dependent deep learning

Patent number: 11704564

Abstract: In an example, an apparatus comprises a plurality of execution units comprising and logic, at least partially including hardware logic, to receive a plurality of data inputs for training a neural network, wherein the data inputs comprise training data and weights inputs; represent the data inputs in a first form; and represent the weight inputs in a second form. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: August 17, 2021

Date of Patent: July 18, 2023

Assignee: INTEL CORPORATION

Inventors: Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Jeremie Dreyfuss, Amit Bleiweiss, Tomer Schwartz, Raanan Yonatan Yehezkel Rohekar, Michael Behar, Amitai Armon, Uzi Sarel
Methods and apparatus to configure heterogenous components in an accelerator

Patent number: 11675630

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to configure heterogenous components in an accelerator. An example apparatus includes a graph compiler to identify a workload node in a workload and generate a selector for the workload node, and the selector to identify an input condition and an output condition of a compute building block, wherein the graph compiler is to, in response to obtaining the identified input condition and output condition from the selector, map the workload node to the compute building block.

Type: Grant

Filed: August 15, 2019

Date of Patent: June 13, 2023

Assignee: INTEL CORPORATION

Inventors: Michael Behar, Moshe Maor, Ronen Gabbai, Roni Rosner, Zigi Walter, Oren Agam
Multiply-accumulate “0” data gating

Patent number: 11656846

Abstract: In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: November 24, 2020

Date of Patent: May 23, 2023

Assignee: INTEL CORPORATION

Inventors: Yaniv Fais, Tomer Bar-On, Jacob Subag, Jeremie Dreyfuss, Lev Faivishevsky, Michael Behar, Amit Bleiweiss, Guy Jacob, Gal Leibovich, Itamar Ben-Ari, Galina Ryvchin, Eyal Yaacoby
Sub-graph in frequency domain and dynamic selection of convolution implementation on a GPU

Patent number: 11600035

Abstract: In an example, an apparatus comprises a plurality of execution units; and logic, at least partially including hardware logic, to determine a sub-graph of a network that can be executed in a frequency domain and apply computations in the sub-graph in the frequency domain. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: February 10, 2022

Date of Patent: March 7, 2023

Assignee: INTEL CORPORATION

Inventors: Uzi Sarel, Ehud Cohen, Tomer Schwartz, Amitai Armon, Yahav Shadmiy, Itamar Ben-Ari, Amit Bleiweiss, Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Michael Behar, Guy Jacob, Gal Leibovich, Jeremie Dreyfuss
Shared read—using a request tracker as a temporary read cache

Patent number: 11422939

Abstract: Disclosed embodiments relate to a shared read request (SRR) using a common request tracker (CRT) as a temporary cache. In one example, a multi-core system includes a memory and a memory controller to receive a SRR from a core when a Leader core is not yet identified, allocate a CRT entry and store the SRR therein, mark it as a Leader, send a read request to a memory address indicated by the SRR, and when read data returns from the memory, store the read data in the CRT entry, send the read data to the Leader core, and await receipt, unless already received, of another SRR from a Follower core, the other SRR having a same address as the SRR, then, send the read data to the Follower core, and deallocate the CRT entry.

Type: Grant

Filed: December 26, 2019

Date of Patent: August 23, 2022

Assignee: Intel Corporation

Inventors: Israel Diamand, Ravi K. Venkatesan, Shlomi Shua, Oz Shitrit, Michael Behar, Roni Rosner
SUB-GRAPH IN FREQUENCY DOMAIN AND DYNAMIC SELECTION OF CONVOLUTION IMPLEMENTATION ON A GPU

Publication number: 20220237850

Abstract: In an example, an apparatus comprises a plurality of execution units; and logic, at least partially including hardware logic, to determine a sub-graph of a network that can be executed in a frequency domain and apply computations in the sub-graph in the frequency domain. Other embodiments are also disclosed and claimed.

Type: Application

Filed: February 10, 2022

Publication date: July 28, 2022

Applicant: Intel Corporation

Inventors: Uzi Sarel, Ehud Cohen, Tomer Schwartz, Amitai Armon, Yahav Shadmiy, Itamar Ben-Ari, Amit Bleiweiss, Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Michael Behar, Guy Jacob, Gal Leibovich, Jeremie Dreyfuss
METHODS AND APPARATUS TO ENABLE OUT-OF-ORDER PIPELINED EXECUTION OF STATIC MAPPING OF A WORKLOAD

Publication number: 20220197703

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed that enable out-of-order pipelined execution of static mapping of a workload to one or more computational building blocks of an accelerator. An example apparatus includes an interface to load a first number of credits into memory; a comparator to compare the first number of credits to a threshold number of credits associated with memory availability in a buffer; and a dispatcher to, when the first number of credits meets the threshold number of credits, select a workload node of the workload to be executed at a first one of the one or more computational building blocks.

Type: Application

Filed: December 23, 2021

Publication date: June 23, 2022

Inventors: Michael Behar, Moshe Maor, Ronen Gabbai, Roni Rosner, Zigi Walter, Oren Agam
REAL TIME CONTEXT DEPENDENT DEEP LEARNING

Publication number: 20220076118

Abstract: In an example, an apparatus comprises a plurality of execution units comprising and logic, at least partially including hardware logic, to receive a plurality of data inputs for training a neural network, wherein the data inputs comprise training data and weights inputs; represent the data inputs in a first form; and represent the weight inputs in a second form. Other embodiments are also disclosed and claimed.

Type: Application

Filed: August 17, 2021

Publication date: March 10, 2022

Applicant: Intel Corporation

Inventors: Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Jeremie Dreyfuss, Amit Bleiweiss, Tomer Schwartz, Raanan Yonatan Yehezkel Rohekar, Michael Behar, Amitai Armon, Uzi Sarel
DYNAMICALLY CONFIGURABLE MULTI-MODE MEMORY ALLOCATION IN AN ACCELERATOR MULTI-CORE SYSTEM ON CHIP

Publication number: 20220066923

Abstract: Systems, apparatuses and methods may provide for technology that determines runtime memory requirements of an artificial intelligence (AI) application, defines a remote address range for a plurality of memories based on the runtime memory requirements, wherein each memory in the plurality of memories corresponds to a processor in a plurality of processors, and defines a shared address range for the plurality of memories based on the runtime memory requirements, wherein the shared address range is aliased. In one example, the technology configures memory mapping hardware to access the remote address range in a linear sequence and access the shared address range in a hashed sequence.

Type: Application

Filed: November 10, 2021

Publication date: March 3, 2022

Inventors: Zigi Walter, Roni Rosner, Michael Behar

1 2 3 next