Patents by Inventor Michael Behar

Michael Behar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Compression for deep learning in case of sparse values mapped to non-zero value

Patent number: 11080611

Abstract: Embodiments described herein provide a processing apparatus comprising compute logic to generate neural network data for a convolutional neural network (CNN) and write the neural network data to a memory buffer. The compute logic additionally includes a direct memory access (DMA) controller including a hardware codec having an encode unit and a decode unit, the DMA controller to read the neural network data from the memory buffer, encode the neural network data via the encode unit, write encoded neural network data to a memory device coupled with the processing apparatus, write metadata for the encoded neural network data to the memory device coupled with the processing apparatus, and decode encoded neural network data via the decode unit in response to a request from the compute logic.

Type: Grant

Filed: December 22, 2017

Date of Patent: August 3, 2021

Assignee: Intel Corporation

Inventors: Ajit Singh, Bharat Daga, Michael Behar
SHARED READ - USING A REQUEST TRACKER AS A TEMPORARY READ CACHE

Publication number: 20210200675

Abstract: Disclosed embodiments relate to a shared read request (SRR) using a common request tracker (CRT) as a temporary cache. In one example, a multi-core system includes a memory and a memory controller to receive a SRR from a core when a Leader core is not yet identified, allocate a CRT entry and store the SRR therein, mark it as a Leader, send a read request to a memory address indicated by the SRR, and when read data returns from the memory, store the read data in the CRT entry, send the read data to the Leader core, and await receipt, unless already received, of another SRR from a Follower core, the other SRR having a same address as the SRR, then, send the read data to the Follower core, and deallocate the CRT entry.

Type: Application

Filed: December 26, 2019

Publication date: July 1, 2021

Applicant: Intel Corporation

Inventors: Israel DIAMAND, Ravi K. VENKATESAN, Shlomi SHUA, Oz SHITRIT, Michael BEHAR, Roni ROSNER
MULTIPLY-ACCUMULATE "0" DATA GATING

Publication number: 20210141604

Abstract: In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.

Type: Application

Filed: November 24, 2020

Publication date: May 13, 2021

Applicant: Intel Corporation

Inventors: Yaniv Fais, Tomer Bar-On, Jacob Subag, Jeremie Dreyfuss, Lev Faivishevsky, Michael Behar, Amit Bleiweiss, Guy Jacob, Gal Leibovich, Itamar Ben-Ari, Galina Ryvchin, Eyal Yaacoby
SUB-GRAPH IN FREQUENCY DOMAIN AND DYNAMIC SELECTION OF CONVOLUTION IMPLEMENTATION ON A GPU

Publication number: 20210049804

Abstract: In an example, an apparatus comprises a plurality of execution units; and logic, at least partially including hardware logic, to determine a sub-graph of a network that can be executed in a frequency domain and apply computations in the sub-graph in the frequency domain. Other embodiments are also disclosed and claimed.

Type: Application

Filed: August 28, 2020

Publication date: February 18, 2021

Applicant: Intel Corporation

Inventors: Uzi Sarel, Ehud Cohen, Tomer Schwartz, Amitai Armon, Yahav Shadmiy, Itamar Ben-Ari, Amit Bleiweiss, Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Michael Behar, Guy Jacob, Gal Leibovich, Jeremie Dreyfuss
Multiply-accumulate “0” data gating

Patent number: 10853035

Abstract: In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: March 27, 2020

Date of Patent: December 1, 2020

Assignee: INTEL CORPORATION

Inventors: Yaniv Fais, Tomer Bar-On, Jacob Subag, Jeremie Dreyfuss, Lev Faivishevsky, Michael Behar, Amit Bleiweiss, Guy Jacob, Gal Leibovich, Itamar Ben-Ari, Galina Ryvchin, Eyal Yaacoby
MULTIPLY-ACCUMULATE "0" DATA GATING

Publication number: 20200293282

Abstract: In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.

Type: Application

Filed: March 27, 2020

Publication date: September 17, 2020

Applicant: Intel Corporation

Inventors: YANIV Fais, Tomer Bar-On, Jacob Subag, Jeremie Dreyfuss, Lev Faivishevsky, Michael Behar, Amit Bleiweiss, Guy Jacob, Gal Leibovich, Itamar Ben-Ari, Galina Ryvchin, Eyal Yaacoby
Sub-graph in frequency domain and dynamic selection of convolution implementation on a GPU

Patent number: 10762685

Abstract: In an example, an apparatus comprises a plurality of execution units; and logic, at least partially including hardware logic, to determine a sub-graph of a network that can be executed in a frequency domain and apply computations in the sub-graph in the frequency domain. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: October 31, 2019

Date of Patent: September 1, 2020

Assignee: INTEL CORPORATION

Inventors: Uzi Sarel, Ehud Cohen, Tomer Schwartz, Amitai Armon, Yahav Shadmiy, Itamar Ben-Ari, Amit Bleiweiss, Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Michael Behar, Guy Jacob, Gal Leibovich, Jeremie Dreyfuss
System and method of encoding and decoding feature maps and weights for a convolutional neural network

Patent number: 10726583

Abstract: Embodiments described herein provide a processing apparatus comprising compute logic to generate output feature map data for a convolutional neural network (CNN) and write the feature map data to a memory buffer; a direct memory access (DMA) controller including a feature map encoder, the DMA controller to read the feature map data from the memory buffer, encode the feature map data using one of multiple encode algorithms, and write encoded feature map data to memory coupled with the processing apparatus; and wherein the compute logic is to read the encoded feature map data from the memory in an encoded format and decode the encoded feature map data while reading the encoded feature map data.

Type: Grant

Filed: December 30, 2016

Date of Patent: July 28, 2020

Assignee: INTEL CORPORATION

Inventors: Ajit Singh, Bharat Daga, Oren Agam, Michael Behar, Dmitri Vainbrand
SUB-GRAPH IN FREQUENCY DOMAIN AND DYNAMIC SELECTION OF CONVOLUTION IMPLEMENTATION ON A GPU

Publication number: 20200143579

Abstract: In an example, an apparatus comprises a plurality of execution units; and logic, at least partially including hardware logic, to determine a sub-graph of a network that can be executed in a frequency domain and apply computations in the sub-graph in the frequency domain. Other embodiments are also disclosed and claimed.

Type: Application

Filed: October 31, 2019

Publication date: May 7, 2020

Applicant: Intel Corporation

Inventors: Uzi Sarel, Ehud Cohen, Tomer Schwartz, Amitai Armon, Yahav Shadmiy, Itamar Ben-Ari, Amit Bleiweiss, Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Michael Behar, Guy Jacob, Gal Leibovich, Jeremie Dreyfuss
Multiply-accumulate “0” data gating

Patent number: 10606559

Abstract: In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: June 12, 2019

Date of Patent: March 31, 2020

Assignee: INTEL CORPORATION

Inventors: Yaniv Fais, Tomer Bar-On, Jacob Subag, Jeremie Dreyfuss, Lev Faivishevsky, Michael Behar, Amit Bleiweiss, Guy Jacob, Gal Leibovich, Itamar Ben-Ari, Galina Ryvchin, Eyal Yaacoby
METHODS AND APPARATUS TO ENABLE DYNAMIC PROCESSING OF A PREDEFINED WORKLOAD

Publication number: 20190370076

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed that enable dynamic processing of a predefined workload to one or more computational building blocks of an accelerator. An example apparatus includes an interface to obtain a workload node, the workload node associated with a first amount of data, the workload node to be executed at a first one of the one or more computational building blocks; an analyzer to: determine whether the workload node is a candidate for early termination; and in response to determining that the workload node is a candidate for early termination, set a flag associated with a tile of the first amount of data; and a dispatcher to, in response to the tile being transmitted from the first one of the one or more computational building blocks to a buffer, stop execution of the workload node.

Type: Application

Filed: August 15, 2019

Publication date: December 5, 2019

Inventors: Michael Behar, Oren Agam, Ronen Gabbai, Zigi Walter, Roni Rosner, Moshe Maor
METHODS AND APPARATUS FOR MULTIPLE ASYNCHRONOUS CONSUMERS

Publication number: 20190370074

Abstract: An apparatus includes a communication processor to receive configuration information from a producing compute building block; a credit generator to generate a number of credits for the producing compute building block corresponding to the configuration information, the configuration information including characteristics of a buffer; a source identifier to analyze a returned credit to determine whether the returned credit originates from the producing compute building block or a consuming compute building block; and a duplicator to, when the returned credit originates from the producing compute building block, multiply the returned credit by a first factor, the first factor indicative of a number of consuming compute building blocks identified in the configuration information.

Type: Application

Filed: August 15, 2019

Publication date: December 5, 2019

Inventors: Roni Rosner, Moshe Maor, Michael Behar, Ronen Gabbai, Zigi Walter, Oren Agam
METHODS AND APPARATUS TO ENABLE OUT-OF-ORDER PIPELINED EXECUTION OF STATIC MAPPING OF A WORKLOAD

Publication number: 20190370073

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed that enable out-of-order pipelined execution of static mapping of a workload to one or more computational building blocks of an accelerator. An example apparatus includes an interface to load a first number of credits into memory; a comparator to compare the first number of credits to a threshold number of credits associated with memory availability in a buffer; and a dispatcher to, when the first number of credits meets the threshold number of credits, select a workload node of the workload to be executed at a first one of the one or more computational building blocks.

Type: Application

Filed: August 15, 2019

Publication date: December 5, 2019

Inventors: Michael Behar, Moshe Maor, Ronen Gabbai, Roni Rosner, Zigi Walter, Oren Agam
METHODS AND APPARATUS TO CONFIGURE HETEROGENOUS COMPONENTS IN AN ACCELERATOR

Publication number: 20190370084

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to configure heterogenous components in an accelerator. An example apparatus includes a graph compiler to identify a workload node in a workload and generate a selector for the workload node, and the selector to identify an input condition and an output condition of a compute building block, wherein the graph compiler is to, in response to obtaining the identified input condition and output condition from the selector, map the workload node to the compute building block.

Type: Application

Filed: August 15, 2019

Publication date: December 5, 2019

Inventors: Michael Behar, Moshe Maor, Ronen Gabbai, Roni Rosner, Zigi Walter, Oren Agam
METHODS AND APPARATUS TO IMPLEMENT MULTIPLE INFERENCE COMPUTE ENGINES

Publication number: 20190370209

Abstract: Methods and apparatus to implement multiple inference compute engines are disclosed herein. A disclosed example apparatus includes a first inference compute engine, a second inference compute engine, and an accelerator on coherent fabric to couple the first inference compute engine and the second inference compute engine to a converged coherency fabric of a system-on-chip, the accelerator on coherent fabric to arbitrate requests from the first inference compute engine and the second inference compute engine to utilize a single in-die interconnect port.

Type: Application

Filed: August 15, 2019

Publication date: December 5, 2019

Inventors: Israel Diamand, Roni Rosner, Ravi Venkatesan, Shlomi Shua, Oz Shitrit, Henrietta Bezbroz, Alexander Gendler, Ohad Falik, Zigi Walter, Michael Behar, Shlomi Alkalay
MULTIPLY-ACCUMULATE "0" DATA GATING

Publication number: 20190361674

Abstract: In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.

Type: Application

Filed: June 12, 2019

Publication date: November 28, 2019

Applicant: INTEL CORPORATION

Inventors: YANIV FAIS, TOMER BAR-ON, JACOB SUBAG, JEREMIE DREYFUSS, LEV FAIVISHEVSKY, MICHAEL BEHAR, AMIT BLEIWEISS, GUY JACOB, GAL LEIBOVICH, ITAMAR BEN-ARI, GALINA RYVCHIN, EYAL YAACOBY
Sub-graph in frequency domain and dynamic selection of convolution implementation on a GPU

Patent number: 10467795

Abstract: In an example, an apparatus comprises a plurality of execution units; and logic, at least partially including hardware logic, to determine a sub-graph of a network that can be executed in a frequency domain and apply computations in the sub-graph in the frequency domain. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: April 8, 2017

Date of Patent: November 5, 2019

Assignee: INTEL CORPORATION

Inventors: Uzi Sarel, Ehud Cohen, Tomer Schwartz, Amitai Armon, Yahav Shadmiy, Itamar Ben-Ari, Amit Bleiweiss, Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Michael Behar, Guy Jacob, Gal Leibovich, Jeremie Dreyfuss
Multiply-accumulate “0” data gating

Patent number: 10372416

Abstract: In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: April 28, 2017

Date of Patent: August 6, 2019

Assignee: INTEL CORPORATION

Inventors: Yaniv Fais, Tomer Bar-On, Jacob Subag, Jeremie Dreyfuss, Lev Faivishevsky, Michael Behar, Amit Bleiweiss, Guy Jacob, Gal Leibovich, Itamar Ben-Ari, Galina Ryvchin, Eyal Yaacoby
COMPRESSION FOR DEEP LEARNING IN CASE OF SPARSE VALUES MAPPED TO NON-ZERO VALUE

Publication number: 20190197420

Abstract: Embodiments described herein provide a processing apparatus comprising compute logic to generate neural network data for a convolutional neural network (CNN) and write the neural network data to a memory buffer. The compute logic additionally includes a direct memory access (DMA) controller including a hardware codec having an encode unit and a decode unit, the DMA controller to read the neural network data from the memory buffer, encode the neural network data via the encode unit, write encoded neural network data to a memory device coupled with the processing apparatus, write metadata for the encoded neural network data to the memory device coupled with the processing apparatus, and decode encoded neural network data via the decode unit in response to a request from the compute logic.

Type: Application

Filed: December 22, 2017

Publication date: June 27, 2019

Applicant: Intel Corporation

Inventors: Ajit Singh, Bharat Daga, Michael Behar
INNER PRODUCT CONVOLUTIONAL NEURAL NETWORK ACCELERATOR

Publication number: 20190102671

Abstract: A convolutional neural network (CNN) accelerator, including: a CNN circuit for performing a multiple-layer CNN computation, wherein the multiple layers are to receive an input feature according to an input feature map (IFM) and a weight matrix per output feature, wherein an output of a first layer provides an input for a next layer; and a mapping circuit to access a three-dimensional input matrix stored as a Z-major matrix; wherein the CNN circuit is to perform an inner-product direct convolution on the Z-major matrix, wherein the direct convolution lacks a lowering operation.

Type: Application

Filed: September 29, 2017

Publication date: April 4, 2019

Applicant: Intel Corporation

Inventors: Ehud Cohen, Moshe Maor, Ashutosh Parkhi, Michael Behar, Yaniv Fais

prev 1 2 3 next