Patents by Inventor Michael Behar

Michael Behar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11080611
    Abstract: Embodiments described herein provide a processing apparatus comprising compute logic to generate neural network data for a convolutional neural network (CNN) and write the neural network data to a memory buffer. The compute logic additionally includes a direct memory access (DMA) controller including a hardware codec having an encode unit and a decode unit, the DMA controller to read the neural network data from the memory buffer, encode the neural network data via the encode unit, write encoded neural network data to a memory device coupled with the processing apparatus, write metadata for the encoded neural network data to the memory device coupled with the processing apparatus, and decode encoded neural network data via the decode unit in response to a request from the compute logic.
    Type: Grant
    Filed: December 22, 2017
    Date of Patent: August 3, 2021
    Assignee: Intel Corporation
    Inventors: Ajit Singh, Bharat Daga, Michael Behar
  • Publication number: 20210200675
    Abstract: Disclosed embodiments relate to a shared read request (SRR) using a common request tracker (CRT) as a temporary cache. In one example, a multi-core system includes a memory and a memory controller to receive a SRR from a core when a Leader core is not yet identified, allocate a CRT entry and store the SRR therein, mark it as a Leader, send a read request to a memory address indicated by the SRR, and when read data returns from the memory, store the read data in the CRT entry, send the read data to the Leader core, and await receipt, unless already received, of another SRR from a Follower core, the other SRR having a same address as the SRR, then, send the read data to the Follower core, and deallocate the CRT entry.
    Type: Application
    Filed: December 26, 2019
    Publication date: July 1, 2021
    Applicant: Intel Corporation
    Inventors: Israel DIAMAND, Ravi K. VENKATESAN, Shlomi SHUA, Oz SHITRIT, Michael BEHAR, Roni ROSNER
  • Publication number: 20210141604
    Abstract: In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: November 24, 2020
    Publication date: May 13, 2021
    Applicant: Intel Corporation
    Inventors: Yaniv Fais, Tomer Bar-On, Jacob Subag, Jeremie Dreyfuss, Lev Faivishevsky, Michael Behar, Amit Bleiweiss, Guy Jacob, Gal Leibovich, Itamar Ben-Ari, Galina Ryvchin, Eyal Yaacoby
  • Publication number: 20210049804
    Abstract: In an example, an apparatus comprises a plurality of execution units; and logic, at least partially including hardware logic, to determine a sub-graph of a network that can be executed in a frequency domain and apply computations in the sub-graph in the frequency domain. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: August 28, 2020
    Publication date: February 18, 2021
    Applicant: Intel Corporation
    Inventors: Uzi Sarel, Ehud Cohen, Tomer Schwartz, Amitai Armon, Yahav Shadmiy, Itamar Ben-Ari, Amit Bleiweiss, Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Michael Behar, Guy Jacob, Gal Leibovich, Jeremie Dreyfuss
  • Patent number: 10853035
    Abstract: In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: March 27, 2020
    Date of Patent: December 1, 2020
    Assignee: INTEL CORPORATION
    Inventors: Yaniv Fais, Tomer Bar-On, Jacob Subag, Jeremie Dreyfuss, Lev Faivishevsky, Michael Behar, Amit Bleiweiss, Guy Jacob, Gal Leibovich, Itamar Ben-Ari, Galina Ryvchin, Eyal Yaacoby
  • Publication number: 20200293282
    Abstract: In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: March 27, 2020
    Publication date: September 17, 2020
    Applicant: Intel Corporation
    Inventors: YANIV Fais, Tomer Bar-On, Jacob Subag, Jeremie Dreyfuss, Lev Faivishevsky, Michael Behar, Amit Bleiweiss, Guy Jacob, Gal Leibovich, Itamar Ben-Ari, Galina Ryvchin, Eyal Yaacoby
  • Patent number: 10762685
    Abstract: In an example, an apparatus comprises a plurality of execution units; and logic, at least partially including hardware logic, to determine a sub-graph of a network that can be executed in a frequency domain and apply computations in the sub-graph in the frequency domain. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: October 31, 2019
    Date of Patent: September 1, 2020
    Assignee: INTEL CORPORATION
    Inventors: Uzi Sarel, Ehud Cohen, Tomer Schwartz, Amitai Armon, Yahav Shadmiy, Itamar Ben-Ari, Amit Bleiweiss, Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Michael Behar, Guy Jacob, Gal Leibovich, Jeremie Dreyfuss
  • Patent number: 10726583
    Abstract: Embodiments described herein provide a processing apparatus comprising compute logic to generate output feature map data for a convolutional neural network (CNN) and write the feature map data to a memory buffer; a direct memory access (DMA) controller including a feature map encoder, the DMA controller to read the feature map data from the memory buffer, encode the feature map data using one of multiple encode algorithms, and write encoded feature map data to memory coupled with the processing apparatus; and wherein the compute logic is to read the encoded feature map data from the memory in an encoded format and decode the encoded feature map data while reading the encoded feature map data.
    Type: Grant
    Filed: December 30, 2016
    Date of Patent: July 28, 2020
    Assignee: INTEL CORPORATION
    Inventors: Ajit Singh, Bharat Daga, Oren Agam, Michael Behar, Dmitri Vainbrand
  • Publication number: 20200143579
    Abstract: In an example, an apparatus comprises a plurality of execution units; and logic, at least partially including hardware logic, to determine a sub-graph of a network that can be executed in a frequency domain and apply computations in the sub-graph in the frequency domain. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: October 31, 2019
    Publication date: May 7, 2020
    Applicant: Intel Corporation
    Inventors: Uzi Sarel, Ehud Cohen, Tomer Schwartz, Amitai Armon, Yahav Shadmiy, Itamar Ben-Ari, Amit Bleiweiss, Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Michael Behar, Guy Jacob, Gal Leibovich, Jeremie Dreyfuss
  • Patent number: 10606559
    Abstract: In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: June 12, 2019
    Date of Patent: March 31, 2020
    Assignee: INTEL CORPORATION
    Inventors: Yaniv Fais, Tomer Bar-On, Jacob Subag, Jeremie Dreyfuss, Lev Faivishevsky, Michael Behar, Amit Bleiweiss, Guy Jacob, Gal Leibovich, Itamar Ben-Ari, Galina Ryvchin, Eyal Yaacoby
  • Publication number: 20190370076
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed that enable dynamic processing of a predefined workload to one or more computational building blocks of an accelerator. An example apparatus includes an interface to obtain a workload node, the workload node associated with a first amount of data, the workload node to be executed at a first one of the one or more computational building blocks; an analyzer to: determine whether the workload node is a candidate for early termination; and in response to determining that the workload node is a candidate for early termination, set a flag associated with a tile of the first amount of data; and a dispatcher to, in response to the tile being transmitted from the first one of the one or more computational building blocks to a buffer, stop execution of the workload node.
    Type: Application
    Filed: August 15, 2019
    Publication date: December 5, 2019
    Inventors: Michael Behar, Oren Agam, Ronen Gabbai, Zigi Walter, Roni Rosner, Moshe Maor
  • Publication number: 20190370074
    Abstract: An apparatus includes a communication processor to receive configuration information from a producing compute building block; a credit generator to generate a number of credits for the producing compute building block corresponding to the configuration information, the configuration information including characteristics of a buffer; a source identifier to analyze a returned credit to determine whether the returned credit originates from the producing compute building block or a consuming compute building block; and a duplicator to, when the returned credit originates from the producing compute building block, multiply the returned credit by a first factor, the first factor indicative of a number of consuming compute building blocks identified in the configuration information.
    Type: Application
    Filed: August 15, 2019
    Publication date: December 5, 2019
    Inventors: Roni Rosner, Moshe Maor, Michael Behar, Ronen Gabbai, Zigi Walter, Oren Agam
  • Publication number: 20190370073
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed that enable out-of-order pipelined execution of static mapping of a workload to one or more computational building blocks of an accelerator. An example apparatus includes an interface to load a first number of credits into memory; a comparator to compare the first number of credits to a threshold number of credits associated with memory availability in a buffer; and a dispatcher to, when the first number of credits meets the threshold number of credits, select a workload node of the workload to be executed at a first one of the one or more computational building blocks.
    Type: Application
    Filed: August 15, 2019
    Publication date: December 5, 2019
    Inventors: Michael Behar, Moshe Maor, Ronen Gabbai, Roni Rosner, Zigi Walter, Oren Agam
  • Publication number: 20190370084
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to configure heterogenous components in an accelerator. An example apparatus includes a graph compiler to identify a workload node in a workload and generate a selector for the workload node, and the selector to identify an input condition and an output condition of a compute building block, wherein the graph compiler is to, in response to obtaining the identified input condition and output condition from the selector, map the workload node to the compute building block.
    Type: Application
    Filed: August 15, 2019
    Publication date: December 5, 2019
    Inventors: Michael Behar, Moshe Maor, Ronen Gabbai, Roni Rosner, Zigi Walter, Oren Agam
  • Publication number: 20190370209
    Abstract: Methods and apparatus to implement multiple inference compute engines are disclosed herein. A disclosed example apparatus includes a first inference compute engine, a second inference compute engine, and an accelerator on coherent fabric to couple the first inference compute engine and the second inference compute engine to a converged coherency fabric of a system-on-chip, the accelerator on coherent fabric to arbitrate requests from the first inference compute engine and the second inference compute engine to utilize a single in-die interconnect port.
    Type: Application
    Filed: August 15, 2019
    Publication date: December 5, 2019
    Inventors: Israel Diamand, Roni Rosner, Ravi Venkatesan, Shlomi Shua, Oz Shitrit, Henrietta Bezbroz, Alexander Gendler, Ohad Falik, Zigi Walter, Michael Behar, Shlomi Alkalay
  • Publication number: 20190361674
    Abstract: In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: June 12, 2019
    Publication date: November 28, 2019
    Applicant: INTEL CORPORATION
    Inventors: YANIV FAIS, TOMER BAR-ON, JACOB SUBAG, JEREMIE DREYFUSS, LEV FAIVISHEVSKY, MICHAEL BEHAR, AMIT BLEIWEISS, GUY JACOB, GAL LEIBOVICH, ITAMAR BEN-ARI, GALINA RYVCHIN, EYAL YAACOBY
  • Patent number: 10467795
    Abstract: In an example, an apparatus comprises a plurality of execution units; and logic, at least partially including hardware logic, to determine a sub-graph of a network that can be executed in a frequency domain and apply computations in the sub-graph in the frequency domain. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: April 8, 2017
    Date of Patent: November 5, 2019
    Assignee: INTEL CORPORATION
    Inventors: Uzi Sarel, Ehud Cohen, Tomer Schwartz, Amitai Armon, Yahav Shadmiy, Itamar Ben-Ari, Amit Bleiweiss, Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Michael Behar, Guy Jacob, Gal Leibovich, Jeremie Dreyfuss
  • Patent number: 10372416
    Abstract: In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: April 28, 2017
    Date of Patent: August 6, 2019
    Assignee: INTEL CORPORATION
    Inventors: Yaniv Fais, Tomer Bar-On, Jacob Subag, Jeremie Dreyfuss, Lev Faivishevsky, Michael Behar, Amit Bleiweiss, Guy Jacob, Gal Leibovich, Itamar Ben-Ari, Galina Ryvchin, Eyal Yaacoby
  • Publication number: 20190197420
    Abstract: Embodiments described herein provide a processing apparatus comprising compute logic to generate neural network data for a convolutional neural network (CNN) and write the neural network data to a memory buffer. The compute logic additionally includes a direct memory access (DMA) controller including a hardware codec having an encode unit and a decode unit, the DMA controller to read the neural network data from the memory buffer, encode the neural network data via the encode unit, write encoded neural network data to a memory device coupled with the processing apparatus, write metadata for the encoded neural network data to the memory device coupled with the processing apparatus, and decode encoded neural network data via the decode unit in response to a request from the compute logic.
    Type: Application
    Filed: December 22, 2017
    Publication date: June 27, 2019
    Applicant: Intel Corporation
    Inventors: Ajit Singh, Bharat Daga, Michael Behar
  • Publication number: 20190102671
    Abstract: A convolutional neural network (CNN) accelerator, including: a CNN circuit for performing a multiple-layer CNN computation, wherein the multiple layers are to receive an input feature according to an input feature map (IFM) and a weight matrix per output feature, wherein an output of a first layer provides an input for a next layer; and a mapping circuit to access a three-dimensional input matrix stored as a Z-major matrix; wherein the CNN circuit is to perform an inner-product direct convolution on the Z-major matrix, wherein the direct convolution lacks a lowering operation.
    Type: Application
    Filed: September 29, 2017
    Publication date: April 4, 2019
    Applicant: Intel Corporation
    Inventors: Ehud Cohen, Moshe Maor, Ashutosh Parkhi, Michael Behar, Yaniv Fais