Patents by Inventor Michael Behar

Michael Behar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240112033
    Abstract: In an example, an apparatus comprises at least one execution platform; and logic, at least partially including hardware logic, to receive a trained neural network model in a model optimizer and convert the trained neural network model to an optimized model comprising parameters that are fit to the at least one execution platform. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: November 20, 2023
    Publication date: April 4, 2024
    Applicant: Intel Corporation
    Inventors: Amit Bleiweiss, Itamar Ben-Ari, Michael Behar, Guy Jacob, Gal Leibovich, Jacob Subag, Lev Faivishevsky, Yaniv Fais, Tomer Schwartz
  • Publication number: 20240078453
    Abstract: Embodiments described herein provide a processing apparatus comprising compute circuitry to generate neural network data for a convolutional neural network (CNN) and write the neural network data to a memory buffer. The compute circuitry additionally includes a direct memory access (DMA) controller including a hardware codec having encode circuitry and a decode circuitry. The DMA controller reads the neural network data from the memory buffer, encode the neural network data via the encode circuit, writes encoded neural network data to a memory device coupled with the processing apparatus, writes metadata for the encoded neural network data to the memory device coupled with the processing apparatus, and decodes encoded neural network data via the decode circuit in response to a request from the compute circuitry.
    Type: Application
    Filed: September 14, 2023
    Publication date: March 7, 2024
    Applicant: Intel Corporation
    Inventors: Ajit Singh, Bharat Daga, Michael Behar
  • Patent number: 11847497
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed that enable out-of-order pipelined execution of static mapping of a workload to one or more computational building blocks of an accelerator. An example apparatus includes an interface to load a first number of credits into memory; a comparator to compare the first number of credits to a threshold number of credits associated with memory availability in a buffer; and a dispatcher to, when the first number of credits meets the threshold number of credits, select a workload node of the workload to be executed at a first one of the one or more computational building blocks.
    Type: Grant
    Filed: December 23, 2021
    Date of Patent: December 19, 2023
    Assignee: Intel Corporation
    Inventors: Michael Behar, Moshe Maor, Ronen Gabbai, Roni Rosner, Zigi Walter, Oren Agam
  • Publication number: 20230394305
    Abstract: In an example, an apparatus comprises a plurality of execution units comprising and logic, at least partially including hardware logic, to receive a plurality of data inputs for training a neural network, wherein the data inputs comprise training data and weights inputs; represent the data inputs in a first form; and represent the weight inputs in a second form. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: May 30, 2023
    Publication date: December 7, 2023
    Applicant: Intel Corporation
    Inventors: Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Jeremie Dreyfuss, Amit Bleiweiss, Tomer Schwartz, Raanan Yonatan Yehezkel Rohekar, Michael Behar, Amitai Armon, Uzi Sarel
  • Publication number: 20230333913
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to configure heterogenous components in an accelerator. An example apparatus includes a graph compiler to identify a workload node in a workload and generate a selector for the workload node, and the selector to identify an input condition and an output condition of a compute building block, wherein the graph compiler is to, in response to obtaining the identified input condition and output condition from the selector, map the workload node to the compute building block.
    Type: Application
    Filed: April 28, 2023
    Publication date: October 19, 2023
    Inventors: Michael Behar, Moshe Maor, Ronen Gabbai, Roni Rosner, Zigi Walter, Oren Agam
  • Patent number: 11763183
    Abstract: Embodiments described herein provide a processing apparatus comprising compute circuitry to generate neural network data for a convolutional neural network (CNN) and write the neural network data to a memory buffer. The compute circuitry additionally includes a direct memory access (DMA) controller including a hardware codec having encode circuitry and a decode circuitry. The DMA controller reads the neural network data from the memory buffer, encode the neural network data via the encode circuit, writes encoded neural network data to a memory device coupled with the processing apparatus, writes metadata for the encoded neural network data to the memory device coupled with the processing apparatus, and decodes encoded neural network data via the decode circuit in response to a request from the compute circuitry.
    Type: Grant
    Filed: July 30, 2021
    Date of Patent: September 19, 2023
    Assignee: Intel Corporation
    Inventors: Ajit Singh, Bharat Daga, Michael Behar
  • Patent number: 11704564
    Abstract: In an example, an apparatus comprises a plurality of execution units comprising and logic, at least partially including hardware logic, to receive a plurality of data inputs for training a neural network, wherein the data inputs comprise training data and weights inputs; represent the data inputs in a first form; and represent the weight inputs in a second form. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: August 17, 2021
    Date of Patent: July 18, 2023
    Assignee: INTEL CORPORATION
    Inventors: Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Jeremie Dreyfuss, Amit Bleiweiss, Tomer Schwartz, Raanan Yonatan Yehezkel Rohekar, Michael Behar, Amitai Armon, Uzi Sarel
  • Patent number: 11675630
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to configure heterogenous components in an accelerator. An example apparatus includes a graph compiler to identify a workload node in a workload and generate a selector for the workload node, and the selector to identify an input condition and an output condition of a compute building block, wherein the graph compiler is to, in response to obtaining the identified input condition and output condition from the selector, map the workload node to the compute building block.
    Type: Grant
    Filed: August 15, 2019
    Date of Patent: June 13, 2023
    Assignee: INTEL CORPORATION
    Inventors: Michael Behar, Moshe Maor, Ronen Gabbai, Roni Rosner, Zigi Walter, Oren Agam
  • Patent number: 11656846
    Abstract: In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: November 24, 2020
    Date of Patent: May 23, 2023
    Assignee: INTEL CORPORATION
    Inventors: Yaniv Fais, Tomer Bar-On, Jacob Subag, Jeremie Dreyfuss, Lev Faivishevsky, Michael Behar, Amit Bleiweiss, Guy Jacob, Gal Leibovich, Itamar Ben-Ari, Galina Ryvchin, Eyal Yaacoby
  • Patent number: 11600035
    Abstract: In an example, an apparatus comprises a plurality of execution units; and logic, at least partially including hardware logic, to determine a sub-graph of a network that can be executed in a frequency domain and apply computations in the sub-graph in the frequency domain. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: February 10, 2022
    Date of Patent: March 7, 2023
    Assignee: INTEL CORPORATION
    Inventors: Uzi Sarel, Ehud Cohen, Tomer Schwartz, Amitai Armon, Yahav Shadmiy, Itamar Ben-Ari, Amit Bleiweiss, Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Michael Behar, Guy Jacob, Gal Leibovich, Jeremie Dreyfuss
  • Patent number: 11422939
    Abstract: Disclosed embodiments relate to a shared read request (SRR) using a common request tracker (CRT) as a temporary cache. In one example, a multi-core system includes a memory and a memory controller to receive a SRR from a core when a Leader core is not yet identified, allocate a CRT entry and store the SRR therein, mark it as a Leader, send a read request to a memory address indicated by the SRR, and when read data returns from the memory, store the read data in the CRT entry, send the read data to the Leader core, and await receipt, unless already received, of another SRR from a Follower core, the other SRR having a same address as the SRR, then, send the read data to the Follower core, and deallocate the CRT entry.
    Type: Grant
    Filed: December 26, 2019
    Date of Patent: August 23, 2022
    Assignee: Intel Corporation
    Inventors: Israel Diamand, Ravi K. Venkatesan, Shlomi Shua, Oz Shitrit, Michael Behar, Roni Rosner
  • Publication number: 20220237850
    Abstract: In an example, an apparatus comprises a plurality of execution units; and logic, at least partially including hardware logic, to determine a sub-graph of a network that can be executed in a frequency domain and apply computations in the sub-graph in the frequency domain. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: February 10, 2022
    Publication date: July 28, 2022
    Applicant: Intel Corporation
    Inventors: Uzi Sarel, Ehud Cohen, Tomer Schwartz, Amitai Armon, Yahav Shadmiy, Itamar Ben-Ari, Amit Bleiweiss, Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Michael Behar, Guy Jacob, Gal Leibovich, Jeremie Dreyfuss
  • Publication number: 20220197703
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed that enable out-of-order pipelined execution of static mapping of a workload to one or more computational building blocks of an accelerator. An example apparatus includes an interface to load a first number of credits into memory; a comparator to compare the first number of credits to a threshold number of credits associated with memory availability in a buffer; and a dispatcher to, when the first number of credits meets the threshold number of credits, select a workload node of the workload to be executed at a first one of the one or more computational building blocks.
    Type: Application
    Filed: December 23, 2021
    Publication date: June 23, 2022
    Inventors: Michael Behar, Moshe Maor, Ronen Gabbai, Roni Rosner, Zigi Walter, Oren Agam
  • Publication number: 20220076118
    Abstract: In an example, an apparatus comprises a plurality of execution units comprising and logic, at least partially including hardware logic, to receive a plurality of data inputs for training a neural network, wherein the data inputs comprise training data and weights inputs; represent the data inputs in a first form; and represent the weight inputs in a second form. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: August 17, 2021
    Publication date: March 10, 2022
    Applicant: Intel Corporation
    Inventors: Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Jeremie Dreyfuss, Amit Bleiweiss, Tomer Schwartz, Raanan Yonatan Yehezkel Rohekar, Michael Behar, Amitai Armon, Uzi Sarel
  • Publication number: 20220066923
    Abstract: Systems, apparatuses and methods may provide for technology that determines runtime memory requirements of an artificial intelligence (AI) application, defines a remote address range for a plurality of memories based on the runtime memory requirements, wherein each memory in the plurality of memories corresponds to a processor in a plurality of processors, and defines a shared address range for the plurality of memories based on the runtime memory requirements, wherein the shared address range is aliased. In one example, the technology configures memory mapping hardware to access the remote address range in a linear sequence and access the shared address range in a hashed sequence.
    Type: Application
    Filed: November 10, 2021
    Publication date: March 3, 2022
    Inventors: Zigi Walter, Roni Rosner, Michael Behar
  • Patent number: 11250610
    Abstract: In an example, an apparatus comprises a plurality of execution units; and logic, at least partially including hardware logic, to determine a sub-graph of a network that can be executed in a frequency domain and apply computations in the sub-graph in the frequency domain. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: August 28, 2020
    Date of Patent: February 15, 2022
    Assignee: INTEL CORPORATION
    Inventors: Uzi Sarel, Ehud Cohen, Tomer Schwartz, Amitai Armon, Yahav Shadmiy, Itamar Ben-Ari, Amit Bleiweiss, Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Michael Behar, Guy Jacob, Gal Leibovich, Jeremie Dreyfuss
  • Patent number: 11238338
    Abstract: In an example, an apparatus comprises a plurality of execution units comprising and logic, at least partially including hardware logic, to receive a plurality of data inputs for training a neural network, wherein the data inputs comprise training data and weights inputs; represent the data inputs in a first form; and represent the weight inputs in a second form. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: April 24, 2017
    Date of Patent: February 1, 2022
    Assignee: INTEL CORPORATION
    Inventors: Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Jeremie Dreyfuss, Amit Bleiweiss, Tomer Schwartz, Raanan Yonatan Yehezkel Rohekar, Michael Behar, Amital Armon, Uzi Sarel
  • Patent number: 11231963
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed that enable out-of-order pipelined execution of static mapping of a workload to one or more computational building blocks of an accelerator. An example apparatus includes an interface to load a first number of credits into memory; a comparator to compare the first number of credits to a threshold number of credits associated with memory availability in a buffer; and a dispatcher to, when the first number of credits meets the threshold number of credits, select a workload node of the workload to be executed at a first one of the one or more computational building blocks.
    Type: Grant
    Filed: August 15, 2019
    Date of Patent: January 25, 2022
    Assignee: INTEL CORPORATION
    Inventors: Michael Behar, Moshe Maor, Ronen Gabbai, Roni Rosner, Zigi Walter, Oren Agam
  • Publication number: 20210357793
    Abstract: Embodiments described herein provide a processing apparatus comprising compute circuitry to generate neural network data for a convolutional neural network (CNN) and write the neural network data to a memory buffer. The compute circuitry additionally includes a direct memory access (DMA) controller including a hardware codec having encode circuitry and a decode circuitry. The DMA controller reads the neural network data from the memory buffer, encode the neural network data via the encode circuit, writes encoded neural network data to a memory device coupled with the processing apparatus, writes metadata for the encoded neural network data to the memory device coupled with the processing apparatus, and decodes encoded neural network data via the decode circuit in response to a request from the compute circuitry.
    Type: Application
    Filed: July 30, 2021
    Publication date: November 18, 2021
    Applicant: Intel Corporation
    Inventors: Ajit Singh, Bharat Daga, Michael Behar
  • Patent number: 11151074
    Abstract: Methods and apparatus to implement multiple inference compute engines are disclosed herein. A disclosed example apparatus includes a first inference compute engine, a second inference compute engine, and an accelerator on coherent fabric to couple the first inference compute engine and the second inference compute engine to a converged coherency fabric of a system-on-chip, the accelerator on coherent fabric to arbitrate requests from the first inference compute engine and the second inference compute engine to utilize a single in-die interconnect port.
    Type: Grant
    Filed: August 15, 2019
    Date of Patent: October 19, 2021
    Assignee: Intel Corporation
    Inventors: Israel Diamand, Roni Rosner, Ravi Venkatesan, Shlomi Shua, Oz Shitrit, Henrietta Bezbroz, Alexander Gendler, Ohad Falik, Zigi Walter, Michael Behar, Shlomi Alkalay