Patents by Inventor Michael Behar
Michael Behar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240112033Abstract: In an example, an apparatus comprises at least one execution platform; and logic, at least partially including hardware logic, to receive a trained neural network model in a model optimizer and convert the trained neural network model to an optimized model comprising parameters that are fit to the at least one execution platform. Other embodiments are also disclosed and claimed.Type: ApplicationFiled: November 20, 2023Publication date: April 4, 2024Applicant: Intel CorporationInventors: Amit Bleiweiss, Itamar Ben-Ari, Michael Behar, Guy Jacob, Gal Leibovich, Jacob Subag, Lev Faivishevsky, Yaniv Fais, Tomer Schwartz
-
Publication number: 20240078453Abstract: Embodiments described herein provide a processing apparatus comprising compute circuitry to generate neural network data for a convolutional neural network (CNN) and write the neural network data to a memory buffer. The compute circuitry additionally includes a direct memory access (DMA) controller including a hardware codec having encode circuitry and a decode circuitry. The DMA controller reads the neural network data from the memory buffer, encode the neural network data via the encode circuit, writes encoded neural network data to a memory device coupled with the processing apparatus, writes metadata for the encoded neural network data to the memory device coupled with the processing apparatus, and decodes encoded neural network data via the decode circuit in response to a request from the compute circuitry.Type: ApplicationFiled: September 14, 2023Publication date: March 7, 2024Applicant: Intel CorporationInventors: Ajit Singh, Bharat Daga, Michael Behar
-
Patent number: 11847497Abstract: Methods, apparatus, systems and articles of manufacture are disclosed that enable out-of-order pipelined execution of static mapping of a workload to one or more computational building blocks of an accelerator. An example apparatus includes an interface to load a first number of credits into memory; a comparator to compare the first number of credits to a threshold number of credits associated with memory availability in a buffer; and a dispatcher to, when the first number of credits meets the threshold number of credits, select a workload node of the workload to be executed at a first one of the one or more computational building blocks.Type: GrantFiled: December 23, 2021Date of Patent: December 19, 2023Assignee: Intel CorporationInventors: Michael Behar, Moshe Maor, Ronen Gabbai, Roni Rosner, Zigi Walter, Oren Agam
-
Publication number: 20230394305Abstract: In an example, an apparatus comprises a plurality of execution units comprising and logic, at least partially including hardware logic, to receive a plurality of data inputs for training a neural network, wherein the data inputs comprise training data and weights inputs; represent the data inputs in a first form; and represent the weight inputs in a second form. Other embodiments are also disclosed and claimed.Type: ApplicationFiled: May 30, 2023Publication date: December 7, 2023Applicant: Intel CorporationInventors: Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Jeremie Dreyfuss, Amit Bleiweiss, Tomer Schwartz, Raanan Yonatan Yehezkel Rohekar, Michael Behar, Amitai Armon, Uzi Sarel
-
Publication number: 20230333913Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to configure heterogenous components in an accelerator. An example apparatus includes a graph compiler to identify a workload node in a workload and generate a selector for the workload node, and the selector to identify an input condition and an output condition of a compute building block, wherein the graph compiler is to, in response to obtaining the identified input condition and output condition from the selector, map the workload node to the compute building block.Type: ApplicationFiled: April 28, 2023Publication date: October 19, 2023Inventors: Michael Behar, Moshe Maor, Ronen Gabbai, Roni Rosner, Zigi Walter, Oren Agam
-
Patent number: 11763183Abstract: Embodiments described herein provide a processing apparatus comprising compute circuitry to generate neural network data for a convolutional neural network (CNN) and write the neural network data to a memory buffer. The compute circuitry additionally includes a direct memory access (DMA) controller including a hardware codec having encode circuitry and a decode circuitry. The DMA controller reads the neural network data from the memory buffer, encode the neural network data via the encode circuit, writes encoded neural network data to a memory device coupled with the processing apparatus, writes metadata for the encoded neural network data to the memory device coupled with the processing apparatus, and decodes encoded neural network data via the decode circuit in response to a request from the compute circuitry.Type: GrantFiled: July 30, 2021Date of Patent: September 19, 2023Assignee: Intel CorporationInventors: Ajit Singh, Bharat Daga, Michael Behar
-
Patent number: 11704564Abstract: In an example, an apparatus comprises a plurality of execution units comprising and logic, at least partially including hardware logic, to receive a plurality of data inputs for training a neural network, wherein the data inputs comprise training data and weights inputs; represent the data inputs in a first form; and represent the weight inputs in a second form. Other embodiments are also disclosed and claimed.Type: GrantFiled: August 17, 2021Date of Patent: July 18, 2023Assignee: INTEL CORPORATIONInventors: Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Jeremie Dreyfuss, Amit Bleiweiss, Tomer Schwartz, Raanan Yonatan Yehezkel Rohekar, Michael Behar, Amitai Armon, Uzi Sarel
-
Patent number: 11675630Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to configure heterogenous components in an accelerator. An example apparatus includes a graph compiler to identify a workload node in a workload and generate a selector for the workload node, and the selector to identify an input condition and an output condition of a compute building block, wherein the graph compiler is to, in response to obtaining the identified input condition and output condition from the selector, map the workload node to the compute building block.Type: GrantFiled: August 15, 2019Date of Patent: June 13, 2023Assignee: INTEL CORPORATIONInventors: Michael Behar, Moshe Maor, Ronen Gabbai, Roni Rosner, Zigi Walter, Oren Agam
-
Patent number: 11656846Abstract: In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.Type: GrantFiled: November 24, 2020Date of Patent: May 23, 2023Assignee: INTEL CORPORATIONInventors: Yaniv Fais, Tomer Bar-On, Jacob Subag, Jeremie Dreyfuss, Lev Faivishevsky, Michael Behar, Amit Bleiweiss, Guy Jacob, Gal Leibovich, Itamar Ben-Ari, Galina Ryvchin, Eyal Yaacoby
-
Patent number: 11600035Abstract: In an example, an apparatus comprises a plurality of execution units; and logic, at least partially including hardware logic, to determine a sub-graph of a network that can be executed in a frequency domain and apply computations in the sub-graph in the frequency domain. Other embodiments are also disclosed and claimed.Type: GrantFiled: February 10, 2022Date of Patent: March 7, 2023Assignee: INTEL CORPORATIONInventors: Uzi Sarel, Ehud Cohen, Tomer Schwartz, Amitai Armon, Yahav Shadmiy, Itamar Ben-Ari, Amit Bleiweiss, Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Michael Behar, Guy Jacob, Gal Leibovich, Jeremie Dreyfuss
-
Patent number: 11422939Abstract: Disclosed embodiments relate to a shared read request (SRR) using a common request tracker (CRT) as a temporary cache. In one example, a multi-core system includes a memory and a memory controller to receive a SRR from a core when a Leader core is not yet identified, allocate a CRT entry and store the SRR therein, mark it as a Leader, send a read request to a memory address indicated by the SRR, and when read data returns from the memory, store the read data in the CRT entry, send the read data to the Leader core, and await receipt, unless already received, of another SRR from a Follower core, the other SRR having a same address as the SRR, then, send the read data to the Follower core, and deallocate the CRT entry.Type: GrantFiled: December 26, 2019Date of Patent: August 23, 2022Assignee: Intel CorporationInventors: Israel Diamand, Ravi K. Venkatesan, Shlomi Shua, Oz Shitrit, Michael Behar, Roni Rosner
-
Publication number: 20220237850Abstract: In an example, an apparatus comprises a plurality of execution units; and logic, at least partially including hardware logic, to determine a sub-graph of a network that can be executed in a frequency domain and apply computations in the sub-graph in the frequency domain. Other embodiments are also disclosed and claimed.Type: ApplicationFiled: February 10, 2022Publication date: July 28, 2022Applicant: Intel CorporationInventors: Uzi Sarel, Ehud Cohen, Tomer Schwartz, Amitai Armon, Yahav Shadmiy, Itamar Ben-Ari, Amit Bleiweiss, Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Michael Behar, Guy Jacob, Gal Leibovich, Jeremie Dreyfuss
-
Publication number: 20220197703Abstract: Methods, apparatus, systems and articles of manufacture are disclosed that enable out-of-order pipelined execution of static mapping of a workload to one or more computational building blocks of an accelerator. An example apparatus includes an interface to load a first number of credits into memory; a comparator to compare the first number of credits to a threshold number of credits associated with memory availability in a buffer; and a dispatcher to, when the first number of credits meets the threshold number of credits, select a workload node of the workload to be executed at a first one of the one or more computational building blocks.Type: ApplicationFiled: December 23, 2021Publication date: June 23, 2022Inventors: Michael Behar, Moshe Maor, Ronen Gabbai, Roni Rosner, Zigi Walter, Oren Agam
-
Publication number: 20220076118Abstract: In an example, an apparatus comprises a plurality of execution units comprising and logic, at least partially including hardware logic, to receive a plurality of data inputs for training a neural network, wherein the data inputs comprise training data and weights inputs; represent the data inputs in a first form; and represent the weight inputs in a second form. Other embodiments are also disclosed and claimed.Type: ApplicationFiled: August 17, 2021Publication date: March 10, 2022Applicant: Intel CorporationInventors: Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Jeremie Dreyfuss, Amit Bleiweiss, Tomer Schwartz, Raanan Yonatan Yehezkel Rohekar, Michael Behar, Amitai Armon, Uzi Sarel
-
Publication number: 20220066923Abstract: Systems, apparatuses and methods may provide for technology that determines runtime memory requirements of an artificial intelligence (AI) application, defines a remote address range for a plurality of memories based on the runtime memory requirements, wherein each memory in the plurality of memories corresponds to a processor in a plurality of processors, and defines a shared address range for the plurality of memories based on the runtime memory requirements, wherein the shared address range is aliased. In one example, the technology configures memory mapping hardware to access the remote address range in a linear sequence and access the shared address range in a hashed sequence.Type: ApplicationFiled: November 10, 2021Publication date: March 3, 2022Inventors: Zigi Walter, Roni Rosner, Michael Behar
-
Patent number: 11250610Abstract: In an example, an apparatus comprises a plurality of execution units; and logic, at least partially including hardware logic, to determine a sub-graph of a network that can be executed in a frequency domain and apply computations in the sub-graph in the frequency domain. Other embodiments are also disclosed and claimed.Type: GrantFiled: August 28, 2020Date of Patent: February 15, 2022Assignee: INTEL CORPORATIONInventors: Uzi Sarel, Ehud Cohen, Tomer Schwartz, Amitai Armon, Yahav Shadmiy, Itamar Ben-Ari, Amit Bleiweiss, Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Michael Behar, Guy Jacob, Gal Leibovich, Jeremie Dreyfuss
-
Patent number: 11238338Abstract: In an example, an apparatus comprises a plurality of execution units comprising and logic, at least partially including hardware logic, to receive a plurality of data inputs for training a neural network, wherein the data inputs comprise training data and weights inputs; represent the data inputs in a first form; and represent the weight inputs in a second form. Other embodiments are also disclosed and claimed.Type: GrantFiled: April 24, 2017Date of Patent: February 1, 2022Assignee: INTEL CORPORATIONInventors: Lev Faivishevsky, Tomer Bar-On, Yaniv Fais, Jacob Subag, Jeremie Dreyfuss, Amit Bleiweiss, Tomer Schwartz, Raanan Yonatan Yehezkel Rohekar, Michael Behar, Amital Armon, Uzi Sarel
-
Patent number: 11231963Abstract: Methods, apparatus, systems and articles of manufacture are disclosed that enable out-of-order pipelined execution of static mapping of a workload to one or more computational building blocks of an accelerator. An example apparatus includes an interface to load a first number of credits into memory; a comparator to compare the first number of credits to a threshold number of credits associated with memory availability in a buffer; and a dispatcher to, when the first number of credits meets the threshold number of credits, select a workload node of the workload to be executed at a first one of the one or more computational building blocks.Type: GrantFiled: August 15, 2019Date of Patent: January 25, 2022Assignee: INTEL CORPORATIONInventors: Michael Behar, Moshe Maor, Ronen Gabbai, Roni Rosner, Zigi Walter, Oren Agam
-
Publication number: 20210357793Abstract: Embodiments described herein provide a processing apparatus comprising compute circuitry to generate neural network data for a convolutional neural network (CNN) and write the neural network data to a memory buffer. The compute circuitry additionally includes a direct memory access (DMA) controller including a hardware codec having encode circuitry and a decode circuitry. The DMA controller reads the neural network data from the memory buffer, encode the neural network data via the encode circuit, writes encoded neural network data to a memory device coupled with the processing apparatus, writes metadata for the encoded neural network data to the memory device coupled with the processing apparatus, and decodes encoded neural network data via the decode circuit in response to a request from the compute circuitry.Type: ApplicationFiled: July 30, 2021Publication date: November 18, 2021Applicant: Intel CorporationInventors: Ajit Singh, Bharat Daga, Michael Behar
-
Patent number: 11151074Abstract: Methods and apparatus to implement multiple inference compute engines are disclosed herein. A disclosed example apparatus includes a first inference compute engine, a second inference compute engine, and an accelerator on coherent fabric to couple the first inference compute engine and the second inference compute engine to a converged coherency fabric of a system-on-chip, the accelerator on coherent fabric to arbitrate requests from the first inference compute engine and the second inference compute engine to utilize a single in-die interconnect port.Type: GrantFiled: August 15, 2019Date of Patent: October 19, 2021Assignee: Intel CorporationInventors: Israel Diamand, Roni Rosner, Ravi Venkatesan, Shlomi Shua, Oz Shitrit, Henrietta Bezbroz, Alexander Gendler, Ohad Falik, Zigi Walter, Michael Behar, Shlomi Alkalay