Patents by Inventor Eugenio Culurciello
Eugenio Culurciello has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11861337Abstract: A method of compiling neural network code to executable instructions for execution by a computational acceleration system having a memory circuit and one or more acceleration circuits having a maps data buffer and a kernel data buffer is disclosed, such as for execution by an inference engine circuit architecture which includes a matrix-matrix (MM) accelerator circuit having multiple operating modes to provide a complete matrix multiplication. A representative compiling method includes generating a list of neural network layer model objects; fusing available functions and layers in the list; selecting a cooperative mode, an independent mode, or a combined cooperative and independent mode for execution; selecting a data movement mode and an ordering of computations which reduces usage of the memory circuit; generating an ordered sequence of load objects, compute objects, and store objects; and converting the ordered sequence of load objects, compute objects, and store objects into the executable instructions.Type: GrantFiled: August 26, 2020Date of Patent: January 2, 2024Assignee: Micron Technology, Inc.Inventors: Andre Xian Ming Chang, Aliasger Zaidy, Eugenio Culurciello, Marko Vitez
-
Patent number: 11775313Abstract: An accelerator for processing of a convolutional neural network (CNN) includes a compute core having a plurality of compute units. Each compute unit includes a first memory cache configured to store at least one vector in a map trace, a second memory cache configured to store at least one vector in a kernel trace, and a plurality of vector multiply-accumulate units (vMACs) connected to the first and second memory caches. Each vMAC includes a plurality of multiply-accumulate units (MACs). Each MAC includes a multiplier unit configured to multiply a first word that of the at least one vector in the map trace by a second word of the at least one vector in the kernel trace to produce an intermediate product, and an adder unit that adds the intermediate product to a third word to generate a sum of the intermediate product and the third word.Type: GrantFiled: May 25, 2018Date of Patent: October 3, 2023Assignee: Purdue Research FoundationInventors: Eugenio Culurciello, Vinayak Gokhale, Aliasger Zaidy, Andre Chang
-
Publication number: 20230306272Abstract: An artificial neural network is trained via reinforcement learning to receive first data representative of execution dependency conditions of instructions of a program, second data representative of a schedule of a first portion of the instructions of the program for execution in a device having a plurality of circuits units operable in parallel, and third data identifying a next instruction selected from a second portion of the instructions of the program remaining to be scheduled for execution in the device. The artificial neural network selects a placement of the next instruction in one of the circuit units from a plurality of possible placements of the next instruction in the device. Performance of placements of instructions being tested in search for a valid schedule for running the program in the device can be measured to generate samples to train the artificial neural network via reinforcement learning.Type: ApplicationFiled: March 16, 2023Publication date: September 28, 2023Inventors: Andre Xian Ming Chang, Abhishek Chaurasia, Parth Khopkar, Bashar Romanous, Patrick Alan Estep, Skyler Arron Windh, Eugenio Culurciello, Sheik Dawood Beer Mohideen
-
Patent number: 11675624Abstract: An inference engine circuit architecture is disclosed which includes a matrix-matrix (MM) processor circuit and a MM accelerator circuit having multiple operating modes to provide a complete matrix multiplication. A representative MM accelerator circuit includes a first buffer circuit storing maps data; a first data network; multiple second buffer circuits each storing different kernel data; multiple second, serial data networks, with each coupled to a corresponding second buffer circuit; and a plurality of vector-vector (VV) acceleration circuits arranged in a plurality of arrays. Each VV acceleration circuit includes multiply and accumulate circuits; a shift register; a control multiplexer to provide a selected output, in response to a mode control word, of a bias parameter or a first accumulation sum; and a second adder circuit which adds the multiplicative product to the bias parameter or to the first accumulation sum to generate a second or next accumulation sum.Type: GrantFiled: March 29, 2020Date of Patent: June 13, 2023Assignee: Micron Technology, Inc.Inventors: Aliasger Zaidy, Andre Xian Ming Chang, Eugenio Culurciello
-
Publication number: 20220351503Abstract: A system, method and apparatus to label video images with assistance from an artificial neural network. After a user provides first inputs to label first aspects of an object shown in a first video frame, the artificial neural network infers or predicts second aspects to be labeled for the object in a second video frame. A graphical user interface presents the inferred or predicted second aspects over a display of the second video frame to allow the user to confirm or modify the inference or prediction. For example, an object of interest in the first frame can be labeled with a classification and a bounding box; and the artificial neural network is trained to infer or predict, for the corresponding object in the second frame, its bounding box, classification, and pixels represented of the image of the object in the second frame.Type: ApplicationFiled: April 15, 2022Publication date: November 3, 2022Inventors: Michael Cody Glapa, Abhishek Chaurasia, Eugenio Culurciello
-
Publication number: 20220188632Abstract: Systems, devices, and methods of evolutionary imitation learning are described. For example, a computing system trains an artificial neural network (ANN) using a supervised machine learning technique according to first example data representative of a behavior to be imitated by the ANN in performing a task. The ANN is used to generate first sample data representative of a behavior of the ANN in performing the task. The computing system modifies the first sample data using a technique of evolutionary algorithm to generate second sample data according to a criterion configured to select mutations of the behavior of the ANN. The computing system further trains the ANN according to the second sample data using the supervised machine learning technique.Type: ApplicationFiled: December 16, 2020Publication date: June 16, 2022Inventors: Eugenio Culurciello, Andre Xian Ming Chang
-
Publication number: 20220147811Abstract: Systems, devices, and methods related to a Deep Learning Accelerator and memory are described. For example, an integrated circuit device may be configured to execute instructions with matrix operands and configured with random access memory (RAM). A compiler can identify a plurality of portions of an artificial neural network for implementation on a plurality of such integrated circuit devices respectively. The compiler converts a description of the artificial neural network into a plurality of compiler outputs executable on the plurality of devices to generate an output of the artificial neural network response to an input to the artificial neural network. Intermediate results are communicated among the devices in generating the output of the artificial neural network.Type: ApplicationFiled: November 6, 2020Publication date: May 12, 2022Inventors: Jaime Cummins, Marko Vitez, Eugenio Culurciello, Andre Xian Ming Chang, Aliasger Tayeb Zaidy
-
Publication number: 20220147813Abstract: Systems, devices, and methods related to a Deep Learning Accelerator and memory are described. For example, an integrated circuit device may be configured to execute instructions with matrix operands and configured with random access memory (RAM). A compiler is configured to generate instructions executable by the Deep Learning Accelerator from a description of a target artificial neural network. The instructions may call routines in a runtime library that has an embedded artificial neural network configured to predict optimized execution options available to implement the routines. The prediction is based at least in part on a pattern of data being processed in the target artificial neural network and/or a pattern of usages of the routines by the instructions.Type: ApplicationFiled: November 6, 2020Publication date: May 12, 2022Inventors: Andre Xian Ming Chang, Aliasger Tayeb Zaidy, Marko Vitez, Eugenio Culurciello
-
Publication number: 20220147810Abstract: Systems, devices, and methods related to a Deep Learning Accelerator and memory are described. For example, an integrated circuit device may be configured to execute instructions with matrix operands and configured with random access memory. A computing device running a compiler can interact and/or probe an integrated circuit device to identify hardware characteristics of the integrated circuit device in performing matrix computations. The compiler can generate and optimize a result of compilation from a description of an artificial neural network based at least in part on the hardware characteristics of the integrated circuit device. The result of compilation can include first data representative of parameters of the artificial neural network and second data representative of instructions executable by the integrated circuit device to generate an output of the artificial neural network based on the first data and an input to the artificial neural network.Type: ApplicationFiled: November 6, 2020Publication date: May 12, 2022Inventors: Aliasger Tayeb Zaidy, Marko Vitez, Eugenio Culurciello, Jaime Cummins, Andre Xian Ming Chang
-
Publication number: 20220147808Abstract: Systems, devices, and methods related to a Deep Learning Accelerator and memory are described. For example, an integrated circuit device may be configured to execute instructions with matrix operands and configured with random access memory (RAM). A compiler can convert a description of an artificial neural network into a generic result of compilation according to a specification of a generic Deep Learning Accelerator and then map the first result of compilation into a platform-specific result according to a specification of a specific hardware platform of Deep Learning Accelerators. The platform-specific result can be stored into the RAM of the integrated circuit device to enable the integrated circuit device to autonomously perform the computation of the artificial neural network in generating an output in response to an input to the artificial neural network.Type: ApplicationFiled: November 6, 2020Publication date: May 12, 2022Inventors: Andre Xian Ming Chang, Aliasger Tayeb Zaidy, Eugenio Culurciello, Jaime Cummins, Marko Vitez
-
Publication number: 20220147809Abstract: Systems, devices, and methods related to a Deep Learning Accelerator and memory are described. For example, an integrated circuit device may be configured to execute instructions with matrix operands and configured with random access memory. A compiler can convert a description of an artificial neural network into a compiler output through optimization and/or selection of hardware options of the integrated circuit device. The compiler output can include parameters of the artificial neural network, instructions executable by processing units of the Deep Learning Accelerator to generate an output of the artificial neural network responsive to an input to the artificial neural network, and hardware options to be stored in registers connected to control hardware configurations of the processing units.Type: ApplicationFiled: November 6, 2020Publication date: May 12, 2022Inventors: Aliasger Tayeb Zaidy, Marko Vitez, Eugenio Culurciello, Jaime Cummins, Andre Xian Ming Chang
-
Publication number: 20220147812Abstract: Systems, devices, and methods related to a Deep Learning Accelerator and memory are described. For example, an integrated circuit device may be configured to execute instructions with matrix operands and configured with random access memory (RAM). A compiler has an artificial neural network configured to identify an optimized compilation option for an artificial neural network to be compiled by the compiler and/or for a hardware platform of Deep Learning Accelerators. The artificial neural network of the compiler can be trained via machine learning to identify the optimized compilation option based on the features of the artificial neural network to be compiled and/or features of the hardware platform on which the compiler output will be executed.Type: ApplicationFiled: November 6, 2020Publication date: May 12, 2022Inventors: Andre Xian Ming Chang, Aliasger Tayeb Zaidy, Marko Vitez, Michael Cody Glapa, Abhishek Chaurasia, Eugenio Culurciello
-
Publication number: 20220066760Abstract: A method of compiling neural network code to executable instructions for execution by a computational acceleration system having a memory circuit and one or more acceleration circuits having a maps data buffer and a kernel data buffer is disclosed, such as for execution by an inference engine circuit architecture which includes a matrix-matrix (MM) accelerator circuit having multiple operating modes to provide a complete matrix multiplication. A representative compiling method includes generating a list of neural network layer model objects; fusing available functions and layers in the list; selecting a cooperative mode, an independent mode, or a combined cooperative and independent mode for execution; selecting a data movement mode and an ordering of computations which reduces usage of the memory circuit; generating an ordered sequence of load objects, compute objects, and store objects; and converting the ordered sequence of load objects, compute objects, and store objects into the executable instructions.Type: ApplicationFiled: August 26, 2020Publication date: March 3, 2022Inventors: Andre Xian Ming Chang, Aliasger Zaidy, Eugenio Culurciello, Marko Vitez
-
Publication number: 20210303358Abstract: An inference engine circuit architecture is disclosed which includes a matrix-matrix (MM) processor circuit and a MM accelerator circuit having multiple operating modes to provide a complete matrix multiplication. A representative MM accelerator circuit includes a first buffer circuit storing maps data; a first data network; multiple second buffer circuits each storing different kernel data; multiple second, serial data networks, with each coupled to a corresponding second buffer circuit; and a plurality of vector-vector (VV) acceleration circuits arranged in a plurality of arrays. Each VV acceleration circuit includes multiply and accumulate circuits; a shift register; a control multiplexer to provide a selected output, in response to a mode control word, of a bias parameter or a first accumulation sum; and a second adder circuit which adds the multiplicative product to the bias parameter or to the first accumulation sum to generate a second or next accumulation sum.Type: ApplicationFiled: March 29, 2020Publication date: September 30, 2021Inventors: Aliasger Zaidy, Andre Xian Ming Chang, Eugenio Culurciello
-
Patent number: 10157156Abstract: A coprocessor (PL) is disclosed. The PL includes a memory router, at least one collection block that is configured to transfer data to/from the memory router, each collection block includes a collection router that is configured to i) transfer data to/from the memory router, ii) transfer data to/from at least one collection router of a neighboring collection block, and iii) transfer data to/from blocks within the collection block, and at least one programmable operator that is configured to i) transfer data to/from the collection router, and ii) perform a programmable operation on data received from the collection router.Type: GrantFiled: December 31, 2017Date of Patent: December 18, 2018Assignee: PURDUE RESEARCH FOUNDATIONInventors: Eugenio Culurciello, Berin Eduard Martini, Vinayak Anand Gokhale, Jonghoon Jin, Aysegul Dundar
-
Publication number: 20180341495Abstract: An accelerator for processing of a convolutional neural network (CNN) includes a compute core having a plurality of compute units. Each compute unit includes a first memory cache configured to store at least one vector in a map trace, a second memory cache configured to store at least one vector in a kernel trace, and a plurality of vector multiply-accumulate units (vMACs) connected to the first and second memory caches. Each vMAC includes a plurality of multiply-accumulate units (MACs). Each MAC includes a multiplier unit configured to multiply a first word that of the at least one vector in the map trace by a second word of the at least one vector in the kernel trace to produce an intermediate product, and an adder unit that adds the intermediate product to a third word to generate a sum of the intermediate product and the third word.Type: ApplicationFiled: May 25, 2018Publication date: November 29, 2018Inventors: Eugenio Culurciello, Vinayak Gokhale, Aliasger Zaidy, Andre Chang
-
Publication number: 20180107620Abstract: A coprocessor (PL) is disclosed. The PL includes a memory router, at least one collection block that is configured to transfer data to/from the memory router, each collection block includes a collection router that is configured to i) transfer data to/from the memory router, ii) transfer data to/from at least one collection router of a neighboring collection block, and iii) transfer data to/from blocks within the collection block, and at least one programmable operator that is configured to i) transfer data to/from the collection router, and ii) perform a programmable operation on data received from the collection router.Type: ApplicationFiled: December 31, 2017Publication date: April 19, 2018Applicant: Purdue Research FoundationInventors: Eugenio Culurciello, Berin Eduard Martini, Vinayak Anand Gokhale, Jonghoon Jin, Aysegul Dundar
-
Patent number: 9858220Abstract: A coprocessor (PL) is disclosed. The PL includes a memory router, at least one collection block that is configured to transfer data to/from the memory router, each collection block includes a collection router that is configured to i) transfer data to/from the memory router, ii) transfer data to/from at least one collection router of a neighboring collection block, and iii) transfer data to/from blocks within the collection block, and at least one programmable operator that is configured to i) transfer data to/from the collection router, and ii) perform a programmable operation on data received from the collection router.Type: GrantFiled: March 17, 2015Date of Patent: January 2, 2018Assignee: Purdue Research FoundationInventors: Eugenio Culurciello, Berin Eduard Martini, Vinayak Anand Gokhale, Jonghoon Jin, Aysegul Dundar
-
Publication number: 20150261702Abstract: A coprocessor (PL) is disclosed. The PL includes a memory router, at least one collection block that is configured to transfer data to/from the memory router, each collection block includes a collection router that is configured to i) transfer data to/from the memory router, ii) transfer data to/from at least one collection router of a neighboring collection block, and iii) transfer data to/from blocks within the collection block, and at least one programmable operator that is configured to i) transfer data to/from the collection router, and ii) perform a programmable operation on data received from the collection router.Type: ApplicationFiled: March 17, 2015Publication date: September 17, 2015Applicant: PURDUE RESEARCH FOUNDATIONInventors: Eugenio Culurciello, Berin Eduard Martini, Vinayak Anand Gokhale, Jonghoon Jin, Aysegul Dundar