Patents by Inventor Mohammad Ghasemzadeh
Mohammad Ghasemzadeh has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12230361Abstract: An apparatus includes an in-memory compute circuit that includes a memory circuit configured to generate a set of products by combining received input values with respective weight values stored in rows of the memory circuit, and to combine the set of products to generate an accumulated output value. The in-memory compute circuit may further include a control circuit and a plurality of routing circuits, including a first routing circuit coupled to a first set of rows of the memory circuit. The control circuit may be configured to cause the first routing circuit to route groups of input values to different ones of the first set of rows over a plurality of clock cycles, and the memory circuit to generate, on a clock cycle following the plurality of clock cycles, a particular accumulated output value that is computed based on the routed groups of input values.Type: GrantFiled: July 3, 2023Date of Patent: February 18, 2025Assignee: Apple Inc.Inventors: Paolo Di Febbo, Mohamed H. Abu-Rahma, Jelam K. Parekh, Yildiz Sinangil, Mohammad Ghasemzadeh, Anthony Ghannoum, Chaminda N. Vidanagamachchi
-
Publication number: 20250021808Abstract: Embodiments relate to a neural processor circuit that may include a fetch circuit that fetches coefficient data of a machine learning model from a memory source. The neural processor circuit may also include one or more neural engine circuits that are coupled to the fetch circuit. A neural engine circuit may include a buffer circuit that stores the coefficient data. The neural engine circuit may also include a coefficient organizing circuit that generates at least a first mapping and a second mapping of the stored coefficient data according to one or more control signals. The neural engine may also include a computation circuit that receives and processes at least a portion of input data with the coefficient data as mapped according to the first mapping or process at least the portion of the input data with the coefficient data as mapped according to the second mapping.Type: ApplicationFiled: October 1, 2024Publication date: January 16, 2025Applicant: Apple Inc.Inventors: Waleed ABDULLA, Paolo Di Febbo, Mohammad Ghasemzadeh, Yohan Rajan
-
Patent number: 12141679Abstract: Embodiments relate to a neural processor circuit that may include a fetch circuit that fetches coefficient data of a machine learning model from a memory source. The neural processor circuit may also include one or more neural engine circuits that are coupled to the fetch circuit. A neural engine circuit may include a buffer circuit that stores the coefficient data. The neural engine circuit may also include a coefficient organizing circuit that generates at least a first mapping and a second mapping of the stored coefficient data according to one or more control signals. The neural engine may also include a computation circuit that receives and processes at least a portion of input data with the coefficient data as mapped according to the first mapping or process at least the portion of the input data with the coefficient data as mapped according to the second mapping.Type: GrantFiled: October 7, 2020Date of Patent: November 12, 2024Assignee: APPLE INC.Inventors: Waleed Abdulla, Paolo Di Febbo, Mohammad Ghasemzadeh, Yohan Rajan
-
Publication number: 20240005972Abstract: An apparatus includes an in-memory compute circuit that includes a memory circuit configured to generate a set of products by combining received input values with respective weight values stored in rows of the memory circuit, and to combine the set of products to generate an accumulated output value. The in-memory compute circuit may further include a control circuit and a plurality of routing circuits, including a first routing circuit coupled to a first set of rows of the memory circuit. The control circuit may be configured to cause the first routing circuit to route groups of input values to different ones of the first set of rows over a plurality of clock cycles, and the memory circuit to generate, on a clock cycle following the plurality of clock cycles, a particular accumulated output value that is computed based on the routed groups of input values.Type: ApplicationFiled: July 3, 2023Publication date: January 4, 2024Inventors: Paolo Di Febbo, Mohamed H. Abu-Rahma, Jelam K. Parekh, Yildiz Sinangil, Mohammad Ghasemzadeh, Anthony Ghannoum, Chaminda N. Vidanagamachchi
-
Patent number: 11694733Abstract: An apparatus includes an in-memory compute circuit that includes a memory circuit configured to generate a set of products by combining received input values with respective weight values stored in rows of the memory circuit, and to combine the set of products to generate an accumulated output value. The in-memory compute circuit may further include a control circuit and a plurality of routing circuits, including a first routing circuit coupled to a first set of rows of the memory circuit. The control circuit may be configured to cause the first routing circuit to route groups of input values to different ones of the first set of rows over a plurality of clock cycles, and the memory circuit to generate, on a clock cycle following the plurality of clock cycles, a particular accumulated output value that is computed based on the routed groups of input values.Type: GrantFiled: August 19, 2021Date of Patent: July 4, 2023Assignee: Apple Inc.Inventors: Paolo Di Febbo, Mohamed H. Abu-Rahma, Jelam K. Parekh, Yildiz Sinangil, Mohammad Ghasemzadeh, Anthony Ghannoum, Chaminda N. Vidanagamachchi
-
Patent number: 11651260Abstract: A method for hardware-based machine learning acceleration is provided. The method may include partitioning, into a first batch of data and a second batch of data, an input data received at a hardware accelerator implementing a machine learning model. The input data may be a continuous stream of data samples. The input data may be partitioned based at least on a resource constraint of the hardware accelerator. An update of a probability density function associated with the machine learning model may be performed in real time. The probability density function may be updated by at least processing, by the hardware accelerator, the first batch of data before the second batch of data. An output may be generated based at least on the updated probability density function. The output may include a probability of encountering a data value. Related systems and articles of manufacture, including computer program products, are also provided.Type: GrantFiled: January 31, 2018Date of Patent: May 16, 2023Assignee: The Regents of the University of CaliforniaInventors: Bita Darvish Rouhani, Mohammad Ghasemzadeh, Farinaz Koushanfar
-
Publication number: 20230059200Abstract: An apparatus includes an in-memory compute circuit that includes a memory circuit configured to generate a set of products by combining received input values with respective weight values stored in rows of the memory circuit, and to combine the set of products to generate an accumulated output value. The in-memory compute circuit may further include a control circuit and a plurality of routing circuits, including a first routing circuit coupled to a first set of rows of the memory circuit. The control circuit may be configured to cause the first routing circuit to route groups of input values to different ones of the first set of rows over a plurality of clock cycles, and the memory circuit to generate, on a clock cycle following the plurality of clock cycles, a particular accumulated output value that is computed based on the routed groups of input values.Type: ApplicationFiled: August 19, 2021Publication date: February 23, 2023Inventors: Paolo Di Febbo, Mohamed H. Abu-Rahma, Jelam K. Parekh, Yildiz Sinangil, Mohammad Ghasemzadeh, Anthony Ghannoum, Chaminda N. Vidanagamachchi
-
Patent number: 11386326Abstract: A method may include a transforming a trained machine learning model including by replacing at least one layer of the trained machine learning model with a dictionary matrix and a coefficient matrix. The dictionary matrix and the coefficient matrix may be formed by decomposing a weight matrix associated with the at least one layer of the trained machine learning model. A product of the dictionary matrix and the coefficient matrix may form a reduced-dimension representation of the weight matrix associated with the at least one layer of the trained machine learning model. The transformed machine learning model may be deployed to a client. Related systems and computer program products are also provided.Type: GrantFiled: June 11, 2019Date of Patent: July 12, 2022Assignee: The Regents of the University of CaliforniaInventors: Fang Lin, Mohammad Ghasemzadeh, Bita Darvish Rouhani, Farinaz Koushanfar
-
Publication number: 20220108155Abstract: Embodiments relate to a neural processor circuit that may include a fetch circuit that fetches coefficient data of a machine learning model from a memory source. The neural processor circuit may also include one or more neural engine circuits that are coupled to the fetch circuit. A neural engine circuit may include a buffer circuit that stores the coefficient data. The neural engine circuit may also include a coefficient organizing circuit that generates at least a first mapping and a second mapping of the stored coefficient data according to one or more control signals. The neural engine may also include a computation circuit that receives and processes at least a portion of input data with the coefficient data as mapped according to the first mapping or process at least the portion of the input data with the coefficient data as mapped according to the second mapping.Type: ApplicationFiled: October 7, 2020Publication date: April 7, 2022Inventors: Waleed Abdulla, Paolo Di Febbo, Mohammad Ghasemzadeh, Yohan Rajan
-
Publication number: 20210166106Abstract: A method may include training, based a training dataset, a machine learning model. The machine learning model may include a neuron configured to generate an output by applying, to one or more inputs to the neuron, an activation function. The output of the activation function may be subject to a multi-level binarization function configured to generate an estimate of the output. The estimate of the output may include a first bit providing a first binary representation of the output and a second bit providing a second binary representation of a first residual error associated with the first binary representation of the output. In response to determining that the training of the machine learning model is complete, the trained machine learning model may be deployed to perform a cognitive task. Related systems and articles of manufacture, including computer program products, are also provided.Type: ApplicationFiled: December 12, 2018Publication date: June 3, 2021Inventors: Mohammad Ghasemzadeh, Farinaz Koushanfar, Mohammad Samragh Razlighi
-
Publication number: 20200027016Abstract: A method for hardware-based machine learning acceleration is provided. The method may include partitioning, into a first batch of data and a second batch of data, an input data received at a hardware accelerator implementing a machine learning model. The input data may be a continuous stream of data samples. The input data may be partitioned based at least on a resource constraint of the hardware accelerator. An update of a probability density function associated with the machine learning model may be performed in real time. The probability density function may be updated by at least processing, by the hardware accelerator, the first batch of data before the second batch of data. An output may be generated based at least on the updated probability density function. The output may include a probability of encountering a data value. Related systems and articles of manufacture, including computer program products, are also provided.Type: ApplicationFiled: January 31, 2018Publication date: January 23, 2020Inventors: Bita Darvish Rouhani, Mohammad Ghasemzadeh, Farinaz Koushanfar
-
Publication number: 20190378015Abstract: A method may include a transforming a trained machine learning model including by replacing at least one layer of the trained machine learning model with a dictionary matrix and a coefficient matrix. The dictionary matrix and the coefficient matrix may be formed by decomposing a weight matrix associated with the at least one layer of the trained machine learning model. A product of the dictionary matrix and the coefficient matrix may form a reduced-dimension representation of the weight matrix associated with the at least one layer of the trained machine learning model. The transformed machine learning model may be deployed to a client. Related systems and computer program products are also provided.Type: ApplicationFiled: June 11, 2019Publication date: December 12, 2019Inventors: Fang Lin, Mohammad Ghasemzadeh, Bita Darvish Rouhani, Farinaz Koushanfar