Patents by Inventor Senad DURAKOVIC

Senad DURAKOVIC has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11934863
    Abstract: A system to support a machine learning (ML) operation comprises an array-based inference engine comprising a plurality of processing tiles each comprising at least one or more of an on-chip memory (OCM) configured to maintain data for local access by components in the processing tile and one or more processing units configured to perform one or more computation tasks on the data in the OCM by executing a set of task instructions. The system also comprises a data streaming engine configured to stream data between a memory and the OCMs and an instruction streaming engine configured to distribute said set of task instructions to the corresponding processing tiles to control their operations and to synchronize said set of task instructions to be executed by each processing tile, respectively, to wait current certain task at each processing tile to finish before starting a new one.
    Type: Grant
    Filed: April 22, 2021
    Date of Patent: March 19, 2024
    Assignee: Marvell Asia Pte Ltd
    Inventors: Avinash Sodani, Senad Durakovic, Gopal Nalamalapu
  • Patent number: 11733983
    Abstract: A method includes receiving a high-level function in a high-level code of an application; identifying resources in a hardware to execute a set of low-level instructions that is generated from the high-level function in the high-level code; compiling the high-level function in the high-level code of the application into the set of low-level instructions to be executed on the hardware; and generating a plurality of structured metadata associated with allocation of resources in the hardware to execute the set of low-level instructions.
    Type: Grant
    Filed: September 8, 2022
    Date of Patent: August 22, 2023
    Assignee: Marvell Asia Pte Ltd
    Inventors: Senad Durakovic, Chien-Chun Chou, Ulf Hanebutte, Harri Hakkarainen
  • Publication number: 20230015688
    Abstract: A method includes receiving a high-level function in a high-level code of an application; identifying resources in a hardware to execute a set of low-level instructions that is generated from the high-level function in the high-level code; compiling the high-level function in the high-level code of the application into the set of low-level instructions to be executed on the hardware; and generating a plurality of structured metadata associated with allocation of resources in the hardware to execute the set of low-level instructions.
    Type: Application
    Filed: September 8, 2022
    Publication date: January 19, 2023
    Inventors: Senad Durakovic, Chien-Chun Chou, Ulf Hanebutte, Harri Hakkarainen
  • Publication number: 20230004365
    Abstract: A system includes a compiler including a plurality of compiler blocks. The compiler blocks of the plurality of compiler blocks are compossible. The compiler is configured to identify one or more resources in a hardware to execute a set of low-level instructions that is generated from a high-level function in a high-level code. The compiler is further configured to determine one or more processing operations to be performed that is associated with the high-level function in the high-level code. The determining of the one or more processing operations occurs based on architecture of the hardware. The compiler is configured to compile the high-level function in the high-level code of the application into the set of low-level instructions to be executed on the hardware.
    Type: Application
    Filed: March 2, 2022
    Publication date: January 5, 2023
    Inventors: Ulf Hanebutte, Senad Durakovic, Chien-Chun Chou, Fu-Hwa Wang, Mohana Tandyala
  • Patent number: 11467811
    Abstract: A method includes receiving a high-level function in a high-level code of an application is received. The method also include identifying resources in a hardware to execute a set of low-level instructions that is generated from the high-level function in the high-level code. One or more processing operations are determined to be performed that is associated with the high-level function in the high-level code. The determining of the one or more processing operations occurs based on architecture of the hardware. The high-level function in the high-level code of the application is compiled into the set of low-level instructions to be executed on the hardware. A plurality of structured metadata is generated and includes information associated with the determining resources in the hardware and further includes information associated with the determining one or more processing operations.
    Type: Grant
    Filed: July 30, 2021
    Date of Patent: October 11, 2022
    Assignee: Marvell Asia Pte Ltd
    Inventors: Senad Durakovic, Chien-Chun Chou, Ulf Hanebutte, Harri Hakkarainen
  • Patent number: 11256517
    Abstract: A programmable hardware system for machine learning (ML) includes a core and an inference engine. The core receives commands from a host. The commands are in a first instruction set architecture (ISA) format. The core divides the commands into a first set for performance-critical operations, in the first ISA format, and a second set of performance non-critical operations, in the first ISA format. The core executes the second set to perform the performance non-critical operations of the ML operations and streams the first set to inference engine. The inference engine generates a stream of the first set of commands in a second ISA format based on the first set of commands in the first ISA format. The first set of commands in the second ISA format programs components within the inference engine to execute the ML operations to infer data.
    Type: Grant
    Filed: December 19, 2018
    Date of Patent: February 22, 2022
    Assignee: Marvell Asia Pte Ltd
    Inventors: Avinash Sodani, Ulf Hanebutte, Senad Durakovic, Hamid Reza Ghasemi, Chia-Hsin Chen
  • Patent number: 11086633
    Abstract: A programmable hardware system for machine learning (ML) includes a core and an inference engine. The core receives commands from a host. The commands are in a first instruction set architecture (ISA) format. The core divides the commands into a first set for performance-critical operations, in the first ISA format, and a second set of performance non-critical operations, in the first ISA format. The core executes the second set to perform the performance non-critical operations of the ML operations and streams the first set to inference engine. The inference engine generates a stream of the first set of commands in a second ISA format based on the first set of commands in the first ISA format. The first set of commands in the second ISA format programs components within the inference engine to execute the ML operations to infer data.
    Type: Grant
    Filed: December 19, 2018
    Date of Patent: August 10, 2021
    Assignee: Marvell Asia Pte, Ltd.
    Inventors: Avinash Sodani, Ulf Hanebutte, Senad Durakovic, Hamid Reza Ghasemi, Chia-Hsin Chen
  • Publication number: 20210240521
    Abstract: A system to support a machine learning (ML) operation comprises an array-based inference engine comprising a plurality of processing tiles each comprising at least one or more of an on-chip memory (OCM) configured to maintain data for local access by components in the processing tile and one or more processing units configured to perform one or more computation tasks on the data in the OCM by executing a set of task instructions. The system also comprises a data streaming engine configured to stream data between a memory and the OCMs and an instruction streaming engine configured to distribute said set of task instructions to the corresponding processing tiles to control their operations and to synchronize said set of task instructions to be executed by each processing tile, respectively, to wait current certain task at each processing tile to finish before starting a new one.
    Type: Application
    Filed: April 22, 2021
    Publication date: August 5, 2021
    Inventors: Avinash SODANI, Senad DURAKOVIC, Gopal NALAMALAPU
  • Patent number: 11029963
    Abstract: A processing unit of an inference engine for machine learning (ML) includes a first data load steamer, a second data load streamer, an operator component, and a store streamer. The first data load streamer streams a first data stream from an on-chip memory (OCM) to the operator component. The second data load streamer streams a second data stream from the OCM to the operator component. The operator component performs a matrix operation on the first data stream and the second data stream. The store streamer receives a data output stream from the operator component and to store the data output stream in a buffer.
    Type: Grant
    Filed: December 19, 2018
    Date of Patent: June 8, 2021
    Assignee: Marvell Asia Pte, Ltd.
    Inventors: Avinash Sodani, Ulf Hanebutte, Senad Durakovic, Hamid Reza Ghasemi, Chia-Hsin Chen, Rishan Tan
  • Patent number: 11016801
    Abstract: A system to support a machine learning (ML) operation comprises an array-based inference engine comprising a plurality of processing tiles each comprising at least one or more of an on-chip memory (OCM) configured to maintain data for local access by components in the processing tile and one or more processing units configured to perform one or more computation tasks on the data in the OCM by executing a set of task instructions. The system also comprises a data streaming engine configured to stream data between a memory and the OCMs and an instruction streaming engine configured to distribute said set of task instructions to the corresponding processing tiles to control their operations and to synchronize said set of task instructions to be executed by each processing tile, respectively, to wait current certain task at each processing tile to finish before starting a new one.
    Type: Grant
    Filed: May 22, 2019
    Date of Patent: May 25, 2021
    Assignee: Marvell Asia Pte, Ltd.
    Inventors: Avinash Sodani, Senad Durakovic, Gopal Nalamalapu
  • Patent number: 10970080
    Abstract: A programmable hardware architecture for machine learning (ML) is proposed, which includes at least a host, a memory, a core, a data streaming engine, a instruction-streaming engine, and an interference engine. The core interprets a plurality of ML commands for a ML operation and/or data received from the host and coordinate activities of the engines based on the data in the received ML commands. The instruction-streaming engine translates the ML commands received from the core and provide a set of programming instructions to the data streaming engine and the inference engines based on the translated parameters. The data steaming engine sends one or more data streams to the inference engine in response to the received programming instructions. The inference engine then processes the data streams received from the data stream engine according to the programming instructions received from the instruction-streaming engine.
    Type: Grant
    Filed: November 9, 2018
    Date of Patent: April 6, 2021
    Assignee: Marvell Asia Pte, Ltd.
    Inventors: Avinash Sodani, Chia-Hsin Chen, Ulf R. Hanebutte, Hamid Reza Ghasemi, Senad Durakovic
  • Publication number: 20210055934
    Abstract: An array-based inference engine includes a plurality of processing tiles arranged in a two-dimensional array of a plurality of rows and a plurality of columns. Each processing tile comprises at least one or more of an on-chip memory (OCM) configured to load and maintain data from the input data stream for local access by components in the processing tile and further configured to maintain and output result of the ML operation performed by the processing tile as an output data stream. The array includes a first processing unit (POD) configured to perform a dense and/or regular computation task of the ML operation on the data in the OCM. The array also includes a second processing unit/element (PE) configured to perform a sparse and/or irregular computation task of the ML operation on the data in the OCM and/or from the POD.
    Type: Application
    Filed: October 2, 2020
    Publication date: February 25, 2021
    Inventors: Avinash Sodani, Ulf Hanebutte, Senad Durakovic, Hamid Reza Ghasemi, Chia-Hsin Chen
  • Patent number: 10896045
    Abstract: A processing unit of an inference engine for machine learning (ML) includes a first, a second, and a third register, and a matrix multiplication block. The first register receives a first stream of data associated with a first matrix data that is read only once. The second register receives a second stream of data associated with a second matrix data that is read only once. The matrix multiplication block performs a multiplication operation based on data from the first register and the second register resulting in an output matrix. A row associated with the first matrix is maintained while rows associated with the second matrix is fed to the matrix multiplication block to perform a multiplication operation. The process is repeated for each row of the first matrix. The third register receives the output matrix from the matrix multiplication block and stores the output matrix.
    Type: Grant
    Filed: December 19, 2018
    Date of Patent: January 19, 2021
    Assignee: Marvell Asia Pte, Ltd.
    Inventors: Avinash Sodani, Ulf Hanebutte, Senad Durakovic, Hamid Reza Ghasemi, Chia-Hsin Chen
  • Patent number: 10824433
    Abstract: An array-based inference engine includes a plurality of processing tiles arranged in a two-dimensional array of a plurality of rows and a plurality of columns. Each processing tile comprises at least one or more of an on-chip memory (OCM) configured to load and maintain data from the input data stream for local access by components in the processing tile and further configured to maintain and output result of the ML operation performed by the processing tile as an output data stream. The array includes a first processing unit (POD) configured to perform a dense and/or regular computation task of the ML operation on the data in the OCM. The array also includes a second processing unit/element (PE) configured to perform a sparse and/or irregular computation task of the ML operation on the data in the OCM and/or from the POD.
    Type: Grant
    Filed: December 19, 2018
    Date of Patent: November 3, 2020
    Assignee: Marvell Asia Pte, Ltd.
    Inventors: Avinash Sodani, Ulf Hanebutte, Senad Durakovic, Hamid Reza Ghasemi, Chia-Hsin Chen
  • Publication number: 20190244130
    Abstract: A processing unit of an inference engine for machine learning (ML) includes a first, a second, and a third register, and a matrix multiplication block. The first register receives a first stream of data associated with a first matrix data that is read only once. The second register receives a second stream of data associated with a second matrix data that is read only once. The matrix multiplication block performs a multiplication operation based on data from the first register and the second register resulting in an output matrix. A row associated with the first matrix is maintained while rows associated with the second matrix is fed to the matrix multiplication block to perform a multiplication operation. The process is repeated for each row of the first matrix. The third register receives the output matrix from the matrix multiplication block and stores the output matrix.
    Type: Application
    Filed: December 19, 2018
    Publication date: August 8, 2019
    Inventors: Avinash SODANI, Ulf HANEBUTTE, Senad DURAKOVIC, Hamid Reza GHASEMI, Chia-Hsin CHEN
  • Publication number: 20190244141
    Abstract: A programmable hardware architecture for machine learning (ML) is proposed, which includes at least a host, a memory, a core, a data streaming engine, a instruction-streaming engine, and an interference engine. The core interprets a plurality of ML commands for a ML operation and/or data received from the host and coordinate activities of the engines based on the data in the received ML commands. The instruction-streaming engine translates the ML commands received from the core and provide a set of programming instructions to the data streaming engine and the inference engines based on the translated parameters. The data steaming engine sends one or more data streams to the inference engine in response to the received programming instructions. The inference engine then processes the data streams received from the data stream engine according to the programming instructions received from the instruction-streaming engine.
    Type: Application
    Filed: November 9, 2018
    Publication date: August 8, 2019
    Inventors: Avinash SODANI, Chia-Hsin CHEN, Ulf R. HANEBUTTE, Hamid Reza GHASEMI, Senad DURAKOVIC
  • Publication number: 20190243800
    Abstract: An array-based inference engine includes a plurality of processing tiles arranged in a two-dimensional array of a plurality of rows and a plurality of columns. Each processing tile comprises at least one or more of an on-chip memory (OCM) configured to load and maintain data from the input data stream for local access by components in the processing tile and further configured to maintain and output result of the ML operation performed by the processing tile as an output data stream. The array includes a first processing unit (POD) configured to perform a dense and/or regular computation task of the ML operation on the data in the OCM. The array also includes a second processing unit/element (PE) configured to perform a sparse and/or irregular computation task of the ML operation on the data in the OCM and/or from the POD.
    Type: Application
    Filed: December 19, 2018
    Publication date: August 8, 2019
    Inventors: Avinash SODANI, Ulf HANEBUTTE, Senad DURAKOVIC, Hamid Reza GHASEMI, Chia-Hsin CHEN
  • Publication number: 20190243653
    Abstract: A programmable hardware system for machine learning (ML) includes a core and an inference engine. The core receives commands from a host. The commands are in a first instruction set architecture (ISA) format. The core divides the commands into a first set for performance-critical operations, in the first ISA format, and a second set of performance non-critical operations, in the first ISA format. The core executes the second set to perform the performance non-critical operations of the ML operations and streams the first set to inference engine. The inference engine generates a stream of the first set of commands in a second ISA format based on the first set of commands in the first ISA format. The first set of commands in the second ISA format programs components within the inference engine to execute the ML operations to infer data.
    Type: Application
    Filed: December 19, 2018
    Publication date: August 8, 2019
    Inventors: Avinash SODANI, Ulf HANEBUTTE, Senad DURAKOVIC, Hamid Reza GHASEMI, Chia-Hsin CHEN
  • Publication number: 20190243871
    Abstract: A processing unit of an inference engine for machine learning (ML) includes a first data load steamer, a second data load streamer, an operator component, and a store streamer. The first data load streamer streams a first data stream from an on-chip memory (OCM) to the operator component. The second data load streamer streams a second data stream from the OCM to the operator component. The operator component performs a matrix operation on the first data stream and the second data stream. The store streamer receives a data output stream from the operator component and to store the data output stream in a buffer.
    Type: Application
    Filed: December 19, 2018
    Publication date: August 8, 2019
    Inventors: Avinash SODANI, Ulf HANEBUTTE, Senad DURAKOVIC, Hamid Reza GHASEMI, Chia-Hsin CHEN, Rishan TAN
  • Publication number: 20190244117
    Abstract: A programmable hardware system for machine learning (ML) includes a core and a streaming engine. The core receives a plurality of commands and a plurality of data from a host to be analyzed and inferred via machine learning. The core transmits a first subset of commands of the plurality of commands that is performance-critical operations and associated data thereof of the plurality of data for efficient processing thereof. The first subset of commands and the associated data are passed through via a function call. The streaming engine is coupled to the core and receives the first subset of commands and the associated data from the core. The streaming engine streams a second subset of commands of the first subset of commands and its associated data to an inference engine by executing a single instruction.
    Type: Application
    Filed: December 19, 2018
    Publication date: August 8, 2019
    Inventors: Avinash SODANI, Ulf HANEBUTTE, Senad DURAKOVIC, Hamid Reza GHASEMI, Chia-Hsin CHEN