Patents by Inventor Senad DURAKOVIC

Senad DURAKOVIC has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

METHOD AND SYSTEM FOR IN-LINE DATA CONVERSION AND DATA MANIPULATION

Publication number: 20250111214

Abstract: A machine learning (ML) hardware includes a first data format conversion block configured to receive data generated by an application source in a first data format. The first data format conversion block is configured to convert the received data from the first data format into a second data format. The first data format is different from the second data format. The ML hardware includes a plurality of processing units configured to perform one or more ML operations on the data in the second data format to generate a processed data. The ML hardware includes a second data format conversion block configured to convert the processed data to a third data format. The ML hardware further includes a transmitting component configured to output the processed data in the third data format to a memory component for use by an application destination.

Type: Application

Filed: October 27, 2023

Publication date: April 3, 2025

Inventors: Ulf Hanebutte, Senad Durakovic, Harri Hakkarainen, Derek Chickles, Shivah Shankar Narayan Rao
Method and apparatus for ML graphs by a compiler

Patent number: 12190086

Abstract: A system and method for splitting a machine learning (ML) graph is disclosed. The system includes a compiler configured to receive an ML model. The compiler generates a graph associated with the ML model, wherein the graph is an internal representation of the ML model. The graph is partitioned into a first subgraph and a second subgraph. The first subgraph is associated with an ML hardware, an ML emulator, or a combination thereof, and the second subgraph is associated with a processor different from the ML hardware. A set of low-level instructions associated with the first subgraph is generated. One or more resources in the ML hardware is identified to execute the set of low-level instructions associated with the first subgraph.

Type: Grant

Filed: May 18, 2022

Date of Patent: January 7, 2025

Assignee: Marvell Asia Pte Ltd

Inventors: Ulf Hanebutte, Chien-Chun Chou, Senad Durakovic, Pranav Jonnalagadda
Method and apparatus for correlating high-level code with low-level instructions for machine learning applications

Patent number: 12174727

Abstract: A new approach is proposed to support correlating high-level code with low-level instructions of an application running on a hardware. A compiler that compiles a high-level function in the high-level code of the application into a set of low-level instructions to be executed on the hardware is configured to utilize one or more reserved fields of the set of low-level instructions to incorporate one or more IDs and an actionable item. The IDs are mapped to the high-level function, wherein such mapping is programmable by the compiler. Based on the mapped IDs and the actionable item incorporated in the set of the low-level instructions, the runtime performance of the application on the hardware can be monitored and profiled and issues related to the high-level code of the application can be identified for debugging purposes.

Type: Grant

Filed: July 30, 2021

Date of Patent: December 24, 2024

Assignee: Marvell Asia Pte Ltd

Inventors: Ulf Hanebutte, Harri Hakkarainen, Senad Durakovic, Chien-Chun Chou
Instruction set architecture (ISA) format for multiple instruction set architectures in machine learning inference engine

Patent number: 12169719

Abstract: A programmable hardware system for machine learning (ML) operations includes a core and an inference engine. The core receives commands from a host. The commands are in a first instruction set architecture (ISA) format. The core divides the commands into a first set for performance-critical operations, in the first ISA format, and a second set of performance non-critical operations, in the first ISA format. The core executes the second set to perform the performance non-critical operations of the ML operations and streams the first set to inference engine. The inference engine generates a stream of the first set of commands in a second ISA format based on the first set of commands in the first ISA format. The first set of commands in the second ISA format programs components within the inference engine to execute the ML operations to infer data.

Type: Grant

Filed: January 6, 2021

Date of Patent: December 17, 2024

Assignee: Marvell Asia Pte Ltd

Inventors: Avinash Sodani, Ulf Hanebutte, Senad Durakovic, Hamid Reza Ghasemi, Chia-Hsin Chen
Method and system to expand accessible on-chip memory (OCM) of an inference engine

Patent number: 12124827

Abstract: A method includes determining that an amount of data external to an inference engine to be transmitted for local storage/processing by a first processing tile exceeds an available space at a first OCM of the first processing tile; receiving a first portion of the data at the first processing tile; transmitting the first portion of the data to a second OCM of a second processing tile for temporary local storage (the second processing tile is within the inference engine); receiving and storing a second portion of the data at the first OCM; processing the second portion of the data at the first processing tile by at least a first processing element; receiving and storing the first portion of the data at the first OCM of the first processing tile from the second processing tile prior to the first portion of data is needed by the first processing tile.

Type: Grant

Filed: October 14, 2022

Date of Patent: October 22, 2024

Assignee: Marvell Asia Pte Ltd

Inventors: Ulf Hanebutte, Senad Durakovic, Mohana Tandyala
Streaming engine for machine learning architecture

Patent number: 12112174

Abstract: A programmable hardware system for machine learning (ML) includes a core and a streaming engine. The core receives a plurality of commands and a plurality of data from a host to be analyzed and inferred via machine learning. The core transmits a first subset of commands of the plurality of commands that is performance-critical operations and associated data thereof of the plurality of data for efficient processing thereof. The first subset of commands and the associated data are passed through via a function call. The streaming engine is coupled to the core and receives the first subset of commands and the associated data from the core. The streaming engine streams a second subset of commands of the first subset of commands and its associated data to an inference engine by executing a single instruction.

Type: Grant

Filed: December 19, 2018

Date of Patent: October 8, 2024

Assignee: Marvell Asia Pte Ltd

Inventors: Avinash Sodani, Ulf Hanebutte, Senad Durakovic, Hamid Reza Ghasemi, Chia-Hsin Chen
Architecture to support color scheme-based synchronization for machine learning

Patent number: 11995463

Abstract: A system to support a machine learning (ML) operation comprises an array-based inference engine comprising a plurality of processing tiles each comprising at least one or more of an on-chip memory (OCM) configured to maintain data for local access by components in the processing tile and one or more processing units configured to perform one or more computation tasks on the data in the OCM by executing a set of task instructions. The system also comprises a data streaming engine configured to stream data between a memory and the OCMs and an instruction streaming engine configured to distribute said set of task instructions to the corresponding processing tiles to control their operations and to synchronize said set of task instructions to be executed by each processing tile, respectively, to wait current certain task at each processing tile to finish before starting a new one.

Type: Grant

Filed: April 22, 2021

Date of Patent: May 28, 2024

Assignee: Marvell Asia Pte Ltd

Inventors: Avinash Sodani, Senad Durakovic, Gopal Nalamalapu
Method and apparatus for compiler and low-level instruction validation of machine learning operations on hardware

Patent number: 11977475

Abstract: A system to support validation and debugging of compiled low-level instructions for a machine learning (ML) network model on an ML-specific hardware. A compiler identifies well-defined boundaries in the ML network model based on primitives used to generate low-level instructions for the hardware. The ML network model is partitioned into units/layers/sub-graphs based on the plurality of well-defined boundaries. The compiler then generates an internal representation for each of the units wherein the internal representation is mapped to components in the hardware. Each of the units is compiled into a first set to be executed on the ML-specific hardware and a second set to be executed on a second computing device. The output results from executing the two sets of low-level instructions are compared to validate the first set of low-level instructions. If the outputs do not match fully, the first set of low-level instructions is debugged and recompiled.

Type: Grant

Filed: March 2, 2022

Date of Patent: May 7, 2024

Assignee: Marvell Asia Pte Ltd

Inventors: Chien-Chun Chou, Senad Durakovic, Ulf Hanebutte, Harri Hakkarainen, Yao Chou, Veena Karthikeyan
Architecture to support color scheme-based synchronization for machine learning

Patent number: 11934863

Abstract: A system to support a machine learning (ML) operation comprises an array-based inference engine comprising a plurality of processing tiles each comprising at least one or more of an on-chip memory (OCM) configured to maintain data for local access by components in the processing tile and one or more processing units configured to perform one or more computation tasks on the data in the OCM by executing a set of task instructions. The system also comprises a data streaming engine configured to stream data between a memory and the OCMs and an instruction streaming engine configured to distribute said set of task instructions to the corresponding processing tiles to control their operations and to synchronize said set of task instructions to be executed by each processing tile, respectively, to wait current certain task at each processing tile to finish before starting a new one.

Type: Grant

Filed: April 22, 2021

Date of Patent: March 19, 2024

Assignee: Marvell Asia Pte Ltd

Inventors: Avinash Sodani, Senad Durakovic, Gopal Nalamalapu
Method and apparatus for generating metadata by a compiler

Patent number: 11733983

Abstract: A method includes receiving a high-level function in a high-level code of an application; identifying resources in a hardware to execute a set of low-level instructions that is generated from the high-level function in the high-level code; compiling the high-level function in the high-level code of the application into the set of low-level instructions to be executed on the hardware; and generating a plurality of structured metadata associated with allocation of resources in the hardware to execute the set of low-level instructions.

Type: Grant

Filed: September 8, 2022

Date of Patent: August 22, 2023

Assignee: Marvell Asia Pte Ltd

Inventors: Senad Durakovic, Chien-Chun Chou, Ulf Hanebutte, Harri Hakkarainen
METHOD AND APPARATUS FOR GENERATING METADATA BY A COMPILER

Publication number: 20230015688

Abstract: A method includes receiving a high-level function in a high-level code of an application; identifying resources in a hardware to execute a set of low-level instructions that is generated from the high-level function in the high-level code; compiling the high-level function in the high-level code of the application into the set of low-level instructions to be executed on the hardware; and generating a plurality of structured metadata associated with allocation of resources in the hardware to execute the set of low-level instructions.

Type: Application

Filed: September 8, 2022

Publication date: January 19, 2023

Inventors: Senad Durakovic, Chien-Chun Chou, Ulf Hanebutte, Harri Hakkarainen
MULTISTAGE COMPILER ARCHITECTURE

Publication number: 20230004365

Abstract: A system includes a compiler including a plurality of compiler blocks. The compiler blocks of the plurality of compiler blocks are compossible. The compiler is configured to identify one or more resources in a hardware to execute a set of low-level instructions that is generated from a high-level function in a high-level code. The compiler is further configured to determine one or more processing operations to be performed that is associated with the high-level function in the high-level code. The determining of the one or more processing operations occurs based on architecture of the hardware. The compiler is configured to compile the high-level function in the high-level code of the application into the set of low-level instructions to be executed on the hardware.

Type: Application

Filed: March 2, 2022

Publication date: January 5, 2023

Inventors: Ulf Hanebutte, Senad Durakovic, Chien-Chun Chou, Fu-Hwa Wang, Mohana Tandyala
Method and apparatus for generating metadata by a compiler

Patent number: 11467811

Abstract: A method includes receiving a high-level function in a high-level code of an application is received. The method also include identifying resources in a hardware to execute a set of low-level instructions that is generated from the high-level function in the high-level code. One or more processing operations are determined to be performed that is associated with the high-level function in the high-level code. The determining of the one or more processing operations occurs based on architecture of the hardware. The high-level function in the high-level code of the application is compiled into the set of low-level instructions to be executed on the hardware. A plurality of structured metadata is generated and includes information associated with the determining resources in the hardware and further includes information associated with the determining one or more processing operations.

Type: Grant

Filed: July 30, 2021

Date of Patent: October 11, 2022

Assignee: Marvell Asia Pte Ltd

Inventors: Senad Durakovic, Chien-Chun Chou, Ulf Hanebutte, Harri Hakkarainen
Architecture of crossbar of inference engine

Patent number: 11256517

Abstract: A programmable hardware system for machine learning (ML) includes a core and an inference engine. The core receives commands from a host. The commands are in a first instruction set architecture (ISA) format. The core divides the commands into a first set for performance-critical operations, in the first ISA format, and a second set of performance non-critical operations, in the first ISA format. The core executes the second set to perform the performance non-critical operations of the ML operations and streams the first set to inference engine. The inference engine generates a stream of the first set of commands in a second ISA format based on the first set of commands in the first ISA format. The first set of commands in the second ISA format programs components within the inference engine to execute the ML operations to infer data.

Type: Grant

Filed: December 19, 2018

Date of Patent: February 22, 2022

Assignee: Marvell Asia Pte Ltd

Inventors: Avinash Sodani, Ulf Hanebutte, Senad Durakovic, Hamid Reza Ghasemi, Chia-Hsin Chen
Single instruction set architecture (ISA) format for multiple ISAS in machine learning inference engine

Patent number: 11086633

Abstract: A programmable hardware system for machine learning (ML) includes a core and an inference engine. The core receives commands from a host. The commands are in a first instruction set architecture (ISA) format. The core divides the commands into a first set for performance-critical operations, in the first ISA format, and a second set of performance non-critical operations, in the first ISA format. The core executes the second set to perform the performance non-critical operations of the ML operations and streams the first set to inference engine. The inference engine generates a stream of the first set of commands in a second ISA format based on the first set of commands in the first ISA format. The first set of commands in the second ISA format programs components within the inference engine to execute the ML operations to infer data.

Type: Grant

Filed: December 19, 2018

Date of Patent: August 10, 2021

Assignee: Marvell Asia Pte, Ltd.

Inventors: Avinash Sodani, Ulf Hanebutte, Senad Durakovic, Hamid Reza Ghasemi, Chia-Hsin Chen
ARCHITECTURE TO SUPPORT COLOR SCHEME-BASED SYNCHRONIZATION FOR MACHINE LEARNING

Publication number: 20210240521

Abstract: A system to support a machine learning (ML) operation comprises an array-based inference engine comprising a plurality of processing tiles each comprising at least one or more of an on-chip memory (OCM) configured to maintain data for local access by components in the processing tile and one or more processing units configured to perform one or more computation tasks on the data in the OCM by executing a set of task instructions. The system also comprises a data streaming engine configured to stream data between a memory and the OCMs and an instruction streaming engine configured to distribute said set of task instructions to the corresponding processing tiles to control their operations and to synchronize said set of task instructions to be executed by each processing tile, respectively, to wait current certain task at each processing tile to finish before starting a new one.

Type: Application

Filed: April 22, 2021

Publication date: August 5, 2021

Inventors: Avinash SODANI, Senad DURAKOVIC, Gopal NALAMALAPU
Architecture for irregular operations in machine learning inference engine

Patent number: 11029963

Abstract: A processing unit of an inference engine for machine learning (ML) includes a first data load steamer, a second data load streamer, an operator component, and a store streamer. The first data load streamer streams a first data stream from an on-chip memory (OCM) to the operator component. The second data load streamer streams a second data stream from the OCM to the operator component. The operator component performs a matrix operation on the first data stream and the second data stream. The store streamer receives a data output stream from the operator component and to store the data output stream in a buffer.

Type: Grant

Filed: December 19, 2018

Date of Patent: June 8, 2021

Assignee: Marvell Asia Pte, Ltd.

Inventors: Avinash Sodani, Ulf Hanebutte, Senad Durakovic, Hamid Reza Ghasemi, Chia-Hsin Chen, Rishan Tan
Architecture to support color scheme-based synchronization for machine learning

Patent number: 11016801

Abstract: A system to support a machine learning (ML) operation comprises an array-based inference engine comprising a plurality of processing tiles each comprising at least one or more of an on-chip memory (OCM) configured to maintain data for local access by components in the processing tile and one or more processing units configured to perform one or more computation tasks on the data in the OCM by executing a set of task instructions. The system also comprises a data streaming engine configured to stream data between a memory and the OCMs and an instruction streaming engine configured to distribute said set of task instructions to the corresponding processing tiles to control their operations and to synchronize said set of task instructions to be executed by each processing tile, respectively, to wait current certain task at each processing tile to finish before starting a new one.

Type: Grant

Filed: May 22, 2019

Date of Patent: May 25, 2021

Assignee: Marvell Asia Pte, Ltd.

Inventors: Avinash Sodani, Senad Durakovic, Gopal Nalamalapu
Systems and methods for programmable hardware architecture for machine learning

Patent number: 10970080

Abstract: A programmable hardware architecture for machine learning (ML) is proposed, which includes at least a host, a memory, a core, a data streaming engine, a instruction-streaming engine, and an interference engine. The core interprets a plurality of ML commands for a ML operation and/or data received from the host and coordinate activities of the engines based on the data in the received ML commands. The instruction-streaming engine translates the ML commands received from the core and provide a set of programming instructions to the data streaming engine and the inference engines based on the translated parameters. The data steaming engine sends one or more data streams to the inference engine in response to the received programming instructions. The inference engine then processes the data streams received from the data stream engine according to the programming instructions received from the instruction-streaming engine.

Type: Grant

Filed: November 9, 2018

Date of Patent: April 6, 2021

Assignee: Marvell Asia Pte, Ltd.

Inventors: Avinash Sodani, Chia-Hsin Chen, Ulf R. Hanebutte, Hamid Reza Ghasemi, Senad Durakovic
ARRAY-BASED INFERENCE ENGINE FOR MACHINE LEARNING

Publication number: 20210055934

Abstract: An array-based inference engine includes a plurality of processing tiles arranged in a two-dimensional array of a plurality of rows and a plurality of columns. Each processing tile comprises at least one or more of an on-chip memory (OCM) configured to load and maintain data from the input data stream for local access by components in the processing tile and further configured to maintain and output result of the ML operation performed by the processing tile as an output data stream. The array includes a first processing unit (POD) configured to perform a dense and/or regular computation task of the ML operation on the data in the OCM. The array also includes a second processing unit/element (PE) configured to perform a sparse and/or irregular computation task of the ML operation on the data in the OCM and/or from the POD.

Type: Application

Filed: October 2, 2020

Publication date: February 25, 2021

Inventors: Avinash Sodani, Ulf Hanebutte, Senad Durakovic, Hamid Reza Ghasemi, Chia-Hsin Chen

1 2 next