Patents by Inventor Avinash Sodani

Avinash Sodani has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

POWER MANAGEMENT AND TRANSITIONING CORES WITHIN A MULTICORE SYSTEM FROM IDLE MODE TO OPERATIONAL MODE OVER A PERIOD OF TIME

Publication number: 20220043503

Abstract: A system includes a plurality of cores. Each core includes a processing unit, an on-chip memory (OCM), and an idle detector unit. Data is received and stored in the OCM. Instructions are received to process data in the OCM. The core enters an idle mode if the idle detector unit detects that the core has been idle for a first number of clocking signals. The core receives a command to process when in idle mode and transitions from the idle mode to an operational mode. A number of no operation (No-Op) commands is inserted for each time segment. A No-Op command prevents the core from processing instructions for a certain number of clocking signals. A number of No-Op commands inserted for a first time segment is greater than a number of No-Op commands inserted for a last time segment. After the last time segment no No-Op command is inserted.

Type: Application

Filed: October 22, 2021

Publication date: February 10, 2022

Inventors: Chia-Hsin Chen, Avinash Sodani, Atul Bhattarai, Srinivas Sripada
Data transmission between memory and on chip memory of inference engine for machine learning via a single data gathering instruction

Patent number: 11210105

Abstract: A system to support data gathering for a machine learning (ML) operation comprises a memory unit configured to maintain data for the ML operation in a plurality of memory blocks each accessible via a memory address. The system further comprises an inference engine comprising a plurality of processing tiles each comprising one or more of an on-chip memory (OCM) configured to load and maintain data for local access by components in the processing tile. The system also comprises a core configured to program components of the processing tiles of the inference engine according to an instruction set architecture (ISA) and a data streaming engine configured to stream data between the memory unit and the OCMs of the processing tiles of the inference engine wherein data streaming engine is configured to perform a data gathering operation via a single data gathering instruction of the ISA at the same time.

Type: Grant

Filed: November 2, 2020

Date of Patent: December 28, 2021

Assignee: Marvell Asia Pte, Ltd.

Inventor: Avinash Sodani
Power management and transitioning cores within a multicore system from idle mode to operational mode over a period of time

Patent number: 11181967

Abstract: A system includes a plurality of cores. Each core includes a processing unit, an on-chip memory (OCM), and an idle detector unit. Data is received and stored in the OCM. Instructions are received to process data in the OCM. The core enters an idle mode if the idle detector unit detects that the core has been idle for a first number of clocking signals. The core receives a command to process when in idle mode and transitions from the idle mode to an operational mode. A number of no operation (No-Op) commands is inserted for each time segment. A No-Op command prevents the core from processing instructions for a certain number of clocking signals. A number of No-Op commands inserted for a first time segment is greater than a number of No-Op commands inserted for a last time segment. After the last time segment no No-Op command is inserted.

Type: Grant

Filed: July 31, 2020

Date of Patent: November 23, 2021

Assignee: Marvell Asia Pte Ltd

Inventors: Chia-Hsin Chen, Avinash Sodani, Atul Bhattarai, Srinivas Sripada
SYSTEM AND METHOD FOR INT9 QUANTIZATION

Publication number: 20210342734

Abstract: A method of converting a data stored in a memory from a first format to a second format is disclosed. The method includes extending a number of bits in the data stored in a double data rate (DDR) memory by one bit to form an extended data. The method further includes determining whether the data stored in the DDR is signed or unsigned data. Moreover, responsive to determining that the data is signed, a sign value is added to the most significant bit of the extended data and the data is copied to lower order bits of the extended data. Responsive to determining that the data is unsigned, the data is copied to lower order bits of the extended data and the most significant bit is set to an unsigned value, e.g., zero. The extended data is stored in an on-chip memory (OCM) of a processing tile of a machine learning computer array.

Type: Application

Filed: April 29, 2020

Publication date: November 4, 2021

Inventors: Avinash Sodani, Ulf Hanebutte, Chia-Hsin Chen
POWER MANAGEMENT AND STAGGERING TRANSITIONING FROM IDLE MODE TO OPERATIONAL MODE

Publication number: 20210318740

Abstract: A system includes a first and a second group of cores in a multicore system. Each core of the first/second group is configured to process data. Each core within the first/second group is configured to enter into an idle state in response to being idle for a first/second period of time respectively. Every idle core in the first/second group is configured to transition out of the idle state and into an operational mode in response to receiving a signal having a first/second value respectively and further in response to having a pending operation to process.

Type: Application

Filed: July 31, 2020

Publication date: October 14, 2021

Inventors: Srinivas Sripada, Chia-Hsin Chen, Avinash Sodani, Atul Bhattarai, Nikhil Jayakumar
Queueing System with Head-of-Line Block Avoidance

Publication number: 20210320880

Abstract: Control logic circuitry stores packets in a queue in an order in which the packets are received. A head entry of the queue corresponds to an oldest packet in the order. The control logic circuitry receives flow control information corresponding to multiple target devices including at least a first target device and a second target device. The control logic circuitry determines, using the flow control information, whether the oldest packet stored in the head entry can be transferred to the first target device, and in response to determining that the oldest packet stored in the head entry cannot be transferred to the first target device, i) selects an other entry with an other packet behind the head entry according to the order, and ii) transfers the other packet to the second target device prior to transferring the oldest packet in the head entry to the first target device.

Type: Application

Filed: December 22, 2020

Publication date: October 14, 2021

Inventors: Avinash SODANI, Enric MUSOLL, Dan TU, Chia-Hsin CHEN
POWER MANAGEMENT AND TRANSITIONING CORES WITHIN A MULTICORE SYSTEM FROM IDLE MODE TO OPERATIONAL MODE OVER A PERIOD OF TIME

Publication number: 20210247836

Abstract: A system includes a plurality of cores. Each core includes a processing unit, an on-chip memory (OCM), and an idle detector unit. Data is received and stored in the OCM. Instructions are received to process data in the OCM. The core enters an idle mode if the idle detector unit detects that the core has been idle for a first number of clocking signals. The core receives a command to process when in idle mode and transitions from the idle mode to an operational mode. A number of no operation (No-Op) commands is inserted for each time segment. A No-Op command prevents the core from processing instructions for a certain number of clocking signals. A number of No-Op commands inserted for a first time segment is greater than a number of No-Op commands inserted for a last time segment. After the last time segment no No-Op command is inserted.

Type: Application

Filed: July 31, 2020

Publication date: August 12, 2021

Inventors: Chia-Hsin Chen, Avinash Sodani, Atul Bhattarai, Srinivas Sripada
ARCHITECTURE TO SUPPORT TANH AND SIGMOID OPERATIONS FOR INFERENCE ACCELERATION IN MACHINE LEARNING

Publication number: 20210248497

Abstract: A processing unit to support inference acceleration for machine learning (ML) comprises an inline post processing unit configured to accept and maintain one or more lookup tables for performing a tanh and/or sigmoid operation/function. The inline post processing unit is further configured to accept data from a set of registers configured to maintain output from a processing block instead of streaming the data from an on-chip memory (OCM), perform the tanh and/or sigmoid operation on each element of the data from the processing block on a per-element basis via the one or more lookup tables, and stream post processing result of the per-element tanh and/or sigmoid operation back to the OCM after the tanh and/or sigmoid operation is complete.

Type: Application

Filed: April 6, 2021

Publication date: August 12, 2021

Inventors: Avinash SODANI, Ulf HANEBUTTE, Chia-Hsin CHEN
Single instruction set architecture (ISA) format for multiple ISAS in machine learning inference engine

Patent number: 11086633

Abstract: A programmable hardware system for machine learning (ML) includes a core and an inference engine. The core receives commands from a host. The commands are in a first instruction set architecture (ISA) format. The core divides the commands into a first set for performance-critical operations, in the first ISA format, and a second set of performance non-critical operations, in the first ISA format. The core executes the second set to perform the performance non-critical operations of the ML operations and streams the first set to inference engine. The inference engine generates a stream of the first set of commands in a second ISA format based on the first set of commands in the first ISA format. The first set of commands in the second ISA format programs components within the inference engine to execute the ML operations to infer data.

Type: Grant

Filed: December 19, 2018

Date of Patent: August 10, 2021

Assignee: Marvell Asia Pte, Ltd.

Inventors: Avinash Sodani, Ulf Hanebutte, Senad Durakovic, Hamid Reza Ghasemi, Chia-Hsin Chen
ARCHITECTURE TO SUPPORT COLOR SCHEME-BASED SYNCHRONIZATION FOR MACHINE LEARNING

Publication number: 20210240521

Abstract: A system to support a machine learning (ML) operation comprises an array-based inference engine comprising a plurality of processing tiles each comprising at least one or more of an on-chip memory (OCM) configured to maintain data for local access by components in the processing tile and one or more processing units configured to perform one or more computation tasks on the data in the OCM by executing a set of task instructions. The system also comprises a data streaming engine configured to stream data between a memory and the OCMs and an instruction streaming engine configured to distribute said set of task instructions to the corresponding processing tiles to control their operations and to synchronize said set of task instructions to be executed by each processing tile, respectively, to wait current certain task at each processing tile to finish before starting a new one.

Type: Application

Filed: April 22, 2021

Publication date: August 5, 2021

Inventors: Avinash SODANI, Senad DURAKOVIC, Gopal NALAMALAPU
ARCHITECTURE FOR TABLE-BASED MATHEMATICAL OPERATIONS FOR INFERENCE ACCELERATION IN MACHINE LEARNING

Publication number: 20210209492

Abstract: A processing unit to support inference acceleration for machine learning (ML) comprises an inline post processing unit configured to accept and maintain one or more lookup tables for performing each of one or more non-linear mathematical operations. The inline post processing unit is further configured to accept data from a set of registers maintaining output from a processing block instead of streaming the data from an on-chip memory (OCM), perform the one or more non-linear mathematical operations on elements of the data from the processing block via their corresponding lookup tables, and stream post processing result of the one or more non-linear mathematical operations back to the OCM after the one or more non-linear mathematical operations are complete.

Type: Application

Filed: December 23, 2020

Publication date: July 8, 2021

Inventors: Avinash Sodani, Ulf Hanebutte, Chia-Hsin Chen
SYSTEM AND METHOD FOR HANDLING FLOATING POINT HARDWARE EXCEPTION

Publication number: 20210191719

Abstract: A method includes receiving an input data at a FP arithmetic operating unit configured to perform a FP arithmetic operation on the input data. The method further includes determining whether the received input data generates a FP hardware exception responsive to the FP arithmetic operation on the input data, wherein the determining occurs prior to performing the FP arithmetic operation. The method also includes converting a value of the received input data to a modified value responsive to the determining that the received input data generates the FP hardware exception, wherein the converting eliminates generation of the FP hardware exception responsive to the FP arithmetic operation on the input data.

Type: Application

Filed: April 30, 2020

Publication date: June 24, 2021

Inventors: Chia-Hsin Chen, Avinash Sodani, Ulf Hanebutte, Rishan Tan, Soumya Gollamudi
SYSTEM AND METHODS FOR TAG-BASED SYNCHRONIZATION OF TASKS FOR MACHINE LEARNING OPERATIONS

Publication number: 20210191787

Abstract: A new approach for supporting tag-based synchronization among different tasks of a machine learning (ML) operation. When a first task tagged with a set tag indicating that one or more subsequent tasks need to be synchronized with it is received at an instruction streaming engine, the engine saves the set tag in a tag table and transmits instructions of the first task to a set of processing tiles for execution. When a second task having an instruction sync tag indicating that it needs to be synchronized with one or more prior tasks is received at the engine, the engine matches the instruction sync tag with the set tags in the tag table to identify prior tasks that the second task depends on. The engine holds instructions of the second task until these matching prior tasks have been completed and then releases the instructions to the processing tiles for execution.

Type: Application

Filed: April 30, 2020

Publication date: June 24, 2021

Inventors: Avinash Sodani, Gopal Nalamalapu
Architecture for irregular operations in machine learning inference engine

Patent number: 11029963

Abstract: A processing unit of an inference engine for machine learning (ML) includes a first data load steamer, a second data load streamer, an operator component, and a store streamer. The first data load streamer streams a first data stream from an on-chip memory (OCM) to the operator component. The second data load streamer streams a second data stream from the OCM to the operator component. The operator component performs a matrix operation on the first data stream and the second data stream. The store streamer receives a data output stream from the operator component and to store the data output stream in a buffer.

Type: Grant

Filed: December 19, 2018

Date of Patent: June 8, 2021

Assignee: Marvell Asia Pte, Ltd.

Inventors: Avinash Sodani, Ulf Hanebutte, Senad Durakovic, Hamid Reza Ghasemi, Chia-Hsin Chen, Rishan Tan
Architecture to support color scheme-based synchronization for machine learning

Patent number: 11016801

Abstract: A system to support a machine learning (ML) operation comprises an array-based inference engine comprising a plurality of processing tiles each comprising at least one or more of an on-chip memory (OCM) configured to maintain data for local access by components in the processing tile and one or more processing units configured to perform one or more computation tasks on the data in the OCM by executing a set of task instructions. The system also comprises a data streaming engine configured to stream data between a memory and the OCMs and an instruction streaming engine configured to distribute said set of task instructions to the corresponding processing tiles to control their operations and to synchronize said set of task instructions to be executed by each processing tile, respectively, to wait current certain task at each processing tile to finish before starting a new one.

Type: Grant

Filed: May 22, 2019

Date of Patent: May 25, 2021

Assignee: Marvell Asia Pte, Ltd.

Inventors: Avinash Sodani, Senad Durakovic, Gopal Nalamalapu
Architecture to support tanh and sigmoid operations for inference acceleration in machine learning

Patent number: 10997510

Abstract: A processing unit to support inference acceleration for machine learning (ML) comprises an inline post processing unit configured to accept and maintain one or more lookup tables for performing a tanh and/or sigmoid operation/function. The inline post processing unit is further configured to accept data from a set of registers configured to maintain output from a processing block instead of streaming the data from an on-chip memory (OCM), perform the tanh and/or sigmoid operation on each element of the data from the processing block on a per-element basis via the one or more lookup tables, and stream post processing result of the per-element tanh and/or sigmoid operation back to the OCM after the tanh and/or sigmoid operation is complete.

Type: Grant

Filed: May 22, 2019

Date of Patent: May 4, 2021

Assignee: Marvell Asia Pte, Ltd.

Inventors: Avinash Sodani, Ulf Hanebutte, Chia-Hsin Chen
ADDRESS INTERLEAVING FOR MACHINE LEARNING

Publication number: 20210117866

Abstract: A system includes a memory, an interface engine, and a master. The memory is configured to store data. The inference engine is configured to receive the data and to perform one or more computation tasks of a machine learning (ML) operation associated with the data. The master is configured to interleave an address associated with memory access transaction for accessing the memory. The master is further configured to provide a content associated with the accessing to the inference engine.

Type: Application

Filed: December 23, 2020

Publication date: April 22, 2021

Inventors: Avinash Sodani, Ramacharan Sundararaman
Systems and methods for programmable hardware architecture for machine learning

Patent number: 10970080

Abstract: A programmable hardware architecture for machine learning (ML) is proposed, which includes at least a host, a memory, a core, a data streaming engine, a instruction-streaming engine, and an interference engine. The core interprets a plurality of ML commands for a ML operation and/or data received from the host and coordinate activities of the engines based on the data in the received ML commands. The instruction-streaming engine translates the ML commands received from the core and provide a set of programming instructions to the data streaming engine and the inference engines based on the translated parameters. The data steaming engine sends one or more data streams to the inference engine in response to the received programming instructions. The inference engine then processes the data streams received from the data stream engine according to the programming instructions received from the instruction-streaming engine.

Type: Grant

Filed: November 9, 2018

Date of Patent: April 6, 2021

Assignee: Marvell Asia Pte, Ltd.

Inventors: Avinash Sodani, Chia-Hsin Chen, Ulf R. Hanebutte, Hamid Reza Ghasemi, Senad Durakovic
ARCHITECTURE TO SUPPORT SYNCHRONIZATION BETWEEN CORE AND INFERENCE ENGINE FOR MACHINE LEARNING

Publication number: 20210081846

Abstract: A system to support a machine learning (ML) operation comprises a core configured to receive and interpret commands into a set of instructions for the ML operation and a memory unit configured to maintain data for the ML operation. The system further comprises an inference engine having a plurality of processing tiles, each comprising an on-chip memory (OCM) configured to maintain data for local access by components in the processing tile and one or more processing units configured to perform tasks of the ML operation on the data in the OCM. The system also comprises an instruction streaming engine configured to distribute the instructions to the processing tiles to control their operations and to synchronize data communication between the core and the inference engine so that data transmitted between them correctly reaches the corresponding processing tiles while ensuring coherence of data shared and distributed among the core and the OCMs.

Type: Application

Filed: November 30, 2020

Publication date: March 18, 2021

Inventors: Avinash Sodani, Gopal Nalamalapu
ARRAY-BASED INFERENCE ENGINE FOR MACHINE LEARNING

Publication number: 20210055934

Abstract: An array-based inference engine includes a plurality of processing tiles arranged in a two-dimensional array of a plurality of rows and a plurality of columns. Each processing tile comprises at least one or more of an on-chip memory (OCM) configured to load and maintain data from the input data stream for local access by components in the processing tile and further configured to maintain and output result of the ML operation performed by the processing tile as an output data stream. The array includes a first processing unit (POD) configured to perform a dense and/or regular computation task of the ML operation on the data in the OCM. The array also includes a second processing unit/element (PE) configured to perform a sparse and/or irregular computation task of the ML operation on the data in the OCM and/or from the POD.

Type: Application

Filed: October 2, 2020

Publication date: February 25, 2021

Inventors: Avinash Sodani, Ulf Hanebutte, Senad Durakovic, Hamid Reza Ghasemi, Chia-Hsin Chen

prev 1 2 3 4 5 6 7 … next