Patents by Inventor Avinash Sodani

Avinash Sodani has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20220043503
    Abstract: A system includes a plurality of cores. Each core includes a processing unit, an on-chip memory (OCM), and an idle detector unit. Data is received and stored in the OCM. Instructions are received to process data in the OCM. The core enters an idle mode if the idle detector unit detects that the core has been idle for a first number of clocking signals. The core receives a command to process when in idle mode and transitions from the idle mode to an operational mode. A number of no operation (No-Op) commands is inserted for each time segment. A No-Op command prevents the core from processing instructions for a certain number of clocking signals. A number of No-Op commands inserted for a first time segment is greater than a number of No-Op commands inserted for a last time segment. After the last time segment no No-Op command is inserted.
    Type: Application
    Filed: October 22, 2021
    Publication date: February 10, 2022
    Inventors: Chia-Hsin Chen, Avinash Sodani, Atul Bhattarai, Srinivas Sripada
  • Patent number: 11210105
    Abstract: A system to support data gathering for a machine learning (ML) operation comprises a memory unit configured to maintain data for the ML operation in a plurality of memory blocks each accessible via a memory address. The system further comprises an inference engine comprising a plurality of processing tiles each comprising one or more of an on-chip memory (OCM) configured to load and maintain data for local access by components in the processing tile. The system also comprises a core configured to program components of the processing tiles of the inference engine according to an instruction set architecture (ISA) and a data streaming engine configured to stream data between the memory unit and the OCMs of the processing tiles of the inference engine wherein data streaming engine is configured to perform a data gathering operation via a single data gathering instruction of the ISA at the same time.
    Type: Grant
    Filed: November 2, 2020
    Date of Patent: December 28, 2021
    Assignee: Marvell Asia Pte, Ltd.
    Inventor: Avinash Sodani
  • Patent number: 11181967
    Abstract: A system includes a plurality of cores. Each core includes a processing unit, an on-chip memory (OCM), and an idle detector unit. Data is received and stored in the OCM. Instructions are received to process data in the OCM. The core enters an idle mode if the idle detector unit detects that the core has been idle for a first number of clocking signals. The core receives a command to process when in idle mode and transitions from the idle mode to an operational mode. A number of no operation (No-Op) commands is inserted for each time segment. A No-Op command prevents the core from processing instructions for a certain number of clocking signals. A number of No-Op commands inserted for a first time segment is greater than a number of No-Op commands inserted for a last time segment. After the last time segment no No-Op command is inserted.
    Type: Grant
    Filed: July 31, 2020
    Date of Patent: November 23, 2021
    Assignee: Marvell Asia Pte Ltd
    Inventors: Chia-Hsin Chen, Avinash Sodani, Atul Bhattarai, Srinivas Sripada
  • Publication number: 20210342734
    Abstract: A method of converting a data stored in a memory from a first format to a second format is disclosed. The method includes extending a number of bits in the data stored in a double data rate (DDR) memory by one bit to form an extended data. The method further includes determining whether the data stored in the DDR is signed or unsigned data. Moreover, responsive to determining that the data is signed, a sign value is added to the most significant bit of the extended data and the data is copied to lower order bits of the extended data. Responsive to determining that the data is unsigned, the data is copied to lower order bits of the extended data and the most significant bit is set to an unsigned value, e.g., zero. The extended data is stored in an on-chip memory (OCM) of a processing tile of a machine learning computer array.
    Type: Application
    Filed: April 29, 2020
    Publication date: November 4, 2021
    Inventors: Avinash Sodani, Ulf Hanebutte, Chia-Hsin Chen
  • Publication number: 20210318740
    Abstract: A system includes a first and a second group of cores in a multicore system. Each core of the first/second group is configured to process data. Each core within the first/second group is configured to enter into an idle state in response to being idle for a first/second period of time respectively. Every idle core in the first/second group is configured to transition out of the idle state and into an operational mode in response to receiving a signal having a first/second value respectively and further in response to having a pending operation to process.
    Type: Application
    Filed: July 31, 2020
    Publication date: October 14, 2021
    Inventors: Srinivas Sripada, Chia-Hsin Chen, Avinash Sodani, Atul Bhattarai, Nikhil Jayakumar
  • Publication number: 20210320880
    Abstract: Control logic circuitry stores packets in a queue in an order in which the packets are received. A head entry of the queue corresponds to an oldest packet in the order. The control logic circuitry receives flow control information corresponding to multiple target devices including at least a first target device and a second target device. The control logic circuitry determines, using the flow control information, whether the oldest packet stored in the head entry can be transferred to the first target device, and in response to determining that the oldest packet stored in the head entry cannot be transferred to the first target device, i) selects an other entry with an other packet behind the head entry according to the order, and ii) transfers the other packet to the second target device prior to transferring the oldest packet in the head entry to the first target device.
    Type: Application
    Filed: December 22, 2020
    Publication date: October 14, 2021
    Inventors: Avinash SODANI, Enric MUSOLL, Dan TU, Chia-Hsin CHEN
  • Publication number: 20210247836
    Abstract: A system includes a plurality of cores. Each core includes a processing unit, an on-chip memory (OCM), and an idle detector unit. Data is received and stored in the OCM. Instructions are received to process data in the OCM. The core enters an idle mode if the idle detector unit detects that the core has been idle for a first number of clocking signals. The core receives a command to process when in idle mode and transitions from the idle mode to an operational mode. A number of no operation (No-Op) commands is inserted for each time segment. A No-Op command prevents the core from processing instructions for a certain number of clocking signals. A number of No-Op commands inserted for a first time segment is greater than a number of No-Op commands inserted for a last time segment. After the last time segment no No-Op command is inserted.
    Type: Application
    Filed: July 31, 2020
    Publication date: August 12, 2021
    Inventors: Chia-Hsin Chen, Avinash Sodani, Atul Bhattarai, Srinivas Sripada
  • Publication number: 20210248497
    Abstract: A processing unit to support inference acceleration for machine learning (ML) comprises an inline post processing unit configured to accept and maintain one or more lookup tables for performing a tanh and/or sigmoid operation/function. The inline post processing unit is further configured to accept data from a set of registers configured to maintain output from a processing block instead of streaming the data from an on-chip memory (OCM), perform the tanh and/or sigmoid operation on each element of the data from the processing block on a per-element basis via the one or more lookup tables, and stream post processing result of the per-element tanh and/or sigmoid operation back to the OCM after the tanh and/or sigmoid operation is complete.
    Type: Application
    Filed: April 6, 2021
    Publication date: August 12, 2021
    Inventors: Avinash SODANI, Ulf HANEBUTTE, Chia-Hsin CHEN
  • Patent number: 11086633
    Abstract: A programmable hardware system for machine learning (ML) includes a core and an inference engine. The core receives commands from a host. The commands are in a first instruction set architecture (ISA) format. The core divides the commands into a first set for performance-critical operations, in the first ISA format, and a second set of performance non-critical operations, in the first ISA format. The core executes the second set to perform the performance non-critical operations of the ML operations and streams the first set to inference engine. The inference engine generates a stream of the first set of commands in a second ISA format based on the first set of commands in the first ISA format. The first set of commands in the second ISA format programs components within the inference engine to execute the ML operations to infer data.
    Type: Grant
    Filed: December 19, 2018
    Date of Patent: August 10, 2021
    Assignee: Marvell Asia Pte, Ltd.
    Inventors: Avinash Sodani, Ulf Hanebutte, Senad Durakovic, Hamid Reza Ghasemi, Chia-Hsin Chen
  • Publication number: 20210240521
    Abstract: A system to support a machine learning (ML) operation comprises an array-based inference engine comprising a plurality of processing tiles each comprising at least one or more of an on-chip memory (OCM) configured to maintain data for local access by components in the processing tile and one or more processing units configured to perform one or more computation tasks on the data in the OCM by executing a set of task instructions. The system also comprises a data streaming engine configured to stream data between a memory and the OCMs and an instruction streaming engine configured to distribute said set of task instructions to the corresponding processing tiles to control their operations and to synchronize said set of task instructions to be executed by each processing tile, respectively, to wait current certain task at each processing tile to finish before starting a new one.
    Type: Application
    Filed: April 22, 2021
    Publication date: August 5, 2021
    Inventors: Avinash SODANI, Senad DURAKOVIC, Gopal NALAMALAPU
  • Publication number: 20210209492
    Abstract: A processing unit to support inference acceleration for machine learning (ML) comprises an inline post processing unit configured to accept and maintain one or more lookup tables for performing each of one or more non-linear mathematical operations. The inline post processing unit is further configured to accept data from a set of registers maintaining output from a processing block instead of streaming the data from an on-chip memory (OCM), perform the one or more non-linear mathematical operations on elements of the data from the processing block via their corresponding lookup tables, and stream post processing result of the one or more non-linear mathematical operations back to the OCM after the one or more non-linear mathematical operations are complete.
    Type: Application
    Filed: December 23, 2020
    Publication date: July 8, 2021
    Inventors: Avinash Sodani, Ulf Hanebutte, Chia-Hsin Chen
  • Publication number: 20210191719
    Abstract: A method includes receiving an input data at a FP arithmetic operating unit configured to perform a FP arithmetic operation on the input data. The method further includes determining whether the received input data generates a FP hardware exception responsive to the FP arithmetic operation on the input data, wherein the determining occurs prior to performing the FP arithmetic operation. The method also includes converting a value of the received input data to a modified value responsive to the determining that the received input data generates the FP hardware exception, wherein the converting eliminates generation of the FP hardware exception responsive to the FP arithmetic operation on the input data.
    Type: Application
    Filed: April 30, 2020
    Publication date: June 24, 2021
    Inventors: Chia-Hsin Chen, Avinash Sodani, Ulf Hanebutte, Rishan Tan, Soumya Gollamudi
  • Publication number: 20210191787
    Abstract: A new approach for supporting tag-based synchronization among different tasks of a machine learning (ML) operation. When a first task tagged with a set tag indicating that one or more subsequent tasks need to be synchronized with it is received at an instruction streaming engine, the engine saves the set tag in a tag table and transmits instructions of the first task to a set of processing tiles for execution. When a second task having an instruction sync tag indicating that it needs to be synchronized with one or more prior tasks is received at the engine, the engine matches the instruction sync tag with the set tags in the tag table to identify prior tasks that the second task depends on. The engine holds instructions of the second task until these matching prior tasks have been completed and then releases the instructions to the processing tiles for execution.
    Type: Application
    Filed: April 30, 2020
    Publication date: June 24, 2021
    Inventors: Avinash Sodani, Gopal Nalamalapu
  • Patent number: 11029963
    Abstract: A processing unit of an inference engine for machine learning (ML) includes a first data load steamer, a second data load streamer, an operator component, and a store streamer. The first data load streamer streams a first data stream from an on-chip memory (OCM) to the operator component. The second data load streamer streams a second data stream from the OCM to the operator component. The operator component performs a matrix operation on the first data stream and the second data stream. The store streamer receives a data output stream from the operator component and to store the data output stream in a buffer.
    Type: Grant
    Filed: December 19, 2018
    Date of Patent: June 8, 2021
    Assignee: Marvell Asia Pte, Ltd.
    Inventors: Avinash Sodani, Ulf Hanebutte, Senad Durakovic, Hamid Reza Ghasemi, Chia-Hsin Chen, Rishan Tan
  • Patent number: 11016801
    Abstract: A system to support a machine learning (ML) operation comprises an array-based inference engine comprising a plurality of processing tiles each comprising at least one or more of an on-chip memory (OCM) configured to maintain data for local access by components in the processing tile and one or more processing units configured to perform one or more computation tasks on the data in the OCM by executing a set of task instructions. The system also comprises a data streaming engine configured to stream data between a memory and the OCMs and an instruction streaming engine configured to distribute said set of task instructions to the corresponding processing tiles to control their operations and to synchronize said set of task instructions to be executed by each processing tile, respectively, to wait current certain task at each processing tile to finish before starting a new one.
    Type: Grant
    Filed: May 22, 2019
    Date of Patent: May 25, 2021
    Assignee: Marvell Asia Pte, Ltd.
    Inventors: Avinash Sodani, Senad Durakovic, Gopal Nalamalapu
  • Patent number: 10997510
    Abstract: A processing unit to support inference acceleration for machine learning (ML) comprises an inline post processing unit configured to accept and maintain one or more lookup tables for performing a tanh and/or sigmoid operation/function. The inline post processing unit is further configured to accept data from a set of registers configured to maintain output from a processing block instead of streaming the data from an on-chip memory (OCM), perform the tanh and/or sigmoid operation on each element of the data from the processing block on a per-element basis via the one or more lookup tables, and stream post processing result of the per-element tanh and/or sigmoid operation back to the OCM after the tanh and/or sigmoid operation is complete.
    Type: Grant
    Filed: May 22, 2019
    Date of Patent: May 4, 2021
    Assignee: Marvell Asia Pte, Ltd.
    Inventors: Avinash Sodani, Ulf Hanebutte, Chia-Hsin Chen
  • Publication number: 20210117866
    Abstract: A system includes a memory, an interface engine, and a master. The memory is configured to store data. The inference engine is configured to receive the data and to perform one or more computation tasks of a machine learning (ML) operation associated with the data. The master is configured to interleave an address associated with memory access transaction for accessing the memory. The master is further configured to provide a content associated with the accessing to the inference engine.
    Type: Application
    Filed: December 23, 2020
    Publication date: April 22, 2021
    Inventors: Avinash Sodani, Ramacharan Sundararaman
  • Patent number: 10970080
    Abstract: A programmable hardware architecture for machine learning (ML) is proposed, which includes at least a host, a memory, a core, a data streaming engine, a instruction-streaming engine, and an interference engine. The core interprets a plurality of ML commands for a ML operation and/or data received from the host and coordinate activities of the engines based on the data in the received ML commands. The instruction-streaming engine translates the ML commands received from the core and provide a set of programming instructions to the data streaming engine and the inference engines based on the translated parameters. The data steaming engine sends one or more data streams to the inference engine in response to the received programming instructions. The inference engine then processes the data streams received from the data stream engine according to the programming instructions received from the instruction-streaming engine.
    Type: Grant
    Filed: November 9, 2018
    Date of Patent: April 6, 2021
    Assignee: Marvell Asia Pte, Ltd.
    Inventors: Avinash Sodani, Chia-Hsin Chen, Ulf R. Hanebutte, Hamid Reza Ghasemi, Senad Durakovic
  • Publication number: 20210081846
    Abstract: A system to support a machine learning (ML) operation comprises a core configured to receive and interpret commands into a set of instructions for the ML operation and a memory unit configured to maintain data for the ML operation. The system further comprises an inference engine having a plurality of processing tiles, each comprising an on-chip memory (OCM) configured to maintain data for local access by components in the processing tile and one or more processing units configured to perform tasks of the ML operation on the data in the OCM. The system also comprises an instruction streaming engine configured to distribute the instructions to the processing tiles to control their operations and to synchronize data communication between the core and the inference engine so that data transmitted between them correctly reaches the corresponding processing tiles while ensuring coherence of data shared and distributed among the core and the OCMs.
    Type: Application
    Filed: November 30, 2020
    Publication date: March 18, 2021
    Inventors: Avinash Sodani, Gopal Nalamalapu
  • Publication number: 20210055934
    Abstract: An array-based inference engine includes a plurality of processing tiles arranged in a two-dimensional array of a plurality of rows and a plurality of columns. Each processing tile comprises at least one or more of an on-chip memory (OCM) configured to load and maintain data from the input data stream for local access by components in the processing tile and further configured to maintain and output result of the ML operation performed by the processing tile as an output data stream. The array includes a first processing unit (POD) configured to perform a dense and/or regular computation task of the ML operation on the data in the OCM. The array also includes a second processing unit/element (PE) configured to perform a sparse and/or irregular computation task of the ML operation on the data in the OCM and/or from the POD.
    Type: Application
    Filed: October 2, 2020
    Publication date: February 25, 2021
    Inventors: Avinash Sodani, Ulf Hanebutte, Senad Durakovic, Hamid Reza Ghasemi, Chia-Hsin Chen