Patents by Inventor Avinash Sodani

Avinash Sodani has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Data transmission between memory and on chip memory of inference engine for machine learning via a single data gathering instruction

Patent number: 10891136

Abstract: A system to support data gathering for a machine learning (ML) operation comprises a memory unit configured to maintain data for the ML operation in a plurality of memory blocks each accessible via a memory address. The system further comprises an inference engine comprising a plurality of processing tiles each comprising one or more of an on-chip memory (OCM) configured to load and maintain data for local access by components in the processing tile. The system also comprises a core configured to program components of the processing tiles of the inference engine according to an instruction set architecture (ISA) and a data streaming engine configured to stream data between the memory unit and the OCMs of the processing tiles of the inference engine wherein data streaming engine is configured to perform a data gathering operation via a single data gathering instruction of the ISA at the same time.

Type: Grant

Filed: May 22, 2019

Date of Patent: January 12, 2021

Assignee: Marvell Asia Pte, Ltd.

Inventor: Avinash Sodani
Array-based inference engine for machine learning

Patent number: 10824433

Abstract: An array-based inference engine includes a plurality of processing tiles arranged in a two-dimensional array of a plurality of rows and a plurality of columns. Each processing tile comprises at least one or more of an on-chip memory (OCM) configured to load and maintain data from the input data stream for local access by components in the processing tile and further configured to maintain and output result of the ML operation performed by the processing tile as an output data stream. The array includes a first processing unit (POD) configured to perform a dense and/or regular computation task of the ML operation on the data in the OCM. The array also includes a second processing unit/element (PE) configured to perform a sparse and/or irregular computation task of the ML operation on the data in the OCM and/or from the POD.

Type: Grant

Filed: December 19, 2018

Date of Patent: November 3, 2020

Assignee: Marvell Asia Pte, Ltd.

Inventors: Avinash Sodani, Ulf Hanebutte, Senad Durakovic, Hamid Reza Ghasemi, Chia-Hsin Chen
Providing Multiple Memory Modes For A Processor Including Internal Memory

Publication number: 20190286559

Abstract: In one embodiment, a processor comprises: at least one core formed on a die to execute instructions; a first memory controller to interface with an in-package memory; a second memory controller to interface with a platform memory to couple to the processor; and the in-package memory located within a package of the processor, where the in-package memory is to be identified as a more distant memory with respect to the at least one core than the platform memory. Other embodiments are described and claimed.

Type: Application

Filed: June 6, 2019

Publication date: September 19, 2019

Inventors: Avinash Sodani, Robert J. Kyanko, Richard J. Greco, Andreas Kleen, Milind B. Girkar, Christopher M. Cantalupo
ARCHITECTURE FOR IRREGULAR OPERATIONS IN MACHINE LEARNING INFFERENCE ENGINE

Publication number: 20190243871

Abstract: A processing unit of an inference engine for machine learning (ML) includes a first data load steamer, a second data load streamer, an operator component, and a store streamer. The first data load streamer streams a first data stream from an on-chip memory (OCM) to the operator component. The second data load streamer streams a second data stream from the OCM to the operator component. The operator component performs a matrix operation on the first data stream and the second data stream. The store streamer receives a data output stream from the operator component and to store the data output stream in a buffer.

Type: Application

Filed: December 19, 2018

Publication date: August 8, 2019

Inventors: Avinash SODANI, Ulf HANEBUTTE, Senad DURAKOVIC, Hamid Reza GHASEMI, Chia-Hsin CHEN, Rishan TAN
ARCHITECTURE FOR DENSE OPERATIONS IN MACHINE LEARNING INFERENCE ENGINE

Publication number: 20190244130

Abstract: A processing unit of an inference engine for machine learning (ML) includes a first, a second, and a third register, and a matrix multiplication block. The first register receives a first stream of data associated with a first matrix data that is read only once. The second register receives a second stream of data associated with a second matrix data that is read only once. The matrix multiplication block performs a multiplication operation based on data from the first register and the second register resulting in an output matrix. A row associated with the first matrix is maintained while rows associated with the second matrix is fed to the matrix multiplication block to perform a multiplication operation. The process is repeated for each row of the first matrix. The third register receives the output matrix from the matrix multiplication block and stores the output matrix.

Type: Application

Filed: December 19, 2018

Publication date: August 8, 2019

Inventors: Avinash SODANI, Ulf HANEBUTTE, Senad DURAKOVIC, Hamid Reza GHASEMI, Chia-Hsin CHEN
ARRAY-BASED INFERENCE ENGINE FOR MACHINE LEARNING

Publication number: 20190243800

Abstract: An array-based inference engine includes a plurality of processing tiles arranged in a two-dimensional array of a plurality of rows and a plurality of columns. Each processing tile comprises at least one or more of an on-chip memory (OCM) configured to load and maintain data from the input data stream for local access by components in the processing tile and further configured to maintain and output result of the ML operation performed by the processing tile as an output data stream. The array includes a first processing unit (POD) configured to perform a dense and/or regular computation task of the ML operation on the data in the OCM. The array also includes a second processing unit/element (PE) configured to perform a sparse and/or irregular computation task of the ML operation on the data in the OCM and/or from the POD.

Type: Application

Filed: December 19, 2018

Publication date: August 8, 2019

Inventors: Avinash SODANI, Ulf HANEBUTTE, Senad DURAKOVIC, Hamid Reza GHASEMI, Chia-Hsin CHEN
ARCHITECTURE OF CROSSBAR OF INFERENCE ENGINE

Publication number: 20190244118

Abstract: A programmable hardware system for machine learning (ML) includes a core and an inference engine. The core receives commands from a host. The commands are in a first instruction set architecture (ISA) format. The core divides the commands into a first set for performance-critical operations, in the first ISA format, and a second set of performance non-critical operations, in the first ISA format. The core executes the second set to perform the performance non-critical operations of the ML operations and streams the first set to inference engine. The inference engine generates a stream of the first set of commands in a second ISA format based on the first set of commands in the first ISA format. The first set of commands in the second ISA format programs components within the inference engine to execute the ML operations to infer data.

Type: Application

Filed: December 19, 2018

Publication date: August 8, 2019

Inventors: Avinash SODANI, Ulf HANEBUTTE, Senad DURAKOVIC, Hamid Reza GHASEMI, Chia-Hsin CHEN
SINGLE INSTRUCTION SET ARCHITECTURE (ISA) FORMAT FOR MULTIPLE ISAS IN MACHINE LEARNING INFERENCE ENGINE

Publication number: 20190243653

Abstract: A programmable hardware system for machine learning (ML) includes a core and an inference engine. The core receives commands from a host. The commands are in a first instruction set architecture (ISA) format. The core divides the commands into a first set for performance-critical operations, in the first ISA format, and a second set of performance non-critical operations, in the first ISA format. The core executes the second set to perform the performance non-critical operations of the ML operations and streams the first set to inference engine. The inference engine generates a stream of the first set of commands in a second ISA format based on the first set of commands in the first ISA format. The first set of commands in the second ISA format programs components within the inference engine to execute the ML operations to infer data.

Type: Application

Filed: December 19, 2018

Publication date: August 8, 2019

Inventors: Avinash SODANI, Ulf HANEBUTTE, Senad DURAKOVIC, Hamid Reza GHASEMI, Chia-Hsin CHEN
STREAMING ENGINE FOR MACHINE LEARNING ARCHITECTURE

Publication number: 20190244117

Abstract: A programmable hardware system for machine learning (ML) includes a core and a streaming engine. The core receives a plurality of commands and a plurality of data from a host to be analyzed and inferred via machine learning. The core transmits a first subset of commands of the plurality of commands that is performance-critical operations and associated data thereof of the plurality of data for efficient processing thereof. The first subset of commands and the associated data are passed through via a function call. The streaming engine is coupled to the core and receives the first subset of commands and the associated data from the core. The streaming engine streams a second subset of commands of the first subset of commands and its associated data to an inference engine by executing a single instruction.

Type: Application

Filed: December 19, 2018

Publication date: August 8, 2019

Inventors: Avinash SODANI, Ulf HANEBUTTE, Senad DURAKOVIC, Hamid Reza GHASEMI, Chia-Hsin CHEN
SYSTEMS AND METHODS FOR PROGRAMMABLE HARDWARE ARCHITECTURE FOR MACHINE LEARNING

Publication number: 20190244141

Abstract: A programmable hardware architecture for machine learning (ML) is proposed, which includes at least a host, a memory, a core, a data streaming engine, a instruction-streaming engine, and an interference engine. The core interprets a plurality of ML commands for a ML operation and/or data received from the host and coordinate activities of the engines based on the data in the received ML commands. The instruction-streaming engine translates the ML commands received from the core and provide a set of programming instructions to the data streaming engine and the inference engines based on the translated parameters. The data steaming engine sends one or more data streams to the inference engine in response to the received programming instructions. The inference engine then processes the data streams received from the data stream engine according to the programming instructions received from the instruction-streaming engine.

Type: Application

Filed: November 9, 2018

Publication date: August 8, 2019

Inventors: Avinash SODANI, Chia-Hsin CHEN, Ulf R. HANEBUTTE, Hamid Reza GHASEMI, Senad DURAKOVIC
Providing multiple memory modes for a processor including internal memory

Patent number: 10346300

Abstract: In one embodiment, a processor comprises: at least one core formed on a die to execute instructions; a first memory controller to interface with an in-package memory; a second memory controller to interface with a platform memory to couple to the processor; and the in-package memory located within a package of the processor, where the in-package memory is to be identified as a more distant memory with respect to the at least one core than the platform memory. Other embodiments are described and claimed.

Type: Grant

Filed: June 21, 2017

Date of Patent: July 9, 2019

Assignee: Intel Corporation

Inventors: Avinash Sodani, Robert J. Kyanko, Richard J. Greco, Andreas Kleen, Milind B. Girkar, Christopher M. Cantalupo
Thermal throttling of electronic devices

Patent number: 10275001

Abstract: Disclosed herein is a computing device configured to implement thermal throttling of a component of the computing device. The computing device includes an electronic component and a temperature sensor thermally coupled to the electronic component. The computing device also includes a thermal management controller to receive a temperature measurement from the temperature sensor and generate a throttling factor for the electronic component. If the temperature measurement is greater than a specified threshold, the throttling factor is to reduce performance of the electronic component to be at least the performance guarantee for the electronic component.

Type: Grant

Filed: June 26, 2015

Date of Patent: April 30, 2019

Assignee: Intel Corporation

Inventors: Timothy Y. Kam, Sandeep Ahuja, Rajat Agarwal, Avinash Sodani, Jinho Suh, Meenakshisundaram Chinthamani
Minimizing snoop traffic locally and across cores on a chip multi-core fabric

Patent number: 10102129

Abstract: A processor includes a processing core, a L1 cache comprising a first processing core and a first L1 cache comprising a first L1 cache data entry of a plurality of L1 cache data entries to store data. The processor also includes an L2 cache comprising a first L2 cache data entry of a plurality of L2 cache data entries. The first L2 cache data entry corresponds to the first L1 cache data entry and each of the plurality of L2 cache data entries are associated with a corresponding presence bit (pbit) of a plurality of pbits. Each of the plurality of pbits indicates a status of a corresponding one of the plurality of L2 cache data entries. The processor also includes a cache controller, which in response to a first request among a plurality of requests to access the data at the first L1 cache data entry, determines that a copy of the data is stored in the first L2 cache data entry; and retrieves the copy of the data from the L2 cache data entry in view of the status of the pbit.

Type: Grant

Filed: December 21, 2015

Date of Patent: October 16, 2018

Assignee: Intel Corporation

Inventors: Krishna N. Vinod, Avinash Sodani, Zainulabedin J. Aurangabadwala
Method and apparatus for user-level thread synchronization with a monitor and MWAIT architecture

Patent number: 9898351

Abstract: Instructions and logic provide user-level thread synchronization with MONITOR and MWAIT instructions. One or more model specific registers (MSRs) in a processor may be configured in a first execution state to specify support of a user-level thread synchronization architecture. Embodiments include multiple hardware threads or processing cores, corresponding monitored address state storage to store a last monitored address for each of a plurality of execution threads that issues a MONITOR request, cache memory to record MONITOR requests and associated states for addresses of memory storage locations, and responsive to receipt of an MWAIT request for the address, to record an associated wait-to-trigger state of monitored addresses for execution cores associated with an MWAIT request; wherein the execution core is to transition a requesting thread to an optimized sleep state responsive to the receipt of said MWAIT request when said one or more MSRs are configured in the first execution state.

Type: Grant

Filed: December 24, 2015

Date of Patent: February 20, 2018

Assignee: Intel Corporation

Inventors: Benjamin C. Chaffin, Robert J. Kyanko, Avinash Sodani
Scalable event handling in multi-threaded processor cores

Patent number: 9886396

Abstract: In one embodiment, a processor includes a frontend unit having an instruction decoder to receive and to decode instructions of a plurality of threads, an execution unit coupled to the instruction decoder to receive and execute the decoded instructions, and an instruction retirement unit having a retirement logic to receive the instructions from the execution unit and to retire the instructions associated with one or more of the threads that have an instruction or an event pending to be retired. The instruction retirement unit includes a thread arbitration logic to select one of the threads at a time and to dispatch the selected thread to the retirement logic for retirement processing.

Type: Grant

Filed: December 23, 2014

Date of Patent: February 6, 2018

Assignee: Intel Corporation

Inventors: Roger Gramunt, Rammohan Padmanabhan, Ramon Matas, Neal S. Moyer, Benjamin C. Chaffin, Avinash Sodani, Alexey P. Suprun, Vikram S. Sundaram, Chung-Lun Chan, Gerardo A. Fernandez, Julio Gago, Michael S. Yang, Aditya Kesiraju
Providing Multiple Memory Modes For A Processor Including Internal Memory

Publication number: 20170357580

Abstract: In one embodiment, a processor comprises: at least one core formed on a die to execute instructions; a first memory controller to interface with an in-package memory; a second memory controller to interface with a platform memory to couple to the processor; and the in-package memory located within a package of the processor, where the in-package memory is to be identified as a more distant memory with respect to the at least one core than the platform memory. Other embodiments are described and claimed.

Type: Application

Filed: June 21, 2017

Publication date: December 14, 2017

Inventors: Avinash Sodani, Robert J. Kyanko, Richard J. Greco, Andreas Kleen, Milind B. Girkar, Christopher M. Cantalupo
Mechanism to avoid hot-L1/cold-L2 events in an inclusive L2 cache using L1 presence bits for victim selection bias

Patent number: 9836399

Abstract: A processor includes a processing core, an L1 cache, operatively coupled to the processing core, the L1 cache comprising an L1 cache entry to store a data item, an L2 cache, inclusive with respect to the L1 cache, the L2 cache comprising an L2 cache entry corresponding to the L1 cache entry, an activity flag associated with the L2 cache entry, the activity flag indicating an activity status of the L1 cache entry, and a cache controller to, in response to detecting an access operation with respect to the L1 cache entry, set the flag to an active status.

Type: Grant

Filed: March 27, 2015

Date of Patent: December 5, 2017

Assignee: Intel Corporation

Inventors: Krishna N. Vinod, Avinash Sodani, Zainulabedin Aurangabadwala
Physical reference list for tracking physical register sharing

Patent number: 9733939

Abstract: A processor includes a processing unit including a storage module having stored thereon a physical reference list for storing identifications of physical registers that have been referenced by multiple logical registers, and a reclamation module for reclaiming physical registers to a free list based on a count of each of the physical registers on the physical reference list.

Type: Grant

Filed: September 28, 2012

Date of Patent: August 15, 2017

Assignee: Intel Corporation

Inventors: Vijaykumar Balaram Kadgi, James D. Hadley, Avinash Sodani, Matthew C. Merten, Morris Marden, Joseph A. McMahon, Grace C. Lee, Laura A. Knauth, Robert S. Chappell, Fariborz Tabesh
Providing multiple memory modes for a processor including internal memory

Patent number: 9720827

Abstract: In one embodiment, a processor comprises: at least one core formed on a die to execute instructions; a first memory controller to interface with an in-package memory; a second memory controller to interface with a platform memory to couple to the processor; and the in-package memory located within a package of the processor, where the in-package memory is to be identified as a more distant memory with respect to the at least one core than the platform memory. Other embodiments are described and claimed.

Type: Grant

Filed: November 14, 2014

Date of Patent: August 1, 2017

Assignee: Intel Corporation

Inventors: Avinash Sodani, Robert J. Kyanko, Richard J. Greco, Andreas Kleen, Milind B. Girkar, Christopher M. Cantalupo
Method and apparatus for user-level thread synchronization with a monitor and MWAIT architecture

Publication number: 20170185458

Abstract: Instructions and logic provide user-level thread synchronization with MONITOR and MWAIT instructions. One or more model specific registers (MSRs) in a processor may be configured in a first execution state to specify support of a user-level thread synchronization architecture. Embodiments include multiple hardware threads or processing cores, corresponding monitored address state storage to store a last monitored address for each of a plurality of execution threads that issues a MONITOR request, cache memory to record MONITOR requests and associated states for addresses of memory storage locations, and responsive to receipt of an MWAIT request for the address, to record an associated wait-to-trigger state of monitored addresses for execution cores associated with an MWAIT request; wherein the execution core is to transition a requesting thread to an optimized sleep state responsive to the receipt of said MWAIT request when said one or more MSRs are configured in the first execution state.

Type: Application

Filed: December 24, 2015

Publication date: June 29, 2017

Applicant: Intel Corporation

Inventors: Benjamin C. Chaffin, Robert J. Kyanko, Avinash Sodani

prev 1 2 3 4 5 6 7 8 next