Patents by Inventor Ankit More

Ankit More has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Compression circuits and methods using tree based encoding of bit masks

Patent number: 11942970

Abstract: Embodiments of the present disclosure include techniques for compressing data using a tree encoded bit mask that may result in higher compression ratios. In one embodiment, an input vector having a plurality of values is received by a first plurality of switch circuits. Selection of the input values is controlled by sets of bits from the bit mask. The sets of bits specify locations of portions of the input vector where particular value of interest reside. The switch circuits output multiple values of the input vector, which include the particular value of interest. A second stage of switch circuits is controlled by logic circuit that detects values on the outputs of the first stage of switch circuits and outputs the values of interest. In some embodiments, the values of interest may be non-zero values of a sparse input vector, and the switch circuits may be multiplexers.

Type: Grant

Filed: March 4, 2022

Date of Patent: March 26, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Nishit Shah, Ankit More, Mattheus C. Heddes
Method and apparatus for compression multiplexing for sparse computations

Patent number: 11848689

Abstract: Embodiments of the present disclosure include a digital circuit and method for compressing input digital values. A plurality of input digital values may include zero values and non-zero values. The input digital values are received on M inputs of a first switching stage. The first switching stage is arranged in groups that rearrange the non-zero values on first switching stage outputs according to a compression and shift. The compression and shift position the non-zero values on outputs coupled to inputs of a second switching stage. The second switching stage consecutively couples non-zero values to N outputs, where N is less than M.

Type: Grant

Filed: March 4, 2022

Date of Patent: December 19, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Ankit More, Mattheus C. Heddes, Nishit Shah
METHOD AND APPARATUS FOR COMPRESSING AND DECOMPRESSING SPARSE DATA SETS

Publication number: 20230333739

Abstract: Embodiments of the present disclosure include a digital circuit and method for multi-stage compression. Digital data values are compressed using a multi-stage compression algorithm and stored in a memory. A decompression circuit receives the values and performs a partial decompression. The partially compressed values are provided to a processor, which performs the final decompression. In one embodiment, a vector of N length compressed values are decompressed using a first bit mask into two N length sets having non-zero values. The two N length sets are further decompressed using two M length bit masks into M length sparse vectors, each having non-zero values.

Type: Application

Filed: June 23, 2023

Publication date: October 19, 2023

Inventors: Mattheus C. HEDDES, Ankit MORE, Nishit SHAH, Torsten HOEFLER
SPARSIFYING VECTORS FOR NEURAL NETWORK MODELS BASED ON OVERLAPPING WINDOWS

Publication number: 20230334284

Abstract: Embodiments of the present disclosure include systems and methods for sparsifying vectors for neural network models based on overlapping windows. A window is used to select a first set of elements in a vector of elements. A first element is selected from the first set of elements having the highest absolute value. The window is slid along the vector by a defined number of elements. The window is used to select a second set of elements in the vector, wherein the first set of elements and the second set of elements share at least one common element. A second element is selected from the second set of elements having the highest absolute value.

Type: Application

Filed: May 27, 2022

Publication date: October 19, 2023

Inventors: Girish Vishnu VARATKAR, Ankit MORE, Bita DARVISH ROUHANI, Mattheus C. HEDDES, Gaurav AGRAWAL
METHOD AND APPARATUS FOR COMPRESSION MULTIPLEXING FOR SPARSE COMPUTATIONS

Publication number: 20230318620

Abstract: Embodiments of the present disclosure include a digital circuit and method for compressing input digital values. A plurality of input digital values may include zero values and non-zero values. The input digital values are received on M inputs of a first switching stage. The first switching stage is arranged in groups that rearrange the non-zero values on first switching stage outputs according to a compression and shift. The compression and shift position the non-zero values on outputs coupled to inputs of a second switching stage. The second switching stage consecutively couples non-zero values to N outputs, where N is less than M.

Type: Application

Filed: March 4, 2022

Publication date: October 5, 2023

Inventors: Ankit MORE, Mattheus C. HEDDES, Nishit SHAH
COMPRESSION CIRCUITS AND METHODS USING TREE BASED ENCODING OF BIT MASKS

Publication number: 20230283296

Abstract: Embodiments of the present disclosure include techniques for compressing data using a tree encoded bit mask that may result in higher compression ratios. In one embodiment, an input vector having a plurality of values is received by a first plurality of switch circuits. Selection of the input values is controlled by sets of bits from the bit mask. The sets of bits specify locations of portions of the input vector where particular value of interest reside. The switch circuits output multiple values of the input vector, which include the particular value of interest. A second stage of switch circuits is controlled by logic circuit that detects values on the outputs of the first stage of switch circuits and outputs the values of interest. In some embodiments, the values of interest may be non-zero values of a sparse input vector, and the switch circuits may be multiplexers.

Type: Application

Filed: March 4, 2022

Publication date: September 7, 2023

Inventors: Nishit SHAH, Ankit MORE, Mattheus C. HEDDES
Method and apparatus for compressing and decompressing sparse data sets

Patent number: 11720252

Abstract: Embodiments of the present disclosure include a digital circuit and method for multi-stage compression. Digital data values are compressed using a multi-stage compression algorithm and stored in a memory. A decompression circuit receives the values and performs a partial decompression. The partially compressed values are provided to a processor, which performs the final decompression. In one embodiment, a vector of N length compressed values are decompressed using a first bit mask into two N length sets having non-zero values. The two N length sets are further decompressed using two M length bit masks into M length sparse vectors, each having non-zero values.

Type: Grant

Filed: March 4, 2022

Date of Patent: August 8, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Mattheus C. Heddes, Ankit More, Nishit Shah, Torsten Hoefler
Memory system architecture for multi-threaded processors

Patent number: 11630691

Abstract: Disclosed embodiments relate to an improved memory system architecture for multi-threaded processors. In one example, a system includes a system comprising a multi-threaded processor core (MTPC), the MTPC comprising: P pipelines, each to concurrently process T threads; a crossbar to communicatively couple the P pipelines; a memory for use by the P pipelines, a scheduler to optimize reduction operations by assigning multiple threads to generate results of commutative arithmetic operations, and then accumulate the generated results, and a memory controller (MC) to connect with external storage and other MTPCs, the MC further comprising at least one optimization selected from: an instruction set architecture including a dual-memory operation; a direct memory access (DMA) engine; a buffer to store multiple pending instruction cache requests; multiple channels across which to stripe memory requests; and a shadow-tag coherency management unit.

Type: Grant

Filed: August 24, 2021

Date of Patent: April 18, 2023

Assignee: Intel Corporation

Inventors: Robert Pawlowski, Ankit More, Jason M. Howard, Joshua B. Fryman, Tina C. Zhong, Shaden Smith, Sowmya Pitchaimoorthy, Samkit Jain, Vincent Cave, Sriram Aananthakrishnan, Bharadwaj Krishnamurthy
SPARSIFYING NARROW DATA FORMATS FOR NEURAL NETWORKS

Publication number: 20220405571

Abstract: Embodiments of the present disclosure include systems and methods for sparsifying narrow data formats for neural networks. A plurality of activation values in a neural network are provided to a muxing unit. A set of sparsification operations are performed on a plurality of weight values to generate a subset of the plurality of weight values and mask values associated with the plurality of weight values. The subset of the plurality of weight values are provided to a matrix multiplication unit. The muxing unit generates a subset of the plurality of activation values based on the mask values and provides the subset of the plurality of activation values to the matrix multiplication unit. The matrix multiplication unit performs a set of matrix multiplication operations on the subset of the plurality of weight values and the subset of the plurality of activation values to generate a set of outputs.

Type: Application

Filed: June 16, 2021

Publication date: December 22, 2022

Inventors: Bita DARVISH ROUHANI, Venmugil Elango, Eric S. Chung, Douglas C Burger, Mattheus C. Heddes, Nishit Shah, Rasoul Shafipour, Ankit More
Multithreaded processor core with hardware-assisted task scheduling

Patent number: 11360809

Abstract: Embodiments of apparatuses, methods, and systems for scheduling tasks to hardware threads are described. In an embodiment, a processor includes a multiple hardware threads and a task manager. The task manager is to issue a task to a hardware thread. The task manager includes a hardware task queue to store a descriptor for the task. The descriptor is to include a field to store a value to indicate whether the task is a single task, a collection of iterative tasks, and a linked list of tasks.

Type: Grant

Filed: June 29, 2018

Date of Patent: June 14, 2022

Assignee: Intel Corporation

Inventors: William Paul Griffin, Joshua Fryman, Jason Howard, Sang Phill Park, Robert Pawlowski, Michael Abbott, Scott Cline, Samkit Jain, Ankit More, Vincent Cave, Fabrizio Petrini, Ivan Ganev
LARGE-SCALE MATRIX RESTRUCTURING AND MATRIX-SCALAR OPERATIONS

Publication number: 20220100508

Abstract: Embodiments of apparatuses and methods for copying and operating on matrix elements are described. In embodiments, an apparatus includes a hardware instruction decoder to decode a single instruction and execution circuitry, coupled to hardware instruction decoder, to perform one or more operations corresponding to the single instruction. The single instruction has a first operand to reference a base address of a first representation of a source matrix and a second operand to reference a base address of second representation of a destination matrix. The one or more operations include copying elements of the source matrix to corresponding locations in the destination matrix and filling empty elements of the destination matrix with a single value.

Type: Application

Filed: December 25, 2020

Publication date: March 31, 2022

Applicant: Intel Corporation

Inventors: Robert Pawlowski, Ankit More, Vincent Cave, Sriram Aananthakrishnan, Jason M. Howard, Joshua B. Fryman
IN-NETWORK MULTICAST OPERATIONS

Publication number: 20210409265

Abstract: Examples described herein relate to a first group of core nodes to couple with a group of switch nodes and a second group of core nodes to couple with the group of switch nodes, wherein: a core node of the first or second group of core nodes includes circuitry to execute one or more message passing instructions that indicate a configuration of a network to transmit data toward two or more endpoint core nodes and a switch node of the group of switch nodes includes circuitry to execute one or more message passing instructions that indicate the configuration to transmit data toward the two or more endpoint core nodes.

Type: Application

Filed: September 13, 2021

Publication date: December 30, 2021

Inventors: Robert PAWLOWSKI, Vincent CAVE, Shruti SHARMA, Fabrizio PETRINI, Joshua B. FRYMAN, Ankit MORE
MEMORY SYSTEM ARCHITECTURE FOR MULTI-THREADED PROCESSORS

Publication number: 20210389984

Abstract: Disclosed embodiments relate to an improved memory system architecture for multi-threaded processors. In one example, a system includes a system comprising a multi-threaded processor core (MTPC), the MTPC comprising: P pipelines, each to concurrently process T threads; a crossbar to communicatively couple the P pipelines; a memory for use by the P pipelines, a scheduler to optimize reduction operations by assigning multiple threads to generate results of commutative arithmetic operations, and then accumulate the generated results, and a memory controller (MC) to connect with external storage and other MTPCs, the MC further comprising at least one optimization selected from: an instruction set architecture including a dual-memory operation; a direct memory access (DMA) engine; a buffer to store multiple pending instruction cache requests; multiple channels across which to stripe memory requests; and a shadow-tag coherency management unit.

Type: Application

Filed: August 24, 2021

Publication date: December 16, 2021

Inventors: Robert PAWLOWSKI, Ankit MORE, Jason M. HOWARD, Joshua B. FRYMAN, Tina C. ZHONG, Shaden SMITH, Sowmya PITCHAIMOORTHY, Samkit JAIN, Vincent CAVE, Sriram AANANTHAKRISHNAN, Bharadwaj KRISHNAMURTHY
Memory system architecture for multi-threaded processors

Patent number: 11106494

Abstract: Disclosed embodiments relate to an improved memory system architecture for multi-threaded processors. In one example, a system includes a system comprising a multi-threaded processor core (MTPC), the MTPC comprising: P pipelines, each to concurrently process T threads; a crossbar to communicatively couple the P pipelines; a memory for use by the P pipelines, a scheduler to optimize reduction operations by assigning multiple threads to generate results of commutative arithmetic operations, and then accumulate the generated results, and a memory controller (MC) to connect with external storage and other MTPCs, the MC further comprising at least one optimization selected from: an instruction set architecture including a dual-memory operation; a direct memory access (DMA) engine; a buffer to store multiple pending instruction cache requests; multiple channels across which to stripe memory requests; and a shadow-tag coherency management unit.

Type: Grant

Filed: September 28, 2018

Date of Patent: August 31, 2021

Assignee: Intel Corporation

Inventors: Robert Pawlowski, Ankit More, Jason M. Howard, Joshua B. Fryman, Tina C. Zhong, Shaden Smith, Sowmya Pitchaimoorthy, Samkit Jain, Vincent Cave, Sriram Aananthakrishnan, Bharadwaj Krishnamurthy
System, apparatus and method for barrier synchronization in a multi-threaded processor

Patent number: 11061742

Abstract: In one embodiment, a first processor core includes: a plurality of execution pipelines each to execute instructions of one or more threads; a plurality of pipeline barrier circuits coupled to the plurality of execution pipelines, each of the plurality of pipeline barrier circuits associated with one of the plurality of execution pipelines to maintain status information for a plurality of barrier groups, each of the plurality of barrier groups formed of at least two threads; and a core barrier circuit to control operation of the plurality of pipeline barrier circuits and to inform the plurality of pipeline barrier circuits when a first barrier has been reached by a first barrier group of the plurality of barrier groups. Other embodiments are described and claimed.

Type: Grant

Filed: June 27, 2018

Date of Patent: July 13, 2021

Assignee: INTEL CORPORATION

Inventors: Robert Pawlowski, Ankit More, Shaden Smith, Sowmya Pitchaimoorthy, Samkit Jain, Vincent Cavé, Sriram Aananthakrishnan, Jason M. Howard, Joshua B. Fryman
TECHNIQUES FOR ACCELERATION OF A PREFIX-SCAN OPERATION

Publication number: 20210149683

Abstract: Examples include techniques for an in-network acceleration of a parallel prefix-scan operation. Examples include configuring registers of a node included in a plurality of nodes on a same semiconductor package. The registers to be configured responsive to receiving an instruction that indicates a logical tree to map to a network topology that includes the node. The instruction associated with a prefix-scan operation to be executed by at least a portion of the plurality of nodes.

Type: Application

Filed: December 21, 2020

Publication date: May 20, 2021

Inventors: Ankit MORE, Fabrizio PETRINI, Robert PAWLOWSKI, Shruti SHARMA, Sowmya PITCHAIMOORTHY
Systolic array accelerator systems and methods

Patent number: 11003619

Abstract: The present disclosure is directed to systems and methods for decomposing systolic array circuitry to provide a plurality of N×N systolic sub-array circuits, apportioning a first tensor or array into a plurality of N×M first input arrays, and apportioning a second tensor or array into a plurality of M×N second input arrays. Systolic array control circuitry transfers corresponding ones of the first input arrays and second input arrays to a respective one of the plurality of N×N systolic sub-array circuits. As the elements included in the first input array and the elements included in the second input array are transferred to the systolic sub-array, the systolic sub-array performs one or more mathematical operations using the first and the second input arrays. The systems and methods beneficially improve the usage of the systolic array circuitry thereby advantageously reducing the number of clock cycles needed to perform a given number of calculations.

Type: Grant

Filed: February 24, 2019

Date of Patent: May 11, 2021

Assignee: Intel Corporation

Inventors: Srinivasan Narayanamoorthy, Jayaram Bobba, Ankit More
Array broadcast and reduction systems and methods

Patent number: 10983793

Abstract: The present disclosure is directed to systems and methods of performing one or more broadcast or reduction operations using direct memory access (DMA) control circuitry. The DMA control circuitry executes a modified instruction set architecture (ISA) that facilitates the broadcast distribution of data to a plurality of destination addresses in system memory circuitry. The broadcast instruction may include broadcast of a single data value to each destination address. The broadcast instruction may include broadcast of a data array to each destination address. The DMA control circuitry may also execute a reduction instruction that facilitates the retrieval of data from a plurality of source addresses in system memory and performing one or more operations using the retrieved data. Since the DMA control circuitry, rather than the processor circuitry performs the broadcast and reduction operations, system speed and efficiency is beneficially enhanced.

Type: Grant

Filed: March 29, 2019

Date of Patent: April 20, 2021

Assignee: Intel Corporation

Inventors: Joshua Fryman, Ankit More, Jason Howard, Robert Pawlowski, Yigit Demir, Nick Pepperling, Fabrizio Petrini, Sriram Aananthakrishnan, Shaden Smith
Systems and methods for ISA support for indirect loads and stores for efficiently accessing compressed lists in graph applications

Patent number: 10929132

Abstract: Disclosed embodiments relate to systems and methods for performing instructions to access a compressed graphic list. In one example, a processor includes fetch and decode circuitry to fetch and decode the single instruction to access the compressed graphic list, and execution circuitry to execute the decoded single instruction to cause access to the compressed graphic list by: receiving, from a load store queue, at a first op-engine associated with a first data location, an indirection request, computing, via the first op-engine, a second data location associated with a second op-engine, computing, via the second op-engine, a third data location associated with a third op-engine responsive to the indirection request, and providing, via the third op-engine, a data response to the load store queue responsive to receiving data from the third data location.

Type: Grant

Filed: September 23, 2019

Date of Patent: February 23, 2021

Assignee: Intel Corporation

Inventors: Robert Pawlowski, Scott Hagan Schmittel, Joshua Fryman, Wim Heirman, Jason Howard, Ankit More, Shaden Smith, Scott Cline
HARDWARE SUPPORT FOR DUAL-MEMORY ATOMIC OPERATIONS

Publication number: 20200401412

Abstract: Disclosed embodiments relate to hardware support for dual-memory atomic operations. In one example, a processor includes multiple cores, each including multiple multi-threaded pipelines (MTPs), each associated with a memory, an atomic unit (ATMU) to perform atomic operations and a write-combine buffer (WCB) to manage access to and locks of cache lines in the associated memory, each MTP including fetch and decode stages to fetch and decode an instruction having fields to specify first and second memory locations and an opcode calling for a first MTP to send a request to a second MTP of the multiple MTPs, the second MTP being associated with a memory to which the first memory location is mapped, and to perform an atomic dual-memory operation on the first and second memory locations using its associated ATMU and WCB to perform the request.

Type: Application

Filed: June 24, 2019

Publication date: December 24, 2020

Applicant: Intel Corporation

Inventors: Robert PAWLOWSKI, Joshua B. FRYMAN, Vincent CAVE, Eric M. SCHWARTZ, Ivan B. GANEV, Jason M. HOWARD, Ankit MORE, Shaden SMITH

1 2 next