Patents by Inventor Mridul Agarwal

Mridul Agarwal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Decoupling atomicity from operation size

Patent number: 11914511

Abstract: In an embodiment, a processor implements a different atomicity size (for memory consistency order) than the operation size. More particularly, the processor may implement a smaller atomicity size than the operation size. For example, for multiple register loads, the atomicity size may be the register size. In another example, the vector element size may be the atomicity size for vector load instructions. In yet another example, multiple contiguous vector elements, but fewer than all the vector elements in a vector register, may be the atomicity size for vector load instructions.

Type: Grant

Filed: June 22, 2020

Date of Patent: February 27, 2024

Assignee: Apple Inc.

Inventors: Francesco Spadini, Gideon Levinsky, Mridul Agarwal
Early load execution via constant address and stride prediction

Patent number: 11829763

Abstract: A system and method for efficiently reducing the latency of load operations. In various embodiments, logic of a processor accesses a prediction table after fetching instructions. For a prediction table hit, the logic executes a load instruction with a retrieved predicted address from the prediction table. For a prediction table miss, when the logic determines the address of the load instruction and hits in a learning table, the logic updates a level of confidence indication to indicate a higher level of confidence when a stored address matches the determined address. When the logic determines the level of confidence indication stored in a given table entry of the learning table meets a threshold, the logic allocates, in the prediction table, information stored in the given entry. Therefore, the predicted address is available during the next lookup of the prediction table.

Type: Grant

Filed: August 13, 2019

Date of Patent: November 28, 2023

Assignee: Apple Inc.

Inventors: Yuan C. Chou, Viney Gautam, Wei-Han Lien, Kulin N. Kothari, Mridul Agarwal
DSB Operation with Excluded Region

Publication number: 20230333851

Abstract: Techniques are disclosed relating to data synchronization barrier operations. A system includes a first processor that may receive a data barrier operation request from a second processor include in the system. Based on receiving that data barrier operation request from the second processor, the first processor may ensure that outstanding load/store operations executed by the first processor that are directed to addresses outside of an exclusion region have been completed. The first processor may respond to the second processor that the data barrier operation request is complete at the first processor, even in the case that one or more load/store operations that are directed to addresses within the exclusion region are outstanding and not complete when the first processor responds that the data barrier operation request is complete.

Type: Application

Filed: June 16, 2023

Publication date: October 19, 2023

Inventors: Jeff Gonion, John H. Kelm, James Vash, Pradeep Kanapathipillai, Mridul Agarwal, Gideon N. Levinsky, Richard F. Russo, Christopher M. Tsay
Scalable Interrupts

Publication number: 20230251985

Abstract: An interrupt delivery mechanism for a system includes and interrupt controller and a plurality of cluster interrupt controllers coupled to respective pluralities of processors in an embodiment. The interrupt controller may serially transmit an interrupt request to respective cluster interrupt controllers, which may acknowledge (Ack) or non-acknowledge (Nack) the interrupt based on attempting to deliver the interrupt to processors to which the cluster interrupt controller is coupled. In a soft iteration, the cluster interrupt controller may attempt to deliver the interrupt to processors that are powered on, without attempting to power on processors that are powered off. If the soft iteration does not result in an Ack response from one of the plurality of cluster interrupt controllers, a hard iteration may be performed in which the powered-off processors may be powered on.

Type: Application

Filed: April 17, 2023

Publication date: August 10, 2023

Inventors: Jeffrey E. Gonion, Charles E. Tucker, Tal Kuzi, Richard F. Russo, Mridul Agarwal, Christopher M. Tsay, Gideon N. Levinsky, Shih-Chieh Wen, Lior Zimet
DSB operation with excluded region

Patent number: 11720360

Abstract: Techniques are disclosed relating to data synchronization barrier operations. A system includes a first processor that may receive a data barrier operation request from a second processor include in the system. Based on receiving that data barrier operation request from the second processor, the first processor may ensure that outstanding load/store operations executed by the first processor that are directed to addresses outside of an exclusion region have been completed. The first processor may respond to the second processor that the data barrier operation request is complete at the first processor, even in the case that one or more load/store operations that are directed to addresses within the exclusion region are outstanding and not complete when the first processor responds that the data barrier operation request is complete.

Type: Grant

Filed: September 8, 2021

Date of Patent: August 8, 2023

Assignee: Apple Inc.

Inventors: Jeff Gonion, John H. Kelm, James Vash, Pradeep Kanapathipillai, Mridul Agarwal, Gideon N. Levinsky, Richard F. Russo, Christopher M. Tsay
Scalable interrupts

Patent number: 11630789

Abstract: An interrupt delivery mechanism for a system includes and interrupt controller and a plurality of cluster interrupt controllers coupled to respective pluralities of processors in an embodiment. The interrupt controller may serially transmit an interrupt request to respective cluster interrupt controllers, which may acknowledge (Ack) or non-acknowledge (Nack) the interrupt based on attempting to deliver the interrupt to processors to which the cluster interrupt controller is coupled. In a soft iteration, the cluster interrupt controller may attempt to deliver the interrupt to processors that are powered on, without attempting to power on processors that are powered off. If the soft iteration does not result in an Ack response from one of the plurality of cluster interrupt controllers, a hard iteration may be performed in which the powered-off processors may be powered on.

Type: Grant

Filed: April 30, 2021

Date of Patent: April 18, 2023

Assignee: Apple Inc.

Inventors: Jeffrey E. Gonion, Charles E. Tucker, Tal Kuzi, Richard F. Russo, Mridul Agarwal, Christopher M. Tsay, Gideon N. Levinsky, Shih-Chieh Wen, Lior Zimet
Hardware compression and decompression engine

Patent number: 11500638

Abstract: A method and system for compressing and decompressing data is disclosed. A compression command may initiate the prefetching of first data, which may be stored in a first buffer. Multiple words of the first data may be read from the first buffer and used to generate a plurality of compressed packets, each of which includes a command specifying a type of packet. The compressed packets may be combined into a group and multiple groups may be combined and stored in a second buffer. A decompression command may initiate the prefetching of second data, which is stored in the first buffer. A portion of the second data may be read from the first buffer and used to generate a group of compressed packets. Multiple output words may be generated dependent upon the group of compressed packets.

Type: Grant

Filed: January 10, 2020

Date of Patent: November 15, 2022

Assignee: Apple Inc.

Inventors: Aditya Kesiraju, James Vash, Pradeep Kanapathipillai, Mridul Agarwal, Zhaoming Hu, Tyler Huberty, Charles Tucker
Scalable Interrupts

Publication number: 20220083484

Abstract: An interrupt delivery mechanism for a system includes and interrupt controller and a plurality of cluster interrupt controllers coupled to respective pluralities of processors in an embodiment. The interrupt controller may serially transmit an interrupt request to respective cluster interrupt controllers, which may acknowledge (Ack) or non-acknowledge (Nack) the interrupt based on attempting to deliver the interrupt to processors to which the cluster interrupt controller is coupled. In a soft iteration, the cluster interrupt controller may attempt to deliver the interrupt to processors that are powered on, without attempting to power on processors that are powered off. If the soft iteration does not result in an Ack response from one of the plurality of cluster interrupt controllers, a hard iteration may be performed in which the powered-off processors may be powered on.

Type: Application

Filed: April 30, 2021

Publication date: March 17, 2022

Inventors: Jeffrey E. Gonion, Charles E. Tucker, Tal Kuzi, Richard F. Russo, Mridul Agarwal, Christopher M. Tsay, Gideon N. Levinsky, Shih-Chieh Wen
DSB Operation with Excluded Region

Publication number: 20220083338

Abstract: Techniques are disclosed relating to data synchronization barrier operations. A system includes a first processor that may receive a data barrier operation request from a second processor include in the system. Based on receiving that data barrier operation request from the second processor, the first processor may ensure that outstanding load/store operations executed by the first processor that are directed to addresses outside of an exclusion region have been completed. The first processor may respond to the second processor that the data barrier operation request is complete at the first processor, even in the case that one or more load/store operations that are directed to addresses within the exclusion region are outstanding and not complete when the first processor responds that the data barrier operation request is complete.

Type: Application

Filed: September 8, 2021

Publication date: March 17, 2022

Inventors: Jeff Gonion, John H. Kelm, James Vash, Pradeep Kanapathipillai, Mridul Agarwal, Gideon N. Levinsky, Richard F. Russo, Christopher M. Tsay
Decoupling Atomicity from Operation Size

Publication number: 20210397555

Abstract: In an embodiment, a processor implements a different atomicity size (for memory consistency order) than the operation size. More particularly, the processor may implement a smaller atomicity size than the operation size. For example, for multiple register loads, the atomicity size may be the register size. In another example, the vector element size may be the atomicity size for vector load instructions. In yet another example, multiple contiguous vector elements, but fewer than all the vector elements in a vector register, may be the atomicity size for vector load instructions.

Type: Application

Filed: June 22, 2020

Publication date: December 23, 2021

Inventors: Francesco Spadini, Gideon Levinsky, Mridul Agarwal
Buffer for replayed loads in parallel with reservation station for rapid rescheduling

Patent number: 11175917

Abstract: In an embodiment, a processor comprises a reservation station that issues a first load operation for execution, a store queue, and a replayed load buffer coupled in parallel with the reservation station. During execution of the first load operation, the store queue detects that the first load operation hits on a first store operation in the store queue that lacks store data and causes a replay of the first load operation. The replayed load buffer captures an identifier of the first load operation and the first store operation based on the replay of the first load operation, wherein the replayed load buffer monitors the reservation station for issuance of a first store data operation corresponding to the first store operation and issues the first load operation for reexecution based on the issuance of the first store data operation.

Type: Grant

Filed: September 11, 2020

Date of Patent: November 16, 2021

Assignee: Apple Inc.

Inventors: Mridul Agarwal, Kulin N. Kothari, Nikhil Gupta
Elevation based mode switch for 5G based aerial UE

Patent number: 11166212

Abstract: Methods, apparatuses, and computer-readable mediums for wireless communication are disclosed by the present disclosure. In an example, a user equipment (UE) may be located on an unmanned aerial vehicle (UAV). The UE may monitor at least one of an elevation of the UE or a number of cells detected by the UE. The UE may determine that the elevation of the UE exceeds an elevation threshold or the number of cells detected by the UE exceeds a cell threshold. The UE may determine a current communication mode of the UE. The UE may switch to a directional transmit mode, in response to determining that the current communication mode is an omnidirectional transmit mode and at least one of the elevation of the UE exceeds the elevation threshold or the number of cells detected by the UE exceeds the cell threshold.

Type: Grant

Filed: July 23, 2019

Date of Patent: November 2, 2021

Assignee: QUALCOMM Incorporated

Inventors: Akash Kumar, Mridul Agarwal, Hargovind Prasad Bansal
Managing serial miss requests for load operations in a non-coherent memory system

Patent number: 11099990

Abstract: A system and method for efficiently forwarding cache misses to another level of the cache hierarchy. Logic in a cache controller receives a first non-cacheable load miss request and stores it in a miss queue. When the logic determines the target address of the first load miss request is within a target address range of an older pending second load miss request stored in the miss queue with an open merge window, the logic merges the two requests into a single merged miss request. Additional requests may be similarly merged. The logic issues the merged miss requests based on determining the merge window has closed. The logic further prevents any other load miss requests, which were not previously merged in the merged miss request before it was issued, from obtaining a copy of data from the returned fill data. Such prevention in a non-coherent memory computing system supports memory ordering.

Type: Grant

Filed: August 20, 2019

Date of Patent: August 24, 2021

Assignee: Apple Inc.

Inventors: Gideon N. Levinsky, Brian R. Mestan, Deepak Limaye, Mridul Agarwal
Load/store ordering violation management

Patent number: 10983801

Abstract: A processor includes a load/store unit that includes one or more load pipelines and one or more store pipelines. Load operations may be issued into the load pipelines out of order with respect to older store operations. If a load operation is executed out or order with an older store operation that writes one or more bytes read by the load operation, and if the store operation is issued shortly after the load operation, such that the load operation is still in the load pipeline when the store operation is issued, some cases of flushing may be converted to replays by detecting the ordering violation while the load operation is still in the load pipeline.

Type: Grant

Filed: September 6, 2019

Date of Patent: April 20, 2021

Assignee: Apple Inc.

Inventors: Kulin N. Kothari, Mridul Agarwal
Load/Store Ordering Violation Management

Publication number: 20210072997

Abstract: A processor includes a load/store unit that includes one or more load pipelines and one or more store pipelines. Load operations may be issued into the load pipelines out of order with respect to older store operations. If a load operation is executed out or order with an older store operation that writes one or more bytes read by the load operation, and if the store operation is issued shortly after the load operation, such that the load operation is still in the load pipeline when the store operation is issued, some cases of flushing may be converted to replays by detecting the ordering violation while the load operation is still in the load pipeline.

Type: Application

Filed: September 6, 2019

Publication date: March 11, 2021

Inventors: Kulin N. Kothari, Mridul Agarwal
MANAGING SERIAL MISS REQUESTS FOR LOAD OPERATIONS IN A NON-COHERENT MEMORY SYSTEM

Publication number: 20210056024

Abstract: A system and method for efficiently forwarding cache misses to another level of the cache hierarchy. Logic in a cache controller receives a first non-cacheable load miss request and stores it in a miss queue. When the logic determines the target address of the first load miss request is within a target address range of an older pending second load miss request stored in the miss queue with an open merge window, the logic merges the two requests into a single merged miss request. Additional requests may be similarly merged. The logic issues the merged miss requests based on determining the merge window has closed. The logic further prevents any other load miss requests, which were not previously merged in the merged miss request before it was issued, from obtaining a copy of data from the returned fill data. Such prevention in a non-coherent memory computing system supports memory ordering.

Type: Application

Filed: August 20, 2019

Publication date: February 25, 2021

Inventors: Gideon N. Levinsky, Brian R. Mestan, Deepak Limaye, Mridul Agarwal
EARLY LOAD EXECUTION VIA CONSTANT ADDRESS AND STRIDE PREDICTION

Publication number: 20210049015

Abstract: A system and method for efficiently reducing the latency of load operations. In various embodiments, logic of a processor accesses a prediction table after fetching instructions. For a prediction table hit, the logic executes a load instruction with a retrieved predicted address from the prediction table. For a prediction table miss, when the logic determines the address of the load instruction and hits in a learning table, the logic updates a level of confidence indication to indicate a higher level of confidence when a stored address matches the determined address. When the logic determines the level of confidence indication stored in a given table entry of the learning table meets a threshold, the logic allocates, in the prediction table, information stored in the given entry. Therefore, the predicted address is available during the next lookup of the prediction table.

Type: Application

Filed: August 13, 2019

Publication date: February 18, 2021

Inventors: Yuan C. Chou, Viney Gautam, Wei-Han Lien, Kulin N. Kothari, Mridul Agarwal
Branch resolve pointer optimization

Patent number: 10628164

Abstract: A system and method for efficiently handling speculative execution. A load store unit (LSU) of a processor stores a commit candidate pointer, which points to a given store instruction buffered in the store queue. The given store instruction is an oldest store instruction not currently permitted to commit to the data cache. The LSU receives a first pointer from the mapping unit, which points to an oldest instruction of non-dispatched branches and unresolved system instructions. The LSU receives a second pointer from the execution unit, which points to an oldest unresolved, issued branch instruction. When the LSU determines the commit candidate pointer is older than each of the first pointer and the second pointer, the commit candidate pointer is updated to point to an oldest store instruction younger than the given store instruction stored in the store queue. The given store instruction is permitted to commit to the data cache.

Type: Grant

Filed: July 30, 2018

Date of Patent: April 21, 2020

Assignee: Apple Inc.

Inventors: Kulin N. Kothari, Mridul Agarwal, Aditya Kesiraju, Deepankar Duggal, Sean M. Reynolds
ELEVATION BASED MODE SWITCH FOR 5G BASED AERIAL UE

Publication number: 20200037219

Abstract: Methods, apparatuses, and computer-readable mediums for wireless communication are disclosed by the present disclosure. In an example, a user equipment (UE) may be located on an unmanned aerial vehicle (UAV). The UE may monitor at least one of an elevation of the UE or a number of cells detected by the UE. The UE may determine that the elevation of the UE exceeds an elevation threshold or the number of cells detected by the UE exceeds a cell threshold. The UE may determine a current communication mode of the UE. The UE may switch to a directional transmit mode, in response to determining that the current communication mode is an omnidirectional transmit mode and at least one of the elevation of the UE exceeds the elevation threshold or the number of cells detected by the UE exceeds the cell threshold.

Type: Application

Filed: July 23, 2019

Publication date: January 30, 2020

Inventors: Akash KUMAR, Mridul AGARWAL, Hargovind Prasad BANSAL
Load/store dependency predictor optimization for replayed loads

Patent number: 10437595

Abstract: Systems, apparatuses, and methods for optimizing a load-store dependency predictor (LSDP). When a younger load instruction is issued before an older store instruction and the younger load is dependent on the older store, the LSDP is trained on this ordering violation. A replay/flush indicator is stored in a corresponding entry in the LSDP to indicate whether the ordering violation resulted in a flush or replay. On subsequent executions, a dependency may be enforced for the load-store pair if a confidence counter is above a threshold, with the threshold varying based on the status of the replay/flush indicator. If a given load matches on multiple entries in the LSDP, and if at least one of the entries has a flush indicator, then the given load may be marked as a multimatch case and forced to wait to issue until all older stores have issued.

Type: Grant

Filed: March 15, 2016

Date of Patent: October 8, 2019

Assignee: Apple Inc.

Inventors: Pradeep Kanapathipillai, Stephan G. Meier, Gerard R. Williams, III, Mridul Agarwal, Kulin N. Kothari

1 2 next