Patents by Inventor Mridul Agarwal
Mridul Agarwal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250258670Abstract: A reservation station that includes a storage circuit with multiple full and partial entries is disclosed. A given store-data entry of the multiple partial entries may store less data than a given full entry of the multiple full entries. A control circuit may receive a load/store operation and, in response to a determination that the load/store operation includes only a single source, store the load/store operation in a particular partial entry of the multiple partial entries.Type: ApplicationFiled: June 13, 2024Publication date: August 14, 2025Inventors: Amit KUMAR, Deepankar DUGGAL, Haoyan JIA, Manjunath SHEVGOOR, Mridul AGARWAL, Nikhil GUPTA
-
Publication number: 20250258622Abstract: A reservation station that includes primary and secondary storage circuits is disclosed. The primary storage circuit may include multiple full entries, while the secondary storage circuit includes multiple store-data entries. A given store-data entry of the multiple store-data entries stores a subset of the information stored in a given full entry of the multiple full entries. In response to a determination that a store address associated with a particular store operation, stored in particular full entry of the multiple full entries, has been available for use for a threshold number of cycles without store data associated with the particular store operation being available for use, a control circuit may transfer the particular store operation to a particular store-data entry of the multiple store-data entries.Type: ApplicationFiled: June 13, 2024Publication date: August 14, 2025Inventors: Amit KUMAR, Brian R. MESTAN, Mridul AGARWAL, Nikhil GUPTA
-
Patent number: 12321746Abstract: Techniques are disclosed relating to data synchronization barrier operations. A system includes a first processor that may receive a data barrier operation request from a second processor include in the system. Based on receiving that data barrier operation request from the second processor, the first processor may ensure that outstanding load/store operations executed by the first processor that are directed to addresses outside of an exclusion region have been completed. The first processor may respond to the second processor that the data barrier operation request is complete at the first processor, even in the case that one or more load/store operations that are directed to addresses within the exclusion region are outstanding and not complete when the first processor responds that the data barrier operation request is complete.Type: GrantFiled: June 16, 2023Date of Patent: June 3, 2025Assignee: Apple Inc.Inventors: Jeff Gonion, John H. Kelm, James Vash, Pradeep Kanapathipillai, Mridul Agarwal, Gideon N. Levinsky, Richard F. Russo, Christopher M. Tsay
-
Patent number: 12314200Abstract: An interrupt delivery mechanism for a system includes and interrupt controller and a plurality of cluster interrupt controllers coupled to respective pluralities of processors in an embodiment. The interrupt controller may serially transmit an interrupt request to respective cluster interrupt controllers, which may acknowledge (Ack) or non-acknowledge (Nack) the interrupt based on attempting to deliver the interrupt to processors to which the cluster interrupt controller is coupled. In a soft iteration, the cluster interrupt controller may attempt to deliver the interrupt to processors that are powered on, without attempting to power on processors that are powered off. If the soft iteration does not result in an Ack response from one of the plurality of cluster interrupt controllers, a hard iteration may be performed in which the powered-off processors may be powered on.Type: GrantFiled: May 24, 2024Date of Patent: May 27, 2025Assignee: Apple Inc.Inventors: Jeffrey E. Gonion, Charles E. Tucker, Tal Kuzi, Richard F. Russo, Mridul Agarwal, Christopher M. Tsay, Gideon N. Levinsky, Shih-Chieh Wen, Lior Zimet
-
Publication number: 20250147767Abstract: A system may include multiple processors. One of the processors may receive an indication of a data synchronization barrier (DSB) instruction in another processor that follows a translation look-ahead buffer invalidate (TLBI) instruction to invalidate an entry of a translation look-ahead buffer. The processor may determine whether instructions are pending in the processor for which the virtual addresses used for memory accesses have been translated to physical addresses before receiving the DSB indication. If there are such pending instructions, the processor may provide, after these instructions retire, an indication to the other processor as a response to the DSB indication.Type: ApplicationFiled: January 3, 2025Publication date: May 8, 2025Applicant: Apple Inc.Inventors: Madhu Sudan Hari, Mridul Agarwal, Kulin N. Kothari, John D. Pape, Niket K. Choudhary
-
Patent number: 12229561Abstract: A system may include multiple processors. One of the processors may receive an indication of a data synchronization barrier (DSB) instruction in another processor that follows a translation look-ahead buffer invalidate (TLBI) instruction to invalidate an entry of a translation look-ahead buffer. The processor may determine whether instructions are pending in the processor for which the virtual addresses used for memory accesses have been translated to physical addresses before receiving the DSB indication. If there are such pending instructions, the processor may provide, after these instructions retire, an indication to the other processor as a response to the DSB indication.Type: GrantFiled: September 16, 2022Date of Patent: February 18, 2025Assignee: Apple Inc.Inventors: Madhu Sudan Hari, Mridul Agarwal, Kulin N Kothari, John D Pape, Niket K Choudhary
-
Publication number: 20240362027Abstract: Techniques are disclosed relating to load value prediction. In some embodiments, a processor includes load address prediction circuitry and load value prediction circuitry. Training circuitry may train loads in a given entry, and may include a first entry configured to store first predicted load address information and a confidence indication of confidence that the first predicted load address information is correct and a second entry configured to store first predicted load value information and a confidence indication of confidence that the first predicted load value information is correct (note a given entry may be configured to load or value prediction at different times). Control circuitry may, in response to an entry in the training circuitry reaching a threshold level of confidence, allocate a corresponding entry in either the load value prediction circuitry or the load address prediction circuitry.Type: ApplicationFiled: July 5, 2024Publication date: October 31, 2024Inventors: Yuan C. Chou, Debasish Chandra, Mridul Agarwal, Haoyan Jia
-
Publication number: 20240329990Abstract: A system, e.g., a system on a chip (SOC), may include one or more processors. A processor may execute an instruction synchronization barrier (ISB) instruction to enforce an ordering constraint on instructions. To execute the ISB instruction, the processor may determine whether contexts of the processor required for execution of instructions older than the ISB instruction are consumed for the older instructions. Responsive to determining that the contexts are consumed for the older instructions, the processor may initiate fetching of an instruction younger than the ISB instruction, without waiting for the older instructions to retire.Type: ApplicationFiled: June 11, 2024Publication date: October 3, 2024Applicant: Apple Inc.Inventors: Deepankar Duggal, Kulin N Kothari, Mridul Agarwal, Chang Xu, Yanran Yang, Richard F Russo, Yuan C Chou, Douglas C Holman
-
Publication number: 20240311319Abstract: An interrupt delivery mechanism for a system includes and interrupt controller and a plurality of cluster interrupt controllers coupled to respective pluralities of processors in an embodiment. The interrupt controller may serially transmit an interrupt request to respective cluster interrupt controllers, which may acknowledge (Ack) or non-acknowledge (Nack) the interrupt based on attempting to deliver the interrupt to processors to which the cluster interrupt controller is coupled. In a soft iteration, the cluster interrupt controller may attempt to deliver the interrupt to processors that are powered on, without attempting to power on processors that are powered off. If the soft iteration does not result in an Ack response from one of the plurality of cluster interrupt controllers, a hard iteration may be performed in which the powered-off processors may be powered on.Type: ApplicationFiled: May 24, 2024Publication date: September 19, 2024Inventors: Jeffrey E. Gonion, Charles E. Tucker, Tal Kuzi, Richard F. Russo, Mridul Agarwal, Christopher M. Tsay, Gideon N. Levinsky, Shih-Chieh Wen, Lior Zimet
-
Patent number: 12067398Abstract: Techniques are disclosed relating to load value prediction. In some embodiments, a processor includes learning table circuitry that is shared for both address and value prediction. Loads may be trained for value prediction when they are eligible for both value and address prediction. Entries in the learning table may be promoted to an address prediction table or a load value prediction table for prediction, e.g., when they reach a threshold confidence level in the training table. In some embodiments, the learning table stores a hash of a predicted load value and control circuitry uses a probing load to retrieve the actual predicted load value for the value prediction table.Type: GrantFiled: April 29, 2022Date of Patent: August 20, 2024Assignee: Apple Inc.Inventors: Yuan C. Chou, Debasish Chandra, Mridul Agarwal, Haoyan Jia
-
Publication number: 20240248844Abstract: In an embodiment, a processor implements a different atomicity size (for memory consistency order) than the operation size. More particularly, the processor may implement a smaller atomicity size than the operation size. For example, for multiple register loads, the atomicity size may be the register size. In another example, the vector element size may be the atomicity size for vector load instructions. In yet another example, multiple contiguous vector elements, but fewer than all the vector elements in a vector register, may be the atomicity size for vector load instructions.Type: ApplicationFiled: February 26, 2024Publication date: July 25, 2024Inventors: Francesco Spadini, Gideon Levinsky, Mridul Agarwal
-
Patent number: 12045615Abstract: A system, e.g., a system on a chip (SOC), may include one or more processors. A processor may execute an instruction synchronization barrier (ISB) instruction to enforce an ordering constraint on instructions. To execute the ISB instruction, the processor may determine whether contexts of the processor required for execution of instructions older than the ISB instruction are consumed for the older instructions. Responsive to determining that the contexts are consumed for the older instructions, the processor may initiate fetching of an instruction younger than the ISB instruction, without waiting for the older instructions to retire.Type: GrantFiled: September 16, 2022Date of Patent: July 23, 2024Assignee: Apple Inc.Inventors: Deepankar Duggal, Kulin N Kothari, Mridul Agarwal, Chang Xu, Yanran Yang, Richard F Russo, Yuan C Chou, Douglas C Holman
-
Patent number: 12007920Abstract: An interrupt delivery mechanism for a system includes and interrupt controller and a plurality of cluster interrupt controllers coupled to respective pluralities of processors in an embodiment. The interrupt controller may serially transmit an interrupt request to respective cluster interrupt controllers, which may acknowledge (Ack) or non-acknowledge (Nack) the interrupt based on attempting to deliver the interrupt to processors to which the cluster interrupt controller is coupled. In a soft iteration, the cluster interrupt controller may attempt to deliver the interrupt to processors that are powered on, without attempting to power on processors that are powered off. If the soft iteration does not result in an Ack response from one of the plurality of cluster interrupt controllers, a hard iteration may be performed in which the powered-off processors may be powered on.Type: GrantFiled: April 17, 2023Date of Patent: June 11, 2024Assignee: Apple Inc.Inventors: Jeffrey E. Gonion, Charles E. Tucker, Tal Kuzi, Richard F. Russo, Mridul Agarwal, Christopher M. Tsay, Gideon N. Levinsky, Shih-Chieh Wen, Lior Zimet
-
Patent number: 11914511Abstract: In an embodiment, a processor implements a different atomicity size (for memory consistency order) than the operation size. More particularly, the processor may implement a smaller atomicity size than the operation size. For example, for multiple register loads, the atomicity size may be the register size. In another example, the vector element size may be the atomicity size for vector load instructions. In yet another example, multiple contiguous vector elements, but fewer than all the vector elements in a vector register, may be the atomicity size for vector load instructions.Type: GrantFiled: June 22, 2020Date of Patent: February 27, 2024Assignee: Apple Inc.Inventors: Francesco Spadini, Gideon Levinsky, Mridul Agarwal
-
Patent number: 11829763Abstract: A system and method for efficiently reducing the latency of load operations. In various embodiments, logic of a processor accesses a prediction table after fetching instructions. For a prediction table hit, the logic executes a load instruction with a retrieved predicted address from the prediction table. For a prediction table miss, when the logic determines the address of the load instruction and hits in a learning table, the logic updates a level of confidence indication to indicate a higher level of confidence when a stored address matches the determined address. When the logic determines the level of confidence indication stored in a given table entry of the learning table meets a threshold, the logic allocates, in the prediction table, information stored in the given entry. Therefore, the predicted address is available during the next lookup of the prediction table.Type: GrantFiled: August 13, 2019Date of Patent: November 28, 2023Assignee: Apple Inc.Inventors: Yuan C. Chou, Viney Gautam, Wei-Han Lien, Kulin N. Kothari, Mridul Agarwal
-
Publication number: 20230333851Abstract: Techniques are disclosed relating to data synchronization barrier operations. A system includes a first processor that may receive a data barrier operation request from a second processor include in the system. Based on receiving that data barrier operation request from the second processor, the first processor may ensure that outstanding load/store operations executed by the first processor that are directed to addresses outside of an exclusion region have been completed. The first processor may respond to the second processor that the data barrier operation request is complete at the first processor, even in the case that one or more load/store operations that are directed to addresses within the exclusion region are outstanding and not complete when the first processor responds that the data barrier operation request is complete.Type: ApplicationFiled: June 16, 2023Publication date: October 19, 2023Inventors: Jeff Gonion, John H. Kelm, James Vash, Pradeep Kanapathipillai, Mridul Agarwal, Gideon N. Levinsky, Richard F. Russo, Christopher M. Tsay
-
Publication number: 20230251985Abstract: An interrupt delivery mechanism for a system includes and interrupt controller and a plurality of cluster interrupt controllers coupled to respective pluralities of processors in an embodiment. The interrupt controller may serially transmit an interrupt request to respective cluster interrupt controllers, which may acknowledge (Ack) or non-acknowledge (Nack) the interrupt based on attempting to deliver the interrupt to processors to which the cluster interrupt controller is coupled. In a soft iteration, the cluster interrupt controller may attempt to deliver the interrupt to processors that are powered on, without attempting to power on processors that are powered off. If the soft iteration does not result in an Ack response from one of the plurality of cluster interrupt controllers, a hard iteration may be performed in which the powered-off processors may be powered on.Type: ApplicationFiled: April 17, 2023Publication date: August 10, 2023Inventors: Jeffrey E. Gonion, Charles E. Tucker, Tal Kuzi, Richard F. Russo, Mridul Agarwal, Christopher M. Tsay, Gideon N. Levinsky, Shih-Chieh Wen, Lior Zimet
-
Patent number: 11720360Abstract: Techniques are disclosed relating to data synchronization barrier operations. A system includes a first processor that may receive a data barrier operation request from a second processor include in the system. Based on receiving that data barrier operation request from the second processor, the first processor may ensure that outstanding load/store operations executed by the first processor that are directed to addresses outside of an exclusion region have been completed. The first processor may respond to the second processor that the data barrier operation request is complete at the first processor, even in the case that one or more load/store operations that are directed to addresses within the exclusion region are outstanding and not complete when the first processor responds that the data barrier operation request is complete.Type: GrantFiled: September 8, 2021Date of Patent: August 8, 2023Assignee: Apple Inc.Inventors: Jeff Gonion, John H. Kelm, James Vash, Pradeep Kanapathipillai, Mridul Agarwal, Gideon N. Levinsky, Richard F. Russo, Christopher M. Tsay
-
Patent number: 11630789Abstract: An interrupt delivery mechanism for a system includes and interrupt controller and a plurality of cluster interrupt controllers coupled to respective pluralities of processors in an embodiment. The interrupt controller may serially transmit an interrupt request to respective cluster interrupt controllers, which may acknowledge (Ack) or non-acknowledge (Nack) the interrupt based on attempting to deliver the interrupt to processors to which the cluster interrupt controller is coupled. In a soft iteration, the cluster interrupt controller may attempt to deliver the interrupt to processors that are powered on, without attempting to power on processors that are powered off. If the soft iteration does not result in an Ack response from one of the plurality of cluster interrupt controllers, a hard iteration may be performed in which the powered-off processors may be powered on.Type: GrantFiled: April 30, 2021Date of Patent: April 18, 2023Assignee: Apple Inc.Inventors: Jeffrey E. Gonion, Charles E. Tucker, Tal Kuzi, Richard F. Russo, Mridul Agarwal, Christopher M. Tsay, Gideon N. Levinsky, Shih-Chieh Wen, Lior Zimet
-
Patent number: 11500638Abstract: A method and system for compressing and decompressing data is disclosed. A compression command may initiate the prefetching of first data, which may be stored in a first buffer. Multiple words of the first data may be read from the first buffer and used to generate a plurality of compressed packets, each of which includes a command specifying a type of packet. The compressed packets may be combined into a group and multiple groups may be combined and stored in a second buffer. A decompression command may initiate the prefetching of second data, which is stored in the first buffer. A portion of the second data may be read from the first buffer and used to generate a group of compressed packets. Multiple output words may be generated dependent upon the group of compressed packets.Type: GrantFiled: January 10, 2020Date of Patent: November 15, 2022Assignee: Apple Inc.Inventors: Aditya Kesiraju, James Vash, Pradeep Kanapathipillai, Mridul Agarwal, Zhaoming Hu, Tyler Huberty, Charles Tucker