Patents by Inventor Pradeep Kanapathipillai

Pradeep Kanapathipillai has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Latency management in synchronization events

Patent number: 11914524

Abstract: An electronic device includes one or more processors for executing one or more virtual machines. In response to a request for initiating a synchronization event, a processor identifies a subset of speculative memory access requests in one or more memory access request queues. Automatically and in accordance with the identifying, the processor purges translations associated with the subset of speculative memory access requests. Subsequent to the purging, the processor initiates the synchronization event. In some implementations, memory access completion is forced in response to a context synchronization event that corresponds to a termination of a first application, a termination of a first virtual machine, or a system call for updating a system register. Alternatively, in some implementations, memory access completion is forced in an operating system level or an application level in response to a data synchronization event that is initiated on a hypervisor layer or a firmware layer.

Type: Grant

Filed: March 1, 2022

Date of Patent: February 27, 2024

Assignee: QUALCOMM Incorporated

Inventors: Adrian Montero, Huzefa Sanjeliwala, Paul Kitchin, Prarthna Santhanakrishnan, Conrado Blasco, Pradeep Kanapathipillai
Dynamic voltage and frequency scaling (DVFS) within processor clusters

Patent number: 11797045

Abstract: An electronic system has a plurality of processing clusters including a first processing cluster. The first processing cluster further includes a plurality of processors and a power management processor. The power management processor obtains performance information about the plurality of processors, executes power instructions to transition a first processor of the plurality of processors from a first performance state to a second performance state different from the first performance state, and executes one or more debug instructions to perform debugging of a respective processor of the plurality of processors. The power instructions are executed in accordance with the obtained performance information and independently of respective performance states of other processors in the plurality of processors of the first processing cluster.

Type: Grant

Filed: February 7, 2022

Date of Patent: October 24, 2023

Assignee: QUALCOMM Incorporated

Inventors: Jonathan Masters, Pradeep Kanapathipillai, Manu Gulati, Nitin Makhija
DSB Operation with Excluded Region

Publication number: 20230333851

Abstract: Techniques are disclosed relating to data synchronization barrier operations. A system includes a first processor that may receive a data barrier operation request from a second processor include in the system. Based on receiving that data barrier operation request from the second processor, the first processor may ensure that outstanding load/store operations executed by the first processor that are directed to addresses outside of an exclusion region have been completed. The first processor may respond to the second processor that the data barrier operation request is complete at the first processor, even in the case that one or more load/store operations that are directed to addresses within the exclusion region are outstanding and not complete when the first processor responds that the data barrier operation request is complete.

Type: Application

Filed: June 16, 2023

Publication date: October 19, 2023

Inventors: Jeff Gonion, John H. Kelm, James Vash, Pradeep Kanapathipillai, Mridul Agarwal, Gideon N. Levinsky, Richard F. Russo, Christopher M. Tsay
Latency Management in Synchronization Events

Publication number: 20230281133

Abstract: An electronic device includes one or more processors for executing one or more virtual machines. In response to a request for initiating a synchronization event, a processor identifies a subset of speculative memory access requests in one or more memory access request queues. Automatically and in accordance with the identifying, the processor purges translations associated with the subset of speculative memory access requests. Subsequent to the purging, the processor initiates the synchronization event. In some implementations, memory access completion is forced in response to a context synchronization event that corresponds to a termination of a first application, a termination of a first virtual machine, or a system call for updating a system register. Alternatively, in some implementations, memory access completion is forced in an operating system level or an application level in response to a data synchronization event that is initiated on a hypervisor layer or a firmware layer.

Type: Application

Filed: March 1, 2022

Publication date: September 7, 2023

Inventors: Adrian MONTERO, Huzefa SANJELIWALA, Paul KITCHIN, Prarthna SANTHANAKRISHNAN, Conrado BLASCO, Pradeep KANAPATHIPILLAI
DSB operation with excluded region

Patent number: 11720360

Abstract: Techniques are disclosed relating to data synchronization barrier operations. A system includes a first processor that may receive a data barrier operation request from a second processor include in the system. Based on receiving that data barrier operation request from the second processor, the first processor may ensure that outstanding load/store operations executed by the first processor that are directed to addresses outside of an exclusion region have been completed. The first processor may respond to the second processor that the data barrier operation request is complete at the first processor, even in the case that one or more load/store operations that are directed to addresses within the exclusion region are outstanding and not complete when the first processor responds that the data barrier operation request is complete.

Type: Grant

Filed: September 8, 2021

Date of Patent: August 8, 2023

Assignee: Apple Inc.

Inventors: Jeff Gonion, John H. Kelm, James Vash, Pradeep Kanapathipillai, Mridul Agarwal, Gideon N. Levinsky, Richard F. Russo, Christopher M. Tsay
Dynamic Voltage and Frequency Scaling (DVFS) within Processor Clusters

Publication number: 20230093426

Abstract: An electronic system has a plurality of processing clusters including a first processing cluster. The first processing cluster further includes a plurality of processors and a power management processor. The power management processor obtains performance information about the plurality of processors, executes power instructions to transition a first processor of the plurality of processors from a first performance state to a second performance state different from the first performance state, and executes one or more debug instructions to perform debugging of a respective processor of the plurality of processors. The power instructions are executed in accordance with the obtained performance information and independently of respective performance states of other processors in the plurality of processors of the first processing cluster.

Type: Application

Filed: February 7, 2022

Publication date: March 23, 2023

Inventors: Jonathan MASTERS, Pradeep KANAPATHIPILLAI, Manu GULATI, Nitin MAKHIJA
SYSTEM AND METHODS FOR INVALIDATING TRANSLATION INFORMATION IN CACHES

Publication number: 20230064603

Abstract: An electronic device includes a plurality of processors for executing one or more virtual machines. A processor of the plurality of processors is associated with a translation cache and one or more filters corresponding to the translation cache. The one or more filters include a virtual machine identifier filter, and the processor is configured to receive a translation invalidation instruction to invalidate one or more entries in the translation cache. In accordance with a determination that the translation invalidation specifies a respective virtual machine identifier, the processor queries the virtual machine identifier filter associated with the translation cache to determine whether the respective virtual machine identifier is stored in the virtual machine identifier filter.

Type: Application

Filed: February 18, 2022

Publication date: March 2, 2023

Inventors: Conrado BLASCO, Adrian MONTERO, Paul KITCHIN, Pradeep KANAPATHIPILLAI, Huzefa SANJELIWALA
Hardware compression and decompression engine

Patent number: 11500638

Abstract: A method and system for compressing and decompressing data is disclosed. A compression command may initiate the prefetching of first data, which may be stored in a first buffer. Multiple words of the first data may be read from the first buffer and used to generate a plurality of compressed packets, each of which includes a command specifying a type of packet. The compressed packets may be combined into a group and multiple groups may be combined and stored in a second buffer. A decompression command may initiate the prefetching of second data, which is stored in the first buffer. A portion of the second data may be read from the first buffer and used to generate a group of compressed packets. Multiple output words may be generated dependent upon the group of compressed packets.

Type: Grant

Filed: January 10, 2020

Date of Patent: November 15, 2022

Assignee: Apple Inc.

Inventors: Aditya Kesiraju, James Vash, Pradeep Kanapathipillai, Mridul Agarwal, Zhaoming Hu, Tyler Huberty, Charles Tucker
Coprocessors with Bypass Optimization, Variable Grid Architecture, and Fused Vector Operations

Publication number: 20220358082

Abstract: In an embodiment, a coprocessor may include a bypass indication which identifies execution circuitry that is not used by a given processor instruction, and thus may be bypassed. The corresponding circuitry may be disabled during execution, preventing evaluation when the output of the circuitry will not be used for the instruction. In another embodiment, the coprocessor may implement a grid of processing elements in rows and columns, where a given coprocessor instruction may specify an operation that causes up to all of the processing elements to operate on vectors of input operands to produce results. Implementations of the coprocessor may implement a portion of the processing elements. The coprocessor control circuitry may be designed to operate with the full grid or partial grid, reissuing instructions in the partial grid case to perform the requested operation. In still another embodiment, the coprocessor may be able to fuse vector mode operations.

Type: Application

Filed: July 20, 2022

Publication date: November 10, 2022

Inventors: Aditya Kesiraju, Andrew J. Beaumont-Smith, Boris S. Alvarez-Heredia, Pradeep Kanapathipillai, Ran A. Chachick
DSB Operation with Excluded Region

Publication number: 20220083338

Abstract: Techniques are disclosed relating to data synchronization barrier operations. A system includes a first processor that may receive a data barrier operation request from a second processor include in the system. Based on receiving that data barrier operation request from the second processor, the first processor may ensure that outstanding load/store operations executed by the first processor that are directed to addresses outside of an exclusion region have been completed. The first processor may respond to the second processor that the data barrier operation request is complete at the first processor, even in the case that one or more load/store operations that are directed to addresses within the exclusion region are outstanding and not complete when the first processor responds that the data barrier operation request is complete.

Type: Application

Filed: September 8, 2021

Publication date: March 17, 2022

Inventors: Jeff Gonion, John H. Kelm, James Vash, Pradeep Kanapathipillai, Mridul Agarwal, Gideon N. Levinsky, Richard F. Russo, Christopher M. Tsay
Unified address translation

Patent number: 11221962

Abstract: A system and method for efficiently transferring address mappings and data access permissions corresponding to the address mappings. A computing system includes at least one processor and memory for storing a page table. In response to receiving a memory access operation comprising a first address, the address translation unit is configured to identify a data access permission based on a permission index corresponding to the first address, and access data stored in a memory location of the memory identified by a second address in a manner defined by the retrieved data access permission. The address translation unit is configured to access a table to identify the data access permission, and is configured to determine the permission index and the second address based on the first address. A single permission index may correspond to different permissions for different entities within the system.

Type: Grant

Filed: May 15, 2020

Date of Patent: January 11, 2022

Assignee: Apple Inc.

Inventors: Jeffry E. Gonion, Bernard Joseph Semeria, Michael J. Swift, Pradeep Kanapathipillai, David J. Williamson
Architected state retention for a frequent operating state switching processor

Patent number: 10990159

Abstract: Systems, apparatuses, and methods for retaining architected state for relatively frequent switching between sleep and active operating states are described. A processor receives an indication to transition from an active state to a sleep state. The processor stores a copy of a first subset of the architected state information in on-die storage elements capable of retaining storage after power is turned off. The processor supports programmable input/output (PIO) access of particular stored information during the sleep state. When a wakeup event is detected, circuitry within the processor is powered up again. A boot sequence and recovery of architected state from off-chip memory are not performed. Rather than fetch from a memory location pointed to by a reset base address register, the processor instead fetches an instruction from a memory location pointed to by a restored program counter of the retained subset of the architected state information.

Type: Grant

Filed: April 25, 2017

Date of Patent: April 27, 2021

Assignee: Apple Inc.

Inventors: Bernard Joseph Semeria, John H. Mylius, Pradeep Kanapathipillai, Richard F. Russo, Shih-Chieh Wen, Richard H. Larson
UNIFIED ADDRESS TRANSLATION

Publication number: 20210064539

Abstract: A system and method for efficiently transferring address mappings and data access permissions corresponding to the address mappings. A computing system includes at least one processor and memory for storing a page table. In response to receiving a memory access operation comprising a first address, the address translation unit is configured to identify a data access permission based on a permission index corresponding to the first address, and access data stored in a memory location of the memory identified by a second address in a manner defined by the retrieved data access permission. The address translation unit is configured to access a table to identify the data access permission, and is configured to determine the permission index and the second address based on the first address. A single permission index may correspond to different permissions for different entities within the system.

Type: Application

Filed: May 15, 2020

Publication date: March 4, 2021

Inventors: Jeffry E. Gonion, Bernard Joseph Semeria, Michael J. Swift, Pradeep Kanapathipillai, David J. Williamson
Coprocessors with Bypass Optimization, Variable Grid Architecture, and Fused Vector Operations

Publication number: 20200272597

Abstract: In an embodiment, a coprocessor may include a bypass indication which identifies execution circuitry that is not used by a given processor instruction, and thus may be bypassed. The corresponding circuitry may be disabled during execution, preventing evaluation when the output of the circuitry will not be used for the instruction. In another embodiment, the coprocessor may implement a grid of processing elements in rows and columns, where a given coprocessor instruction may specify an operation that causes up to all of the processing elements to operate on vectors of input operands to produce results. Implementations of the coprocessor may implement a portion of the processing elements. The coprocessor control circuitry may be designed to operate with the full grid or partial grid, reissuing instructions in the partial grid case to perform the requested operation. In still another embodiment, the coprocessor may be able to fuse vector mode operations.

Type: Application

Filed: February 26, 2019

Publication date: August 27, 2020

Inventors: Aditya Kesiraju, Andrew J. Beaumont-Smith, Boris S. Alvarez-Heredia, Pradeep Kanapathipillai, Ran A. Chachick, Srikanth Balasubramanian
Translation lookaside buffer invalidation by range

Patent number: 10725928

Abstract: A system and method for efficiently performing maintenance on a cache. In various embodiments, control logic in a cache controller or elsewhere receives an indication for invalidating a range of virtual-to-physical mappings in a given translation lookaside buffer (TLB). The logic determines a first latency to invalidate entries of the TLB based on a number of addresses in the range and a number of supported page sizes simultaneously stored in the TLB. The logic determines a second latency based on a number of entries in the TLB. If the first latency is greater, then the logic traverses through each TLB entry and invalidates TLB entries storing a virtual address within the range. If the first latency is smaller, then the logic traverses through each address in the range and invalidates TLB entries storing a virtual address within the range.

Type: Grant

Filed: January 9, 2019

Date of Patent: July 28, 2020

Assignee: Apple Inc.

Inventors: Brian R. Mestan, Pradeep Kanapathipillai, Joshua William Smith
Translation Lookaside Buffer Invalidation By Range

Publication number: 20200218663

Abstract: A system and method for efficiently performing maintenance on a cache. In various embodiments, control logic in a cache controller or elsewhere receives an indication for invalidating a range of virtual-to-physical mappings in a given translation lookaside buffer (TLB). The logic determines a first latency to invalidate entries of the TLB based on a number of addresses in the range and a number of supported page sizes simultaneously stored in the TLB. The logic determines a second latency based on a number of entries in the TLB. If the first latency is greater, then the logic traverses through each TLB entry and invalidates TLB entries storing a virtual address within the range. If the first latency is smaller, then the logic traverses through each address in the range and invalidates TLB entries storing a virtual address within the range.

Type: Application

Filed: January 9, 2019

Publication date: July 9, 2020

Inventors: Brian R. Mestan, Pradeep Kanapathipillai, Joshua William Smith
Unified prefetch circuit for multi-level caches

Patent number: 10621100

Abstract: In an embodiment, a processor may implement an access map-pattern match (AMPM)-based prefetch circuit for a multi-level cache system. The access patterns that are matched to the access maps may include prefetches for different cache levels. Centralizing the generation of prefetches into one prefetch circuit may provide better observability and controllability of prefetching at various levels of the cache hierarchy, in an embodiment. Prefetches at different levels may be controlled individually based on the accuracy of those prefetches, in an embodiment. Additionally, in an embodiment, access patterns that are longer that a given threshold may have the granularity of the prefetches change so that more data is prefetched and the prefetches occur farther in advance, in some embodiments.

Type: Grant

Filed: December 5, 2018

Date of Patent: April 14, 2020

Assignee: Apple Inc.

Inventors: Stephan G. Meier, Tyler J. Huberty, Gerard R. Williams, III, Pradeep Kanapathipillai
Load/store dependency predictor optimization for replayed loads

Patent number: 10437595

Abstract: Systems, apparatuses, and methods for optimizing a load-store dependency predictor (LSDP). When a younger load instruction is issued before an older store instruction and the younger load is dependent on the older store, the LSDP is trained on this ordering violation. A replay/flush indicator is stored in a corresponding entry in the LSDP to indicate whether the ordering violation resulted in a flush or replay. On subsequent executions, a dependency may be enforced for the load-store pair if a confidence counter is above a threshold, with the threshold varying based on the status of the replay/flush indicator. If a given load matches on multiple entries in the LSDP, and if at least one of the entries has a flush indicator, then the given load may be marked as a multimatch case and forced to wait to issue until all older stores have issued.

Type: Grant

Filed: March 15, 2016

Date of Patent: October 8, 2019

Assignee: Apple Inc.

Inventors: Pradeep Kanapathipillai, Stephan G. Meier, Gerard R. Williams, III, Mridul Agarwal, Kulin N. Kothari
Out of order store commit

Patent number: 10228951

Abstract: Systems, apparatuses, and methods for committing store instructions out of order from a store queue are described. A processor may store a first store instruction and a second store instruction in the store queue, wherein the first store instruction is older than the second store instruction. In response to determining the second store instruction is ready to commit to the memory hierarchy, the processor may allow the second store instruction to commit before the first store instruction, in response to determining that all store instructions in the store queue older than the second store instruction are non-speculative. However, if it is determined that at least one store instruction in the store queue older than the second store instruction is speculative, the processor may prevent the second store instruction from committing to the memory hierarchy before the first store instruction.

Type: Grant

Filed: August 20, 2015

Date of Patent: March 12, 2019

Assignee: Apple Inc.

Inventors: Kulin N. Kothari, Mridul Agarwal, Pradeep Kanapathipillai
Unified prefetch circuit for multi-level caches

Patent number: 10180905

Abstract: In an embodiment, a processor may implement an access map-pattern match (AMPM)-based prefetch circuit for a multi-level cache system. The access patterns that are matched to the access maps may include prefetches for different cache levels. Centralizing the generation of prefetches into one prefetch circuit may provide better observability and controllability of prefetching at various levels of the cache hierarchy, in an embodiment. Prefetches at different levels may be controlled individually based on the accuracy of those prefetches, in an embodiment. Additionally, in an embodiment, access patterns that are longer that a given threshold may have the granularity of the prefetches change so that more data is prefetched and the prefetches occur farther in advance, in some embodiments.

Type: Grant

Filed: April 7, 2016

Date of Patent: January 15, 2019

Assignee: Apple Inc.

Inventors: Stephan G. Meier, Tyler J. Huberty, Gerard R. Williams, III, Pradeep Kanapathipillai

1 2 3 next