Patents by Inventor Pradeep Kanapathipillai
Pradeep Kanapathipillai has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11914524Abstract: An electronic device includes one or more processors for executing one or more virtual machines. In response to a request for initiating a synchronization event, a processor identifies a subset of speculative memory access requests in one or more memory access request queues. Automatically and in accordance with the identifying, the processor purges translations associated with the subset of speculative memory access requests. Subsequent to the purging, the processor initiates the synchronization event. In some implementations, memory access completion is forced in response to a context synchronization event that corresponds to a termination of a first application, a termination of a first virtual machine, or a system call for updating a system register. Alternatively, in some implementations, memory access completion is forced in an operating system level or an application level in response to a data synchronization event that is initiated on a hypervisor layer or a firmware layer.Type: GrantFiled: March 1, 2022Date of Patent: February 27, 2024Assignee: QUALCOMM IncorporatedInventors: Adrian Montero, Huzefa Sanjeliwala, Paul Kitchin, Prarthna Santhanakrishnan, Conrado Blasco, Pradeep Kanapathipillai
-
Patent number: 11797045Abstract: An electronic system has a plurality of processing clusters including a first processing cluster. The first processing cluster further includes a plurality of processors and a power management processor. The power management processor obtains performance information about the plurality of processors, executes power instructions to transition a first processor of the plurality of processors from a first performance state to a second performance state different from the first performance state, and executes one or more debug instructions to perform debugging of a respective processor of the plurality of processors. The power instructions are executed in accordance with the obtained performance information and independently of respective performance states of other processors in the plurality of processors of the first processing cluster.Type: GrantFiled: February 7, 2022Date of Patent: October 24, 2023Assignee: QUALCOMM IncorporatedInventors: Jonathan Masters, Pradeep Kanapathipillai, Manu Gulati, Nitin Makhija
-
Publication number: 20230333851Abstract: Techniques are disclosed relating to data synchronization barrier operations. A system includes a first processor that may receive a data barrier operation request from a second processor include in the system. Based on receiving that data barrier operation request from the second processor, the first processor may ensure that outstanding load/store operations executed by the first processor that are directed to addresses outside of an exclusion region have been completed. The first processor may respond to the second processor that the data barrier operation request is complete at the first processor, even in the case that one or more load/store operations that are directed to addresses within the exclusion region are outstanding and not complete when the first processor responds that the data barrier operation request is complete.Type: ApplicationFiled: June 16, 2023Publication date: October 19, 2023Inventors: Jeff Gonion, John H. Kelm, James Vash, Pradeep Kanapathipillai, Mridul Agarwal, Gideon N. Levinsky, Richard F. Russo, Christopher M. Tsay
-
Publication number: 20230281133Abstract: An electronic device includes one or more processors for executing one or more virtual machines. In response to a request for initiating a synchronization event, a processor identifies a subset of speculative memory access requests in one or more memory access request queues. Automatically and in accordance with the identifying, the processor purges translations associated with the subset of speculative memory access requests. Subsequent to the purging, the processor initiates the synchronization event. In some implementations, memory access completion is forced in response to a context synchronization event that corresponds to a termination of a first application, a termination of a first virtual machine, or a system call for updating a system register. Alternatively, in some implementations, memory access completion is forced in an operating system level or an application level in response to a data synchronization event that is initiated on a hypervisor layer or a firmware layer.Type: ApplicationFiled: March 1, 2022Publication date: September 7, 2023Inventors: Adrian MONTERO, Huzefa SANJELIWALA, Paul KITCHIN, Prarthna SANTHANAKRISHNAN, Conrado BLASCO, Pradeep KANAPATHIPILLAI
-
Patent number: 11720360Abstract: Techniques are disclosed relating to data synchronization barrier operations. A system includes a first processor that may receive a data barrier operation request from a second processor include in the system. Based on receiving that data barrier operation request from the second processor, the first processor may ensure that outstanding load/store operations executed by the first processor that are directed to addresses outside of an exclusion region have been completed. The first processor may respond to the second processor that the data barrier operation request is complete at the first processor, even in the case that one or more load/store operations that are directed to addresses within the exclusion region are outstanding and not complete when the first processor responds that the data barrier operation request is complete.Type: GrantFiled: September 8, 2021Date of Patent: August 8, 2023Assignee: Apple Inc.Inventors: Jeff Gonion, John H. Kelm, James Vash, Pradeep Kanapathipillai, Mridul Agarwal, Gideon N. Levinsky, Richard F. Russo, Christopher M. Tsay
-
Publication number: 20230093426Abstract: An electronic system has a plurality of processing clusters including a first processing cluster. The first processing cluster further includes a plurality of processors and a power management processor. The power management processor obtains performance information about the plurality of processors, executes power instructions to transition a first processor of the plurality of processors from a first performance state to a second performance state different from the first performance state, and executes one or more debug instructions to perform debugging of a respective processor of the plurality of processors. The power instructions are executed in accordance with the obtained performance information and independently of respective performance states of other processors in the plurality of processors of the first processing cluster.Type: ApplicationFiled: February 7, 2022Publication date: March 23, 2023Inventors: Jonathan MASTERS, Pradeep KANAPATHIPILLAI, Manu GULATI, Nitin MAKHIJA
-
Publication number: 20230064603Abstract: An electronic device includes a plurality of processors for executing one or more virtual machines. A processor of the plurality of processors is associated with a translation cache and one or more filters corresponding to the translation cache. The one or more filters include a virtual machine identifier filter, and the processor is configured to receive a translation invalidation instruction to invalidate one or more entries in the translation cache. In accordance with a determination that the translation invalidation specifies a respective virtual machine identifier, the processor queries the virtual machine identifier filter associated with the translation cache to determine whether the respective virtual machine identifier is stored in the virtual machine identifier filter.Type: ApplicationFiled: February 18, 2022Publication date: March 2, 2023Inventors: Conrado BLASCO, Adrian MONTERO, Paul KITCHIN, Pradeep KANAPATHIPILLAI, Huzefa SANJELIWALA
-
Patent number: 11500638Abstract: A method and system for compressing and decompressing data is disclosed. A compression command may initiate the prefetching of first data, which may be stored in a first buffer. Multiple words of the first data may be read from the first buffer and used to generate a plurality of compressed packets, each of which includes a command specifying a type of packet. The compressed packets may be combined into a group and multiple groups may be combined and stored in a second buffer. A decompression command may initiate the prefetching of second data, which is stored in the first buffer. A portion of the second data may be read from the first buffer and used to generate a group of compressed packets. Multiple output words may be generated dependent upon the group of compressed packets.Type: GrantFiled: January 10, 2020Date of Patent: November 15, 2022Assignee: Apple Inc.Inventors: Aditya Kesiraju, James Vash, Pradeep Kanapathipillai, Mridul Agarwal, Zhaoming Hu, Tyler Huberty, Charles Tucker
-
Publication number: 20220358082Abstract: In an embodiment, a coprocessor may include a bypass indication which identifies execution circuitry that is not used by a given processor instruction, and thus may be bypassed. The corresponding circuitry may be disabled during execution, preventing evaluation when the output of the circuitry will not be used for the instruction. In another embodiment, the coprocessor may implement a grid of processing elements in rows and columns, where a given coprocessor instruction may specify an operation that causes up to all of the processing elements to operate on vectors of input operands to produce results. Implementations of the coprocessor may implement a portion of the processing elements. The coprocessor control circuitry may be designed to operate with the full grid or partial grid, reissuing instructions in the partial grid case to perform the requested operation. In still another embodiment, the coprocessor may be able to fuse vector mode operations.Type: ApplicationFiled: July 20, 2022Publication date: November 10, 2022Inventors: Aditya Kesiraju, Andrew J. Beaumont-Smith, Boris S. Alvarez-Heredia, Pradeep Kanapathipillai, Ran A. Chachick
-
Publication number: 20220083338Abstract: Techniques are disclosed relating to data synchronization barrier operations. A system includes a first processor that may receive a data barrier operation request from a second processor include in the system. Based on receiving that data barrier operation request from the second processor, the first processor may ensure that outstanding load/store operations executed by the first processor that are directed to addresses outside of an exclusion region have been completed. The first processor may respond to the second processor that the data barrier operation request is complete at the first processor, even in the case that one or more load/store operations that are directed to addresses within the exclusion region are outstanding and not complete when the first processor responds that the data barrier operation request is complete.Type: ApplicationFiled: September 8, 2021Publication date: March 17, 2022Inventors: Jeff Gonion, John H. Kelm, James Vash, Pradeep Kanapathipillai, Mridul Agarwal, Gideon N. Levinsky, Richard F. Russo, Christopher M. Tsay
-
Patent number: 11221962Abstract: A system and method for efficiently transferring address mappings and data access permissions corresponding to the address mappings. A computing system includes at least one processor and memory for storing a page table. In response to receiving a memory access operation comprising a first address, the address translation unit is configured to identify a data access permission based on a permission index corresponding to the first address, and access data stored in a memory location of the memory identified by a second address in a manner defined by the retrieved data access permission. The address translation unit is configured to access a table to identify the data access permission, and is configured to determine the permission index and the second address based on the first address. A single permission index may correspond to different permissions for different entities within the system.Type: GrantFiled: May 15, 2020Date of Patent: January 11, 2022Assignee: Apple Inc.Inventors: Jeffry E. Gonion, Bernard Joseph Semeria, Michael J. Swift, Pradeep Kanapathipillai, David J. Williamson
-
Patent number: 10990159Abstract: Systems, apparatuses, and methods for retaining architected state for relatively frequent switching between sleep and active operating states are described. A processor receives an indication to transition from an active state to a sleep state. The processor stores a copy of a first subset of the architected state information in on-die storage elements capable of retaining storage after power is turned off. The processor supports programmable input/output (PIO) access of particular stored information during the sleep state. When a wakeup event is detected, circuitry within the processor is powered up again. A boot sequence and recovery of architected state from off-chip memory are not performed. Rather than fetch from a memory location pointed to by a reset base address register, the processor instead fetches an instruction from a memory location pointed to by a restored program counter of the retained subset of the architected state information.Type: GrantFiled: April 25, 2017Date of Patent: April 27, 2021Assignee: Apple Inc.Inventors: Bernard Joseph Semeria, John H. Mylius, Pradeep Kanapathipillai, Richard F. Russo, Shih-Chieh Wen, Richard H. Larson
-
Publication number: 20210064539Abstract: A system and method for efficiently transferring address mappings and data access permissions corresponding to the address mappings. A computing system includes at least one processor and memory for storing a page table. In response to receiving a memory access operation comprising a first address, the address translation unit is configured to identify a data access permission based on a permission index corresponding to the first address, and access data stored in a memory location of the memory identified by a second address in a manner defined by the retrieved data access permission. The address translation unit is configured to access a table to identify the data access permission, and is configured to determine the permission index and the second address based on the first address. A single permission index may correspond to different permissions for different entities within the system.Type: ApplicationFiled: May 15, 2020Publication date: March 4, 2021Inventors: Jeffry E. Gonion, Bernard Joseph Semeria, Michael J. Swift, Pradeep Kanapathipillai, David J. Williamson
-
Publication number: 20200272597Abstract: In an embodiment, a coprocessor may include a bypass indication which identifies execution circuitry that is not used by a given processor instruction, and thus may be bypassed. The corresponding circuitry may be disabled during execution, preventing evaluation when the output of the circuitry will not be used for the instruction. In another embodiment, the coprocessor may implement a grid of processing elements in rows and columns, where a given coprocessor instruction may specify an operation that causes up to all of the processing elements to operate on vectors of input operands to produce results. Implementations of the coprocessor may implement a portion of the processing elements. The coprocessor control circuitry may be designed to operate with the full grid or partial grid, reissuing instructions in the partial grid case to perform the requested operation. In still another embodiment, the coprocessor may be able to fuse vector mode operations.Type: ApplicationFiled: February 26, 2019Publication date: August 27, 2020Inventors: Aditya Kesiraju, Andrew J. Beaumont-Smith, Boris S. Alvarez-Heredia, Pradeep Kanapathipillai, Ran A. Chachick, Srikanth Balasubramanian
-
Patent number: 10725928Abstract: A system and method for efficiently performing maintenance on a cache. In various embodiments, control logic in a cache controller or elsewhere receives an indication for invalidating a range of virtual-to-physical mappings in a given translation lookaside buffer (TLB). The logic determines a first latency to invalidate entries of the TLB based on a number of addresses in the range and a number of supported page sizes simultaneously stored in the TLB. The logic determines a second latency based on a number of entries in the TLB. If the first latency is greater, then the logic traverses through each TLB entry and invalidates TLB entries storing a virtual address within the range. If the first latency is smaller, then the logic traverses through each address in the range and invalidates TLB entries storing a virtual address within the range.Type: GrantFiled: January 9, 2019Date of Patent: July 28, 2020Assignee: Apple Inc.Inventors: Brian R. Mestan, Pradeep Kanapathipillai, Joshua William Smith
-
Publication number: 20200218663Abstract: A system and method for efficiently performing maintenance on a cache. In various embodiments, control logic in a cache controller or elsewhere receives an indication for invalidating a range of virtual-to-physical mappings in a given translation lookaside buffer (TLB). The logic determines a first latency to invalidate entries of the TLB based on a number of addresses in the range and a number of supported page sizes simultaneously stored in the TLB. The logic determines a second latency based on a number of entries in the TLB. If the first latency is greater, then the logic traverses through each TLB entry and invalidates TLB entries storing a virtual address within the range. If the first latency is smaller, then the logic traverses through each address in the range and invalidates TLB entries storing a virtual address within the range.Type: ApplicationFiled: January 9, 2019Publication date: July 9, 2020Inventors: Brian R. Mestan, Pradeep Kanapathipillai, Joshua William Smith
-
Patent number: 10621100Abstract: In an embodiment, a processor may implement an access map-pattern match (AMPM)-based prefetch circuit for a multi-level cache system. The access patterns that are matched to the access maps may include prefetches for different cache levels. Centralizing the generation of prefetches into one prefetch circuit may provide better observability and controllability of prefetching at various levels of the cache hierarchy, in an embodiment. Prefetches at different levels may be controlled individually based on the accuracy of those prefetches, in an embodiment. Additionally, in an embodiment, access patterns that are longer that a given threshold may have the granularity of the prefetches change so that more data is prefetched and the prefetches occur farther in advance, in some embodiments.Type: GrantFiled: December 5, 2018Date of Patent: April 14, 2020Assignee: Apple Inc.Inventors: Stephan G. Meier, Tyler J. Huberty, Gerard R. Williams, III, Pradeep Kanapathipillai
-
Patent number: 10437595Abstract: Systems, apparatuses, and methods for optimizing a load-store dependency predictor (LSDP). When a younger load instruction is issued before an older store instruction and the younger load is dependent on the older store, the LSDP is trained on this ordering violation. A replay/flush indicator is stored in a corresponding entry in the LSDP to indicate whether the ordering violation resulted in a flush or replay. On subsequent executions, a dependency may be enforced for the load-store pair if a confidence counter is above a threshold, with the threshold varying based on the status of the replay/flush indicator. If a given load matches on multiple entries in the LSDP, and if at least one of the entries has a flush indicator, then the given load may be marked as a multimatch case and forced to wait to issue until all older stores have issued.Type: GrantFiled: March 15, 2016Date of Patent: October 8, 2019Assignee: Apple Inc.Inventors: Pradeep Kanapathipillai, Stephan G. Meier, Gerard R. Williams, III, Mridul Agarwal, Kulin N. Kothari
-
Patent number: 10228951Abstract: Systems, apparatuses, and methods for committing store instructions out of order from a store queue are described. A processor may store a first store instruction and a second store instruction in the store queue, wherein the first store instruction is older than the second store instruction. In response to determining the second store instruction is ready to commit to the memory hierarchy, the processor may allow the second store instruction to commit before the first store instruction, in response to determining that all store instructions in the store queue older than the second store instruction are non-speculative. However, if it is determined that at least one store instruction in the store queue older than the second store instruction is speculative, the processor may prevent the second store instruction from committing to the memory hierarchy before the first store instruction.Type: GrantFiled: August 20, 2015Date of Patent: March 12, 2019Assignee: Apple Inc.Inventors: Kulin N. Kothari, Mridul Agarwal, Pradeep Kanapathipillai
-
Patent number: 10180905Abstract: In an embodiment, a processor may implement an access map-pattern match (AMPM)-based prefetch circuit for a multi-level cache system. The access patterns that are matched to the access maps may include prefetches for different cache levels. Centralizing the generation of prefetches into one prefetch circuit may provide better observability and controllability of prefetching at various levels of the cache hierarchy, in an embodiment. Prefetches at different levels may be controlled individually based on the accuracy of those prefetches, in an embodiment. Additionally, in an embodiment, access patterns that are longer that a given threshold may have the granularity of the prefetches change so that more data is prefetched and the prefetches occur farther in advance, in some embodiments.Type: GrantFiled: April 7, 2016Date of Patent: January 15, 2019Assignee: Apple Inc.Inventors: Stephan G. Meier, Tyler J. Huberty, Gerard R. Williams, III, Pradeep Kanapathipillai