Patents by Inventor Shakti Kapoor

Shakti Kapoor has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240028518
    Abstract: A processor includes a load and store unit (LSU) and a cache memory, and transfers data information from a store queue in the LSU to the cache memory. The cache memory requests an information packet from the LSU when the cache memory determines that an available entry exists in a store queue within the cache memory. The LSU acknowledges the request and transfers an information packet to the cache memory. The LSU anticipates that an additional available entry exists in the cache memory, transmits an additional acknowledgement to the cache memory, and transfers an additional information packet, before receiving an additional request from the cache memory. The cache memory stores the additional information packet if an additional available entry exists in the cache store queue. The cache memory rejects the additional information packet if no additional available entries exist in the cache store queue.
    Type: Application
    Filed: July 21, 2022
    Publication date: January 25, 2024
    Inventors: Shakti Kapoor, Nelson Wu, Manoj Dusanapudi
  • Publication number: 20230195649
    Abstract: Systems and methods for invalidating page translation entries are described. A processing element may apply a delay to a drain cycle of a store reorder queue (SRQ) of a processing element. The processing element may drain the SRQ under the delayed drain cycle. The processing element may receive a translation lookaside buffer invalidation (TLBI) instruction from an interconnect connecting the plurality of processing elements. The TLBI instruction may be an instruction to invalidate a translation lookaside buffer (TLB) entry corresponding to at least one of a virtual memory page and a physical memory frame. The TLBI instruction may be broadcasted by another processing element. The application of the delay to the drain cycle of the SRQ may decrease a difference between the drain cycle of the SRQ and an invalidation cycle associated with the TLBI.
    Type: Application
    Filed: February 23, 2023
    Publication date: June 22, 2023
    Inventors: Shakti Kapoor, Nelson Wu, Manoj Dusanapudi
  • Publication number: 20230195981
    Abstract: A system, mechanism, tool, programming product, processor, and/or method for generating a hazard in a processor includes: identifying one or more cache lines to invalidate in a second level memory of a processing core in the processor; invalidating, in response to identifying one or more cache lines to invalidate in the second level cache, the one or more identified cache lines in the second level memory; and invalidating, in response to invalidating the one or more identified cache lines in the second level memory, the corresponding one or more cache lines in a first level memory. In an aspect the hazard generating mechanism is triggered, preferably on demand, and includes in an approach searching for cache lines in the second level memory that are also in the first level memory.
    Type: Application
    Filed: December 17, 2021
    Publication date: June 22, 2023
    Inventors: Shakti Kapoor, Nelson Wu, Manoj Dusanapudi
  • Publication number: 20230122466
    Abstract: Methods and systems for validating cache coherence in a data processing system are described. A processing element may detect a load instruction requesting the processing element to transfer data from a global memory location to a local memory location. The processing element may apply, in response to detecting the load instruction requesting the processing element to transfer data from the global memory location to the local memory location, a delay to the transfer of the data from the global memory location to the local memory location. The processing element may execute the load instruction and transferring the data from the global memory location to the local memory location with the applied delay. The processing element may validate, in response to executing the load instruction and transferring the data with the applied delay, a cache coherence of the data processing system.
    Type: Application
    Filed: October 20, 2021
    Publication date: April 20, 2023
    Inventors: Shakti Kapoor, Manoj Dusanapudi, Nelson Wu
  • Publication number: 20230105945
    Abstract: Systems and methods for invalidating page translation entries are described. A processing element may apply a delay to a drain cycle of a store reorder queue (SRQ) of a processing element. The processing element may drain the SRQ under the delayed drain cycle. The processing element may receive a translation lookaside buffer invalidation (TLBI) instruction from an interconnect connecting the plurality of processing elements. The TLBI instruction may be an instruction to invalidate a translation lookaside buffer (TLB) entry corresponding to at least one of a virtual memory page and a physical memory frame. The TLBI instruction may be broadcasted by another processing element. The application of the delay to the drain cycle of the SRQ may decrease a difference between the drain cycle of the SRQ and an invalidation cycle associated with the TLBI.
    Type: Application
    Filed: October 4, 2021
    Publication date: April 6, 2023
    Inventors: Shakti Kapoor, Nelson Wu, Manoj Dusanapudi
  • Patent number: 11620235
    Abstract: Systems and methods for invalidating page translation entries are described. A processing element may apply a delay to a drain cycle of a store reorder queue (SRQ) of a processing element. The processing element may drain the SRQ under the delayed drain cycle. The processing element may receive a translation lookaside buffer invalidation (TLBI) instruction from an interconnect connecting the plurality of processing elements. The TLBI instruction may be an instruction to invalidate a translation lookaside buffer (TLB) entry corresponding to at least one of a virtual memory page and a physical memory frame. The TLBI instruction may be broadcasted by another processing element. The application of the delay to the drain cycle of the SRQ may decrease a difference between the drain cycle of the SRQ and an invalidation cycle associated with the TLBI.
    Type: Grant
    Filed: October 4, 2021
    Date of Patent: April 4, 2023
    Assignee: International Business Machines Corporation
    Inventors: Shakti Kapoor, Nelson Wu, Manoj Dusanapudi
  • Patent number: 11501046
    Abstract: A system is provided to validate a computer processor. The system includes a computing system configured to obtain core dump data including executable instructions corresponding to a code stored in a legacy processor. An instruction-level simulator is installed in the computing system and is configured to simulate the executable instructions to generate a plurality of instruction traces. The system further includes a pre-silicon chip model simulator configured to execute the instruction traces to generate performance data. The computer processor is verified based at least in part on the performance data.
    Type: Grant
    Filed: March 24, 2020
    Date of Patent: November 15, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Nelson Wu, Daniel Isaac Rodriguez, Miguel Gomez Gonzalez, Shakti Kapoor
  • Patent number: 11347505
    Abstract: A processor includes a performance monitor that logs reservation losses, and additionally logs reasons for the reservation losses. By logging reasons for the reservation losses, the performance monitor provides data that can be used to determine whether the reservation losses were due to valid programming, such as two threads competing for the same lock, or whether the reservation losses were due to bad programming. When the reservation losses are due to bad programming, the information can be used to improve the programming to obtain better performance.
    Type: Grant
    Filed: January 15, 2020
    Date of Patent: May 31, 2022
    Assignee: International Business Machines Corporation
    Inventors: Shakti Kapoor, Karen E. Yokum, John A. Schumann
  • Patent number: 11163704
    Abstract: Disclosed is a method, apparatus, and/or computer program product for reducing latency in a processor with regard to the execution of noncacheable operations that includes receiving noncacheable operations from one or both of the level 2 cache and a level 3 cache, sending the noncacheable operations to a noncacheable unit (NCU) associated with a core of the processor, executing the noncacheable operations by the NCU, and sending results of the executed noncacheable operations to a host bridge for output to an input/out device. The noncacheable operations bypass the core of the processor.
    Type: Grant
    Filed: October 9, 2019
    Date of Patent: November 2, 2021
    Assignee: International Business Machines Corporation
    Inventor: Shakti Kapoor
  • Patent number: 11151011
    Abstract: A computing system includes a core system and an uncore system. The core system includes a packet generator unit configured to generate a data packet having a plurality of bytes defining a target packet size, and to output a first byte among the plurality of bytes at a packet delivery start time. The uncore system includes an input/output (I/O) bridge configured to connect an I/O component to the core system, and a packet monitor unit configured to monitor the bytes delivered from the packet generator unit to the I/O component. The packet monitor unit further determines a packet delivery end time after detecting a last byte of the data packet. The computing system determines a latency attributed to the uncore system and the I/O bridge based on the packet delivery start time and the packet delivery end time.
    Type: Grant
    Filed: October 1, 2019
    Date of Patent: October 19, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Shakti Kapoor, Daniel Isaac Rodriguez, Miguel Gomez Gonzalez, Anatoli Andreev
  • Publication number: 20210303766
    Abstract: A system is provided to validate a computer processor. The system includes a computing system configured to obtain core dump data including executable instructions corresponding to a code stored in a legacy processor. An instruction-level simulator is installed in the computing system and is configured to simulate the executable instructions to generate a plurality of instruction traces. The system further includes a pre-silicon chip model simulator configured to execute the instruction traces to generate performance data. The computer processor is verified based at least in part on the performance data.
    Type: Application
    Filed: March 24, 2020
    Publication date: September 30, 2021
    Inventors: Nelson Wu, Daniel Isaac Rodriguez, Miguel Gomez Gonzalez, Shakti Kapoor
  • Patent number: 11094391
    Abstract: A processor memory is stress tested with a variable list insertion depth using list insertion test segments with non-naturally aligned data boundaries. List insertion test segments are interspersed into test code of a processor memory tests to change the list insertion depth without changing results of the test code. The list insertion test segments are the same structure as the segments of the test code and have non-naturally aligned boundaries. The list insertion test segments include list insertion segments and load/store segments. The list insertion segments locate a current memory location using a fixed segment at a known location. The load/store segments load and store list elements in memory.
    Type: Grant
    Filed: June 6, 2019
    Date of Patent: August 17, 2021
    Assignee: International Business Machines Corporation
    Inventors: Manoj Dusanapudi, Shakti Kapoor, Nelson Wu
  • Patent number: 11061821
    Abstract: Disclosed is a system, method and/or computer product that includes generating translation requests that are identical but have different expected results, transmitting the translation requests from a MMU tester to a non-core MMU disposed on a processor chip, where the non-core MMU is external to a processing core of the processor chip, and where the MMU tester is disposed on a computing component external to the processor chip. The method also includes receiving memory translation results from the non-core MMU at the MMU tester, comparing the results to determine if there is a flaw in the non-core MMU.
    Type: Grant
    Filed: November 25, 2019
    Date of Patent: July 13, 2021
    Assignee: International Business Machines Corporation
    Inventors: Manoj Dusanapudi, Shakti Kapoor, Nelson Wu
  • Patent number: 10983798
    Abstract: Embodiments of the invention are directed to methods for handling cache. The method includes retrieving a plurality of instructions from a cache. The method further includes placing the plurality of instructions into an instruction fetch buffer. The method includes retrieving a first instruction of the plurality of instructions from the instruction fetch buffer. The method includes executing the first instruction. The method includes retrieving a second instruction from the plurality of instructions from the instruction fetch buffer unless a back invalidate is received from the cache. Thereafter executing the second instruction without refreshing the instruction fetch buffer from the cache.
    Type: Grant
    Filed: November 15, 2017
    Date of Patent: April 20, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Shakti Kapoor
  • Patent number: 10977043
    Abstract: Embodiments of the invention are directed to methods for handling cache. The method includes retrieving a plurality of instructions from a cache. The method further includes placing the plurality of instructions into an instruction fetch buffer. The method includes retrieving a first instruction of the plurality of instructions from the instruction fetch buffer. The method includes executing the first instruction. The method includes retrieving a second instruction from the plurality of instructions from the instruction fetch buffer unless a back invalidate is received from the cache. Thereafter executing the second instruction without refreshing the instruction fetch buffer from the cache.
    Type: Grant
    Filed: September 14, 2017
    Date of Patent: April 13, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Shakti Kapoor
  • Publication number: 20210099373
    Abstract: A computing system includes a core system and an uncore system. The core system includes a packet generator unit configured to generate a data packet having a plurality of bytes defining a target packet size, and to output a first byte among the plurality of bytes at a packet delivery start time. The uncore system includes an input/output (I/O) bridge configured to connect an I/O component to the core system, and a packet monitor unit configured to monitor the bytes delivered from the packet generator unit to the I/O component. The packet monitor unit further determines a packet delivery end time after detecting a last byte of the data packet. The computing system determines a latency attributed to the uncore system and the I/O bridge based on the packet delivery start time and the packet delivery end time.
    Type: Application
    Filed: October 1, 2019
    Publication date: April 1, 2021
    Inventors: Shakti Kapoor, Daniel Isaac Rodriguez, Miguel Gomez Gonzalez, Anatoli Andreev
  • Patent number: 10877864
    Abstract: A processor memory is stress tested with a variable link stack depth using test code segments and link stack test segments on non-naturally aligned data boundaries. Link stack test segments are interspersed into test code segments of a processor memory test to change the link stack depth without changing results of the test code. The link stack test segments include branch to target, push/pop, push and pop segments. The depth of the link stack is varied independent of the memory test code by changing the number to branches in the branch to target segment and varying the number of the push/pop segments. The link stack test segments and test segments may be placed randomly with a recursive algorithm to intersperse the link stack test segments in the test code segments and to reduce the amount of data to be saved and restored for all subroutine calls, push and pop segments.
    Type: Grant
    Filed: December 20, 2018
    Date of Patent: December 29, 2020
    Assignee: International Business Machines Corporation
    Inventors: Shakti Kapoor, Manoj Dusanapudi
  • Patent number: 10748637
    Abstract: A system comprising a computer processor comprising a plurality of registers, a load-store unit configured to load data in at least one of the plurality of registers, and a memory. The memory includes a memory location mapped to a first virtual memory address and a second virtual memory address. Issuance of a load from the memory location via the first virtual memory address causes execution of a side effect. The memory also includes a computer program containing programming instructions that, when executed by the computer processor, performs an operation that includes storing a predetermined data value at the memory location, and testing the memory for errors during load operations.
    Type: Grant
    Filed: July 26, 2018
    Date of Patent: August 18, 2020
    Assignee: International Business Machines Corporation
    Inventors: Nelson Wu, Manoj Dusanapudi, Shakti Kapoor, Nandhini Rajaiah
  • Patent number: 10713179
    Abstract: Efficiently generating effective address translations for memory management test cases including obtaining a first set of EAs, wherein each EA comprises an effective segment ID and a page, wherein each effective segment ID of each EA in the first set of EAs is mapped to a same first effective segment; obtaining a set of virtual address corresponding to the first set of EAs; translating the first set of EAs by applying a hash function to each virtual address in the set of virtual addresses to obtain a first set of PTEG addresses mapped to a first set of PTEGs; and generating a translation for a second set of EAs to obtain a second set of PTEG addresses mapped to the first set of PTEGs.
    Type: Grant
    Filed: April 2, 2019
    Date of Patent: July 14, 2020
    Assignee: International Business Machines Corporation
    Inventors: Manoj Dusanapudi, Shakti Kapoor
  • Publication number: 20200201728
    Abstract: A processor memory is stress tested with a variable link stack depth using test code segments and link stack test segments on non-naturally aligned data boundaries. Link stack test segments are interspersed into test code segments of a processor memory test to change the link stack depth without changing results of the test code. The link stack test segments include branch to target, push/pop, push and pop segments. The depth of the link stack is varied independent of the memory test code by changing the number to branches in the branch to target segment and varying the number of the push/pop segments. The link stack test segments and test segments may be placed randomly with a recursive algorithm to intersperse the link stack test segments in the test code segments and to reduce the amount of data to be saved and restored for all subroutine calls, push and pop segments.
    Type: Application
    Filed: December 20, 2018
    Publication date: June 25, 2020
    Inventors: Shakti Kapoor, Manoj Dusanapudi