Patents by Inventor Shakti Kapoor
Shakti Kapoor has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240028518Abstract: A processor includes a load and store unit (LSU) and a cache memory, and transfers data information from a store queue in the LSU to the cache memory. The cache memory requests an information packet from the LSU when the cache memory determines that an available entry exists in a store queue within the cache memory. The LSU acknowledges the request and transfers an information packet to the cache memory. The LSU anticipates that an additional available entry exists in the cache memory, transmits an additional acknowledgement to the cache memory, and transfers an additional information packet, before receiving an additional request from the cache memory. The cache memory stores the additional information packet if an additional available entry exists in the cache store queue. The cache memory rejects the additional information packet if no additional available entries exist in the cache store queue.Type: ApplicationFiled: July 21, 2022Publication date: January 25, 2024Inventors: Shakti Kapoor, Nelson Wu, Manoj Dusanapudi
-
Publication number: 20230195649Abstract: Systems and methods for invalidating page translation entries are described. A processing element may apply a delay to a drain cycle of a store reorder queue (SRQ) of a processing element. The processing element may drain the SRQ under the delayed drain cycle. The processing element may receive a translation lookaside buffer invalidation (TLBI) instruction from an interconnect connecting the plurality of processing elements. The TLBI instruction may be an instruction to invalidate a translation lookaside buffer (TLB) entry corresponding to at least one of a virtual memory page and a physical memory frame. The TLBI instruction may be broadcasted by another processing element. The application of the delay to the drain cycle of the SRQ may decrease a difference between the drain cycle of the SRQ and an invalidation cycle associated with the TLBI.Type: ApplicationFiled: February 23, 2023Publication date: June 22, 2023Inventors: Shakti Kapoor, Nelson Wu, Manoj Dusanapudi
-
Publication number: 20230195981Abstract: A system, mechanism, tool, programming product, processor, and/or method for generating a hazard in a processor includes: identifying one or more cache lines to invalidate in a second level memory of a processing core in the processor; invalidating, in response to identifying one or more cache lines to invalidate in the second level cache, the one or more identified cache lines in the second level memory; and invalidating, in response to invalidating the one or more identified cache lines in the second level memory, the corresponding one or more cache lines in a first level memory. In an aspect the hazard generating mechanism is triggered, preferably on demand, and includes in an approach searching for cache lines in the second level memory that are also in the first level memory.Type: ApplicationFiled: December 17, 2021Publication date: June 22, 2023Inventors: Shakti Kapoor, Nelson Wu, Manoj Dusanapudi
-
Publication number: 20230122466Abstract: Methods and systems for validating cache coherence in a data processing system are described. A processing element may detect a load instruction requesting the processing element to transfer data from a global memory location to a local memory location. The processing element may apply, in response to detecting the load instruction requesting the processing element to transfer data from the global memory location to the local memory location, a delay to the transfer of the data from the global memory location to the local memory location. The processing element may execute the load instruction and transferring the data from the global memory location to the local memory location with the applied delay. The processing element may validate, in response to executing the load instruction and transferring the data with the applied delay, a cache coherence of the data processing system.Type: ApplicationFiled: October 20, 2021Publication date: April 20, 2023Inventors: Shakti Kapoor, Manoj Dusanapudi, Nelson Wu
-
Publication number: 20230105945Abstract: Systems and methods for invalidating page translation entries are described. A processing element may apply a delay to a drain cycle of a store reorder queue (SRQ) of a processing element. The processing element may drain the SRQ under the delayed drain cycle. The processing element may receive a translation lookaside buffer invalidation (TLBI) instruction from an interconnect connecting the plurality of processing elements. The TLBI instruction may be an instruction to invalidate a translation lookaside buffer (TLB) entry corresponding to at least one of a virtual memory page and a physical memory frame. The TLBI instruction may be broadcasted by another processing element. The application of the delay to the drain cycle of the SRQ may decrease a difference between the drain cycle of the SRQ and an invalidation cycle associated with the TLBI.Type: ApplicationFiled: October 4, 2021Publication date: April 6, 2023Inventors: Shakti Kapoor, Nelson Wu, Manoj Dusanapudi
-
Patent number: 11620235Abstract: Systems and methods for invalidating page translation entries are described. A processing element may apply a delay to a drain cycle of a store reorder queue (SRQ) of a processing element. The processing element may drain the SRQ under the delayed drain cycle. The processing element may receive a translation lookaside buffer invalidation (TLBI) instruction from an interconnect connecting the plurality of processing elements. The TLBI instruction may be an instruction to invalidate a translation lookaside buffer (TLB) entry corresponding to at least one of a virtual memory page and a physical memory frame. The TLBI instruction may be broadcasted by another processing element. The application of the delay to the drain cycle of the SRQ may decrease a difference between the drain cycle of the SRQ and an invalidation cycle associated with the TLBI.Type: GrantFiled: October 4, 2021Date of Patent: April 4, 2023Assignee: International Business Machines CorporationInventors: Shakti Kapoor, Nelson Wu, Manoj Dusanapudi
-
Patent number: 11501046Abstract: A system is provided to validate a computer processor. The system includes a computing system configured to obtain core dump data including executable instructions corresponding to a code stored in a legacy processor. An instruction-level simulator is installed in the computing system and is configured to simulate the executable instructions to generate a plurality of instruction traces. The system further includes a pre-silicon chip model simulator configured to execute the instruction traces to generate performance data. The computer processor is verified based at least in part on the performance data.Type: GrantFiled: March 24, 2020Date of Patent: November 15, 2022Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Nelson Wu, Daniel Isaac Rodriguez, Miguel Gomez Gonzalez, Shakti Kapoor
-
Patent number: 11347505Abstract: A processor includes a performance monitor that logs reservation losses, and additionally logs reasons for the reservation losses. By logging reasons for the reservation losses, the performance monitor provides data that can be used to determine whether the reservation losses were due to valid programming, such as two threads competing for the same lock, or whether the reservation losses were due to bad programming. When the reservation losses are due to bad programming, the information can be used to improve the programming to obtain better performance.Type: GrantFiled: January 15, 2020Date of Patent: May 31, 2022Assignee: International Business Machines CorporationInventors: Shakti Kapoor, Karen E. Yokum, John A. Schumann
-
Patent number: 11163704Abstract: Disclosed is a method, apparatus, and/or computer program product for reducing latency in a processor with regard to the execution of noncacheable operations that includes receiving noncacheable operations from one or both of the level 2 cache and a level 3 cache, sending the noncacheable operations to a noncacheable unit (NCU) associated with a core of the processor, executing the noncacheable operations by the NCU, and sending results of the executed noncacheable operations to a host bridge for output to an input/out device. The noncacheable operations bypass the core of the processor.Type: GrantFiled: October 9, 2019Date of Patent: November 2, 2021Assignee: International Business Machines CorporationInventor: Shakti Kapoor
-
Patent number: 11151011Abstract: A computing system includes a core system and an uncore system. The core system includes a packet generator unit configured to generate a data packet having a plurality of bytes defining a target packet size, and to output a first byte among the plurality of bytes at a packet delivery start time. The uncore system includes an input/output (I/O) bridge configured to connect an I/O component to the core system, and a packet monitor unit configured to monitor the bytes delivered from the packet generator unit to the I/O component. The packet monitor unit further determines a packet delivery end time after detecting a last byte of the data packet. The computing system determines a latency attributed to the uncore system and the I/O bridge based on the packet delivery start time and the packet delivery end time.Type: GrantFiled: October 1, 2019Date of Patent: October 19, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Shakti Kapoor, Daniel Isaac Rodriguez, Miguel Gomez Gonzalez, Anatoli Andreev
-
Publication number: 20210303766Abstract: A system is provided to validate a computer processor. The system includes a computing system configured to obtain core dump data including executable instructions corresponding to a code stored in a legacy processor. An instruction-level simulator is installed in the computing system and is configured to simulate the executable instructions to generate a plurality of instruction traces. The system further includes a pre-silicon chip model simulator configured to execute the instruction traces to generate performance data. The computer processor is verified based at least in part on the performance data.Type: ApplicationFiled: March 24, 2020Publication date: September 30, 2021Inventors: Nelson Wu, Daniel Isaac Rodriguez, Miguel Gomez Gonzalez, Shakti Kapoor
-
Patent number: 11094391Abstract: A processor memory is stress tested with a variable list insertion depth using list insertion test segments with non-naturally aligned data boundaries. List insertion test segments are interspersed into test code of a processor memory tests to change the list insertion depth without changing results of the test code. The list insertion test segments are the same structure as the segments of the test code and have non-naturally aligned boundaries. The list insertion test segments include list insertion segments and load/store segments. The list insertion segments locate a current memory location using a fixed segment at a known location. The load/store segments load and store list elements in memory.Type: GrantFiled: June 6, 2019Date of Patent: August 17, 2021Assignee: International Business Machines CorporationInventors: Manoj Dusanapudi, Shakti Kapoor, Nelson Wu
-
Patent number: 11061821Abstract: Disclosed is a system, method and/or computer product that includes generating translation requests that are identical but have different expected results, transmitting the translation requests from a MMU tester to a non-core MMU disposed on a processor chip, where the non-core MMU is external to a processing core of the processor chip, and where the MMU tester is disposed on a computing component external to the processor chip. The method also includes receiving memory translation results from the non-core MMU at the MMU tester, comparing the results to determine if there is a flaw in the non-core MMU.Type: GrantFiled: November 25, 2019Date of Patent: July 13, 2021Assignee: International Business Machines CorporationInventors: Manoj Dusanapudi, Shakti Kapoor, Nelson Wu
-
Patent number: 10983798Abstract: Embodiments of the invention are directed to methods for handling cache. The method includes retrieving a plurality of instructions from a cache. The method further includes placing the plurality of instructions into an instruction fetch buffer. The method includes retrieving a first instruction of the plurality of instructions from the instruction fetch buffer. The method includes executing the first instruction. The method includes retrieving a second instruction from the plurality of instructions from the instruction fetch buffer unless a back invalidate is received from the cache. Thereafter executing the second instruction without refreshing the instruction fetch buffer from the cache.Type: GrantFiled: November 15, 2017Date of Patent: April 20, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Shakti Kapoor
-
Patent number: 10977043Abstract: Embodiments of the invention are directed to methods for handling cache. The method includes retrieving a plurality of instructions from a cache. The method further includes placing the plurality of instructions into an instruction fetch buffer. The method includes retrieving a first instruction of the plurality of instructions from the instruction fetch buffer. The method includes executing the first instruction. The method includes retrieving a second instruction from the plurality of instructions from the instruction fetch buffer unless a back invalidate is received from the cache. Thereafter executing the second instruction without refreshing the instruction fetch buffer from the cache.Type: GrantFiled: September 14, 2017Date of Patent: April 13, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Shakti Kapoor
-
Publication number: 20210099373Abstract: A computing system includes a core system and an uncore system. The core system includes a packet generator unit configured to generate a data packet having a plurality of bytes defining a target packet size, and to output a first byte among the plurality of bytes at a packet delivery start time. The uncore system includes an input/output (I/O) bridge configured to connect an I/O component to the core system, and a packet monitor unit configured to monitor the bytes delivered from the packet generator unit to the I/O component. The packet monitor unit further determines a packet delivery end time after detecting a last byte of the data packet. The computing system determines a latency attributed to the uncore system and the I/O bridge based on the packet delivery start time and the packet delivery end time.Type: ApplicationFiled: October 1, 2019Publication date: April 1, 2021Inventors: Shakti Kapoor, Daniel Isaac Rodriguez, Miguel Gomez Gonzalez, Anatoli Andreev
-
Patent number: 10877864Abstract: A processor memory is stress tested with a variable link stack depth using test code segments and link stack test segments on non-naturally aligned data boundaries. Link stack test segments are interspersed into test code segments of a processor memory test to change the link stack depth without changing results of the test code. The link stack test segments include branch to target, push/pop, push and pop segments. The depth of the link stack is varied independent of the memory test code by changing the number to branches in the branch to target segment and varying the number of the push/pop segments. The link stack test segments and test segments may be placed randomly with a recursive algorithm to intersperse the link stack test segments in the test code segments and to reduce the amount of data to be saved and restored for all subroutine calls, push and pop segments.Type: GrantFiled: December 20, 2018Date of Patent: December 29, 2020Assignee: International Business Machines CorporationInventors: Shakti Kapoor, Manoj Dusanapudi
-
Patent number: 10748637Abstract: A system comprising a computer processor comprising a plurality of registers, a load-store unit configured to load data in at least one of the plurality of registers, and a memory. The memory includes a memory location mapped to a first virtual memory address and a second virtual memory address. Issuance of a load from the memory location via the first virtual memory address causes execution of a side effect. The memory also includes a computer program containing programming instructions that, when executed by the computer processor, performs an operation that includes storing a predetermined data value at the memory location, and testing the memory for errors during load operations.Type: GrantFiled: July 26, 2018Date of Patent: August 18, 2020Assignee: International Business Machines CorporationInventors: Nelson Wu, Manoj Dusanapudi, Shakti Kapoor, Nandhini Rajaiah
-
Patent number: 10713179Abstract: Efficiently generating effective address translations for memory management test cases including obtaining a first set of EAs, wherein each EA comprises an effective segment ID and a page, wherein each effective segment ID of each EA in the first set of EAs is mapped to a same first effective segment; obtaining a set of virtual address corresponding to the first set of EAs; translating the first set of EAs by applying a hash function to each virtual address in the set of virtual addresses to obtain a first set of PTEG addresses mapped to a first set of PTEGs; and generating a translation for a second set of EAs to obtain a second set of PTEG addresses mapped to the first set of PTEGs.Type: GrantFiled: April 2, 2019Date of Patent: July 14, 2020Assignee: International Business Machines CorporationInventors: Manoj Dusanapudi, Shakti Kapoor
-
Publication number: 20200201728Abstract: A processor memory is stress tested with a variable link stack depth using test code segments and link stack test segments on non-naturally aligned data boundaries. Link stack test segments are interspersed into test code segments of a processor memory test to change the link stack depth without changing results of the test code. The link stack test segments include branch to target, push/pop, push and pop segments. The depth of the link stack is varied independent of the memory test code by changing the number to branches in the branch to target segment and varying the number of the push/pop segments. The link stack test segments and test segments may be placed randomly with a recursive algorithm to intersperse the link stack test segments in the test code segments and to reduce the amount of data to be saved and restored for all subroutine calls, push and pop segments.Type: ApplicationFiled: December 20, 2018Publication date: June 25, 2020Inventors: Shakti Kapoor, Manoj Dusanapudi