Patents by Inventor Sudarshan Kadambi

Sudarshan Kadambi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7493451
    Abstract: In one embodiment, a processor comprises a prefetch unit coupled to a data cache. The prefetch unit is configured to concurrently maintain a plurality of separate, active prefetch streams. Each prefetch stream is either software initiated via execution by the processor of a dedicated prefetch instruction or hardware initiated via detection of a data cache miss by one or more load/store memory operations. The prefetch unit is further configured to generate prefetch requests responsive to the plurality of prefetch streams to prefetch data in to the data cache.
    Type: Grant
    Filed: June 15, 2006
    Date of Patent: February 17, 2009
    Assignee: P.A. Semi, Inc.
    Inventors: Sudarshan Kadambi, Puneet Kumar, Po-Yung Chang
  • Patent number: 7487296
    Abstract: A multi-stride prefetcher includes a recurring prefetch table that in turn includes a stream table and an index table. The stream table includes a valid field and a tag field. The stream table also includes a thread number field to help support multi-threaded processor cores. The tag field stores a tag from an address associated with a cache miss. The index table includes fields for storing information characterizing a state machine. The fields include a learning bit. The multi-stride prefetcher prefetches data into a cache for a plurality of streams of cache misses, each stream having a plurality of strides.
    Type: Grant
    Filed: February 17, 2005
    Date of Patent: February 3, 2009
    Assignee: Sun Microsystems, Inc.
    Inventors: Sorin Iacobovici, Sudarshan Kadambi, Yuan C. Chou
  • Publication number: 20080307166
    Abstract: In one embodiment, a processor may be configured to write ECC granular stores into the data cache, while non-ECC granular stores may be merged with cache data in a memory request buffer. In one embodiment, a processor may be configured to detect that a victim block writeback hits one or more stores in a memory request buffer (or vice versa) and may convert the victim block writeback to a fill. In one embodiment, a processor may speculatively issue stores that are subsequent to a load from a load/store queue, but prevent the update for the stores in response to a snoop hit on the load.
    Type: Application
    Filed: June 5, 2007
    Publication date: December 11, 2008
    Inventors: Ramesh Gunna, Po-Yung Chang, Sudarshan Kadambi
  • Publication number: 20080307167
    Abstract: In one embodiment, a processor may be configured to write ECC granular stores into the data cache, while non-ECC granular stores may be merged with cache data in a memory request buffer. In one embodiment, a processor may be configured to detect that a victim block writeback hits one or more stores in a memory request buffer (or vice versa) and may convert the victim block writeback to a fill. In one embodiment, a processor may speculatively issue stores that are subsequent to a load from a load/store queue, but prevent the update for the stores in response to a snoop hit on the load.
    Type: Application
    Filed: June 5, 2007
    Publication date: December 11, 2008
    Inventors: Ramesh Gunna, Sudarshan Kadambi
  • Publication number: 20080177988
    Abstract: In one embodiment, a processor comprises a prediction circuit and another circuit coupled to the prediction circuit. The prediction circuit is configured to predict whether or not a first load instruction will experience a partial store to load forward (PSTLF) event during execution. A PSTLF event occurs if a plurality of bytes, accessed responsive to the first load instruction during execution, include at least a first byte updated responsive to a previous uncommitted store operation and also include at least a second byte not updated responsive to the previous uncommitted store operation. Coupled to receive the first load instruction, the circuit is configured to generate one or more load operations responsive to the first load instruction. The load operations are to be executed in the processor to execute the first load instruction, and a number of the load operations is dependent on the prediction by the prediction circuit.
    Type: Application
    Filed: March 25, 2008
    Publication date: July 24, 2008
    Inventors: Sudarshan Kadambi, Po-Yung Chang, Eric Hao
  • Patent number: 7376817
    Abstract: In one embodiment, a processor comprises a prediction circuit and another circuit coupled to the prediction circuit. The prediction circuit is configured to predict whether or not a first load instruction will experience a partial store to load forward (PSTLF) event during execution. A PSTLF event occurs if a plurality of bytes, accessed responsive to the first load instruction during execution, include at least a first byte updated responsive to a previous uncommitted store operation and also include at least a second byte not updated responsive to the previous uncommitted store operation. Coupled to receive the first load instruction, the circuit is configured to generate one or more load operations responsive to the first load instruction. The load operations are to be executed in the processor to execute the first load instruction, and a number of the load operations is dependent on the prediction by the prediction circuit.
    Type: Grant
    Filed: August 10, 2005
    Date of Patent: May 20, 2008
    Assignee: P.A. Semi, Inc.
    Inventors: Sudarshan Kadambi, Po-Yung Chang, Eric Hao
  • Publication number: 20070294482
    Abstract: In one embodiment, a processor comprises a prefetch unit coupled to a data cache. The prefetch unit is configured to concurrently maintain a plurality of separate, active prefetch streams. Each prefetch stream is either software initiated via execution by the processor of a dedicated prefetch instruction or hardware initiated via detection of a data cache miss by one or more load/store memory operations. The prefetch unit is further configured to generate prefetch requests responsive to the plurality of prefetch streams to prefetch data in to the data cache.
    Type: Application
    Filed: June 15, 2006
    Publication date: December 20, 2007
    Applicant: P.A. Semi, Inc.
    Inventors: Sudarshan Kadambi, Puneet Kumar, Po-Yung Chang
  • Patent number: 7237096
    Abstract: If a consumer instruction specifies a 64 bit source register comprised of results provided by two 32 bit producer instructions, the number of dependencies that must be tracked per source register can be decreased by transforming one or more of the 32 bit producer instructions so that rather than simply storing its result in a 32 bit destination register, the transformed instruction stores its result into a 64 bit logical register along with another 32 bit value held in another 32 bit register.
    Type: Grant
    Filed: April 5, 2004
    Date of Patent: June 26, 2007
    Assignee: Sun Microsystems, Inc.
    Inventors: Julian A. Prabhu, Atul Kalambur, Sudarshan Kadambi, Daniel L. Liebholz, Julie M. Staraitis
  • Publication number: 20070113020
    Abstract: In one embodiment, a processor comprises a core configured to execute a data cache block write instruction and an interface unit coupled to the core and to an interconnect on which the processor is configured to communicate. The core is configured to transmit a request to the interface unit in response to the data cache block write instruction. If the request is speculative, the interface unit is configured to issue a first transaction on the interconnect. On the other hand, if the request is non-speculative, the interface unit is configured to issue a second transaction on the interconnect. The second transaction is different from the first transaction. For example, the second transaction may be an invalidate transaction and the first transaction may be a probe transaction. In some embodiments, the processor may be in a system including the interconnect and one or more caching agents.
    Type: Application
    Filed: November 17, 2005
    Publication date: May 17, 2007
    Applicant: P.A. Semi, Inc.
    Inventors: Ramesh Gunna, Sudarshan Kadambi, Peter Bannon
  • Publication number: 20070038846
    Abstract: In one embodiment, a processor comprises a prediction circuit and another circuit coupled to the prediction circuit. The prediction circuit is configured to predict whether or not a first load instruction will experience a partial store to load forward (PSTLF) event during execution. A PSTLF event occurs if a plurality of bytes, accessed responsive to the first load instruction during execution, include at least a first byte updated responsive to a previous uncommitted store operation and also include at least a second byte not updated responsive to the previous uncommitted store operation. Coupled to receive the first load instruction, the circuit is configured to generate one or more load operations responsive to the first load instruction. The load operations are to be executed in the processor to execute the first load instruction, and a number of the load operations is dependent on the prediction by the prediction circuit.
    Type: Application
    Filed: August 10, 2005
    Publication date: February 15, 2007
    Applicant: P.A. Semi, Inc.
    Inventors: Sudarshan Kadambi, Po-Yung Chang, Eric Hao
  • Publication number: 20060248319
    Abstract: A processor avoids or eliminates repetitive replay conditions and frequent instruction resteering through various techniques including resteering the fetch after the branch instruction retires, and delaying branch resolution. A processor resolves conditional branches and avoids repetitive resteering by delaying branch resolution. The processor has an instruction pipeline with inserted delay in branch condition and replay control pathways. For example, an instruction sequence that includes a load instruction followed by a subtract instruction then a conditional branch, delays branch resolution to allow time for analysis to determine whether the condition branch has resolved correctly. Eliminating incorrect branch resolutions prevents flushing of correctly predicted branches.
    Type: Application
    Filed: July 10, 2006
    Publication date: November 2, 2006
    Applicant: SUN MICROSYSTEMS, INC.
    Inventor: Sudarshan Kadambi
  • Patent number: 7076640
    Abstract: A processor avoids or eliminates repetitive replay conditions and frequent instruction resteering through various techniques including resteering the fetch after the branch instruction retires, and delaying branch resolution. A processor resolves conditional branches and avoids repetitive resteering by delaying branch resolution. The processor has an instruction pipeline with inserted delay in branch condition and replay control pathways. For example, an instruction sequence that includes a load instruction followed by a subtract instruction then a conditional branch, delays branch resolution to allow time for analysis to determine whether the condition branch has resolved correctly. Eliminating incorrect branch resolutions prevents flushing of correctly predicted branches.
    Type: Grant
    Filed: March 11, 2002
    Date of Patent: July 11, 2006
    Assignee: Sun Microsystems, Inc.
    Inventor: Sudarshan Kadambi
  • Patent number: 7055021
    Abstract: A pipelined processor includes a dependency scoreboard that tracks dependency for replay of instructions capable of executing out-of-order. Early instructions are termed “producers” that produce data for later dependent instructions. The subsequent instructions are “consumers” that consume the data produced by the producer instructions. The dependency scoreboard is a table of storage cells that tracks producers and consumers and designates whether a particular instruction is dependent on a producer. Active instructions are allocated storage elements for all active instructions. For example, a dependency scoreboard for tracking N active instructions will have N dependency storage cells for ones of the N active instructions. The storage cells for an active instruction may be set for each active instruction that is a “producer” instruction and all levels of dependency are tracked in each cycle.
    Type: Grant
    Filed: March 11, 2002
    Date of Patent: May 30, 2006
    Assignee: Sun Microsystems, Inc.
    Inventor: Sudarshan Kadambi
  • Patent number: 7010648
    Abstract: A cache pollution avoidance unit includes a dynamic memory dependency table for storing a dependency state condition between a first load instruction and a sequentially later second load instruction, which may depend on the completion of execution of the first load instruction for operand data. The cache pollution avoidance unit logically ANDs the dependency state condition stored in the dynamic memory dependency table with a cache memory “miss” state condition returned by the cache pollution avoidance unit for operand data produced by the first load instruction and required by the second load instruction. If the logical ANDing is true, memory access to the second load instruction is squashed and the execution of the second load instruction is re-scheduled.
    Type: Grant
    Filed: September 8, 2003
    Date of Patent: March 7, 2006
    Assignee: Sun Microsystems, Inc.
    Inventors: Sudarshan Kadambi, Vijay Balakrishnan
  • Patent number: 6976125
    Abstract: One embodiment of the present invention provides a system for predicting hot spots in a cache memory. Upon receiving a memory operation at the cache, the system determines a target location within the cache for the memory operation. Once the target location is determined, the system increments a counter associated with the target location. If the counter reaches a pre-determined threshold value, the system generates a signal indicating that the target location is a hot spot in the cache memory.
    Type: Grant
    Filed: January 29, 2003
    Date of Patent: December 13, 2005
    Assignee: Sun Microsystems, Inc.
    Inventors: Sudarshan Kadambi, Vijay Balakrishnan, Wayne I. Yamamoto
  • Patent number: 6948032
    Abstract: One embodiment of the present invention provides a system that uses a hot spot cache to alleviate the performance problems caused by hot spots in cache memories, wherein the hot spot cache stores lines that are evicted from hot spots in the cache. Upon receiving a memory operation at the cache, the system performs a lookup for the memory operation in both the cache and the hot spot cache in parallel. If the memory operation is a read operation that causes a miss in the cache and a hit in the hot spot cache, the system reads a data line for the read operation from the hot spot cache, writes the data line to the cache, performs the read operation on the data line in the cache, and then evicts the data line from the hot spot cache.
    Type: Grant
    Filed: January 29, 2003
    Date of Patent: September 20, 2005
    Assignee: Sun Microsystems, Inc.
    Inventors: Sudarshan Kadambi, Vijay Balakrishnan, Wayne I. Yamamoto
  • Patent number: 6934830
    Abstract: One embodiment of the present invention provides a system that reduces the time required to access registers from a register file within a processor. During operation, the system receives an instruction to be executed, wherein the instruction identifies at least one operand to be accessed from the register file. Next, the system looks up the operands in a register pane, wherein the register pane is smaller and faster than the register file and contains copies of a subset of registers from the register file. If the lookup is successful, the system retrieves the operands from the register pane to execute the instruction. Otherwise, if the lookup is not successful, the system retrieves the operands from the register file, and stores the operands into the register pane. This triggers the system to reissue the instruction to be executed again, so that the re-issued instruction retrieves the operands from the register pane.
    Type: Grant
    Filed: September 26, 2002
    Date of Patent: August 23, 2005
    Assignee: Sun Microsystems, Inc.
    Inventors: Sudarshan Kadambi, Adam R. Talcott, Wayne I. Yamamoto
  • Publication number: 20050055533
    Abstract: A cache pollution avoidance unit includes a dynamic memory dependency table for storing a dependency state condition between a first load instruction and a sequentially later second load instruction, which may depend on the completion of execution of the first load instruction for operand data. The cache pollution avoidance unit logically ANDs the dependency state condition stored in the dynamic memory dependency table with a cache memory “miss” state condition returned by the cache pollution avoidance unit for operand data produced by the first load instruction and required by the second load instruction. If the logical ANDing is true, memory access to the second load instruction is squashed and the execution of the second load instruction is re-scheduled.
    Type: Application
    Filed: September 8, 2003
    Publication date: March 10, 2005
    Inventors: Sudarshan Kadambi, Vijay Balakrishnan
  • Publication number: 20040148469
    Abstract: One embodiment of the present invention provides a system for predicting hot spots in a cache memory. Upon receiving a memory operation at the cache, the system determines a target location within the cache for the memory operation. Once the target location is determined, the system increments a counter associated with the target location. If the counter reaches a pre-determined threshold value, the system generates a signal indicating that the target location is a hot spot in the cache memory.
    Type: Application
    Filed: January 29, 2003
    Publication date: July 29, 2004
    Inventors: Sudarshan Kadambi, Vijay Balakrishnan, Wayne I. Yamamoto
  • Publication number: 20040148465
    Abstract: One embodiment of the present invention provides a system that uses a hot spot cache to alleviate the performance problems caused by hot spots in cache memories, wherein the hot spot cache stores lines that are evicted from hot spots in the cache. Upon receiving a memory operation at the cache, the system performs a lookup for the memory operation in both the cache and the hot spot cache in parallel. If the memory operation is a read operation that causes a miss in the cache and a hit in the hot spot cache, the system reads a data line for the read operation from the hot spot cache, writes the data line to the cache, performs the read operation on the data line in the cache, and then evicts the data line from the hot spot cache.
    Type: Application
    Filed: January 29, 2003
    Publication date: July 29, 2004
    Inventors: Sudarshan Kadambi, Vijay Balakrishnan, Wayne I. Yamamoto