Patents by Inventor Sudarshan Kadambi

Sudarshan Kadambi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Prefetch unit

Patent number: 7493451

Abstract: In one embodiment, a processor comprises a prefetch unit coupled to a data cache. The prefetch unit is configured to concurrently maintain a plurality of separate, active prefetch streams. Each prefetch stream is either software initiated via execution by the processor of a dedicated prefetch instruction or hardware initiated via detection of a data cache miss by one or more load/store memory operations. The prefetch unit is further configured to generate prefetch requests responsive to the plurality of prefetch streams to prefetch data in to the data cache.

Type: Grant

Filed: June 15, 2006

Date of Patent: February 17, 2009

Assignee: P.A. Semi, Inc.

Inventors: Sudarshan Kadambi, Puneet Kumar, Po-Yung Chang
Multi-stride prefetcher with a recurring prefetch table

Patent number: 7487296

Abstract: A multi-stride prefetcher includes a recurring prefetch table that in turn includes a stream table and an index table. The stream table includes a valid field and a tag field. The stream table also includes a thread number field to help support multi-threaded processor cores. The tag field stores a tag from an address associated with a cache miss. The index table includes fields for storing information characterizing a state machine. The fields include a learning bit. The multi-stride prefetcher prefetches data into a cache for a plurality of streams of cache misses, each stream having a plurality of strides.

Type: Grant

Filed: February 17, 2005

Date of Patent: February 3, 2009

Assignee: Sun Microsystems, Inc.

Inventors: Sorin Iacobovici, Sudarshan Kadambi, Yuan C. Chou
Store Handling in a Processor

Publication number: 20080307166

Abstract: In one embodiment, a processor may be configured to write ECC granular stores into the data cache, while non-ECC granular stores may be merged with cache data in a memory request buffer. In one embodiment, a processor may be configured to detect that a victim block writeback hits one or more stores in a memory request buffer (or vice versa) and may convert the victim block writeback to a fill. In one embodiment, a processor may speculatively issue stores that are subsequent to a load from a load/store queue, but prevent the update for the stores in response to a snoop hit on the load.

Type: Application

Filed: June 5, 2007

Publication date: December 11, 2008

Inventors: Ramesh Gunna, Po-Yung Chang, Sudarshan Kadambi
Converting Victim Writeback to a Fill

Publication number: 20080307167

Abstract: In one embodiment, a processor may be configured to write ECC granular stores into the data cache, while non-ECC granular stores may be merged with cache data in a memory request buffer. In one embodiment, a processor may be configured to detect that a victim block writeback hits one or more stores in a memory request buffer (or vice versa) and may convert the victim block writeback to a fill. In one embodiment, a processor may speculatively issue stores that are subsequent to a load from a load/store queue, but prevent the update for the stores in response to a snoop hit on the load.

Type: Application

Filed: June 5, 2007

Publication date: December 11, 2008

Inventors: Ramesh Gunna, Sudarshan Kadambi
Partial Load/Store Forward Prediction

Publication number: 20080177988

Abstract: In one embodiment, a processor comprises a prediction circuit and another circuit coupled to the prediction circuit. The prediction circuit is configured to predict whether or not a first load instruction will experience a partial store to load forward (PSTLF) event during execution. A PSTLF event occurs if a plurality of bytes, accessed responsive to the first load instruction during execution, include at least a first byte updated responsive to a previous uncommitted store operation and also include at least a second byte not updated responsive to the previous uncommitted store operation. Coupled to receive the first load instruction, the circuit is configured to generate one or more load operations responsive to the first load instruction. The load operations are to be executed in the processor to execute the first load instruction, and a number of the load operations is dependent on the prediction by the prediction circuit.

Type: Application

Filed: March 25, 2008

Publication date: July 24, 2008

Inventors: Sudarshan Kadambi, Po-Yung Chang, Eric Hao
Partial load/store forward prediction

Patent number: 7376817

Abstract: In one embodiment, a processor comprises a prediction circuit and another circuit coupled to the prediction circuit. The prediction circuit is configured to predict whether or not a first load instruction will experience a partial store to load forward (PSTLF) event during execution. A PSTLF event occurs if a plurality of bytes, accessed responsive to the first load instruction during execution, include at least a first byte updated responsive to a previous uncommitted store operation and also include at least a second byte not updated responsive to the previous uncommitted store operation. Coupled to receive the first load instruction, the circuit is configured to generate one or more load operations responsive to the first load instruction. The load operations are to be executed in the processor to execute the first load instruction, and a number of the load operations is dependent on the prediction by the prediction circuit.

Type: Grant

Filed: August 10, 2005

Date of Patent: May 20, 2008

Assignee: P.A. Semi, Inc.

Inventors: Sudarshan Kadambi, Po-Yung Chang, Eric Hao
Prefetch unit

Publication number: 20070294482

Abstract: In one embodiment, a processor comprises a prefetch unit coupled to a data cache. The prefetch unit is configured to concurrently maintain a plurality of separate, active prefetch streams. Each prefetch stream is either software initiated via execution by the processor of a dedicated prefetch instruction or hardware initiated via detection of a data cache miss by one or more load/store memory operations. The prefetch unit is further configured to generate prefetch requests responsive to the plurality of prefetch streams to prefetch data in to the data cache.

Type: Application

Filed: June 15, 2006

Publication date: December 20, 2007

Applicant: P.A. Semi, Inc.

Inventors: Sudarshan Kadambi, Puneet Kumar, Po-Yung Chang
Storing results of producer instructions to facilitate consumer instruction dependency tracking

Patent number: 7237096

Abstract: If a consumer instruction specifies a 64 bit source register comprised of results provided by two 32 bit producer instructions, the number of dependencies that must be tracked per source register can be decreased by transforming one or more of the 32 bit producer instructions so that rather than simply storing its result in a 32 bit destination register, the transformed instruction stores its result into a 64 bit logical register along with another 32 bit value held in another 32 bit register.

Type: Grant

Filed: April 5, 2004

Date of Patent: June 26, 2007

Assignee: Sun Microsystems, Inc.

Inventors: Julian A. Prabhu, Atul Kalambur, Sudarshan Kadambi, Daniel L. Liebholz, Julie M. Staraitis
Data cache block zero implementation

Publication number: 20070113020

Abstract: In one embodiment, a processor comprises a core configured to execute a data cache block write instruction and an interface unit coupled to the core and to an interconnect on which the processor is configured to communicate. The core is configured to transmit a request to the interface unit in response to the data cache block write instruction. If the request is speculative, the interface unit is configured to issue a first transaction on the interconnect. On the other hand, if the request is non-speculative, the interface unit is configured to issue a second transaction on the interconnect. The second transaction is different from the first transaction. For example, the second transaction may be an invalidate transaction and the first transaction may be a probe transaction. In some embodiments, the processor may be in a system including the interconnect and one or more caching agents.

Type: Application

Filed: November 17, 2005

Publication date: May 17, 2007

Applicant: P.A. Semi, Inc.

Inventors: Ramesh Gunna, Sudarshan Kadambi, Peter Bannon
Partial load/store forward prediction

Publication number: 20070038846

Abstract: In one embodiment, a processor comprises a prediction circuit and another circuit coupled to the prediction circuit. The prediction circuit is configured to predict whether or not a first load instruction will experience a partial store to load forward (PSTLF) event during execution. A PSTLF event occurs if a plurality of bytes, accessed responsive to the first load instruction during execution, include at least a first byte updated responsive to a previous uncommitted store operation and also include at least a second byte not updated responsive to the previous uncommitted store operation. Coupled to receive the first load instruction, the circuit is configured to generate one or more load operations responsive to the first load instruction. The load operations are to be executed in the processor to execute the first load instruction, and a number of the load operations is dependent on the prediction by the prediction circuit.

Type: Application

Filed: August 10, 2005

Publication date: February 15, 2007

Applicant: P.A. Semi, Inc.

Inventors: Sudarshan Kadambi, Po-Yung Chang, Eric Hao
VALIDATING BRANCH RESOLUTION TO AVOID MIS-STEERING INSTRUCTION FETCH

Publication number: 20060248319

Abstract: A processor avoids or eliminates repetitive replay conditions and frequent instruction resteering through various techniques including resteering the fetch after the branch instruction retires, and delaying branch resolution. A processor resolves conditional branches and avoids repetitive resteering by delaying branch resolution. The processor has an instruction pipeline with inserted delay in branch condition and replay control pathways. For example, an instruction sequence that includes a load instruction followed by a subtract instruction then a conditional branch, delays branch resolution to allow time for analysis to determine whether the condition branch has resolved correctly. Eliminating incorrect branch resolutions prevents flushing of correctly predicted branches.

Type: Application

Filed: July 10, 2006

Publication date: November 2, 2006

Applicant: SUN MICROSYSTEMS, INC.

Inventor: Sudarshan Kadambi
Processor that eliminates mis-steering instruction fetch resulting from incorrect resolution of mis-speculated branch instructions

Patent number: 7076640

Abstract: A processor avoids or eliminates repetitive replay conditions and frequent instruction resteering through various techniques including resteering the fetch after the branch instruction retires, and delaying branch resolution. A processor resolves conditional branches and avoids repetitive resteering by delaying branch resolution. The processor has an instruction pipeline with inserted delay in branch condition and replay control pathways. For example, an instruction sequence that includes a load instruction followed by a subtract instruction then a conditional branch, delays branch resolution to allow time for analysis to determine whether the condition branch has resolved correctly. Eliminating incorrect branch resolutions prevents flushing of correctly predicted branches.

Type: Grant

Filed: March 11, 2002

Date of Patent: July 11, 2006

Assignee: Sun Microsystems, Inc.

Inventor: Sudarshan Kadambi
Out-of-order processor that reduces mis-speculation using a replay scoreboard

Patent number: 7055021

Abstract: A pipelined processor includes a dependency scoreboard that tracks dependency for replay of instructions capable of executing out-of-order. Early instructions are termed “producers” that produce data for later dependent instructions. The subsequent instructions are “consumers” that consume the data produced by the producer instructions. The dependency scoreboard is a table of storage cells that tracks producers and consumers and designates whether a particular instruction is dependent on a producer. Active instructions are allocated storage elements for all active instructions. For example, a dependency scoreboard for tracking N active instructions will have N dependency storage cells for ones of the N active instructions. The storage cells for an active instruction may be set for each active instruction that is a “producer” instruction and all levels of dependency are tracked in each cycle.

Type: Grant

Filed: March 11, 2002

Date of Patent: May 30, 2006

Assignee: Sun Microsystems, Inc.

Inventor: Sudarshan Kadambi
Method and apparatus for avoiding cache pollution due to speculative memory load operations in a microprocessor

Patent number: 7010648

Abstract: A cache pollution avoidance unit includes a dynamic memory dependency table for storing a dependency state condition between a first load instruction and a sequentially later second load instruction, which may depend on the completion of execution of the first load instruction for operand data. The cache pollution avoidance unit logically ANDs the dependency state condition stored in the dynamic memory dependency table with a cache memory “miss” state condition returned by the cache pollution avoidance unit for operand data produced by the first load instruction and required by the second load instruction. If the logical ANDing is true, memory access to the second load instruction is squashed and the execution of the second load instruction is re-scheduled.

Type: Grant

Filed: September 8, 2003

Date of Patent: March 7, 2006

Assignee: Sun Microsystems, Inc.

Inventors: Sudarshan Kadambi, Vijay Balakrishnan
Method and apparatus for predicting hot spots in cache memories

Patent number: 6976125

Abstract: One embodiment of the present invention provides a system for predicting hot spots in a cache memory. Upon receiving a memory operation at the cache, the system determines a target location within the cache for the memory operation. Once the target location is determined, the system increments a counter associated with the target location. If the counter reaches a pre-determined threshold value, the system generates a signal indicating that the target location is a hot spot in the cache memory.

Type: Grant

Filed: January 29, 2003

Date of Patent: December 13, 2005

Assignee: Sun Microsystems, Inc.

Inventors: Sudarshan Kadambi, Vijay Balakrishnan, Wayne I. Yamamoto
Method and apparatus for reducing the effects of hot spots in cache memories

Patent number: 6948032

Abstract: One embodiment of the present invention provides a system that uses a hot spot cache to alleviate the performance problems caused by hot spots in cache memories, wherein the hot spot cache stores lines that are evicted from hot spots in the cache. Upon receiving a memory operation at the cache, the system performs a lookup for the memory operation in both the cache and the hot spot cache in parallel. If the memory operation is a read operation that causes a miss in the cache and a hit in the hot spot cache, the system reads a data line for the read operation from the hot spot cache, writes the data line to the cache, performs the read operation on the data line in the cache, and then evicts the data line from the hot spot cache.

Type: Grant

Filed: January 29, 2003

Date of Patent: September 20, 2005

Assignee: Sun Microsystems, Inc.

Inventors: Sudarshan Kadambi, Vijay Balakrishnan, Wayne I. Yamamoto
Method and apparatus for reducing register file access times in pipelined processors

Patent number: 6934830

Abstract: One embodiment of the present invention provides a system that reduces the time required to access registers from a register file within a processor. During operation, the system receives an instruction to be executed, wherein the instruction identifies at least one operand to be accessed from the register file. Next, the system looks up the operands in a register pane, wherein the register pane is smaller and faster than the register file and contains copies of a subset of registers from the register file. If the lookup is successful, the system retrieves the operands from the register pane to execute the instruction. Otherwise, if the lookup is not successful, the system retrieves the operands from the register file, and stores the operands into the register pane. This triggers the system to reissue the instruction to be executed again, so that the re-issued instruction retrieves the operands from the register pane.

Type: Grant

Filed: September 26, 2002

Date of Patent: August 23, 2005

Assignee: Sun Microsystems, Inc.

Inventors: Sudarshan Kadambi, Adam R. Talcott, Wayne I. Yamamoto
Method and apparatus for avoiding cache pollution due to speculative memory load operations in a microprocessor

Publication number: 20050055533

Abstract: A cache pollution avoidance unit includes a dynamic memory dependency table for storing a dependency state condition between a first load instruction and a sequentially later second load instruction, which may depend on the completion of execution of the first load instruction for operand data. The cache pollution avoidance unit logically ANDs the dependency state condition stored in the dynamic memory dependency table with a cache memory “miss” state condition returned by the cache pollution avoidance unit for operand data produced by the first load instruction and required by the second load instruction. If the logical ANDing is true, memory access to the second load instruction is squashed and the execution of the second load instruction is re-scheduled.

Type: Application

Filed: September 8, 2003

Publication date: March 10, 2005

Inventors: Sudarshan Kadambi, Vijay Balakrishnan
Method and apparatus for predicting hot spots in cache memories

Publication number: 20040148469

Abstract: One embodiment of the present invention provides a system for predicting hot spots in a cache memory. Upon receiving a memory operation at the cache, the system determines a target location within the cache for the memory operation. Once the target location is determined, the system increments a counter associated with the target location. If the counter reaches a pre-determined threshold value, the system generates a signal indicating that the target location is a hot spot in the cache memory.

Type: Application

Filed: January 29, 2003

Publication date: July 29, 2004

Inventors: Sudarshan Kadambi, Vijay Balakrishnan, Wayne I. Yamamoto
Method and apparatus for reducing the effects of hot spots in cache memories

Publication number: 20040148465

Abstract: One embodiment of the present invention provides a system that uses a hot spot cache to alleviate the performance problems caused by hot spots in cache memories, wherein the hot spot cache stores lines that are evicted from hot spots in the cache. Upon receiving a memory operation at the cache, the system performs a lookup for the memory operation in both the cache and the hot spot cache in parallel. If the memory operation is a read operation that causes a miss in the cache and a hit in the hot spot cache, the system reads a data line for the read operation from the hot spot cache, writes the data line to the cache, performs the read operation on the data line in the cache, and then evicts the data line from the hot spot cache.

Type: Application

Filed: January 29, 2003

Publication date: July 29, 2004

Inventors: Sudarshan Kadambi, Vijay Balakrishnan, Wayne I. Yamamoto

prev 1 2 3 next