Patents by Inventor Sudarshan Kadambi
Sudarshan Kadambi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 7493451Abstract: In one embodiment, a processor comprises a prefetch unit coupled to a data cache. The prefetch unit is configured to concurrently maintain a plurality of separate, active prefetch streams. Each prefetch stream is either software initiated via execution by the processor of a dedicated prefetch instruction or hardware initiated via detection of a data cache miss by one or more load/store memory operations. The prefetch unit is further configured to generate prefetch requests responsive to the plurality of prefetch streams to prefetch data in to the data cache.Type: GrantFiled: June 15, 2006Date of Patent: February 17, 2009Assignee: P.A. Semi, Inc.Inventors: Sudarshan Kadambi, Puneet Kumar, Po-Yung Chang
-
Patent number: 7487296Abstract: A multi-stride prefetcher includes a recurring prefetch table that in turn includes a stream table and an index table. The stream table includes a valid field and a tag field. The stream table also includes a thread number field to help support multi-threaded processor cores. The tag field stores a tag from an address associated with a cache miss. The index table includes fields for storing information characterizing a state machine. The fields include a learning bit. The multi-stride prefetcher prefetches data into a cache for a plurality of streams of cache misses, each stream having a plurality of strides.Type: GrantFiled: February 17, 2005Date of Patent: February 3, 2009Assignee: Sun Microsystems, Inc.Inventors: Sorin Iacobovici, Sudarshan Kadambi, Yuan C. Chou
-
Publication number: 20080307166Abstract: In one embodiment, a processor may be configured to write ECC granular stores into the data cache, while non-ECC granular stores may be merged with cache data in a memory request buffer. In one embodiment, a processor may be configured to detect that a victim block writeback hits one or more stores in a memory request buffer (or vice versa) and may convert the victim block writeback to a fill. In one embodiment, a processor may speculatively issue stores that are subsequent to a load from a load/store queue, but prevent the update for the stores in response to a snoop hit on the load.Type: ApplicationFiled: June 5, 2007Publication date: December 11, 2008Inventors: Ramesh Gunna, Po-Yung Chang, Sudarshan Kadambi
-
Publication number: 20080307167Abstract: In one embodiment, a processor may be configured to write ECC granular stores into the data cache, while non-ECC granular stores may be merged with cache data in a memory request buffer. In one embodiment, a processor may be configured to detect that a victim block writeback hits one or more stores in a memory request buffer (or vice versa) and may convert the victim block writeback to a fill. In one embodiment, a processor may speculatively issue stores that are subsequent to a load from a load/store queue, but prevent the update for the stores in response to a snoop hit on the load.Type: ApplicationFiled: June 5, 2007Publication date: December 11, 2008Inventors: Ramesh Gunna, Sudarshan Kadambi
-
Publication number: 20080177988Abstract: In one embodiment, a processor comprises a prediction circuit and another circuit coupled to the prediction circuit. The prediction circuit is configured to predict whether or not a first load instruction will experience a partial store to load forward (PSTLF) event during execution. A PSTLF event occurs if a plurality of bytes, accessed responsive to the first load instruction during execution, include at least a first byte updated responsive to a previous uncommitted store operation and also include at least a second byte not updated responsive to the previous uncommitted store operation. Coupled to receive the first load instruction, the circuit is configured to generate one or more load operations responsive to the first load instruction. The load operations are to be executed in the processor to execute the first load instruction, and a number of the load operations is dependent on the prediction by the prediction circuit.Type: ApplicationFiled: March 25, 2008Publication date: July 24, 2008Inventors: Sudarshan Kadambi, Po-Yung Chang, Eric Hao
-
Patent number: 7376817Abstract: In one embodiment, a processor comprises a prediction circuit and another circuit coupled to the prediction circuit. The prediction circuit is configured to predict whether or not a first load instruction will experience a partial store to load forward (PSTLF) event during execution. A PSTLF event occurs if a plurality of bytes, accessed responsive to the first load instruction during execution, include at least a first byte updated responsive to a previous uncommitted store operation and also include at least a second byte not updated responsive to the previous uncommitted store operation. Coupled to receive the first load instruction, the circuit is configured to generate one or more load operations responsive to the first load instruction. The load operations are to be executed in the processor to execute the first load instruction, and a number of the load operations is dependent on the prediction by the prediction circuit.Type: GrantFiled: August 10, 2005Date of Patent: May 20, 2008Assignee: P.A. Semi, Inc.Inventors: Sudarshan Kadambi, Po-Yung Chang, Eric Hao
-
Publication number: 20070294482Abstract: In one embodiment, a processor comprises a prefetch unit coupled to a data cache. The prefetch unit is configured to concurrently maintain a plurality of separate, active prefetch streams. Each prefetch stream is either software initiated via execution by the processor of a dedicated prefetch instruction or hardware initiated via detection of a data cache miss by one or more load/store memory operations. The prefetch unit is further configured to generate prefetch requests responsive to the plurality of prefetch streams to prefetch data in to the data cache.Type: ApplicationFiled: June 15, 2006Publication date: December 20, 2007Applicant: P.A. Semi, Inc.Inventors: Sudarshan Kadambi, Puneet Kumar, Po-Yung Chang
-
Patent number: 7237096Abstract: If a consumer instruction specifies a 64 bit source register comprised of results provided by two 32 bit producer instructions, the number of dependencies that must be tracked per source register can be decreased by transforming one or more of the 32 bit producer instructions so that rather than simply storing its result in a 32 bit destination register, the transformed instruction stores its result into a 64 bit logical register along with another 32 bit value held in another 32 bit register.Type: GrantFiled: April 5, 2004Date of Patent: June 26, 2007Assignee: Sun Microsystems, Inc.Inventors: Julian A. Prabhu, Atul Kalambur, Sudarshan Kadambi, Daniel L. Liebholz, Julie M. Staraitis
-
Publication number: 20070113020Abstract: In one embodiment, a processor comprises a core configured to execute a data cache block write instruction and an interface unit coupled to the core and to an interconnect on which the processor is configured to communicate. The core is configured to transmit a request to the interface unit in response to the data cache block write instruction. If the request is speculative, the interface unit is configured to issue a first transaction on the interconnect. On the other hand, if the request is non-speculative, the interface unit is configured to issue a second transaction on the interconnect. The second transaction is different from the first transaction. For example, the second transaction may be an invalidate transaction and the first transaction may be a probe transaction. In some embodiments, the processor may be in a system including the interconnect and one or more caching agents.Type: ApplicationFiled: November 17, 2005Publication date: May 17, 2007Applicant: P.A. Semi, Inc.Inventors: Ramesh Gunna, Sudarshan Kadambi, Peter Bannon
-
Publication number: 20070038846Abstract: In one embodiment, a processor comprises a prediction circuit and another circuit coupled to the prediction circuit. The prediction circuit is configured to predict whether or not a first load instruction will experience a partial store to load forward (PSTLF) event during execution. A PSTLF event occurs if a plurality of bytes, accessed responsive to the first load instruction during execution, include at least a first byte updated responsive to a previous uncommitted store operation and also include at least a second byte not updated responsive to the previous uncommitted store operation. Coupled to receive the first load instruction, the circuit is configured to generate one or more load operations responsive to the first load instruction. The load operations are to be executed in the processor to execute the first load instruction, and a number of the load operations is dependent on the prediction by the prediction circuit.Type: ApplicationFiled: August 10, 2005Publication date: February 15, 2007Applicant: P.A. Semi, Inc.Inventors: Sudarshan Kadambi, Po-Yung Chang, Eric Hao
-
Publication number: 20060248319Abstract: A processor avoids or eliminates repetitive replay conditions and frequent instruction resteering through various techniques including resteering the fetch after the branch instruction retires, and delaying branch resolution. A processor resolves conditional branches and avoids repetitive resteering by delaying branch resolution. The processor has an instruction pipeline with inserted delay in branch condition and replay control pathways. For example, an instruction sequence that includes a load instruction followed by a subtract instruction then a conditional branch, delays branch resolution to allow time for analysis to determine whether the condition branch has resolved correctly. Eliminating incorrect branch resolutions prevents flushing of correctly predicted branches.Type: ApplicationFiled: July 10, 2006Publication date: November 2, 2006Applicant: SUN MICROSYSTEMS, INC.Inventor: Sudarshan Kadambi
-
Patent number: 7076640Abstract: A processor avoids or eliminates repetitive replay conditions and frequent instruction resteering through various techniques including resteering the fetch after the branch instruction retires, and delaying branch resolution. A processor resolves conditional branches and avoids repetitive resteering by delaying branch resolution. The processor has an instruction pipeline with inserted delay in branch condition and replay control pathways. For example, an instruction sequence that includes a load instruction followed by a subtract instruction then a conditional branch, delays branch resolution to allow time for analysis to determine whether the condition branch has resolved correctly. Eliminating incorrect branch resolutions prevents flushing of correctly predicted branches.Type: GrantFiled: March 11, 2002Date of Patent: July 11, 2006Assignee: Sun Microsystems, Inc.Inventor: Sudarshan Kadambi
-
Patent number: 7055021Abstract: A pipelined processor includes a dependency scoreboard that tracks dependency for replay of instructions capable of executing out-of-order. Early instructions are termed “producers” that produce data for later dependent instructions. The subsequent instructions are “consumers” that consume the data produced by the producer instructions. The dependency scoreboard is a table of storage cells that tracks producers and consumers and designates whether a particular instruction is dependent on a producer. Active instructions are allocated storage elements for all active instructions. For example, a dependency scoreboard for tracking N active instructions will have N dependency storage cells for ones of the N active instructions. The storage cells for an active instruction may be set for each active instruction that is a “producer” instruction and all levels of dependency are tracked in each cycle.Type: GrantFiled: March 11, 2002Date of Patent: May 30, 2006Assignee: Sun Microsystems, Inc.Inventor: Sudarshan Kadambi
-
Patent number: 7010648Abstract: A cache pollution avoidance unit includes a dynamic memory dependency table for storing a dependency state condition between a first load instruction and a sequentially later second load instruction, which may depend on the completion of execution of the first load instruction for operand data. The cache pollution avoidance unit logically ANDs the dependency state condition stored in the dynamic memory dependency table with a cache memory “miss” state condition returned by the cache pollution avoidance unit for operand data produced by the first load instruction and required by the second load instruction. If the logical ANDing is true, memory access to the second load instruction is squashed and the execution of the second load instruction is re-scheduled.Type: GrantFiled: September 8, 2003Date of Patent: March 7, 2006Assignee: Sun Microsystems, Inc.Inventors: Sudarshan Kadambi, Vijay Balakrishnan
-
Patent number: 6976125Abstract: One embodiment of the present invention provides a system for predicting hot spots in a cache memory. Upon receiving a memory operation at the cache, the system determines a target location within the cache for the memory operation. Once the target location is determined, the system increments a counter associated with the target location. If the counter reaches a pre-determined threshold value, the system generates a signal indicating that the target location is a hot spot in the cache memory.Type: GrantFiled: January 29, 2003Date of Patent: December 13, 2005Assignee: Sun Microsystems, Inc.Inventors: Sudarshan Kadambi, Vijay Balakrishnan, Wayne I. Yamamoto
-
Patent number: 6948032Abstract: One embodiment of the present invention provides a system that uses a hot spot cache to alleviate the performance problems caused by hot spots in cache memories, wherein the hot spot cache stores lines that are evicted from hot spots in the cache. Upon receiving a memory operation at the cache, the system performs a lookup for the memory operation in both the cache and the hot spot cache in parallel. If the memory operation is a read operation that causes a miss in the cache and a hit in the hot spot cache, the system reads a data line for the read operation from the hot spot cache, writes the data line to the cache, performs the read operation on the data line in the cache, and then evicts the data line from the hot spot cache.Type: GrantFiled: January 29, 2003Date of Patent: September 20, 2005Assignee: Sun Microsystems, Inc.Inventors: Sudarshan Kadambi, Vijay Balakrishnan, Wayne I. Yamamoto
-
Patent number: 6934830Abstract: One embodiment of the present invention provides a system that reduces the time required to access registers from a register file within a processor. During operation, the system receives an instruction to be executed, wherein the instruction identifies at least one operand to be accessed from the register file. Next, the system looks up the operands in a register pane, wherein the register pane is smaller and faster than the register file and contains copies of a subset of registers from the register file. If the lookup is successful, the system retrieves the operands from the register pane to execute the instruction. Otherwise, if the lookup is not successful, the system retrieves the operands from the register file, and stores the operands into the register pane. This triggers the system to reissue the instruction to be executed again, so that the re-issued instruction retrieves the operands from the register pane.Type: GrantFiled: September 26, 2002Date of Patent: August 23, 2005Assignee: Sun Microsystems, Inc.Inventors: Sudarshan Kadambi, Adam R. Talcott, Wayne I. Yamamoto
-
Publication number: 20050055533Abstract: A cache pollution avoidance unit includes a dynamic memory dependency table for storing a dependency state condition between a first load instruction and a sequentially later second load instruction, which may depend on the completion of execution of the first load instruction for operand data. The cache pollution avoidance unit logically ANDs the dependency state condition stored in the dynamic memory dependency table with a cache memory “miss” state condition returned by the cache pollution avoidance unit for operand data produced by the first load instruction and required by the second load instruction. If the logical ANDing is true, memory access to the second load instruction is squashed and the execution of the second load instruction is re-scheduled.Type: ApplicationFiled: September 8, 2003Publication date: March 10, 2005Inventors: Sudarshan Kadambi, Vijay Balakrishnan
-
Publication number: 20040148469Abstract: One embodiment of the present invention provides a system for predicting hot spots in a cache memory. Upon receiving a memory operation at the cache, the system determines a target location within the cache for the memory operation. Once the target location is determined, the system increments a counter associated with the target location. If the counter reaches a pre-determined threshold value, the system generates a signal indicating that the target location is a hot spot in the cache memory.Type: ApplicationFiled: January 29, 2003Publication date: July 29, 2004Inventors: Sudarshan Kadambi, Vijay Balakrishnan, Wayne I. Yamamoto
-
Publication number: 20040148465Abstract: One embodiment of the present invention provides a system that uses a hot spot cache to alleviate the performance problems caused by hot spots in cache memories, wherein the hot spot cache stores lines that are evicted from hot spots in the cache. Upon receiving a memory operation at the cache, the system performs a lookup for the memory operation in both the cache and the hot spot cache in parallel. If the memory operation is a read operation that causes a miss in the cache and a hit in the hot spot cache, the system reads a data line for the read operation from the hot spot cache, writes the data line to the cache, performs the read operation on the data line in the cache, and then evicts the data line from the hot spot cache.Type: ApplicationFiled: January 29, 2003Publication date: July 29, 2004Inventors: Sudarshan Kadambi, Vijay Balakrishnan, Wayne I. Yamamoto