Patents by Inventor Po-Yung Chang
Po-Yung Chang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 7984274Abstract: In one embodiment, a processor comprises a prediction circuit and another circuit coupled to the prediction circuit. The prediction circuit is configured to predict whether or not a first load instruction will experience a partial store to load forward (PSTLF) event during execution. A PSTLF event occurs if a plurality of bytes, accessed responsive to the first load instruction during execution, include at least a first byte updated responsive to a previous uncommitted store operation and also include at least a second byte not updated responsive to the previous uncommitted store operation. Coupled to receive the first load instruction, the circuit is configured to generate one or more load operations responsive to the first load instruction. The load operations are to be executed in the processor to execute the first load instruction, and a number of the load operations is dependent on the prediction by the prediction circuit.Type: GrantFiled: June 18, 2009Date of Patent: July 19, 2011Assignee: Apple Inc.Inventors: Sudarshan Kadambi, Po-Yung Chang, Eric Hao
-
Patent number: 7962730Abstract: In one embodiment, a processor comprises a retire unit and a load/store unit coupled thereto. The retire unit is configured to retire a first store memory operation responsive to the first store memory operation having been processed at least to a pipeline stage at which exceptions are reported for the first store memory operation. The load/store unit comprises a queue having a first entry assigned to the first store memory operation. The load/store unit is configured to retain the first store memory operation in the first entry subsequent to retirement of the first store memory operation if the first store memory operation is not complete. The queue may have multiple entries, and more than one store may be retained in the queue after being retired by the retire unit.Type: GrantFiled: November 25, 2008Date of Patent: June 14, 2011Assignee: Apple Inc.Inventors: Wei-Han Lien, Po-Yung Chang
-
Publication number: 20100268894Abstract: In one embodiment, a processor comprises a prefetch unit coupled to a data cache. The prefetch unit is configured to concurrently maintain a plurality of separate, active prefetch streams. Each prefetch stream is either software initiated via execution by the processor of a dedicated prefetch instruction or hardware initiated via detection of a data cache miss by one or more load/store memory operations. The prefetch unit is further configured to generate prefetch requests responsive to the plurality of prefetch streams to prefetch data in to the data cache.Type: ApplicationFiled: July 6, 2010Publication date: October 21, 2010Inventors: Sudarshan Kadambi, Puneet Kumar, Po-Yung Chang
-
Patent number: 7779208Abstract: In one embodiment, a processor comprises a prefetch unit coupled to a data cache. The prefetch unit is configured to concurrently maintain a plurality of separate, active prefetch streams. Each prefetch stream is either software initiated via execution by the processor of a dedicated prefetch instruction or hardware initiated via detection of a data cache miss by one or more load/store memory operations. The prefetch unit is further configured to generate prefetch requests responsive to the plurality of prefetch streams to prefetch data in to the data cache.Type: GrantFiled: January 7, 2009Date of Patent: August 17, 2010Assignee: Apple Inc.Inventors: Sudarshan Kadambi, Puneet Kumar, Po-Yung Chang
-
Publication number: 20100169619Abstract: In one embodiment, an apparatus comprises a queue comprising a plurality of entries and a control unit coupled to the queue. The control unit is configured to allocate a first queue entry to a store memory operation, and is configured to write a first even offset, a first even mask, a first odd offset, and a first odd mask corresponding to the store memory operation to the first entry. A group of contiguous memory locations are logically divided into alternately-addressed even and odd byte ranges. A given store memory operation writes at most one even byte range and one adjacent odd byte range. The first even offset identifies a first even byte range that is potentially written by the store memory operation, and the first odd offset identifies a first odd byte range that is potentially written by the store memory operation.Type: ApplicationFiled: March 10, 2010Publication date: July 1, 2010Inventors: Tse-yu Yeh, Daniel C. Murray, Po Yung Chang, Anup S. Mehta
-
Patent number: 7721066Abstract: In one embodiment, an apparatus comprises a queue comprising a plurality of entries and a control unit coupled to the queue. The control unit is configured to allocate a first queue entry to a store memory operation, and is configured to write a first even offset, a first even mask, a first odd offset, and a first odd mask corresponding to the store memory operation to the first entry. A group of contiguous memory locations are logically divided into alternately-addressed even and odd byte ranges. A given store memory operation writes at most one even byte range and one adjacent odd byte range. The first even offset identifies a first even byte range that is potentially written by the store memory operation, and the first odd offset identifies a first odd byte range that is potentially written by the store memory operation.Type: GrantFiled: June 5, 2007Date of Patent: May 18, 2010Assignee: Apple Inc.Inventors: Tse-yu Yeh, Daniel C. Murray, Po-Yung Chang, Anup S. Mehta
-
Publication number: 20100064120Abstract: In one embodiment, a processor comprises a scheduler configured to issue a first instruction operation to be executed and an execution core coupled to the scheduler. Configured to execute the first instruction operation, the execution core comprises a plurality of replay sources configured to cause a replay of the first instruction operation responsive to detecting at least one of a plurality of replay cases. The scheduler is configured to inhibit issuance of the first instruction operation subsequent to the replay for a subset of the plurality of replay cases. The scheduler is coupled to receive an acknowledgement indication corresponding to each of the plurality of replay cases in the subset, and is configured to inhibit issuance of the first instruction operation until the acknowledgement indication is asserted that corresponds to an identified replay case of the subset.Type: ApplicationFiled: November 17, 2009Publication date: March 11, 2010Inventors: Po-Yung Chang, Wei-Han Lien, Jesse Pan, Ramesh Gunna, Tse-Yu Yeh, James B. Keller
-
Patent number: 7647518Abstract: In one embodiment, a processor comprises a scheduler configured to issue a first instruction operation to be executed and an execution core coupled to the scheduler. Configured to execute the first instruction operation, the execution core comprises a plurality of replay sources configured to cause a replay of the first instruction operation responsive to detecting at least one of a plurality of replay cases. The scheduler is configured to inhibit issuance of the first instruction operation subsequent to the replay for a subset of the plurality of replay cases. The scheduler is coupled to receive an acknowledgement indication corresponding to each of the plurality of replay cases in the subset, and is configured to inhibit issuance of the first instruction operation until the acknowledge indication is asserted that corresponds to an identified replay case of the subset.Type: GrantFiled: October 10, 2006Date of Patent: January 12, 2010Assignee: Apple Inc.Inventors: Po-Yung Chang, Wei-Han Lien, Jesse Pan, Ramesh Gunna, Tse-Yu Yeh, James B. Keller
-
Publication number: 20090254734Abstract: In one embodiment, a processor comprises a prediction circuit and another circuit coupled to the prediction circuit. The prediction circuit is configured to predict whether or not a first load instruction will experience a partial store to load forward (PSTLF) event during execution. A PSTLF event occurs if a plurality of bytes, accessed responsive to the first load instruction during execution, include at least a first byte updated responsive to a previous uncommitted store operation and also include at least a second byte not updated responsive to the previous uncommitted store operation. Coupled to receive the first load instruction, the circuit is configured to generate one or more load operations responsive to the first load instruction. The load operations are to be executed in the processor to execute the first load instruction, and a number of the load operations is dependent on the prediction by the prediction circuit.Type: ApplicationFiled: June 18, 2009Publication date: October 8, 2009Inventors: Sudarshan Kadambi, Po-Yung Chang, Eric Hao
-
Patent number: 7568087Abstract: In one embodiment, a processor comprises a prediction circuit and another circuit coupled to the prediction circuit. The prediction circuit is configured to predict whether or not a first load instruction will experience a partial store to load forward (PSTLF) event during execution. A PSTLF event occurs if a plurality of bytes, accessed responsive to the first load instruction during execution, include at least a first byte updated responsive to a previous uncommitted store operation and also include at least a second byte not updated responsive to the previous uncommitted store operation. Coupled to receive the first load instruction, the circuit is configured to generate one or more load operations responsive to the first load instruction. The load operations are to be executed in the processor to execute the first load instruction, and a number of the load operations is dependent on the prediction by the prediction circuit.Type: GrantFiled: March 25, 2008Date of Patent: July 28, 2009Assignee: Apple Inc.Inventors: Sudarshan Kadambi, Po-Yung Chang, Eric Hao
-
Publication number: 20090119488Abstract: In one embodiment, a processor comprises a prefetch unit coupled to a data cache. The prefetch unit is configured to concurrently maintain a plurality of separate, active prefetch streams. Each prefetch stream is either software initiated via execution by the processor of a dedicated prefetch instruction or hardware initiated via detection of a data cache miss by one or more load/store memory operations. The prefetch unit is further configured to generate prefetch requests responsive to the plurality of prefetch streams to prefetch data in to the data cache.Type: ApplicationFiled: January 7, 2009Publication date: May 7, 2009Inventors: Sudarshan Kadambi, Puneet Kumar, Po-Yung Chang
-
Publication number: 20090077560Abstract: In one embodiment, a processor comprises a retire unit and a load/store unit coupled thereto. The retire unit is configured to retire a first store memory operation responsive to the first store memory operation having been processed at least to a pipeline stage at which exceptions are reported for the first store memory operation. The load/store unit comprises a queue having a first entry assigned to the first store memory operation. The load/store unit is configured to retain the first store memory operation in the first entry subsequent to retirement of the first store memory operation if the first store memory operation is not complete. The queue may have multiple entries, and more than one store may be retained in the queue after being retired by the retire unit.Type: ApplicationFiled: November 25, 2008Publication date: March 19, 2009Inventors: Wei-Han Lien, Po-Yung Chang
-
Patent number: 7493451Abstract: In one embodiment, a processor comprises a prefetch unit coupled to a data cache. The prefetch unit is configured to concurrently maintain a plurality of separate, active prefetch streams. Each prefetch stream is either software initiated via execution by the processor of a dedicated prefetch instruction or hardware initiated via detection of a data cache miss by one or more load/store memory operations. The prefetch unit is further configured to generate prefetch requests responsive to the plurality of prefetch streams to prefetch data in to the data cache.Type: GrantFiled: June 15, 2006Date of Patent: February 17, 2009Assignee: P.A. Semi, Inc.Inventors: Sudarshan Kadambi, Puneet Kumar, Po-Yung Chang
-
Patent number: 7472260Abstract: In one embodiment, a processor comprises a retire unit and a load/store unit coupled thereto. The retire unit is configured to retire a first store memory operation responsive to the first store memory operation having been processed at least to a pipeline stage at which exceptions are reported for the first store memory operation. The load/store unit comprises a queue having a first entry assigned to the first store memory operation. The load/store unit is configured to retain the first store memory operation in the first entry subsequent to retirement of the first store memory operation if the first store memory operation is not complete. The queue may have multiple entries, and more than one store may be retained in the queue after being retired by the retire unit.Type: GrantFiled: October 10, 2006Date of Patent: December 30, 2008Assignee: P.A. Semi, Inc.Inventors: Wei-Han Lien, Po-Yung Chang
-
Publication number: 20080307166Abstract: In one embodiment, a processor may be configured to write ECC granular stores into the data cache, while non-ECC granular stores may be merged with cache data in a memory request buffer. In one embodiment, a processor may be configured to detect that a victim block writeback hits one or more stores in a memory request buffer (or vice versa) and may convert the victim block writeback to a fill. In one embodiment, a processor may speculatively issue stores that are subsequent to a load from a load/store queue, but prevent the update for the stores in response to a snoop hit on the load.Type: ApplicationFiled: June 5, 2007Publication date: December 11, 2008Inventors: Ramesh Gunna, Po-Yung Chang, Sudarshan Kadambi
-
Publication number: 20080307173Abstract: In one embodiment, an apparatus comprises a queue comprising a plurality of entries and a control unit coupled to the queue. The control unit is configured to allocate a first queue entry to a store memory operation, and is configured to write a first even offset, a first even mask, a first odd offset, and a first odd mask corresponding to the store memory operation to the first entry. A group of contiguous memory locations are logically divided into alternately-addressed even and odd byte ranges. A given store memory operation writes at most one even byte range and one adjacent odd byte range. The first even offset identifies a first even byte range that is potentially written by the store memory operation, and the first odd offset identifies a first odd byte range that is potentially written by the store memory operation.Type: ApplicationFiled: June 5, 2007Publication date: December 11, 2008Inventors: Tse-yu Yeh, Daniel C. Murray, Po-Yung Chang, Anup S. Mehta
-
Publication number: 20080177988Abstract: In one embodiment, a processor comprises a prediction circuit and another circuit coupled to the prediction circuit. The prediction circuit is configured to predict whether or not a first load instruction will experience a partial store to load forward (PSTLF) event during execution. A PSTLF event occurs if a plurality of bytes, accessed responsive to the first load instruction during execution, include at least a first byte updated responsive to a previous uncommitted store operation and also include at least a second byte not updated responsive to the previous uncommitted store operation. Coupled to receive the first load instruction, the circuit is configured to generate one or more load operations responsive to the first load instruction. The load operations are to be executed in the processor to execute the first load instruction, and a number of the load operations is dependent on the prediction by the prediction circuit.Type: ApplicationFiled: March 25, 2008Publication date: July 24, 2008Inventors: Sudarshan Kadambi, Po-Yung Chang, Eric Hao
-
Patent number: 7398361Abstract: In one embodiment, an interface unit comprises an address buffer and a control unit coupled to the address buffer. The address buffer is configured to store addresses of processor core requests generated by a processor core and addresses of snoop requests received from an interconnect. The control unit is configured to maintain a plurality of queues, wherein at least a first queue of the plurality of queues is dedicated to snoop requests and at least a second queue of the plurality of queues is dedicated to processor core requests. Responsive to a first snoop request received by the interface unit from the interconnect, the control unit is configured to allocate a first address buffer entry of the address buffer to store the first snoop request and to store a first pointer to the first address buffer entry in the first queue.Type: GrantFiled: August 30, 2005Date of Patent: July 8, 2008Assignee: P.A. Semi, Inc.Inventors: Ramesh Gunna, Po-Yung Chang, Sridhar P. Subramanian, James B. Keller, Tse-Yuh Yeh
-
Patent number: 7376817Abstract: In one embodiment, a processor comprises a prediction circuit and another circuit coupled to the prediction circuit. The prediction circuit is configured to predict whether or not a first load instruction will experience a partial store to load forward (PSTLF) event during execution. A PSTLF event occurs if a plurality of bytes, accessed responsive to the first load instruction during execution, include at least a first byte updated responsive to a previous uncommitted store operation and also include at least a second byte not updated responsive to the previous uncommitted store operation. Coupled to receive the first load instruction, the circuit is configured to generate one or more load operations responsive to the first load instruction. The load operations are to be executed in the processor to execute the first load instruction, and a number of the load operations is dependent on the prediction by the prediction circuit.Type: GrantFiled: August 10, 2005Date of Patent: May 20, 2008Assignee: P.A. Semi, Inc.Inventors: Sudarshan Kadambi, Po-Yung Chang, Eric Hao
-
Publication number: 20080086623Abstract: In one embodiment, a processor comprises a retire unit and a load/store unit coupled thereto. The retire unit is configured to retire a first store memory operation responsive to the first store memory operation having been processed at least to a pipeline stage at which exceptions are reported for the first store memory operation. The load/store unit comprises a queue having a first entry assigned to the first store memory operation. The load/store unit is configured to retain the first store memory operation in the first entry subsequent to retirement of the first store memory operation if the first store memory operation is not complete. The queue may have multiple entries, and more than one store may be retained in the queue after being retired by the retire unit.Type: ApplicationFiled: October 10, 2006Publication date: April 10, 2008Applicant: P.A. Semi, Inc.Inventors: Wei-Han Lien, Po-Yung Chang