Patents by Inventor Po-Yung Chang

Po-Yung Chang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7984274
    Abstract: In one embodiment, a processor comprises a prediction circuit and another circuit coupled to the prediction circuit. The prediction circuit is configured to predict whether or not a first load instruction will experience a partial store to load forward (PSTLF) event during execution. A PSTLF event occurs if a plurality of bytes, accessed responsive to the first load instruction during execution, include at least a first byte updated responsive to a previous uncommitted store operation and also include at least a second byte not updated responsive to the previous uncommitted store operation. Coupled to receive the first load instruction, the circuit is configured to generate one or more load operations responsive to the first load instruction. The load operations are to be executed in the processor to execute the first load instruction, and a number of the load operations is dependent on the prediction by the prediction circuit.
    Type: Grant
    Filed: June 18, 2009
    Date of Patent: July 19, 2011
    Assignee: Apple Inc.
    Inventors: Sudarshan Kadambi, Po-Yung Chang, Eric Hao
  • Patent number: 7962730
    Abstract: In one embodiment, a processor comprises a retire unit and a load/store unit coupled thereto. The retire unit is configured to retire a first store memory operation responsive to the first store memory operation having been processed at least to a pipeline stage at which exceptions are reported for the first store memory operation. The load/store unit comprises a queue having a first entry assigned to the first store memory operation. The load/store unit is configured to retain the first store memory operation in the first entry subsequent to retirement of the first store memory operation if the first store memory operation is not complete. The queue may have multiple entries, and more than one store may be retained in the queue after being retired by the retire unit.
    Type: Grant
    Filed: November 25, 2008
    Date of Patent: June 14, 2011
    Assignee: Apple Inc.
    Inventors: Wei-Han Lien, Po-Yung Chang
  • Publication number: 20100268894
    Abstract: In one embodiment, a processor comprises a prefetch unit coupled to a data cache. The prefetch unit is configured to concurrently maintain a plurality of separate, active prefetch streams. Each prefetch stream is either software initiated via execution by the processor of a dedicated prefetch instruction or hardware initiated via detection of a data cache miss by one or more load/store memory operations. The prefetch unit is further configured to generate prefetch requests responsive to the plurality of prefetch streams to prefetch data in to the data cache.
    Type: Application
    Filed: July 6, 2010
    Publication date: October 21, 2010
    Inventors: Sudarshan Kadambi, Puneet Kumar, Po-Yung Chang
  • Patent number: 7779208
    Abstract: In one embodiment, a processor comprises a prefetch unit coupled to a data cache. The prefetch unit is configured to concurrently maintain a plurality of separate, active prefetch streams. Each prefetch stream is either software initiated via execution by the processor of a dedicated prefetch instruction or hardware initiated via detection of a data cache miss by one or more load/store memory operations. The prefetch unit is further configured to generate prefetch requests responsive to the plurality of prefetch streams to prefetch data in to the data cache.
    Type: Grant
    Filed: January 7, 2009
    Date of Patent: August 17, 2010
    Assignee: Apple Inc.
    Inventors: Sudarshan Kadambi, Puneet Kumar, Po-Yung Chang
  • Publication number: 20100169619
    Abstract: In one embodiment, an apparatus comprises a queue comprising a plurality of entries and a control unit coupled to the queue. The control unit is configured to allocate a first queue entry to a store memory operation, and is configured to write a first even offset, a first even mask, a first odd offset, and a first odd mask corresponding to the store memory operation to the first entry. A group of contiguous memory locations are logically divided into alternately-addressed even and odd byte ranges. A given store memory operation writes at most one even byte range and one adjacent odd byte range. The first even offset identifies a first even byte range that is potentially written by the store memory operation, and the first odd offset identifies a first odd byte range that is potentially written by the store memory operation.
    Type: Application
    Filed: March 10, 2010
    Publication date: July 1, 2010
    Inventors: Tse-yu Yeh, Daniel C. Murray, Po Yung Chang, Anup S. Mehta
  • Patent number: 7721066
    Abstract: In one embodiment, an apparatus comprises a queue comprising a plurality of entries and a control unit coupled to the queue. The control unit is configured to allocate a first queue entry to a store memory operation, and is configured to write a first even offset, a first even mask, a first odd offset, and a first odd mask corresponding to the store memory operation to the first entry. A group of contiguous memory locations are logically divided into alternately-addressed even and odd byte ranges. A given store memory operation writes at most one even byte range and one adjacent odd byte range. The first even offset identifies a first even byte range that is potentially written by the store memory operation, and the first odd offset identifies a first odd byte range that is potentially written by the store memory operation.
    Type: Grant
    Filed: June 5, 2007
    Date of Patent: May 18, 2010
    Assignee: Apple Inc.
    Inventors: Tse-yu Yeh, Daniel C. Murray, Po-Yung Chang, Anup S. Mehta
  • Publication number: 20100064120
    Abstract: In one embodiment, a processor comprises a scheduler configured to issue a first instruction operation to be executed and an execution core coupled to the scheduler. Configured to execute the first instruction operation, the execution core comprises a plurality of replay sources configured to cause a replay of the first instruction operation responsive to detecting at least one of a plurality of replay cases. The scheduler is configured to inhibit issuance of the first instruction operation subsequent to the replay for a subset of the plurality of replay cases. The scheduler is coupled to receive an acknowledgement indication corresponding to each of the plurality of replay cases in the subset, and is configured to inhibit issuance of the first instruction operation until the acknowledgement indication is asserted that corresponds to an identified replay case of the subset.
    Type: Application
    Filed: November 17, 2009
    Publication date: March 11, 2010
    Inventors: Po-Yung Chang, Wei-Han Lien, Jesse Pan, Ramesh Gunna, Tse-Yu Yeh, James B. Keller
  • Patent number: 7647518
    Abstract: In one embodiment, a processor comprises a scheduler configured to issue a first instruction operation to be executed and an execution core coupled to the scheduler. Configured to execute the first instruction operation, the execution core comprises a plurality of replay sources configured to cause a replay of the first instruction operation responsive to detecting at least one of a plurality of replay cases. The scheduler is configured to inhibit issuance of the first instruction operation subsequent to the replay for a subset of the plurality of replay cases. The scheduler is coupled to receive an acknowledgement indication corresponding to each of the plurality of replay cases in the subset, and is configured to inhibit issuance of the first instruction operation until the acknowledge indication is asserted that corresponds to an identified replay case of the subset.
    Type: Grant
    Filed: October 10, 2006
    Date of Patent: January 12, 2010
    Assignee: Apple Inc.
    Inventors: Po-Yung Chang, Wei-Han Lien, Jesse Pan, Ramesh Gunna, Tse-Yu Yeh, James B. Keller
  • Publication number: 20090254734
    Abstract: In one embodiment, a processor comprises a prediction circuit and another circuit coupled to the prediction circuit. The prediction circuit is configured to predict whether or not a first load instruction will experience a partial store to load forward (PSTLF) event during execution. A PSTLF event occurs if a plurality of bytes, accessed responsive to the first load instruction during execution, include at least a first byte updated responsive to a previous uncommitted store operation and also include at least a second byte not updated responsive to the previous uncommitted store operation. Coupled to receive the first load instruction, the circuit is configured to generate one or more load operations responsive to the first load instruction. The load operations are to be executed in the processor to execute the first load instruction, and a number of the load operations is dependent on the prediction by the prediction circuit.
    Type: Application
    Filed: June 18, 2009
    Publication date: October 8, 2009
    Inventors: Sudarshan Kadambi, Po-Yung Chang, Eric Hao
  • Patent number: 7568087
    Abstract: In one embodiment, a processor comprises a prediction circuit and another circuit coupled to the prediction circuit. The prediction circuit is configured to predict whether or not a first load instruction will experience a partial store to load forward (PSTLF) event during execution. A PSTLF event occurs if a plurality of bytes, accessed responsive to the first load instruction during execution, include at least a first byte updated responsive to a previous uncommitted store operation and also include at least a second byte not updated responsive to the previous uncommitted store operation. Coupled to receive the first load instruction, the circuit is configured to generate one or more load operations responsive to the first load instruction. The load operations are to be executed in the processor to execute the first load instruction, and a number of the load operations is dependent on the prediction by the prediction circuit.
    Type: Grant
    Filed: March 25, 2008
    Date of Patent: July 28, 2009
    Assignee: Apple Inc.
    Inventors: Sudarshan Kadambi, Po-Yung Chang, Eric Hao
  • Publication number: 20090119488
    Abstract: In one embodiment, a processor comprises a prefetch unit coupled to a data cache. The prefetch unit is configured to concurrently maintain a plurality of separate, active prefetch streams. Each prefetch stream is either software initiated via execution by the processor of a dedicated prefetch instruction or hardware initiated via detection of a data cache miss by one or more load/store memory operations. The prefetch unit is further configured to generate prefetch requests responsive to the plurality of prefetch streams to prefetch data in to the data cache.
    Type: Application
    Filed: January 7, 2009
    Publication date: May 7, 2009
    Inventors: Sudarshan Kadambi, Puneet Kumar, Po-Yung Chang
  • Publication number: 20090077560
    Abstract: In one embodiment, a processor comprises a retire unit and a load/store unit coupled thereto. The retire unit is configured to retire a first store memory operation responsive to the first store memory operation having been processed at least to a pipeline stage at which exceptions are reported for the first store memory operation. The load/store unit comprises a queue having a first entry assigned to the first store memory operation. The load/store unit is configured to retain the first store memory operation in the first entry subsequent to retirement of the first store memory operation if the first store memory operation is not complete. The queue may have multiple entries, and more than one store may be retained in the queue after being retired by the retire unit.
    Type: Application
    Filed: November 25, 2008
    Publication date: March 19, 2009
    Inventors: Wei-Han Lien, Po-Yung Chang
  • Patent number: 7493451
    Abstract: In one embodiment, a processor comprises a prefetch unit coupled to a data cache. The prefetch unit is configured to concurrently maintain a plurality of separate, active prefetch streams. Each prefetch stream is either software initiated via execution by the processor of a dedicated prefetch instruction or hardware initiated via detection of a data cache miss by one or more load/store memory operations. The prefetch unit is further configured to generate prefetch requests responsive to the plurality of prefetch streams to prefetch data in to the data cache.
    Type: Grant
    Filed: June 15, 2006
    Date of Patent: February 17, 2009
    Assignee: P.A. Semi, Inc.
    Inventors: Sudarshan Kadambi, Puneet Kumar, Po-Yung Chang
  • Patent number: 7472260
    Abstract: In one embodiment, a processor comprises a retire unit and a load/store unit coupled thereto. The retire unit is configured to retire a first store memory operation responsive to the first store memory operation having been processed at least to a pipeline stage at which exceptions are reported for the first store memory operation. The load/store unit comprises a queue having a first entry assigned to the first store memory operation. The load/store unit is configured to retain the first store memory operation in the first entry subsequent to retirement of the first store memory operation if the first store memory operation is not complete. The queue may have multiple entries, and more than one store may be retained in the queue after being retired by the retire unit.
    Type: Grant
    Filed: October 10, 2006
    Date of Patent: December 30, 2008
    Assignee: P.A. Semi, Inc.
    Inventors: Wei-Han Lien, Po-Yung Chang
  • Publication number: 20080307166
    Abstract: In one embodiment, a processor may be configured to write ECC granular stores into the data cache, while non-ECC granular stores may be merged with cache data in a memory request buffer. In one embodiment, a processor may be configured to detect that a victim block writeback hits one or more stores in a memory request buffer (or vice versa) and may convert the victim block writeback to a fill. In one embodiment, a processor may speculatively issue stores that are subsequent to a load from a load/store queue, but prevent the update for the stores in response to a snoop hit on the load.
    Type: Application
    Filed: June 5, 2007
    Publication date: December 11, 2008
    Inventors: Ramesh Gunna, Po-Yung Chang, Sudarshan Kadambi
  • Publication number: 20080307173
    Abstract: In one embodiment, an apparatus comprises a queue comprising a plurality of entries and a control unit coupled to the queue. The control unit is configured to allocate a first queue entry to a store memory operation, and is configured to write a first even offset, a first even mask, a first odd offset, and a first odd mask corresponding to the store memory operation to the first entry. A group of contiguous memory locations are logically divided into alternately-addressed even and odd byte ranges. A given store memory operation writes at most one even byte range and one adjacent odd byte range. The first even offset identifies a first even byte range that is potentially written by the store memory operation, and the first odd offset identifies a first odd byte range that is potentially written by the store memory operation.
    Type: Application
    Filed: June 5, 2007
    Publication date: December 11, 2008
    Inventors: Tse-yu Yeh, Daniel C. Murray, Po-Yung Chang, Anup S. Mehta
  • Publication number: 20080177988
    Abstract: In one embodiment, a processor comprises a prediction circuit and another circuit coupled to the prediction circuit. The prediction circuit is configured to predict whether or not a first load instruction will experience a partial store to load forward (PSTLF) event during execution. A PSTLF event occurs if a plurality of bytes, accessed responsive to the first load instruction during execution, include at least a first byte updated responsive to a previous uncommitted store operation and also include at least a second byte not updated responsive to the previous uncommitted store operation. Coupled to receive the first load instruction, the circuit is configured to generate one or more load operations responsive to the first load instruction. The load operations are to be executed in the processor to execute the first load instruction, and a number of the load operations is dependent on the prediction by the prediction circuit.
    Type: Application
    Filed: March 25, 2008
    Publication date: July 24, 2008
    Inventors: Sudarshan Kadambi, Po-Yung Chang, Eric Hao
  • Patent number: 7398361
    Abstract: In one embodiment, an interface unit comprises an address buffer and a control unit coupled to the address buffer. The address buffer is configured to store addresses of processor core requests generated by a processor core and addresses of snoop requests received from an interconnect. The control unit is configured to maintain a plurality of queues, wherein at least a first queue of the plurality of queues is dedicated to snoop requests and at least a second queue of the plurality of queues is dedicated to processor core requests. Responsive to a first snoop request received by the interface unit from the interconnect, the control unit is configured to allocate a first address buffer entry of the address buffer to store the first snoop request and to store a first pointer to the first address buffer entry in the first queue.
    Type: Grant
    Filed: August 30, 2005
    Date of Patent: July 8, 2008
    Assignee: P.A. Semi, Inc.
    Inventors: Ramesh Gunna, Po-Yung Chang, Sridhar P. Subramanian, James B. Keller, Tse-Yuh Yeh
  • Patent number: 7376817
    Abstract: In one embodiment, a processor comprises a prediction circuit and another circuit coupled to the prediction circuit. The prediction circuit is configured to predict whether or not a first load instruction will experience a partial store to load forward (PSTLF) event during execution. A PSTLF event occurs if a plurality of bytes, accessed responsive to the first load instruction during execution, include at least a first byte updated responsive to a previous uncommitted store operation and also include at least a second byte not updated responsive to the previous uncommitted store operation. Coupled to receive the first load instruction, the circuit is configured to generate one or more load operations responsive to the first load instruction. The load operations are to be executed in the processor to execute the first load instruction, and a number of the load operations is dependent on the prediction by the prediction circuit.
    Type: Grant
    Filed: August 10, 2005
    Date of Patent: May 20, 2008
    Assignee: P.A. Semi, Inc.
    Inventors: Sudarshan Kadambi, Po-Yung Chang, Eric Hao
  • Publication number: 20080086623
    Abstract: In one embodiment, a processor comprises a retire unit and a load/store unit coupled thereto. The retire unit is configured to retire a first store memory operation responsive to the first store memory operation having been processed at least to a pipeline stage at which exceptions are reported for the first store memory operation. The load/store unit comprises a queue having a first entry assigned to the first store memory operation. The load/store unit is configured to retain the first store memory operation in the first entry subsequent to retirement of the first store memory operation if the first store memory operation is not complete. The queue may have multiple entries, and more than one store may be retained in the queue after being retired by the retire unit.
    Type: Application
    Filed: October 10, 2006
    Publication date: April 10, 2008
    Applicant: P.A. Semi, Inc.
    Inventors: Wei-Han Lien, Po-Yung Chang