Patents by Inventor Gerard R. Williams, III

Gerard R. Williams, III has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20140089638
    Abstract: Various techniques for processing instructions that specify multiple destinations. A first portion of a processor pipeline is configured to split a multi-destination instruction into a plurality of single-destination operations. A second portion of the pipeline is configured to process the plurality of single-destination operations. A third portion of the pipeline is configured to merge the plurality of single-destination operations into one or more multi-destination operations. The one or more multi-destination operations may be performed. The first portion of the pipeline may include a decode unit. The second portion of the pipeline may include a map unit, which may in turn include circuitry configured to maintain a list of free architectural registers and a mapping table that maps physical registers to architectural registers. The third portion of the pipeline may comprise a dispatch unit. In some embodiments, this may provide certain advantages such as reduced area and/or power consumption.
    Type: Application
    Filed: September 26, 2012
    Publication date: March 27, 2014
    Applicant: APPLE INC.
    Inventors: John H. Mylius, Gerard R. Williams III, James B. Keller, Fang Liu, Shyam Sundar
  • Publication number: 20140089647
    Abstract: In an embodiment, a processor may be configured to fetch N instruction bytes from an instruction cache (a “fetch group”), even if the fetch group crosses a cache line boundary. A branch predictor may be configured to produce branch predictions for up to M branches in the fetch group, where M is a maximum number of branches that may be included in the fetch group. In an embodiment, a branch direction predictor may be updated responsive to a misprediction and also responsive to the branch prediction being within a threshold of transitioning between predictions. To avoid a lookup to determine if the threshold update is to be performed, the branch predictor may detect the threshold update during prediction, and may transmit an indication with the branch.
    Type: Application
    Filed: September 24, 2012
    Publication date: March 27, 2014
    Inventors: Ian D. Kountanis, Gerard R. Williams, III, James B. Keller
  • Publication number: 20140089589
    Abstract: Methods and processors for enforcing an order of memory access requests in the presence of barriers in an out-of-order processor pipeline. A speculative color is assigned to instruction operations in the front-end of the processor pipeline, while the instruction operations are still in order. The instruction operations are placed in any of multiple reservation stations and then issued out-of-order from the reservation stations. When a barrier is encountered in the front-end, the speculative color is changed, and instruction operations are assigned the new speculative color. A core interface unit maintains an architectural color, and the architectural color is changed when a barrier retires. The core interface unit stalls instruction operations with a speculative color that does match the architectural color.
    Type: Application
    Filed: September 27, 2012
    Publication date: March 27, 2014
    Applicant: APPLE INC.
    Inventors: Stephan G. Meier, Gerard R. Williams, III
  • Publication number: 20140025892
    Abstract: Methods, apparatuses, and processors for reducing memory latency in the presence of barriers. When a barrier operation is executed, subsequent memory access operations are delayed until the barrier operation retires. While the memory access operation is delayed, the memory access operation is converted into a prefetch request and sent to the L2 cache. Then, data corresponding to the prefetch request is retrieved and stored in the L1 data cache. When the memory access operation wakes up, the data for the operation will already be stored in the L1 data cache, reducing the memory latency of the operation.
    Type: Application
    Filed: July 17, 2012
    Publication date: January 23, 2014
    Inventor: Gerard R. Williams III
  • Publication number: 20130339671
    Abstract: A system and method for reducing the latency of load operations. A register rename unit within a processor determines whether a decoded load instruction is eligible for conversion to a zero-cycle load operation. If so, control logic assigns a physical register identifier associated with a source operand of an older dependent store instruction to the destination operand of the load instruction. Additionally, the register rename unit marks the load instruction to prevent it from reading data associated with the source operand of the store instruction from memory. Due to the duplicate renaming, this data may be forwarded from a physical register file to instructions that are younger and dependent on the load instruction.
    Type: Application
    Filed: June 14, 2012
    Publication date: December 19, 2013
    Inventors: Gerard R. Williams, III, John H. Mylius, Conrade Blasco-Allue
  • Publication number: 20130326198
    Abstract: Methods and processors for managing load-store dependencies in an out-of-order instruction pipeline. A load store dependency predictor includes a table for storing entries for load-store pairs that have been found to be dependent and execute out of order. Each entry in the table includes hashed values to identify load and store operations. When a load or store operation is detected, the PC and an architectural register number are used to create a hashed value that can be used to uniquely identify the operation. Then, the load store dependency predictor table is searched for any matching entries with the same hashed value.
    Type: Application
    Filed: May 30, 2012
    Publication date: December 5, 2013
    Inventors: Stephan G. Meier, John H. Mylius, Gerard R. Williams, III, Suparn Vats
  • Publication number: 20130298127
    Abstract: Methods and apparatuses for managing load-store dependencies in an out-of-order processor. A load store dependency predictor may include a table for storing entries for load-store pairs that have been found to be dependent and execute out of order. Each entry in the table includes a counter to indicate a strength of the dependency prediction. If the counter is above a threshold, a dependency is enforced for the load-store pair. If the counter is below the threshold, the dependency is not enforced for the load-store pair. When a store is dispatched, the table is searched, and any matching entries in the table are armed. If a load is dispatched, matches on an armed entry, and the counter is above the threshold, then the load will wait to issue until the corresponding store issues.
    Type: Application
    Filed: May 4, 2012
    Publication date: November 7, 2013
    Inventors: Stephan G. Meier, John H. Mylius, Gerard R. Williams, III, Suparn Vats
  • Publication number: 20130290680
    Abstract: A system and method for efficiently reducing the latency of initializing registers. A register rename unit within a processor determines whether prior to an execution pipeline stage it is known a decoded given instruction writes a particular numerical value in a destination operand. An example is a move immediate instruction that writes a value of 0 in its destination operand. Other examples may also qualify. If the determination is made, a given physical register identifier is assigned to the destination operand, wherein the given physical register identifier is associated with the particular numerical value, but it is not associated with an actual physical register in a physical register file. The given instruction is marked to prevent it from proceeding to an execution pipeline stage. When the given physical register identifier is used to read the physical register file, no actual physical register is accessed.
    Type: Application
    Filed: April 30, 2012
    Publication date: October 31, 2013
    Inventors: James B. Keller, John H. Mylius, Conrado Blasco-Allue, Gerard R. Williams, III
  • Publication number: 20130290681
    Abstract: A system and method for efficiently reducing the power consumption of register file accesses. A processor is operable to execute instructions with two or more data types, each with an associated size and alignment. Data operands for a first data type use operand sizes equal to an entire width of a physical register within a physical register file. Data operands for a second data type use operand sizes less than an entire width of a physical register. Accesses of the physical register file for operands associated with a non-full-width data type do not access a full width of the physical registers. A given numerical value may be bypassed for the portion of the physical register that is not accessed.
    Type: Application
    Filed: April 30, 2012
    Publication date: October 31, 2013
    Inventors: James B. Keller, John H. Mylius, Conrado Blasco-Allue, Gerard R. Williams, III, Sandeep Gupta
  • Publication number: 20130275720
    Abstract: A system and method for reducing the latency of data move operations. A register rename unit within a processor determines whether a decoded move instruction is eligible for a zero cycle move operation. If so, control logic assigns a physical register identifier associated with a source operand of the move instruction to the destination operand of the move instruction. Additionally, the register rename unit marks the given move instruction to prevent it from proceeding in the processor pipeline. Further maintenance of the particular physical register identifier may be done by the register rename unit during commit of the given move instruction.
    Type: Application
    Filed: April 16, 2012
    Publication date: October 17, 2013
    Inventors: James B. Keller, John H. Mylius, Conrado Blasco-Allue, Gerard R. Williams, III, Suparn Vats
  • Publication number: 20130254485
    Abstract: Processors and methods for coordinating prefetch units at multiple cache levels. A single, unified training mechanism is utilized for training on streams generated by a processor core. Prefetch requests are sent from the core to lower level caches, and a packet is sent with each prefetch request. The packet identifies the stream ID of the prefetch request and includes relevant training information for the particular stream ID. The lower level caches generate prefetch requests based on the received training information.
    Type: Application
    Filed: March 20, 2012
    Publication date: September 26, 2013
    Inventors: Hari S. Kannan, Brian P. Lilly, Gerard R. Williams, III, Mahnaz Sadoughi-Yarandi, Perumal R. Subramoniam, Pradeep Kanapathipillai