Patents by Inventor Conrado Blasco

Conrado Blasco has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Optimizing register initialization operations

Patent number: 9430243

Abstract: A system and method for efficiently reducing the latency of initializing registers. A register rename unit within a processor determines whether prior to an execution pipeline stage it is known a decoded given instruction writes a particular numerical value in a destination operand. An example is a move immediate instruction that writes a value of 0 in its destination operand. Other examples may also qualify. If the determination is made, a given physical register identifier is assigned to the destination operand, wherein the given physical register identifier is associated with the particular numerical value, but it is not associated with an actual physical register in a physical register file. The given instruction is marked to prevent it from proceeding to an execution pipeline stage. When the given physical register identifier is used to read the physical register file, no actual physical register is accessed.

Type: Grant

Filed: April 30, 2012

Date of Patent: August 30, 2016

Assignee: Apple Inc.

Inventors: James B. Keller, John H. Mylius, Conrado Blasco-Allue, Gerard R. Williams, III
Next fetch predictor return address stack

Patent number: 9405544

Abstract: A system and method for efficient branch prediction. A processor includes a next fetch predictor to generate a fast branch prediction for branch instructions at an early pipeline stage. The processor also includes a main return address stack (RAS) at a later pipeline stage for predicting the target of return instructions. When a return instruction is encountered, the prediction from the next fetch predictor is replaced by the top of the main RAS. If there are any recent call or return instructions in flight toward the main RAS, then a separate prediction is generated by a mini-RAS.

Type: Grant

Filed: May 14, 2013

Date of Patent: August 2, 2016

Assignee: Apple Inc.

Inventors: Douglas C. Holman, Ramesh B. Gunna, Conrado Blasco-Allue
Fetch width predictor

Patent number: 9367471

Abstract: Various techniques for predicting instruction fetch widths. In one embodiment, a fetch prediction unit in a processor is configured to generate a fetch width that specifies a number of bits to be retrieved in a subsequent fetch from an instruction cache. The fetch prediction unit may also generate a fetch prediction that includes the fetch width in response to a current fetch request. A number of bits corresponding to the fetch width may be fetched from the instruction cache. The fetch width may correspond to a location of a predicted-taken control transfer instruction. This fetch width prediction may lead to power savings in instruction cache accesses.

Type: Grant

Filed: September 10, 2012

Date of Patent: June 14, 2016

Assignee: Apple Inc.

Inventors: Conrado Blasco-Allue, Ramesh B. Gunna
Multi-level dispatch for a superscalar processor

Patent number: 9336003

Abstract: In an embodiment, a processor includes a multi-level dispatch circuit configured to supply operations for execution by multiple parallel execution pipelines. The multi-level dispatch circuit may include multiple dispatch buffers, each of which is coupled to multiple reservation stations. Each reservation station may be coupled to a respective execution pipeline and may be configured to schedule instruction operations (ops) for execution in the respective execution pipeline. The sets of reservation stations coupled to each dispatch buffer may be non-overlapping. Thus, if a given op is to be executed in a given execution pipeline, the op may be sent to the dispatch buffer which is coupled to the reservation station that provides ops to the given execution pipeline.

Type: Grant

Filed: January 25, 2013

Date of Patent: May 10, 2016

Assignee: Apple Inc.

Inventors: John H. Mylius, Gerard R. Williams, III, Shyam Sundar Balasubramanian, Conrado Blasco-Allue
Instruction set architecture mode dependent sub-size access of register with associated status indication

Patent number: 9317285

Abstract: A system and method for efficiently reducing the power consumption of register file accesses. A processor is operable to execute instructions with two or more data types, each with an associated size and alignment. Data operands for a first data type use operand sizes equal to an entire width of a physical register within a physical register file. Data operands for a second data type use operand sizes less than an entire width of a physical register. Accesses of the physical register file for operands associated with a non-full-width data type do not access a full width of the physical registers. A given numerical value may be bypassed for the portion of the physical register that is not accessed.

Type: Grant

Filed: April 30, 2012

Date of Patent: April 19, 2016

Assignee: Apple Inc.

Inventors: Sandeep Gupta, Conrado Blasco-Allue, John H. Mylius, Gerard R. Williams, III, James B. Keller
Usefulness indication for indirect branch prediction training

Patent number: 9311100

Abstract: A circuit for implementing a branch target buffer. The branch target buffer may include a memory that stores a plurality of entries. Each entry may include a tag value, a target value, and a prediction accuracy value. A received index value corresponding to an indirect branch instruction may be used to select one of entries of the plurality of entries, and a received tag value may then be compared to the tag value of the selected entries in the memory. An entry in the memory may be selected in response to a determination that the received tag does not match the tag value of compared entries. The selected entry may be allocated to the indirect instruction branch dependent upon the prediction accuracy values of the plurality of entries.

Type: Grant

Filed: January 7, 2013

Date of Patent: April 12, 2016

Assignee: Apple Inc.

Inventors: Sandeep Gupta, Shyam Sundar, Wei-Han Lien, Gerard R. Williams, III, Conrado Blasco-Allue
Mechanism for reducing cache power consumption using cache way prediction

Patent number: 9311098

Abstract: A mechanism for reducing power consumption of a cache memory of a processor includes a processor with a cache memory that stores instruction information for one or more instruction fetch groups fetched from a system memory. The cache memory may include a number of ways that are each independently controllable. The processor also includes a way prediction unit. The way prediction unit may enable, in a next execution cycle, a given way within which instruction information corresponding to a target of a next branch instruction is stored in response to a branch taken prediction for the next branch instruction. The way prediction unit may also, in response to the branch taken prediction for the next branch instruction, enable, one at a time, each corresponding way within which instruction information corresponding to respective sequential instruction fetch groups that follow the next branch instruction are stored.

Type: Grant

Filed: May 7, 2013

Date of Patent: April 12, 2016

Assignee: Apple Inc.

Inventors: Ronald P. Hall, Conrado Blasco-Allue
RDA checkpoint optimization

Patent number: 9311084

Abstract: A system and method for efficiently performing microarchitectural checkpointing. A register rename unit within a processor determines whether a physical register number qualifies to have duplicate mappings. Information for maintenance of the duplicate mappings is stored in a register duplicate array (RDA). To reduce the penalty for misspeculation or exception recovery, control logic in the processor supports multiple checkpoints. The RDA is one of multiple data structures to have checkpoint copies of state. The RDA utilizes a content addressable memory (CAM) to store physical register numbers. The duplicate counts for both the current state and the checkpoint copies for a given physical register number are updated when instructions utilizing the given physical register number are retired. To reduce on-die real estate and power consumption, a single CAM entry is stores the physical register number and the other fields are stored in separate storage elements.

Type: Grant

Filed: July 31, 2013

Date of Patent: April 12, 2016

Assignee: Apple Inc.

Inventors: Shyam Sundar, Conrado Blasco-Allue
MECHANISM FOR ALLOWING SPECULATIVE EXECUTION OF LOADS BEYOND A WAIT FOR EVENT INSTRUCTION

Publication number: 20160092236

Abstract: A processor includes a mechanism that checks for and flushes only speculative loads and any respective dependent instructions that are younger than an executed wait for event (WEV) instruction, and which also match an address of a store instruction that has been determined to have been executed by a different processor prior to execution of the paired SEV instruction by the different processor. The mechanism may allow speculative loads that do not match the address of any store instruction that has been determined to have been executed by a different processor prior to execution of the paired SEV instruction by the different processor.

Type: Application

Filed: September 30, 2014

Publication date: March 31, 2016

Inventors: Pradeep Kanapathipaillai, Richard F. Russo, Sandeep Gupta, Conrado Blasco
IMMEDIATE BRANCH RECODE THAT HANDLES ALIASING

Publication number: 20160085550

Abstract: A system and method for efficiently indicating branch target addresses. A semiconductor chip predecodes instructions of a computer program prior to installing the instructions in an instruction cache. In response to determining a particular instruction is a control flow instruction with a displacement relative to a program counter address (PC), the chip replaces a portion of the PC relative displacement in the particular instruction with a subset of a target address. The subset of the target address is an untranslated physical subset of the full target address. When the recoded particular instruction is fetched and decoded, the remaining portion of the PC relative displacement is added to a virtual portion of the PC used to fetch the particular instruction. The result is concatenated with the portion of the target address embedded in the fetched particular instruction to form a full target address.

Type: Application

Filed: September 19, 2014

Publication date: March 24, 2016

Inventors: Shyam Sundar, Richard F. Russo, Ronald P. Hall, Conrado Blasco
EARLY LOOP BUFFER ENTRY

Publication number: 20150227374

Abstract: Systems, processors, and methods for determining when to enter loop buffer mode early for loops in an instruction stream. A processor waits until a branch history register has saturated before entering loop buffer mode for a loop if the processor has not yet determined the loop has an unpredictable exit. However, if the loop has an unpredictable exit, then the loop is allowed to enter loop buffer mode early. While in loop buffer mode, the loop is dispatched from a loop buffer, and the front-end of the processor is powered down until the loop terminates.

Type: Application

Filed: February 12, 2014

Publication date: August 13, 2015

Applicant: Apple Inc.

Inventors: Conrado Blasco, Ian D. Kountanis
Size mis-match hazard detection

Patent number: 9081581

Abstract: An out-of-order processor 4 groups program instructions together to control their commitment to complete processing. If an instruction within a group has a source operand dependent upon a plurality of destination operands of other instructions then this is identified as a size mismatch hazard. When the program instruction having the size mismatch hazard reaches a commit point within the processor, then it is flushed together with any speculatively executed succeeding program instructions. Furthermore, the group of program instructions containing the program instruction containing the program instruction having the size mismatch is divided into a plurality of groups of program instructions each containing a single program instruction which are then replayed through the processing mechanisms.

Type: Grant

Filed: November 16, 2010

Date of Patent: July 14, 2015

Assignee: ARM Limited

Inventors: James Nolan Hardage, Conrado Blasco Allue, Glen Andrew Harris
REDUCING POWER CONSUMPTION IN A PROCESSOR

Publication number: 20150169041

Abstract: A processor includes a mechanism for disabling a memory array of a branch prediction unit. The processor may include a next fetch prediction unit that may include a number of entries. Each entry may correspond to a next instruction fetch group and may store an indication of whether or not the corresponding the next fetch group includes a conditional branch instruction. In response to an indication that the next fetch group does not include a conditional branch instruction, the fetch prediction unit may be configured to disable, in a next instruction execution cycle, the memory array of the branch prediction unit.

Type: Application

Filed: December 12, 2013

Publication date: June 18, 2015

Applicant: Apple Inc.

Inventors: Conrado Blasco, Ronald P. Hall, Ramesh B. Gunna, Ian D. Kountanis, Shyam Sundar, André Seznec
Split Register File for Operands of Different Sizes

Publication number: 20150134935

Abstract: In an embodiment, a processor includes a register file having multiple widths corresponding to different operands sizes of a given data type implemented by the processor. For example, the integer register file may have 32 bit and 64 bit widths for 32 and 64 bit operand sizes. The register file may have a section of registers for each operand size, and the map unit may allocate registers from the appropriate section for each instruction operation based on the operand size of that instruction operation. The register file may consume less integrated circuit area than another register file having the same number of registers, all of which are implemented at the largest operand size. In some embodiments, only the register file and the map unit (specifically the free list management logic in the map unit) are changed to implement the multiple-width register file.

Type: Application

Filed: November 11, 2013

Publication date: May 14, 2015

Applicant: Apple Inc.

Inventor: Conrado Blasco
RDA CHECKPOINT OPTIMIZATION

Publication number: 20150039860

Abstract: A system and method for efficiently performing microarchitectural checkpointing. A register rename unit within a processor determines whether a physical register number qualifies to have duplicate mappings. Information for maintenance of the duplicate mappings is stored in a register duplicate array (RDA). To reduce the penalty for misspeculation or exception recovery, control logic in the processor supports multiple checkpoints. The RDA is one of multiple data structures to have checkpoint copies of state. The RDA utilizes a content addressable memory (CAM) to store physical register numbers. The duplicate counts for both the current state and the checkpoint copies for a given physical register number are updated when instructions utilizing the given physical register number are retired. To reduce on-die real estate and power consumption, a single CAM entry is stores the physical register number and the other fields are stored in separate storage elements.

Type: Application

Filed: July 31, 2013

Publication date: February 5, 2015

Applicant: Apple Inc.

Inventors: Shyam Sundar, Conrado Blasco-Allue
NEXT FETCH PREDICTOR RETURN ADDRESS STACK

Publication number: 20140344558

Abstract: A system and method for efficient branch prediction. A processor includes a next fetch predictor to generate a fast branch prediction for branch instructions at an early pipeline stage. The processor also includes a main return address stack (RAS) at a later pipeline stage for predicting the target of return instructions. When a return instruction is encountered, the prediction from the next fetch predictor is replaced by the top of the main RAS. If there are any recent call or return instructions in flight toward the main RAS, then a separate prediction is generated by a mini-RAS.

Type: Application

Filed: May 14, 2013

Publication date: November 20, 2014

Applicant: Apple Inc.

Inventors: Douglas C. Holman, Ramesh B. Gunna, Conrado Blasco-Allue
Mechanism for Reducing Cache Power Consumption Using Cache Way Prediction

Publication number: 20140337605

Abstract: A mechanism for reducing power consumption of a cache memory of a processor includes a processor with a cache memory that stores instruction information for one or more instruction fetch groups fetched from a system memory. The cache memory may include a number of ways that are each independently controllable. The processor also includes a way prediction unit. The way prediction unit may enable, in a next execution cycle, a given way within which instruction information corresponding to a target of a next branch instruction is stored in response to a branch taken prediction for the next branch instruction. The way prediction unit may also, in response to the branch taken prediction for the next branch instruction, enable, one at a time, each corresponding way within which instruction information corresponding to respective sequential instruction fetch groups that follow the next branch instruction are stored.

Type: Application

Filed: May 7, 2013

Publication date: November 13, 2014

Applicant: Apple Inc.

Inventors: Ronald P. Hall, Conrado Blasco-Allue
IT INSTRUCTION PRE-DECODE

Publication number: 20140244976

Abstract: Various techniques for processing and pre-decoding branches within an IT instruction block. Instructions are fetched and cached in an instruction cache, and pre-decode bits are generated to indicate the presence of an IT instruction and the likely boundaries of the IT instruction block. If an unconditional branch is detected within the likely boundaries of an IT instruction block, the unconditional branch is treated as if it were a conditional branch. The unconditional branch is sent to the branch direction predictor and the predictor generates a branch direction prediction for the unconditional branch.

Type: Application

Filed: February 22, 2013

Publication date: August 28, 2014

Applicant: APPLE INC.

Inventors: Shyam Sundar, Ian D. Kountanis, Conrado Blasco-Allue, Gerard R. Williams, III, Wei-Han Lien, Ramesh B. Gunna
Multi-Level Dispatch for a Superscalar Processor

Publication number: 20140215188

Abstract: In an embodiment, a processor includes a multi-level dispatch circuit configured to supply operations for execution by multiple parallel execution pipelines. The multi-level dispatch circuit may include multiple dispatch buffers, each of which is coupled to multiple reservation stations. Each reservation station may be coupled to a respective execution pipeline and may be configured to schedule instruction operations (ops) for execution in the respective execution pipeline. The sets of reservation stations coupled to each dispatch buffer may be non-overlapping. Thus, if a given op is to be executed in a given execution pipeline, the op may be sent to the dispatch buffer which is coupled to the reservation station that provides ops to the given execution pipeline.

Type: Application

Filed: January 25, 2013

Publication date: July 31, 2014

Applicant: Apple Inc.

Inventors: John H. Mylius, Gerard R. Williams, III, Shyam Sundar Balasubramanian, Conrado Blasco-Allue
Arithmetic Branch Fusion

Publication number: 20140208073

Abstract: A processor and method for fusing together an arithmetic instruction and a branch instruction. The processor includes an instruction fetch unit configured to fetch instructions. The processor may also include an instruction decode unit that may be configured to decode the fetched instructions into micro-operations for execution by an execution unit. The decode unit may be configured to detect an occurrence of an arithmetic instruction followed by a branch instruction in program order, wherein the branch instruction, upon execution, changes a program flow of control dependent upon a result of execution of the arithmetic instruction. In addition, the processor may further be configured to fuse together the arithmetic instruction and the branch instruction such that a single micro-operation is formed. The single micro-operation includes execution information based upon both the arithmetic instruction and the branch instruction.

Type: Application

Filed: January 23, 2013

Publication date: July 24, 2014

Applicant: APPLE INC.

Inventors: Conrado Blasco-Allue, Sandeep Gupta

prev 1 2 3 4 5 next