Superscalar Patents (Class 712/23)
  • Publication number: 20140281375
    Abstract: A method and a computer program for a processor simultaneously handle multiple instructions at a time. The method includes labeling of an instruction ending a relevant sample interval from a plurality of such instructions. Further, the method utilizes a buffer to store N more number of entries than actually required, wherein, N refers to the number of RI instructions younger than the instruction ending a sample interval. Further, the method also includes the step of recording relevant instrumentation data corresponding to the sample interval and providing the instrumentation data in response to identification of the sample interval.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Applicant: International Business Machines Corporation
    Inventors: Gregory W. Alexander, Mark S. Farrell, Wolfgang Fischer, Guenter Gerwig, Frank Lehnert, Chung-Lung Shum
  • Patent number: 8812822
    Abstract: A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design for minimizing unscheduled D-cache miss pipeline stalls is provided. The design structure includes an integrated circuit device, which includes a cascaded delayed execution pipeline unit having two or more execution pipelines that begin execution of instructions in a common issue group in a delayed manner relative to each other, and circuitry. The circuitry is configured to receive an issue group of instructions, determine whether the issue group is a load instruction, and if so, schedule the load instruction in a first pipeline of the two or more execution pipelines, and schedule each remaining instruction in the issue group to be executed in remaining pipelines of the two or more pipelines, wherein execution of the load instruction in the first pipeline begins prior to beginning execution of the remaining instructions in the remaining pipelines.
    Type: Grant
    Filed: March 13, 2008
    Date of Patent: August 19, 2014
    Assignee: International Business Machines Corporation
    Inventor: David A. Luick
  • Patent number: 8775147
    Abstract: An algorithm and architecture are disclosed for performing multi-argument associative operations. The algorithm and architecture can be used to schedule operations on multiple facilities for computations or can be used in the development of a model in a modeling environment. The algorithm and architecture resulting from the algorithm use the latency of the components that are used to process the associative operations. The algorithm minimizes the number of components necessary to produce an output of multi-argument associative operations and also can minimize the number of inputs each component receives.
    Type: Grant
    Filed: May 31, 2006
    Date of Patent: July 8, 2014
    Assignee: The MathWorks, Inc.
    Inventors: Alireza Pakyari, Brian K. Ogilvie
  • Patent number: 8649508
    Abstract: A system and method for implementing the Elliptic Curve scalar multiplication method in cryptography, where the Double Base Number System is expressed in decreasing order of exponents and further on using it to determine Elliptic curve scalar multiplication over a finite elliptic curve.
    Type: Grant
    Filed: September 29, 2008
    Date of Patent: February 11, 2014
    Assignee: Tata Consultancy Services Ltd.
    Inventor: Natarajan Vijayarangan
  • Patent number: 8645714
    Abstract: A branch target address cache (BTAC) caches history information associated with branch and switch key instructions previously executed by a microprocessor. The history information includes a target address and an identifier (index into a register file) for identifying key values associated with each of the previous branch and switch key instructions. A fetch unit receives from the BTAC a prediction that the fetch unit fetched a previous branch and switch key instruction and receives the target address and identifier associated with the fetched branch and switch key instruction. The fetch unit also fetches encrypted instruction data at the associated target address and decrypts (via XOR) the fetched encrypted instruction data based on the key values identified by the identifier, in response to receiving the prediction. If the BTAC predicts correctly, a pipeline flush normally associated with the branch and switch key instruction is avoided.
    Type: Grant
    Filed: April 21, 2011
    Date of Patent: February 4, 2014
    Assignee: VIA Technologies, Inc.
    Inventors: G. Glenn Henry, Terry Parks, Brent Bean, Thomas A. Crispin
  • Patent number: 8566570
    Abstract: In a system having a plurality of processing nodes, a control node divides a task into a plurality of sub-tasks, and assigns the sub-tasks to one or more additional processing nodes which execute the assigned sub-tasks and return the results to the control node, thereby enabling a plurality of processing nodes to efficiently and quickly perform memory initialization and test of all assigned sub-tasks.
    Type: Grant
    Filed: October 10, 2012
    Date of Patent: October 22, 2013
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Oswin E. Housty
  • Publication number: 20130232359
    Abstract: Embodiments of a processing architecture are described. The architecture includes a fetch unit for fetching instructions from a data bus. A scheduler receives data from the fetch unit and creates a schedule allocates the data and schedule to a plurality of computational units. The scheduler also modifies voltage and frequency settings of the processing architecture to optimize power consumption and throughput of the system. The computational units include control units and execute units. The control units receive and decode the instructions and send the decoded instructions to execute units. The execute units then execute the instructions according to relevant software.
    Type: Application
    Filed: March 1, 2012
    Publication date: September 5, 2013
    Applicant: NXP B.V.
    Inventors: Hamed Fatemi, Ajay Kapoor, J. Pineda de Gyvez
  • Patent number: 8516223
    Abstract: A priority circuit is connected to a reservation station and a plurality of arithmetic units that processes different operations and dispatches, when it is determined that an executable flag indicating that an instruction can be executed by only a specific arithmetic unit is on, an instruction to an arithmetic unit that is different from the specific arithmetic unit and of which a queue is vacant in accordance with the input performed by an instruction decoder and the reservation station.
    Type: Grant
    Filed: June 29, 2010
    Date of Patent: August 20, 2013
    Assignee: Fujitsu Limited
    Inventors: Atsushi Fusejima, Yasunobu Akizuki, Toshio Yoshida
  • Patent number: 8468540
    Abstract: A multi-streaming processor has a plurality of streams for streaming one or more instruction threads, a set of functional resources for processing instructions from streams, and interrupt handler logic. The logic detects and maps interrupts and exceptions to one or more specific streams. In some embodiments, one interrupt or exception may be mapped to two or more streams, and in others two or more interrupts or exceptions may be mapped to one stream. Mapping may be static and determined at processor design, programmable, with data stored and amendable, or conditional and dynamic, the interrupt logic executing an algorithm sensitive to variables to determine the mapping. Interrupts may be external interrupts generated by devices external to the processor software (internal) interrupts generated by active streams, or conditional, based on variables. After interrupts are acknowledged, streams to which interrupts or exceptions are mapped are vectored to appropriate service routines.
    Type: Grant
    Filed: March 7, 2011
    Date of Patent: June 18, 2013
    Assignee: Bridge Crossing, LLC
    Inventors: Mario D. Nemirovsky, Adolfo M. Nemirovsky, Narendra Sankar
  • Patent number: 8464089
    Abstract: A tracing apparatus for tracing operational information that is output from a plurality of processing units in relation to data processing operations, the tracing apparatus comprising for each of the processing units: a counting unit configured to obtain and output a counter value for the corresponding processing unit, the counter value obtained by counting clock signals that are input to the processing unit at an operating frequency thereof; a counter value conversion unit configured to obtain and output a converted counter value for the corresponding processing unit, the converted counter value obtained by converting the counter value based on the assumption that the processing unit has a given reference operating frequency; and an adding unit configured to acquire an operational information set from the corresponding processing unit, and to add the converted counter value to the operational information set.
    Type: Grant
    Filed: June 3, 2010
    Date of Patent: June 11, 2013
    Assignee: Panasonic Corporation
    Inventors: Kazuhiro Watanabe, Takashi Hashimoto
  • Patent number: 8402256
    Abstract: The present invention is directed to realize efficient issue of a superscalar instruction in an instruction set including an instruction with a prefix. A circuit is employed which retrieves an instruction of each instruction code type other than a prefix on the basis of a determination result of decoders for determining an instruction code type, adds the immediately preceding instruction to the retrieved instruction, and outputs the resultant to instruction executing means. When an instruction of a target instruction code type is detected in a plurality of instruction units to be searched, the circuit outputs the detected instruction code and the immediately preceding instruction other than the target instruction code type as prefix code candidates. When an instruction of a target instruction code type cannot be detected at the rear end of the instruction units to be searched, the circuit outputs the instruction at the rear end as a prefix code candidate.
    Type: Grant
    Filed: August 25, 2009
    Date of Patent: March 19, 2013
    Assignee: Renesas Electronics Corporation
    Inventor: Fumio Arakawa
  • Patent number: 8307198
    Abstract: In a system having a plurality of processing nodes, a control node divides a task into a plurality of sub-tasks, and assigns the sub-tasks to one or more additional processing nodes which execute the assigned sub-tasks and return the results to the control node, thereby enabling a plurality of processing nodes to efficiently and quickly perform memory initialization and test of all assigned sub-tasks.
    Type: Grant
    Filed: November 24, 2009
    Date of Patent: November 6, 2012
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Oswin E. Housty
  • Publication number: 20120226891
    Abstract: Methods and apparatuses are provided for increased efficiency in a processor via control word prediction. The apparatus comprises an operational unit capable of determining whether an instruction will change a first control word to a second control word for processing dependent instructions. Execution units process the dependent instructions using a predicted control word and compare the second control word to the predicted control word. A scheduling unit causes the execution units to reprocess the dependent instructions when the predicted control word does not match the second control word. The method comprises determining that an instruction will change a first control word to a second control word and processing the dependent instructions using a predicted control word. The second control word is compared to the predicted control word and the dependent instructions are reprocessed using the second control word when the predicted control word does not match the second control word.
    Type: Application
    Filed: March 1, 2011
    Publication date: September 6, 2012
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Michael D. ESTLICK, Jay FLEISCHMAN, Debjit Das Sarma, Emil TALPES, Krishnan V. RAMANI, Chun LIU
  • Patent number: 8261085
    Abstract: According to some implementations methods, apparatus and systems are provided involving the use of processors having at least one core with a security component, the security component adapted to read and verify data within data blocks stored in a L1 instruction cache memory and to allow the execution of data block instructions in the core only upon the instructions being verified by the use of a cryptographic algorithm.
    Type: Grant
    Filed: September 26, 2011
    Date of Patent: September 4, 2012
    Assignee: Media Patents, S.L.
    Inventor: Álvaro Fernández Gutiérrez
  • Patent number: 8250395
    Abstract: A mechanism is provided for controlling operational parameters associated with a plurality of processors. A control system in the data processing system determines a utilization slack value of the data processing system. The utilization slack value is determined using one or more active core count values and one or more slack core count values. The control system computes a new utilization metric to be a difference between a full utilization value and the utilization slack value. The control system determines whether the new utilization metric is below a predetermined utilization threshold. Responsive to the new utilization metric being below the predetermined utilization threshold, the control system decreases a frequency of the plurality of processors.
    Type: Grant
    Filed: November 12, 2009
    Date of Patent: August 21, 2012
    Assignee: International Business Machines Corporation
    Inventors: John B. Carter, Heather L. Hanson, Karthick Rajamani, Freeman L. Rawson, III, Todd J. Rosedahl, Malcolm S. Ware
  • Patent number: 8245015
    Abstract: A processor includes a plurality of executing sections configured to simultaneously execute instructions for a plurality of threads, an instruction issuing section configured to issue instructions to the plurality of executing sections, and an instruction sync monitoring section configured to, when an instruction-synchronizing instruction is issued to one or more of the plurality of executing sections from the instruction issuing section, monitor completion of execution of the instruction-synchronizing instruction for each of the executing sections, to which the instruction-synchronizing instruction has been issued, thus detecting completion of execution of preceding instructions for the thread to which the instruction-synchronizing instruction belongs.
    Type: Grant
    Filed: July 7, 2009
    Date of Patent: August 14, 2012
    Assignee: Sony Corporation
    Inventor: Masaaki Ishii
  • Patent number: 8219784
    Abstract: A computer-implemented method and apparatus for managing an out of order dispatched instruction queue in a microprocessor. In one embodiment, the method and apparatus include assigning a group identification number and a target identification number to an instruction in an instruction stream. The group identification number and the target identification number are labeled inside an instruction fetcher unit. The group identification number and the target identification number are pre-decoded. The instruction is sent to an instruction queue. The instruction is re-ordered in the instruction stream after executing the instruction utilizing information from the pre-decoding of the group identification number and the target identification number.
    Type: Grant
    Filed: December 9, 2008
    Date of Patent: July 10, 2012
    Assignee: International Business Machines Corporation
    Inventors: Oliver Keren Ban, Xiangang Cheng, Liang Huang Lee, Katherine June Pearsall
  • Patent number: 8200950
    Abstract: A pipeline operation processor comprises a pipeline processing unit and an instruction insertion controller which inserts an instruction when access to an operation memory is requested, and corrects control information by reference to control information of stages. When a control program is in execution, on receiving an access request instruction requesting for access to the operation memory, the instruction insertion controller inserts an NOP instruction from the instruction decoding unit in place of the access request instruction. The access request instruction is executed while the pipeline processing unit executes no operation, and subsequently, the pipeline processing is continued.
    Type: Grant
    Filed: June 4, 2009
    Date of Patent: June 12, 2012
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Motohiko Okabe
  • Patent number: 8195759
    Abstract: A mechanism is provided for accessing, by an application running on a first processor, operating system services from an operating system running on a second processor by performing an assisted call. A data plane processor first constructs a parameter area based on the input and output parameters for the function that requires control processor assistance. The current values for the input parameters are copied into the parameter area. An assisted call message is generated based on a combination of a pointer to the parameter area and a specific library function opcode for the library function that is being called. The assisted call message is placed into the processor's stack immediately following a stop-and-signal instruction. The control plane processor is signaled to perform the library function corresponding to the opcode on behalf of the data plane processor by executing a stop and signal instruction.
    Type: Grant
    Filed: May 29, 2008
    Date of Patent: June 5, 2012
    Assignee: International Business Machines Corporation
    Inventors: Daniel A. Brokenshire, Mark R. Nutter
  • Patent number: 8179540
    Abstract: An image forming apparatus is provided that holds counter information obtained by integrating a consumption of a consumable that depends on usage of service provided by the image forming apparatus. A log corresponding to the usage of the service is set in job log information with a synchronization flag set off. The log in the job log information, for which the synchronization flag is set off, is set on. The counter information and the job log information are output after the synchronization flag for the log having the synchronization flag set off has been set on.
    Type: Grant
    Filed: October 29, 2008
    Date of Patent: May 15, 2012
    Assignee: Canon Kabushiki Kaisha
    Inventors: Junichi Hiruma, Nobuyuki Tonegawa
  • Patent number: 8140831
    Abstract: Disclosed are a method and system for reducing complexity of routing of instructions from an instruction issue queue to appropriate execution pipelines in a superscalar processor. In one or more embodiments, an instruction steering unit of the superscalar processor receives ordered instructions. The steering unit determines that a first instruction and a subsequent second instruction of the ordered instructions are non-branching instructions, and the steering unit stores the first and second instructions in two non-branching instruction issue queue entries of a shadow queue. The steering unit determines whether or not a third instruction the ordered instructions is a branch instruction, where the third instruction is subsequent to the second instruction. If the third instruction is a branch instruction, the steering unit stores the third instruction in a branch entry of the shadow queue; otherwise, the steering unit stores a no operation instruction in the branch entry of the shadow queue.
    Type: Grant
    Filed: March 27, 2009
    Date of Patent: March 20, 2012
    Assignee: International Business Machines Corporation
    Inventors: Anthony J. Bybell, Kenichi Tsuchlya
  • Patent number: 8074060
    Abstract: A microprocessor for improving out-of-order superscalar execution unit utilization with a relatively small in-order instruction retirement buffer. A plurality of execution units each calculate an instruction result. The instruction is either an excepting type instruction or a non-excepting type instruction. The excepting type instruction is capable of causing the microprocessor to take an exception after being issued to the execution unit, wherein the non-excepting type instruction is incapable of causing the microprocessor to take an exception after being issued. A retire unit makes a determination that an instruction is the oldest instruction in the microprocessor and that the instruction is ready to update the architectural state of the microprocessor with its result.
    Type: Grant
    Filed: November 25, 2008
    Date of Patent: December 6, 2011
    Assignee: VIA Technologies, Inc.
    Inventors: Gerard M. Col, Brent Bean, Bryan Wayne Pogor
  • Patent number: 8074052
    Abstract: A tag monitoring system for assigning tags to instructions embodied in software on a tangible computer-readable storage medium. A source supplies instructions to be executed by a functional unit. A queue having a plurality of slots containing tags which are used for tagging instructions. A register file stores information required for the execution of each instruction at a location in the register file defined by the tag assigned to that instruction. A control unit monitors the completion of executed instructions and advances the tags in the queue upon completion of an executed instruction. The register file also contains a plurality of read address enable ports and corresponding read output ports. Each of the slots from the queue is coupled to a corresponding one of the read address enable ports. Thus, the information for each instruction can be read out of the register file in program order.
    Type: Grant
    Filed: September 15, 2008
    Date of Patent: December 6, 2011
    Assignee: Seiko Epson Corporation
    Inventors: Kevin R. Iadonato, Trevor A. Deosaran, Sanjiv Garg
  • Patent number: 8068109
    Abstract: Task and data management systems methods and apparatus are disclosed. A processor event that requires more memory space than is available in a local storage of a co-processor is divided into two or more segments. Each segment has a segment size that is less than or the same as an amount of memory space available in the local storage. The segments are processed with one or more co-processors to produce two or more corresponding outputs. The two or more outputs are associated into one or more groups. Each group is less than or equal to a target data size associated with a subsequent process.
    Type: Grant
    Filed: June 8, 2010
    Date of Patent: November 29, 2011
    Assignee: Sony Computer Entertainment Inc.
    Inventors: Richard B. Stenson, John P. Bates
  • Patent number: 8019975
    Abstract: The present invention provides a system and method for managing load and store operations necessary for reading from and writing to memory or I/O in a superscalar RISC architecture environment. To perform this task, a load store unit is provided whose main purpose is to make load requests out of order whenever possible to get the load data back for use by an instruction execution unit as quickly as possible. A load operation can only be performed out of order if there are no address collisions and no write pendings. An address collision occurs when a read is requested at a memory location where an older instruction will be writing. Write pending refers to the case where an older instruction requests a store operation, but the store address has not yet been calculated. The data cache unit returns 8 bytes of unaligned data. The load/store unit aligns this data properly before it is returned to the instruction execution unit.
    Type: Grant
    Filed: April 25, 2005
    Date of Patent: September 13, 2011
    Assignee: Seiko-Epson Corporation
    Inventors: Cheryl Senter Brashears, Johannes Wang, Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
  • Publication number: 20110167241
    Abstract: A high-speed lookup table is designed using Rapid Single Flux Quantum (RSFQ) logic elements and fabricated using superconducting integrated circuits. The lookup table is composed of an address decoder and a programmable read-only memory array (PROM). The memory array has rapid parallel pipelined readout and slower serial reprogramming of memory contents. The memory cells are constructed using standard non-destructive reset-set flip-flops (RSN cells) and data flip-flops (DFF cells). An n-bit address decoder is implemented in the same technology and closely integrated with the memory array to achieve high-speed operation as a lookup table. The circuit architecture is scalable to large two-dimensional data arrays.
    Type: Application
    Filed: March 8, 2011
    Publication date: July 7, 2011
    Applicant: HYPRES, INC.
    Inventors: Alex F. Kirichenko, Timur V. Filippov, Deepnarayan Gupta
  • Patent number: 7966478
    Abstract: A method for reducing entries searched in a load reorder queue (LRQ) when snoop instructions are executed by a processor, including checking load reorder queue (LRQ) entries located between a load_peril_snoop register and a lrq_tail register for addresses matching the address of the snoop; and setting a snooped bit in the LRQ entry for any matches found.
    Type: Grant
    Filed: July 14, 2008
    Date of Patent: June 21, 2011
    Assignee: International Business Machines Corporation
    Inventors: Erik R. Altman, Vijayalakshmi Srinivasan
  • Patent number: 7941636
    Abstract: Disclosed herein is an apparatus that implements multiple typed register sets, and applications thereof. The apparatus includes an execution unit and a register file. The execution unit is configured to execute instructions including one or more fields. The register file is configured to store operands defined by the one or more fields and is configured to store results of execution of the instructions in a destination defined by the one or more fields. The register file includes (i) a first register set having a register configured to store data of a single data type and (ii) a second register set having a register configured to store data of a plurality of data types.
    Type: Grant
    Filed: December 31, 2009
    Date of Patent: May 10, 2011
    Assignee: Intellectual Venture Funding LLC
    Inventors: Sanjiv Garg, Derek J. Lentz, Le Trong Nguyen, Sho Long Chen
  • Patent number: 7941635
    Abstract: The high-performance, RISC core based microprocessor architecture includes an instruction fetch unit for fetching instruction sets from an instruction store and an execution unit that implements the concurrent execution of a plurality of instructions through a parallel array of functional units. The fetch unit generally maintains a predetermined number of instructions in an instruction buffer. The execution unit includes an instruction selection unit, coupled to the instruction buffer, for selecting instructions for execution, and a plurality of functional units for performing instruction specified functional operations. A unified instruction scheduler, within the instruction selection unit, initiates the processing of instructions through the functional units when instructions are determined to be available for execution and for which at least one of the functional units implementing a necessary computational function is available.
    Type: Grant
    Filed: December 19, 2006
    Date of Patent: May 10, 2011
    Assignee: Seiko-Epson Corporation
    Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
  • Patent number: 7926062
    Abstract: A multi-streaming processor has a plurality of streams for streaming one or more instruction threads, a set of functional resources for processing instructions from streams, and interrupt handler logic. The logic detects and maps interrupts and exceptions to one or more specific streams. In some embodiments, one interrupt or exception may be mapped to two or more streams, and in others two or more interrupts or exceptions may be mapped to one stream. Mapping may be static and determined at processor design, programmable, with data stored and amendable, or conditional and dynamic, the interrupt logic executing an algorithm sensitive to variables to determine the mapping. Interrupts may be external interrupts generated by devices external to the processor software (internal) interrupts generated by active streams, or conditional, based on variables. After interrupts are acknowledged, streams to which interrupts or exceptions are mapped are vectored to appropriate service routines.
    Type: Grant
    Filed: April 29, 2009
    Date of Patent: April 12, 2011
    Assignee: MIPS Technologies, Inc.
    Inventors: Mario D. Nemirovsky, Adolfo M. Nemirovsky, Narendra Sankar
  • Patent number: 7900207
    Abstract: A multi-streaming processor has a plurality of streams for streaming one or more instruction threads, a set of functional resources for processing instructions from streams, and interrupt handler logic. The logic detects and maps interrupts and exceptions to one or more specific streams. In some embodiments, one interrupt or exception may be mapped to two or more streams, and in others two or more interrupts or exceptions may be mapped to one stream. Mapping may be static and determined at processor design, programmable, with data stored and amendable, or conditional and dynamic, the interrupt logic executing an algorithm sensitive to variables to determine the mapping. Interrupts may be external interrupts generated by devices external to the processor software (internal) interrupts generated by active streams, or conditional, based on variables. After interrupts are acknowledged, streams to which interrupts or exceptions are mapped are vectored to appropriate service routines.
    Type: Grant
    Filed: November 19, 2008
    Date of Patent: March 1, 2011
    Assignee: MIPS Technologies, Inc.
    Inventors: Mario D. Nemirovsky, Adolfo M. Nemirovsky, Narendra Sankar
  • Publication number: 20110035569
    Abstract: A superscalar pipelined microprocessor includes a register set defined by its instruction set architecture, a cache memory, execution units, and a load unit, coupled to the cache memory and distinct from the other execution units. The load unit comprises an ALU. The load unit receives an instruction that specifies a memory address of a source operand, an operation to be performed on the source operand to generate a result, and a destination register of the register set to which the result is to be stored. The load unit reads the source operand from the cache memory. The ALU performs the operation on the source operand to generate the result, rather than forwarding the source operand to any of the other execution units of the microprocessor to perform the operation on the source operand to generate the result. The load unit outputs the result for subsequent retirement to the destination register.
    Type: Application
    Filed: October 30, 2009
    Publication date: February 10, 2011
    Inventors: Gerard M. Col, Colin Eddy, Rodney E. Hooker
  • Publication number: 20110035570
    Abstract: A superscalar pipelined microprocessor includes a register set defined by an instruction set architecture of the microprocessor, execution units, and a store unit, coupled to the cache memory and distinct from the other execution units of the microprocessor. The store unit comprises an ALU. The store unit receives an instruction that specifies a source register of the register set and an operation to be performed on a source operand to generate a result. The store unit reads the source operand from the source register. The ALU performs the operation on the source operand to generate the result, rather than forwarding the source operand to any of the other execution units of the microprocessor to perform the operation on the source operand to generate the result. The store unit operatively writes the result to the cache memory.
    Type: Application
    Filed: October 30, 2009
    Publication date: February 10, 2011
    Inventors: Gerard M. Col, Colin Eddy, Rodney E. Hooker
  • Publication number: 20100318772
    Abstract: A system and method for increasing processor throughput by decreasing a loop critical path. In one embodiment, a table comprises multiple stack entries, each comprising an x87 floating-point (FP) stack specifier. The combinatorial logic for operand translation of N FP instructions per clock cycle may require N instantiated copies of a combinatorial logic block. Each instantiated copy may determine a new ordering of the stack entries. Control logic may receive necessary information from the corresponding N FP instructions and determine a corresponding combined computational effect, or stack reordering, on entries within the table based on two or more instructions. Resulting control signals are conveyed to the N instantiated copies. A resulting accumulative delay from an input of the first copy to the output of the Nth copy may be less than or equal to (N?1)*time_delay versus a longer N*time_delay.
    Type: Application
    Filed: June 11, 2009
    Publication date: December 16, 2010
    Inventors: Ranganathan Sudhakar, Daryl Lieu, Debjit Das Sarma
  • Patent number: 7844797
    Abstract: The present invention provides a system and method for managing load and store operations necessary for reading from and writing to memory or I/O in a superscalar RISC architecture environment. To perform this task, a load store unit is provided whose main purpose is to make load requests out of order whenever possible to get the load data back for use by an instruction execution unit as quickly as possible. A load operation can only be performed out of order if there are no address collisions and no write pendings. An address collision occurs when a read is requested at a memory location where an older instruction will be writing. Write pending refers to the case where an older instruction requests a store operation, but the store address has not yet been calculated. The data cache unit returns 8 bytes of unaligned data. The load/store unit aligns this data properly before it is returned to the instruction execution unit.
    Type: Grant
    Filed: May 6, 2009
    Date of Patent: November 30, 2010
    Assignee: Seiko Epson Corporation
    Inventors: Cheryl D. Senter, Johannes Wang
  • Patent number: 7840788
    Abstract: A process which automatically inserts commands that test for and raise exceptions indicating floating point status exceptions into a sequence of instructions to be executed, re-ordering a pipelined instructions by moving a floating point instruction from after a branch instruction to before the branch instruction, and responds to exceptions in execution of the sequence of instructions by returning execution to a point in the sequence of instructions at which correct state is known and then executing each instruction in the sequence singly to completion so that exceptions in pipelined floating point instructions can be automatically-detected and handled precisely.
    Type: Grant
    Filed: February 26, 2008
    Date of Patent: November 23, 2010
    Inventors: Guillermo J. Rozas, David Dunn, Robert F. Cmelik
  • Patent number: 7822881
    Abstract: In a data-processing method, first result data may be obtained using a plurality of configurable coarse-granular elements, the first result data may be written into a memory that includes spatially separate first and second memory areas and that is connected via a bus to the plurality of configurable coarse-granular elements, the first result data may be subsequently read out from the memory, and the first result data may be subsequently processed using the plurality of configurable coarse-granular elements. In a first configuration, the first memory area may be configured as a write memory, and the second memory area may be configured as a read memory. Subsequent to writing to and reading from the memory in accordance with the first configuration, the first memory area may be configured as a read memory, and the second memory area may be configured as a write memory.
    Type: Grant
    Filed: October 7, 2005
    Date of Patent: October 26, 2010
    Inventors: Martin Vorbach, Robert Münch
  • Patent number: 7802074
    Abstract: A register renaming system for out-of-order execution of a set of reduced instruction set computer instructions having addressable source and destination register fields, adapted for use in a computer having an instruction execution unit with a register file accessed by read address ports and for storing instruction operands. A data dependance check circuit is included for determining data dependencies between the instructions. A tag assignment circuit generates one or more tags to specify the location of operands, based on the data dependencies determined by the data dependance check circuit. A set of register file port multiplexers select the tags generated by the tag assignment circuit and pass the tags onto the read address ports of the register file for storing execution results.
    Type: Grant
    Filed: April 2, 2007
    Date of Patent: September 21, 2010
    Inventors: Sanjiv Garg, Kevin Ray Iadonato, Le Trong Nguyen, Johannes Wang
  • Patent number: 7739482
    Abstract: A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order.
    Type: Grant
    Filed: December 21, 2006
    Date of Patent: June 15, 2010
    Assignee: Seiko Epson Corporation
    Inventors: Le Trong Nguyen, Derek J. Lentz, Yoshiyuki Miyayama, Sanjiv Garg, Yasuaki Hagiwara, Johannes Wang, Te-Li Lau, Sze-Shun Wang, Quang H. Trang
  • Patent number: 7734897
    Abstract: A superscalar data processing apparatus and method are provided for processing operations, the apparatus having a plurality of execution threads and each execution thread being operable to process a sequence of operations including at least one memory access operation. The superscalar data processing apparatus comprises a plurality of execution pipelines for executing the operations, and issue logic for allocating each operation to one of the execution pipelines for execution by that execution pipeline. At least two of the execution pipelines are memory access capable pipelines which can execute memory access operations, and each memory access capable pipeline is associated with a subset of the plurality of execution threads. The issue logic is arranged, for each execution thread, to allocate any memory access operations of that execution thread to an associated memory access capable pipeline.
    Type: Grant
    Filed: December 21, 2005
    Date of Patent: June 8, 2010
    Assignee: ARM Limited
    Inventor: David Hennah Mansell
  • Patent number: 7734827
    Abstract: Secure operation of cell processors is disclosed. A cell processor receives a secure file image from a client device at a cell processor of a host device (host cell processor), wherein the secure file image includes an encrypted SPU image.
    Type: Grant
    Filed: October 24, 2005
    Date of Patent: June 8, 2010
    Assignee: Sony Computer Entertainment, Inc.
    Inventor: Tatsuya Iwamoto
  • Patent number: 7712080
    Abstract: The present invention relates generally to computer programming, and more particularly to systems and methods for parallel distributed programming. Generally, a parallel distributed program is configured to operate across multiple processors and multiple memories. In one aspect of the invention, a parallel distributed program includes a distributed shared variable located across the multiple memories and distributed programs capable of operating across multiple processors.
    Type: Grant
    Filed: May 21, 2004
    Date of Patent: May 4, 2010
    Assignee: The Regents of the University of California
    Inventors: Lei Pan, Lubomir R. Bic, Michael B. Dillencourt
  • Patent number: 7711934
    Abstract: A processor core and method for managing branch misprediction in an out-of-order processor pipeline. In one embodiment, the pipeline of the processor core includes a front-end instruction fetch portion, a back-end instruction execution portion, and pipeline control logic. Operation of the instruction fetch portion is decoupled from operation of the instruction execution portion. Following detection of a control transfer misprediction, operation of the instruction fetch portion is halted and instructions residing in the instruction fetch portion are invalidated. When the instruction associated with the misprediction reaches a selected pipeline stage, instructions residing in the instruction execution portion of the pipeline are invalidated and the flow of instructions from the instruction fetch portion to the instruction execution portion of the processor pipeline is restarted.
    Type: Grant
    Filed: October 31, 2005
    Date of Patent: May 4, 2010
    Assignee: MIPS Technologies, Inc.
    Inventors: Karagada Ramarao Kishore, Kjeld Svendsen, Vidya Rajagopalan
  • Patent number: 7694112
    Abstract: A method for executing multiple computational primitives is provided in accordance with exemplary embodiments. A first computational unit and at least a second computational unit cooperate to execute multiple computational primitives. The first computational unit independently computes other computational primitives. By virtue of arbitration for shared source operand buses or shared result buses, availability of the first and second computational units needed to execute cooperatively the multiple computational primitives is assured by a process of reservation as used for a computational primitive executed on a dedicated computational unit.
    Type: Grant
    Filed: January 31, 2008
    Date of Patent: April 6, 2010
    Assignee: International Business Machines Corporation
    Inventors: Harry S. Barowski, J. Adam Butts, Stephen V. Kosonocky, Silvia M. Mueller, Jochen Preiss
  • Patent number: 7685402
    Abstract: A register system for a data processor which operates in a plurality of modes. The register system provides multiple, identical banks of register sets, the data processor controlling access such that instructions and processes need not specify any given bank. An integer register set includes first (RA[23:0]) and second (RA[31:24]) subsets, and a shadow subset (RT[31:24]). While the data processor is in a first mode, instructions access the first and second subsets. While the data processor is in a second mode, instructions may access the first subset, but any attempts to access the second subset are re-routed to the shadow subset instead, transparently to the instructions, allowing system routines to seemingly use the second subset without having to save and restore data which user routines have written to the second subset. A re-typable register set provides integer width data and floating point width data in response to integer instructions and floating point instructions, respectively.
    Type: Grant
    Filed: January 9, 2007
    Date of Patent: March 23, 2010
    Inventors: Sanjiv Garg, Derek J. Lentz, Le Trong Nguyen, Sho Long Chen
  • Patent number: 7664935
    Abstract: A system and method for extracting complex, variable length computer instructions from a stream of complex instructions each subdivided into a variable number of instructions bytes, and aligning instruction bytes of individual ones of the complex instructions. The system receives a portion of the stream of complex instructions and extracts a first set of instruction bytes starting with the first instruction bytes, using an extract shifter. The set of instruction bytes are then passed to an align latch where they are aligned and output to a next instruction detector. The next instruction detector determines the end of the first instruction based on said set of instruction bytes. An extract shifter is used to extract and provide the next set of instruction bytes to an align shifter which aligns and outputs the next instruction. The process is then repeated for the remaining instruction bytes in the stream of complex instructions.
    Type: Grant
    Filed: March 11, 2008
    Date of Patent: February 16, 2010
    Inventors: Brett Coon, Yoshiyuki Miyayama, Le Trong Nguyen, Johannes Wang
  • Patent number: 7653804
    Abstract: A signal processing network and method for generating code for such a signal processing network are described. Pipeline blocks are each coupled to receive control signaling and associated information signaling from a scheduler. Each of the pipeline blocks respectively includes an allocation unit, a pipeline, and section controllers. The allocation unit is configured to provide a lock signal and sequence information to the section controllers in each of the pipeline blocks. The section controllers are configured to maintain in order inter-pipeline execution of the sequence responsive to the sequence information and the lock signal.
    Type: Grant
    Filed: January 26, 2006
    Date of Patent: January 26, 2010
    Assignee: Xilinx, Inc.
    Inventors: Thomas A. Lenart, Jorn W. Janneck
  • Patent number: 7590824
    Abstract: Techniques for processing transmissions in a communications (e.g., CDMA) system. A method and system for issuing and executing mixed architecture instructions in a multiple-issue digital signal processor receives in a mixed instruction listing a plurality of digital signal processor instructions. The plurality of digital signal processor instructions includes a plurality of parallel executable instructions (e.g., VLIW instructions or instruction packets) mixed among a plurality of series executable instructions (e.g., superscalar instructions). The series executable instructions are associated by various instruction dependencies. The method and system further identify in the mixed instruction listing the plurality of parallel executable instructions. Once identified, the parallel executable instructions are first executed in parallel irrespective of any such instruction's relative order in the mixed instruction listing.
    Type: Grant
    Filed: March 29, 2005
    Date of Patent: September 15, 2009
    Assignee: QUALCOMM Incorporated
    Inventors: Muhammad Ahmed, Erich Plondke, Lucian Codrescu, William C. Anderson
  • Publication number: 20090228729
    Abstract: A microelectronic device according to the present invention is made up of two or more functional units, which are all disposed on a single chip, or die. The present invention works on the strategy that all of the functional units on the die are not, and do not need to be operational at a given time in the execution of a computer program that is controlling the microelectronic device. The present invention on a very rapid basis (typically a half clock cycle), therefore, turns on and off the functional units of the microelectronic device in accordance with the requirements of the program being executed. This power down can be achieved by one of three techniques; turning off clock inputs to the functional units, interrupting the supply of power to the functional units, or deactivating input signals to the functional units.
    Type: Application
    Filed: February 19, 2009
    Publication date: September 10, 2009
    Applicant: Seiko Epson Corporation
    Inventor: Chong Ming LIN
  • Patent number: RE41293
    Abstract: Global address and data routers interconnect individual system units each having its own processors, memory, and I/O. A domain filter coupled to the routers dynamically defines groups of system units as domains and clusters of domains which have both software and hardware isolation from each other. Clusters can share dynamically definable ranges of memory with each other. The domain filter has software-loadable registers on the system units and in the global routers to set the parameters of the domains and clusters. The registers label individual inter-system transactions on the routers as invalid for system units not in the same domain or cluster as the originating unit.
    Type: Grant
    Filed: August 1, 2001
    Date of Patent: April 27, 2010
    Assignee: Sun Microsystems, Inc.
    Inventors: Daniel P. Drogichen, Andrew J. McCrocklin, Nicholas E. Aneshansley