Commitment Control Or Register Bypass Patents (Class 712/218)
  • Patent number: 10318302
    Abstract: Certain embodiments of the present disclosure support a method and apparatus for efficient multithreading on a single core microprocessor. Thread switching in the single core microprocessor presented herein is based on a reserved space in a memory allocated to each thread for storing and restoring of registers in a register file. The thread switching is achieved without full save and restore of the register file, and only those registers referenced in the memory are saved and restored during thread switching.
    Type: Grant
    Filed: June 3, 2016
    Date of Patent: June 11, 2019
    Assignee: Synopsys, Inc.
    Inventor: Thang Tran
  • Patent number: 10296316
    Abstract: A method is for generating a parallel program for a multicore microcomputer from processes in a single program for a single core. The method includes extraction procedure, association procedure, and analysis procedure. The extraction procedure extracts (i) an extracted address of an accessed data item, which is among data items stored in a storage area together with the processes and accessed when each process is executed and (ii) an extracted symbol name of the accessed data item. The association procedure associates an associated address in the storage area storing the accessed data item of the extracted symbol name with the extracted symbol name. The analysis procedure analyzes a dependency between each process based on the extracted address and the associated address, and determines that two processes accessing an identical address have a dependency while determining that two processes not accessing an identical address have no dependency.
    Type: Grant
    Filed: November 14, 2016
    Date of Patent: May 21, 2019
    Assignee: DENSO CORPORATION
    Inventor: Kenichi Mineda
  • Patent number: 10277955
    Abstract: A signal receiver chip may be configured to receive a satellite signal, and when the satellite signal is partially-processed off-chip, to bypass at least a portion of processing functions applied in the signal receiver chip during processing of satellite signals. The bypassed processing functions may comprise or correspond to signal band conversions. The satellite signal chip may generate an output signal, corresponding to the satellite signal, with the output signal being configured for communication to a peer device (e.g., satellite STB). The output signal may be generated and/or configured such that to enable distributing content carried in the output signal to a plurality of client devices in a local network serviced by the peer device. The signal receiver chip may combine a plurality of portions, corresponding to a plurality of satellite signals, into the output signal.
    Type: Grant
    Filed: October 13, 2015
    Date of Patent: April 30, 2019
    Assignee: MAXLINEAR, INC.
    Inventors: Raja Pullela, Glenn Chang, Curtis Ling
  • Patent number: 10241797
    Abstract: A method for reducing a number of operations replayed in a processor includes decoding an operation to determine a memory address and a command in the operation. If data is not in a way predictor based on the memory address, a suppress wakeup signal is sent to an operation scheduler, and the operation scheduler suppresses waking up other operations that are dependent on the data.
    Type: Grant
    Filed: July 17, 2012
    Date of Patent: March 26, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Ganesh Venkataramanan, Mike Butler, Krishnan V. Ramani
  • Patent number: 10209995
    Abstract: A processor core supporting out-of-order execution (OOE) includes load-hit-store (LHS) hazard prediction at the instruction execution phase, reducing load instruction rejections and queue flushes at the dispatch phase. The instruction dispatch unit (IDU) detects likely LHS hazards by generating entries for pending stores in a LHS detection table. The entries in the table contain an address field (generally the immediate field) of the store instruction and the register number of the store. The IDU compares the address field and register number for each load with entries in the table to determine if a likely LHS hazard exists and if an LHS hazard is detected, the load is dispatched to the issue queue of the load-store unit (LSU) with a tag corresponding to the matching store instruction, causing the LSU to dispatch the load only after the corresponding store has been dispatched for execution.
    Type: Grant
    Filed: October 24, 2014
    Date of Patent: February 19, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sundeep Chadha, Richard James Eickemeyer, John Barry Griswell, Jr., Dung Quoc Nguyen
  • Patent number: 10198298
    Abstract: The technology disclosed improves existing streaming processing systems by allowing the ability to both scale up and scale down resources within an infrastructure of a stream processing system. In particular, the technology disclosed relates to a dispatch system for a stream processing system that adapts its behavior according to a computational capacity of the system based on a run-time evaluation. The technical solution includes, during run-time execution of a pipeline, comparing a count of available physical threads against a set number of logically parallel threads. When a count of available physical threads equals or exceeds the number of logically parallel threads, the solution includes concurrently processing the batches at the physical threads. Further, when there are fewer available physical threads than the number of logically parallel threads, the solution includes multiplexing the batches sequentially over the available physical threads.
    Type: Grant
    Filed: December 31, 2015
    Date of Patent: February 5, 2019
    Assignee: salesforce.com, inc.
    Inventors: Elden Gregory Bishop, Jeffrey Chao
  • Patent number: 10152396
    Abstract: A method, apparatus, and system for a time-based checkpoint target is provided for standby databases. Change records received from a primary database are applied for a standby database, creating dirty buffer queues. As the change records are applied, a mapping is maintained, which maps timestamps to logical times of change records that were most recently applied at the timestamp for the standby database. On a periodic dirty buffer queue processing interval, the mapping is used to determine a target logical time that is mapped to a target timestamp that is prior to a present timestamp by at least a checkpoint delay. The dirty buffer queues are then processed up to the target logical time, creating an incremental checkpoint. On a periodic header update interval, file headers reflecting a consistent logical time for the checkpoint are also updated. The intervals and the checkpoint delay are adjustable by user or application.
    Type: Grant
    Filed: May 5, 2014
    Date of Patent: December 11, 2018
    Assignee: Oracle International Corporation
    Inventors: Jonghyun Lee, Yunrui Li, Mahesh Baburao Girkar, Amrish Srivastava
  • Patent number: 10133582
    Abstract: A processor includes a first logic to execute an instruction stream out-of-order, the instruction stream divided into a plurality of strands, the instruction stream and each strand ordered by program order (PO). The processor also includes a second logic to determine an oldest undispatched instruction in the instruction stream and store an associated PO value of the oldest undispatched instruction as an executed instruction pointer. The instruction stream includes dispatched and undispatched instructions. The processor also includes a third logic to determine a most recently retired instruction in the instruction stream and store an associated PO value of the most recently retired instruction as a retirement pointer, a fourth logic to select a range of instructions between the retirement pointer and the executed instruction pointer, and a fifth logic to identify the range of instructions as eligible for retirement.
    Type: Grant
    Filed: December 23, 2013
    Date of Patent: November 20, 2018
    Assignee: Intel Corporation
    Inventors: Nikolay Kosarev, Sergey Y. Shishlov, Jayesh Iyer, Alexander V. Butuzov, Boris A. Babayan, Andrey Kluchnikov
  • Patent number: 10127074
    Abstract: Various embodiments include methods and apparatus structured to provide synchronization of a transaction identification between a host and a memory module using a parity check. A transaction identification can be generated at both the host and the memory module independently using incremental counters of these apparatus. Synchronization of the transaction identifications generated by the host and by a controller of the memory module can be implemented using a parity bit sequences pattern of a combination of the generated transaction identification plus the corresponding transaction command and data address. Use of transaction commands modified with respect to transaction identifications can be used in initialization of the synchronization, in message passing, and in error detection and response to errors. Additional apparatus, systems, and methods can be implemented in a variety of applications.
    Type: Grant
    Filed: January 27, 2017
    Date of Patent: November 13, 2018
    Assignee: Futurewei Technologies, Inc.
    Inventors: Xiaobing Lee, Feng Yang, Shaojie Chen
  • Patent number: 10108420
    Abstract: An apparatus including first and second reservation stations. The first reservation station dispatches a load micro instruction, and indicates on a hold bus if the load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory, where the specified load instruction requires more than a first number of clock cycles to retrieve the operand. The second reservation station is coupled to the hold bus, and dispatches one or more younger micro instructions therein that depend on the load micro instruction for execution after a number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus that the load micro instruction is the specified load micro instruction, the second reservation station is configured to stall dispatch of the one or more younger micro instructions until the load micro instruction has retrieved the operand.
    Type: Grant
    Filed: November 24, 2015
    Date of Patent: October 23, 2018
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD
    Inventors: Gerard M. Col, Colin Eddy, G. Glenn Henry
  • Patent number: 10108421
    Abstract: An apparatus including first and second reservation stations. The first reservation station dispatches a load micro instruction, and indicates on a hold bus if the load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory. The second reservation station is coupled to the hold bus, and dispatches one or more younger micro instructions therein that depend on the load micro instruction for execution after a number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus that the load micro instruction is the specified load micro instruction, the second reservation station is configured to stall dispatch of the one or more younger micro instructions until the load micro instruction has retrieved the operand.
    Type: Grant
    Filed: November 24, 2015
    Date of Patent: October 23, 2018
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD
    Inventors: Gerard M. Col, Colin Eddy, G. Glenn Henry
  • Patent number: 10108426
    Abstract: Embodiments include issuing dynamic issue masks for processor hang prevention. Aspects include storing an instruction in an issue queue for execution by an execution unit, the instruction including a default issue mask. Aspects further include determining whether the instruction in the issue queue is likely to be rescinded by the execution unit. Based on determining that the instruction is not likely to be rescinded by the execution unit, aspects include issuing the instruction to the execution unit with the default issue mask. Based on determining that the instruction is likely to be rescinded by the execution unit, aspects include issuing the instruction to the execution unit with a likely to be rescinded issue mask.
    Type: Grant
    Filed: September 1, 2015
    Date of Patent: October 23, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Gregory W. Alexander, Steven R. Carlough, Lee E. Eisen, David A. Schroter
  • Patent number: 10108425
    Abstract: A computing device reorders an iteratively executed sequence of instructions such that constituent instructions that require longer execution time than other constituent instructions are grouped together. The computing device inserts, within the iteratively executed sequence of instructions, one or more additional instructions to enable parallel execution of two or more instances of the sequence of instructions such that the constituent instructions that require longer execution time will be executed concurrently with the other constituent instructions.
    Type: Grant
    Filed: October 25, 2016
    Date of Patent: October 23, 2018
    Assignee: Superpowered Inc.
    Inventors: Gabor Szanto, Alexander Patrick Vlaskovits
  • Patent number: 10108428
    Abstract: An apparatus including first and second reservation stations. The first reservation station dispatches a load micro instruction, and indicates on a hold bus if the load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory, where the specified load instruction requires more than a first number of clock cycles to retrieve the operand. The second reservation station is coupled to the hold bus, and dispatches one or more younger micro instructions therein that depend on the load micro instruction for execution after a number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus that the load micro instruction is the specified load micro instruction, the second reservation station is configured to stall dispatch of the one or more younger micro instructions until the load micro instruction has retrieved the operand.
    Type: Grant
    Filed: December 14, 2014
    Date of Patent: October 23, 2018
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD
    Inventors: Gerard M. Col, Colin Eddy, G. Glenn Henry
  • Patent number: 10108429
    Abstract: An apparatus including first and second reservation stations. The first reservation station dispatches a load micro instruction, and indicates on a hold bus if the load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory. The second reservation station is coupled to the hold bus, and dispatches one or more younger micro instructions therein that depend on the load micro instruction for execution after a number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus that the load micro instruction is the specified load micro instruction, the second reservation station is configured to stall dispatch of the one or more younger micro instructions until the load micro instruction has retrieved the operand.
    Type: Grant
    Filed: December 14, 2014
    Date of Patent: October 23, 2018
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD
    Inventors: Gerard M. Col, Colin Eddy, G. Glenn Henry
  • Patent number: 10102142
    Abstract: A method for detecting an instruction ordering violation in a CPU. The method includes receiving a reordered stream of instructions and detecting whether an ordering violation has occurred by using virtual addresses. The method further includes transferring results of the reordered stream of instructions from a load store buffer into a cache and detecting whether an ordering violation has occurred by using physical addresses. Subsequently, a recovery is initiated upon detection of an ordering violation.
    Type: Grant
    Filed: December 26, 2012
    Date of Patent: October 16, 2018
    Assignee: Nvidia Corporation
    Inventors: Guillermo J. Rozas, Bharath Krishnan, James Van Zoeren
  • Patent number: 10102002
    Abstract: Embodiments include issuing dynamic issue masks for processor hang prevention. Aspects include storing an instruction in an issue queue for execution by an execution unit, the instruction including a default issue mask. Aspects further include determining whether the instruction in the issue queue is likely to be rescinded by the execution unit. Based on determining that the instruction is not likely to be rescinded by the execution unit, aspects include issuing the instruction to the execution unit with the default issue mask. Based on determining that the instruction is likely to be rescinded by the execution unit, aspects include issuing the instruction to the execution unit with a likely to be rescinded issue mask.
    Type: Grant
    Filed: September 30, 2014
    Date of Patent: October 16, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Gregory W. Alexander, Steven R. Carlough, Lee E. Eisen, David A. Schroter
  • Patent number: 10102158
    Abstract: Methods and apparatus relating to the transfer of data for processing and/or the transfer of the resulting processed data are described. Some features relate to a processing system which performs data transfers under control of a Dynamic Sequence Controller (DSC). In various embodiments a sequence of operational codes is used to control data transfer with the status of data source and destination locations taken into consideration. Modification of the op code sequence used to control the dynamic sequence controller and thus the transfer of data can be performed asynchronously to control of processing units which can be controlled via a command and control bus used to control the function of operators which process the data provided via the data bus.
    Type: Grant
    Filed: December 31, 2014
    Date of Patent: October 16, 2018
    Assignee: Accusoft Corporation
    Inventor: Robert M Nally
  • Patent number: 10095647
    Abstract: An accelerated processor structure on a programmable integrated circuit device includes a processor and a plurality of configurable digital signal processors (DSPs). Each configurable DSP includes a circuit block, which in turn includes a plurality of multipliers. The accelerated processor structure further includes a first bus to transfer data from the processor to the configurable DSPs, and a second bus to transfer data from the configurable DSPs to the processor.
    Type: Grant
    Filed: May 29, 2015
    Date of Patent: October 9, 2018
    Assignee: Altera Corporation
    Inventors: David Shippy, Martin Langhammer, Jeffrey Eastlack
  • Patent number: 10083035
    Abstract: A streaming engine employed in a digital data processor specifies fixed first and second read only data streams. Corresponding stream address generator produces address of data elements of the two streams. Corresponding steam head registers stores data elements next to be supplied to functional units for use as operands. The two streams share two memory ports. A toggling preference of stream to port ensures fair allocation. The arbiters permit one stream to borrow the other's interface when the other interface is idle. Thus one stream may issue two memory requests, one from each memory port, if the other stream is idle. This spreads the bandwidth demand for each stream across both interfaces, ensuring neither interface becomes a bottleneck.
    Type: Grant
    Filed: December 20, 2016
    Date of Patent: September 25, 2018
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Joseph Zbiciak, Timothy Anderson
  • Patent number: 10025554
    Abstract: An apparatus and a corresponding method for processing a sequence of received data items are disclosed. The processing is performed by multiple processing elements. A reorder buffer comprising multiple slots is used to maintain the order of the received data items, wherein a processing element reserves a next available slot in the reorder buffer before beginning processing the next data item of the sequence of received data items. On completion of the processing a buffer change indicator value is read by the processing element when seeking to insert the processed data item into the reserved slot. If the buffer change indicator changes during the course of the insertion process, this serves as an indication to the processing element that another processing element is modifying the content of the reorder buffer in parallel. A check may be repeated for at least one subsequent already-processed data item, since this latter data item may have become ready to be retired from the reorder buffer.
    Type: Grant
    Filed: September 19, 2016
    Date of Patent: July 17, 2018
    Assignee: ARM Limited
    Inventor: Eric Ola Harald Liljedahl
  • Patent number: 9952901
    Abstract: Described herein are technologies related to enforcing thread dependency using a hybrid scoreboard. An encoded video information that includes a plurality of threads is received, a first set and a second set of threads from the plurality of thread is determined, the first and second sets of threads are assigned to a hardware and a software, respectively, and dependency threads in the first and second sets of threads is enforced.
    Type: Grant
    Filed: December 9, 2014
    Date of Patent: April 24, 2018
    Assignee: Intel Corporation
    Inventors: Haihua Wu, Julia A. Gould, Li-An Tang
  • Patent number: 9934039
    Abstract: Methods of predicting stack pointer values of variables stored in a stack are described. When an instruction is seen which stores a variable in the stack in a position offset from the stack pointer, an entry is added to a data structure which identifies the physical register which currently stores the stack pointer, the physical register which stores the value of the variable and the offset value. Subsequently when an instruction to load a variable from the stack from a position which is identified by reference to the stack pointer is seen, the data structure is searched to see if there is a corresponding entry which includes the same offset and the same physical register storing the stack pointer as the load instruction. If a corresponding entry is found the architectural register in the load instruction is mapped to the physical register storing the value of the variable from the entry.
    Type: Grant
    Filed: January 16, 2015
    Date of Patent: April 3, 2018
    Assignee: MIPS Tech Limited
    Inventor: Hugh Jackson
  • Patent number: 9928067
    Abstract: Systems and methods are provided in example embodiments for performing binary translation. A binary translation system converts, by a translator module, source instructions to target instructions. The binary translation system identifies a condition code block in the source instructions, where the condition code block includes a plurality of condition bits. In response to identifying the condition code block, the binary translation system provides an optimizer module to convert the condition code block. Then, the binary translation system performs a pre-execution on the condition code block to resolve the plurality of condition bits in the condition code block.
    Type: Grant
    Filed: September 21, 2012
    Date of Patent: March 27, 2018
    Assignee: Intel Corporation
    Inventors: Xueliang Zhong, Jianhui Li, Jian Ping Jane Chen, Gang Wang, Yi Qian, Huifeng Gu
  • Patent number: 9875105
    Abstract: Embodiments related to re-dispatching an instruction selected for re-execution from a buffer upon a microprocessor re-entering a particular execution location after runahead are provided. In one example, a microprocessor is provided. The example microprocessor includes fetch logic, one or more execution mechanisms for executing a retrieved instruction provided by the fetch logic, and scheduler logic for scheduling the retrieved instruction for execution. The example scheduler logic includes a buffer for storing the retrieved instruction and one or more additional instructions, the scheduler logic being configured, upon the microprocessor re-entering at a particular execution location after runahead, to re-dispatch, from the buffer, an instruction that has been previously dispatched to one of the execution mechanisms.
    Type: Grant
    Filed: May 3, 2012
    Date of Patent: January 23, 2018
    Assignee: NVIDIA CORPORATION
    Inventors: Guillermo J. Rozas, Paul Serris, Brad Hoyt, Sridharan Ramakrishnan, Hens Vanderschoot, Ross Segelken, Darrell Boggs, Magnus Ekman
  • Patent number: 9851975
    Abstract: A processor and instruction graduation unit for a processor. In one embodiment, a processor or instruction graduation unit according to the present invention includes a linked-list-based multi-threaded graduation buffer and a graduation controller. The graduation buffer stores identification values generated by an instruction decode and dispatch unit of the processor as part of one or more linked-list data structures. Each linked-list data structure formed is associated with a particular program thread running on the processor. The number of linked-list data structures formed is variable and related to the number of program threads running on the processor. The graduation controller includes linked-list head identification registers and linked-list tail identification registers that facilitate reading and writing identifications values to linked-list data structures associated with particular program threads.
    Type: Grant
    Filed: September 23, 2014
    Date of Patent: December 26, 2017
    Assignee: ARM Finance Overseas Limited
    Inventor: Kjeld Svendsen
  • Patent number: 9830197
    Abstract: One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.
    Type: Grant
    Filed: August 16, 2016
    Date of Patent: November 28, 2017
    Assignee: NVIDIA Corporation
    Inventors: Brian Fahs, Ming Y Siu, Brett W. Coon, John R. Nickolls, Lars Nyland
  • Patent number: 9823929
    Abstract: A processor includes a queue for storing instructions processed within the context of a current value of a register field, where for some embodiments the instruction is undefined or defined, depending upon the register field at time of processing. After a write instruction (an instruction that writes to the register field) executes, the queue is searched for any entries that contain instructions that depend upon the executed write instruction. Each such entry stores the value of the register field at the time the instruction in the entry was processed. If such an entry is found in the queue and its stored value of the register field does not match the value that the write instruction wrote to the register field, then the processor flushes the pipeline and restarts at a state so as to correctly execute the instruction.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: November 21, 2017
    Assignee: QUALCOMM Incorporated
    Inventors: Daren Eugene Streett, Brian Michael Stempel, Thomas Philip Speier, Rodney Wayne Smith, Michael Scott McIlvaine, Kenneth Alan Dockser, James Norris Dieffenderfer
  • Patent number: 9754104
    Abstract: The invention relates to a virtual machine. The virtual machine is set to recognize, in addition to a set of conventional bytecodes, at least one secure bytecode functionally equivalent to one of the conventional bytecodes. It is set to process secure bytecodes with increased security, while it is set to process conventional bytecodes with increased speed. The invention also relates to a computing device comprising such a virtual machine, to a procedure for generating bytecode executable by such a virtual machine, and to an applet development tool comprising such procedure.
    Type: Grant
    Filed: December 9, 2009
    Date of Patent: September 5, 2017
    Assignee: GEMALTO SA
    Inventors: Olivier Joffray, Milan Krizenecky
  • Patent number: 9755620
    Abstract: A device for detecting and correcting timing error and a method for designing typical-case timing using the same is disclosed. The device includes two datapath units connected with first and second multiplexers and two transition detectors. Each datapath unit receives and calculates an input signal to generate a speculation value and a correct value. Then, the speculation value and the correct value are transmitted to the first and second multiplexers and the transition detectors determine whether transition of the outputted speculation value is unstable. If yes, the datapath unit outputting the speculation value is stalled for a period of time for correction, whereby the second multiplexer outputs the correct value. If no, the datapath unit outputs the speculation value, then the present invention uses the undertaken timing as a setting specification to complete a circuit design. The present invention can improve system efficiency and power of the whole circuit.
    Type: Grant
    Filed: July 18, 2016
    Date of Patent: September 5, 2017
    Assignee: NATIONAL CHUNG CHENG UNIVERSITY
    Inventors: Tay-Jyi Lin, Jinn-Shyan Wang, Hong-Chih Lin, Ting-Yu Shyu
  • Patent number: 9747217
    Abstract: An approach is provided in which a computing system captures content included in a history buffer entry that corresponds to a flush ITAG. The computing system, in turn, uses an execution unit to transmit the content over a results bus to multiple registers and restore at least one of the registers accordingly.
    Type: Grant
    Filed: June 1, 2015
    Date of Patent: August 29, 2017
    Assignee: International Business Machines Corporation
    Inventors: Salma Ayub, Sundeep Chadha, Michael J. Genden, Cliff Kucharski, Dung Q. Nguyen, David R. Terry
  • Patent number: 9740620
    Abstract: An approach is provided in which a computing system captures content included in a history buffer entry that corresponds to a flush ITAG. The computing system, in turn, uses an execution unit to transmit the content over a results bus to multiple registers and restore at least one of the registers accordingly.
    Type: Grant
    Filed: May 7, 2015
    Date of Patent: August 22, 2017
    Assignee: International Business Machines Corporation
    Inventors: Salma Ayub, Sundeep Chadha, Michael J. Genden, Cliff Kucharski, Dung Q. Nguyen, David R. Terry
  • Patent number: 9715389
    Abstract: A method includes suppressing execution of at least one dependent instruction of a load instruction by a processor using stored dependency information responsive to an invalid status of the load instruction. A processor includes an execution unit to execute instructions and a scheduler. The scheduler is to select for execution in the execution unit a load instruction having at least one dependent instruction and suppress execution of the at least one dependent instruction using stored dependency information responsive to an invalid status of the load instruction.
    Type: Grant
    Filed: June 25, 2013
    Date of Patent: July 25, 2017
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Francesco Spadini, Michael Achenbach
  • Patent number: 9703567
    Abstract: In an embodiment, the present invention includes a processor having an execution logic to execute instructions and a control transfer termination (CTT) logic coupled to the execution logic. This logic is to cause a CTT fault to be raised if a target instruction of a control transfer instruction is not a CTT instruction. Other embodiments are described and claimed.
    Type: Grant
    Filed: November 30, 2012
    Date of Patent: July 11, 2017
    Assignee: Intel Corporation
    Inventors: Vedvyas Shanbhogue, Jason W. Brandt, Uday R. Savagaonkar, Ravi L. Sahita
  • Patent number: 9703565
    Abstract: Embodiments provide methods, apparatus, systems, and computer readable media associated with predicting predicates and branch targets during execution of programs using combined branch target and predicate predictions. The predictions may be made using one or more prediction control flow graphs which represent predicates in instruction blocks and branches between blocks in a program. The prediction control flow graphs may be structured as trees such that each node in the graphs is associated with a predicate instruction, and each leaf associated with a branch target which jumps to another block. During execution of a block, a prediction generator may take a control point history and generate a prediction. Following the path suggested by the prediction through the tree, both predicate values and branch targets may be predicted. Other embodiments may be described and claimed.
    Type: Grant
    Filed: March 25, 2015
    Date of Patent: July 11, 2017
    Assignee: The Board of Regents of the University of Texas System
    Inventors: Douglas C. Burger, Stephen W. Keckler
  • Patent number: 9684511
    Abstract: In an embodiment, the present invention includes a processor having a decode unit, an execution unit, and a retirement unit. The decode unit is to decode control transfer instructions and the execution unit is to execute control transfer instructions. The retirement unit is to retire a first control transfer instruction, and to raise a fault if a next instruction to be retired after the first control transfer instruction is not a second control transfer instruction and a target instruction of the first control transfer instruction is in code using the control transfer instructions.
    Type: Grant
    Filed: September 27, 2013
    Date of Patent: June 20, 2017
    Assignee: Intel Corporation
    Inventors: Vedvyas Shanbhogue, Jason Brandt, Uday Savagaonkar, Ravi Sahita
  • Patent number: 9672044
    Abstract: A processor may efficiently implement register renaming and checkpoint repair even in instruction set architectures with large numbers of wide (bit-width) registers by (i) renaming all destination operand register targets, (ii) implementing free list and architectural-to-physical mapping table as a combined array storage with unitary (or common) read, write and checkpoint pointer indexing and (iiii) storing checkpoints as snapshots of the mapping table, rather than of actual register contents. In this way, uniformity (and timing simplicity) of the decode pipeline may be accentuated and architectural-to-physical mappings (or allocable mappings) may be efficiently shuttled between free-list, reorder buffer and mapping table stores in correspondence with instruction dispatch and completion as well as checkpoint creation, retirement and restoration.
    Type: Grant
    Filed: August 1, 2012
    Date of Patent: June 6, 2017
    Assignee: NXP USA, INC.
    Inventor: Thang M. Tran
  • Patent number: 9652198
    Abstract: Systems and methods are disclosed for managing data entry buffers in a data storage device. A memory of the data storage device includes one or more data input ports. The device further includes a controller configured to receive a data entry over one of the data input ports and store the data entry in a first data structure (e.g., a FIFO data structure). The data entry is stored in the first data structure among other data entries received over various data input ports. The controller stores a data entry corresponding to the data entry stored in the first data structure in a second data structure. Entries in the second data structure include a valid bit field and one or more condition fields. The controller indicates, using a valid bit field of the second data structure data entry, that the corresponding data entry stored in the first data structure is valid.
    Type: Grant
    Filed: July 29, 2015
    Date of Patent: May 16, 2017
    Assignee: Western Digital Technologies, Inc.
    Inventor: Jianxun Gao
  • Patent number: 9652244
    Abstract: A processing bypass directory system and method are disclosed. In one embodiment, a bypass directory tracking process includes setting bits in a bypass directory when a corresponding architectural register is written. The bits are selectively cleared in the bypass directory each cycle. The configuration of the bits is utilized to determine which stage of a bypass path processing information is at.
    Type: Grant
    Filed: June 25, 2012
    Date of Patent: May 16, 2017
    Assignee: Intellectual Ventures Holding 81 LLC
    Inventors: Alexander Klaiber, Guillermo Rozas
  • Patent number: 9645827
    Abstract: An apparatus including first and second reservation stations. The first reservation station dispatches a load micro instruction, and indicates on a hold bus if the load micro instruction is a specified load micro instruction directed to retrieve an operand from a prescribed resource other than on-core cache memory. The second reservation station is coupled to the hold bus, and dispatches one or more younger micro instructions therein that depend on the load micro instruction for execution after a number of clock cycles following dispatch of the first load micro instruction, and if it is indicated on the hold bus that the load micro instruction is the specified load micro instruction, the second reservation station is configured to stall dispatch of the one or more younger micro instructions until the load micro instruction has retrieved the operand.
    Type: Grant
    Filed: December 14, 2014
    Date of Patent: May 9, 2017
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.
    Inventors: Gerard M. Col, Colin Eddy, G. Glenn Henry
  • Patent number: 9639479
    Abstract: A method for managing a parallel cache hierarchy in a processing unit. The method includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier.
    Type: Grant
    Filed: September 22, 2010
    Date of Patent: May 2, 2017
    Assignee: NVIDIA Corporation
    Inventors: John R. Nickolls, Brett W. Coon, Michael C. Shebanow
  • Patent number: 9582280
    Abstract: The description covers a system and method for operating a micro-processing system having a runahead mode of operation. In one implementation, the method includes providing, for a first portion of code, a runahead correlate. When the first portion of code is encountered by the micro-processing system, a determination is made as to whether the system is operating in the runahead mode. If so, the system branches to the runahead correlate, which is specifically configured to identify and resolve latency events likely to occur when the first portion of code is encountered outside of runahead. Branching out of the first portion of code may also be performed based on a determination that a register is poisoned.
    Type: Grant
    Filed: July 18, 2013
    Date of Patent: February 28, 2017
    Assignee: NVIDIA CORPORATION
    Inventors: Rohit Kumar, Guillermo Rozas, Magnus Ekman, Lawrence Spracklen
  • Patent number: 9569214
    Abstract: In one embodiment, in an execution pipeline having a plurality of execution subunits, a method of using a bypass network to directly forward data from a producing execution subunit to a consuming execution subunit is provided. The method includes producing output data with the producing execution subunit, consuming input data with the consuming execution subunit, for one or more intervening operations whose input is the output data from the producing execution subunit and whose output is the input data to the consuming execution subunit, evaluating those one or more intervening operations to determine whether their execution would compose an identify function, and if the one or more intervening operations would compose such an identity function, controlling the bypass network to forward the producing execution subunit's output data directly to the consuming execution subunit.
    Type: Grant
    Filed: December 27, 2012
    Date of Patent: February 14, 2017
    Assignee: NVIDIA CORPORATION
    Inventors: Gokul Govindu, Parag Gupta, Scott Pitkethly, Guillermo J. Rozas
  • Patent number: 9483273
    Abstract: A method includes suppressing execution of an operation portion of a load-operation instruction in a processor responsive to an invalid status of a load portion of load-operation instruction. A processor includes an instruction pipeline including an execution unit operable to execute instructions and a scheduler unit. The scheduler unit includes a scheduler queue and is operable to store a load-operation in the scheduler queue. The load-operation instruction includes a load portion and an operation portion. The scheduler unit schedules the load portion for execution in the execution unit, marks the operation portion in the scheduler queue as eligible for execution responsive to scheduling the load portion, receives an indication of an invalid status of the load portion, and suppresses execution of the operation portion responsive to the indication of the invalid status.
    Type: Grant
    Filed: July 16, 2013
    Date of Patent: November 1, 2016
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Francesco Spadini, Michael Achenbach, Emil Talpes, Ganesh Venkataramanan
  • Patent number: 9454370
    Abstract: A Conditional Transaction End (CTEND) instruction is provided that allows a program executing in a nonconstrained transactional execution mode to inspect a storage location that is modified by either another central processing unit or the Input/Output subsystem. Based on the inspected data, transactional execution may be ended or aborted, or the decision to end/abort may be delayed, e.g., until a predefined event occurs. For instance, when the instruction executes, the processor is in a nonconstrained transaction execution mode, and the transaction nesting depth is one at the beginning of the instruction, a second operand of the instruction is inspected, and based on the inspected data, transaction execution may be ended or aborted, or the decision to end/abort may be delayed, e.g., until a predefined event occurs, such as the value of the second operand becomes a prespecified value or a time interval is exceeded.
    Type: Grant
    Filed: March 14, 2014
    Date of Patent: September 27, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Dan F. Greiner, Christian Jacobi, Marcel Mitran, Donald W. Schmidt, Timothy J. Slegel
  • Patent number: 9436470
    Abstract: A technique for restoring a register renaming map is described. In one example, a restore table having a number of storage locations saves a copy of the register renaming map whenever a flow-risk instruction is passed to a re-order buffer. When all storage locations are full, further instructions still pass to the re-order buffer, but a copy of the map is not saved. A storage location subsequently becomes available when its associated flow-risk instruction is executed. A register renaming map state for an unrecorded flow-risk instruction passed to the re-order buffer whilst the storage locations were full is generated and stored in the available location. This is generated using the restore table entry for a previous flow-risk instruction and re-order buffer values for intervening instructions between the previous and unrecorded flow-risk instructions. The restore table can be used to restore the map if an unexpected change in instruction flow occurs.
    Type: Grant
    Filed: August 3, 2015
    Date of Patent: September 6, 2016
    Assignee: Imagination Technologies Limited
    Inventor: Hugh Jackson
  • Patent number: 9436476
    Abstract: A method for sorting elements in hardware structures is disclosed. The method comprises selecting a plurality of elements to order from an unordered input queue (UIQ) within a predetermined range in response to finding a match between at least one most significant bit of the predetermined range and corresponding bits of a respective identifier associated with each of the plurality of elements. The method further comprises presenting each of the plurality of elements to a respective multiplexer. Further the method comprises generating a select signal for an enabled multiplexer in response to finding a match between at least one least significant bit of a respective identifier associated with each of the plurality of elements and a port number of the ordered queue. Finally, the method comprises forwarding a packet associated with a selected element identifier to a matching port number of the ordered queue from the enabled multiplexer.
    Type: Grant
    Filed: October 11, 2013
    Date of Patent: September 6, 2016
    Assignee: SOFT MACHINES INC.
    Inventors: Mohammad A. Abdallah, Mandeep Singh
  • Patent number: 9430275
    Abstract: Transactional memory implementations may be extended to support transaction communicators and/or transaction condition variables for which transaction isolation is relaxed, and through which concurrent transactions can communicate and be synchronized with each other. Transactional accesses to these objects may not be isolated unless called within communicator-isolating transactions. A waiter transaction may invoke a wait method of a transaction condition variable, be added to a wait list for the variable, and be suspended pending notification of a notification event from a notify method of the variable. A notifier transaction may invoke a notify method of the variable, which may remove the waiter from the wait list, schedule the waiter transaction for resumed execution, and notify the waiter of the notification event. A waiter transaction may commit only if the corresponding notifier transaction commits. If the waiter transaction aborts, the notification may be forwarded to another waiter.
    Type: Grant
    Filed: June 27, 2011
    Date of Patent: August 30, 2016
    Assignee: Oracle International Corporation
    Inventors: Virendra J. Marathe, Victor M. Luchangco
  • Patent number: 9424035
    Abstract: A Conditional Transaction End (CTEND) instruction is provided that allows a program executing in a nonconstrained transactional execution mode to inspect a storage location that is modified by either another central processing unit or the Input/Output subsystem. Based on the inspected data, transactional execution may be ended or aborted, or the decision to end/abort may be delayed, e.g., until a predefined event occurs. For instance, when the instruction executes, the processor is in a nonconstrained transaction execution mode, and the transaction nesting depth is one at the beginning of the instruction, a second operand of the instruction is inspected, and based on the inspected data, transaction execution may be ended or aborted, or the decision to end/abort may be delayed, e.g., until a predefined event occurs, such as the value of the second operand becomes a prespecified value or a time interval is exceeded.
    Type: Grant
    Filed: November 26, 2014
    Date of Patent: August 23, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Dan F. Greiner, Christian Jacobi, Marcel Mitran, Donald W. Schmidt, Timothy J. Slegel
  • Patent number: 9418043
    Abstract: A method is disclosed of utilizing a plurality of Arithmetic Logic Units (ALUs) of an array processor. It is determined that a first quantity of the ALUs are scheduled to execute a function during a given processing cycle, with each ALU being scheduled to use a respective one of a plurality of selected input vectors as an input. It is also determined that a second quantity of the ALUs are not scheduled for use during the given processing cycle. A plurality of predicted future input vectors that differ from the plurality of selected input vectors are determined. The second quantity of ALUs are scheduled to execute the function during the given processing cycle using respective ones of the plurality of predicted future input vectors as inputs. After completion of the processing cycle, function outputs received from the first and second quantity of ALUs are cached.
    Type: Grant
    Filed: March 7, 2014
    Date of Patent: August 16, 2016
    Assignee: Sony Corporation
    Inventors: Jim Rasmusson, HÃ¥kan Jonsson, Jonas Gustavsson, Anders Isberg