Patents Examined by Andrew J Cromer
-
Patent number: 10552165Abstract: Within a processor, speculative finishes of load instructions only are tracked in a speculative finish table by maintaining an oldest load instruction of a thread in the speculative finish table after data is loaded for the oldest load instruction, wherein a particular queue index tag assigned to the oldest load instruction by an execution unit points to a particular entry in the speculative finish table, wherein the oldest load instruction is waiting to be finished dependent upon an error check code result. Responsive to a flow unit receiving the particular queue index tag with an indicator that the error check code result for data retrieved for the oldest load instruction is good, finishing the oldest load instruction in the particular entry pointed to by the queue index tag and writing an instruction tag stored in the entry for the oldest load instruction out of the speculative finish table for completion.Type: GrantFiled: October 19, 2015Date of Patent: February 4, 2020Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Susan E. Eisen, David A. Hrusecky, Christopher M. Mueller, Dung Q. Nguyen, A. James Van Norstrand, Jr., Kenneth L. Ward
-
Patent number: 10545763Abstract: Detecting data dependencies of instructions associated with threads in a simultaneous multithreading (SMT) scheme is disclosed, including: dividing a plurality of comparators of an SMT-enabled device into groups of comparators corresponding to respective ones of threads associated with the SMT-enabled device; simultaneously distributing a first set of instructions associated with a first thread of the plurality of threads to a corresponding first group of comparators from the plurality of groups of comparators and distributing a second set of instructions associated with a second thread of the plurality of threads to a corresponding second group of comparators from the plurality of groups of comparators; and simultaneously performing data dependency detection on the first set of instructions associated with the first thread using the corresponding first group of comparators and performing data dependency detection on the second set of instructions associated with the second thread using the corresponding secoType: GrantFiled: May 6, 2015Date of Patent: January 28, 2020Assignee: Alibaba Group Holding LimitedInventors: Ling Ma, Sihai Yao, Lei Zhang
-
Patent number: 10534606Abstract: Approaches are described to improve database performance by implementing a RLE decompression function at a low level within a general-purpose processor or an external block. Specifically, embodiments of a hardware implementation of an instruction for RLE decompression are disclosed. The described approaches improve performance by supporting the RLE decompression function within a processor and/or external block. Specifically, a RLE decompression hardware implementation is disclosed that produces a 64-bit RLE decompression result, with an example embodiment performing the task in two pipelined execution stages with a throughput of one per cycle. According to embodiments, hardware organization of narrow-width shifters operating in parallel, controlled by computed shift counts, is used to perform the decompression.Type: GrantFiled: September 28, 2015Date of Patent: January 14, 2020Assignee: ORACLE INTERNATIONAL CORPORATIONInventors: Jeffrey S. Brooks, Robert Golla, Albert Danysh, Shasank Chavan, Prateek Agrawal, Andrew Ewoldt, David Weaver
-
Patent number: 10514911Abstract: Examples of techniques for designing processors are described herein. In one example, a design structure can be tangibly embodied in a machine readable medium for designing, manufacturing, or testing an integrated circuit. The design structure can include a logic to determine whether a received instruction is an updating fixed point instruction or a non-updating fixed point instruction. The design structure can include a first arithmetic logic unit (ALU) to execute the received instruction if the received instruction is determined to be an updating fixed point instruction and store an update value in a general register. The design structure can include a second arithmetic logic unit (ALU) to execute the received instruction if the received instruction is determined to be a non-updating fixed point instruction.Type: GrantFiled: November 26, 2014Date of Patent: December 24, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Avraham Ayzenfeld, Lee E. Eisen, Brian W. Curran, Christian Jacobi
-
Patent number: 10514925Abstract: Systems, apparatuses, and methods for managing dependencies between instruction operations when speculatively issuing load instruction operations. A processor may maintain dependency vectors for sources of instruction operations dispatched to the scheduler. The dependency vector may include a column for each cycle of the load recovery window and a row for each load execution pipeline. When a load speculatively issues, any instruction operation which is dependent on the load may have a bit set in the earliest bit position of its dependency vector to indicate the dependency. The bit may shift in the dependency vector toward the cancel bit position during each clock cycle as the load executes. If the load does not produce its data at the expected latency, an instruction operation may be canceled if there is a bit in the cancel bit position of the dependency vector row corresponding to the execution pipeline of the load.Type: GrantFiled: January 28, 2016Date of Patent: December 24, 2019Assignee: Apple Inc.Inventor: Sean M. Reynolds
-
Patent number: 10503503Abstract: A method in a computer-aided design system for generating a functional design model of a processor, is described herein. The method comprises generating a functional representation of logic to determine whether an instruction is an updating instruction or a non-updating instruction. The method further comprises generating a functional representation of a first arithmetic logic unit (ALU) coupled to a general register in the processor, the first ALU to execute the instruction if the instruction is an updating instruction and store an update value in the general register, and generating a functional representation of a second ALU in the processor to execute the instruction if the instruction is a non-updating instruction.Type: GrantFiled: September 25, 2015Date of Patent: December 10, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Avraham Ayzenfeld, Lee E. Eisen, Brian W. Curran, Christian Jacobi
-
Patent number: 10503506Abstract: A mechanism is provided for improving performance when executing unaligned load instructions which load an unaligned block of data from a data store. In a first unaligned load handling mode, a final load operation of a series of load operations performed for the instruction loads a full data word extending beyond the end of the unaligned block of data to be loaded by that instruction. If an initial portion of the unaligned block of data to be loaded by a subsequent unaligned load instruction corresponds to the excess part in the stream buffer for the earlier instruction, then an initial load operation for the subsequent instruction can be suppressed. A mechanism is also described for allowing series of dependent data access operations triggered by a given instruction to be halted partway through when a stall condition arises, and resumed partway through later, by defining overlapping sequences of transactions.Type: GrantFiled: October 19, 2015Date of Patent: December 10, 2019Assignee: ARM LimitedInventor: Max John Batley
-
Patent number: 10459723Abstract: Systems and methods relate to performing data movement operations using single instruction multiple data (SIMD) instructions. A first SIMD instruction comprises a first input data vector having a number N of two or more data elements in corresponding N SIMD lanes and a control vector having N control elements in the corresponding N SIMD lanes. A first multi-stage cube network is controllable by the first SIMD instruction, and includes movement elements, with one movement element per SIMD lane, per stage. A movement element selects between one of two data elements based on a corresponding control element and moves the data elements across the stages of the first multi-stage cube network by a zero distance or power-of-two distance between adjacent stages to generate a first output data vector. A second multi-stage cube network can be used in conjunction to generate all possible data movement operations of the input data vector.Type: GrantFiled: July 20, 2015Date of Patent: October 29, 2019Assignee: QUALCOMM IncorporatedInventor: Eric Wayne Mahurin
-
Patent number: 10445094Abstract: A data processing apparatus includes a multi-level memory system, one or more first processing unit coupled to the memory system at a first level and one or more second processing units each coupled to the memory system at a second level. A first reorder buffer maintains data order during execution of instructions by the first and second processing units and a second reorder buffer maintains data order during execution of the instructions by an associated second processing unit. An entry in the first reorder buffer is configured, dependent upon an indicator bit, as an entry for a single instruction or a pointer to an entry in the second reorder buffer. An entry in the second reorder buffer includes instruction block start and end addresses and indicators of input and output register. Instructions are released to a processing unit when all inputs, as indicated by the reorder buffers, are available.Type: GrantFiled: May 27, 2016Date of Patent: October 15, 2019Assignee: Arm LimitedInventors: Jonathan Curtis Beard, Wendy Elsasser, Shibo Wang
-
Patent number: 10430191Abstract: Methods, apparatus, systems, and articles of manufacture to compile instructions for a vector of instruction pointers (VIP) processor architecture are disclosed. An example method includes identifying a strand including a fork instruction introducing a first speculative assumption. A basing instruction to initialize a basing value of the strand before execution of a first instruction under the first speculative assumption. A determination of whether a second instruction under a second speculative assumption modifies a first memory address that is also modified by the first instruction under the first speculative assumption is made. The second instruction is not modified when the second instruction does not modify the first memory address. The second instruction is modified based on the basing value when the second instruction modifies the first memory address, the basing value to cause the second instruction to modify a second memory address different from the first memory address.Type: GrantFiled: July 20, 2015Date of Patent: October 1, 2019Assignee: Intel CorporationInventors: Yevgeniy M. Astigeyevich, Dmitry M. Maslennikov, Sergey P. Scherbinin, Marat Zakirov, Pavel G. Matveyev, Andrey Rodchenko, Andrey Chudnovets, Boris V. Shurygin
-
Patent number: 10423423Abstract: Within a processor, speculative finishes of load instructions only are tracked in a speculative finish table by maintaining an oldest load instruction of a thread in the speculative finish table after data is loaded for the oldest load instruction, wherein a particular queue index tag assigned to the oldest load instruction by an execution unit points to a particular entry in the speculative finish table, wherein the oldest load instruction is waiting to be finished dependent upon an error check code result. Responsive to a flow unit receiving the particular queue index tag with an indicator that the error check code result for data retrieved for the oldest load instruction is good, finishing the oldest load instruction in the particular entry pointed to by the queue index tag and writing an instruction tag stored in the entry for the oldest load instruction out of the speculative finish table for completion.Type: GrantFiled: September 29, 2015Date of Patent: September 24, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Susan E. Eisen, David A. Hrusecky, Christopher M. Mueller, Dung Q. Nguyen, A. James Van Norstrand, Jr., Kenneth L. Ward
-
Patent number: 10402199Abstract: One embodiment of this invention provides two conditional execution auxiliary instructions directed to disparate subsets of the plural functional units. Depending on the conditional execution desired, only one of the two conditional execution auxiliary instructions may be required for a particular execute packet. Another embodiment of this invention employs only one of two possible register files for the condition registers. In a VLIW processor it may be advantageous to split the functional units into separate sets with corresponding register files. This limits the number of functional units that may simultaneously access the register files. In the preferred embodiment of this invention the functional units are divided into a scalar set which access scalar registers and a vector set which access vector registers. The data registers storing the conditions for both scalar and vector instructions are in the scalar data register file.Type: GrantFiled: October 22, 2015Date of Patent: September 3, 2019Assignee: TEXAS INSTRUMENTS INCORPORATEDInventors: Timothy David Anderson, Duc Quang Bui, Joseph Raymond Michael Zbiciak
-
Patent number: 10394568Abstract: Managing exception handling. A plurality of instruction units of an instruction stream are selected to be decoded in parallel by a plurality of instruction decode units of a processor. The plurality of instruction units includes a prefix instruction and a prefixed instruction. The prefixed instruction is an instruction to be modified by the prefix instruction. An exception condition associated with the prefixed instruction is determined. Exception handling is performed for the prefixed instruction, in which the performing includes determining an address at which to restart execution of the instruction stream. The determining the address includes adjusting the address at which to restart execution based on the prefix instruction to be separately decoded by an instruction decode unit.Type: GrantFiled: September 30, 2015Date of Patent: August 27, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Michael K. Gschwind
-
Patent number: 10394569Abstract: Managing exception handling. A plurality of instruction units of an instruction stream are selected to be decoded in parallel by a plurality of instruction decode units of a processor. The plurality of instruction units includes a prefix instruction and a prefixed instruction. The prefixed instruction is an instruction to be modified by the prefix instruction. An exception condition associated with the prefixed instruction is determined. Exception handling is performed for the prefixed instruction, in which the performing includes determining an address at which to restart execution of the instruction stream. The determining the address includes adjusting the address at which to restart execution based on the prefix instruction to be separately decoded by an instruction decode unit.Type: GrantFiled: November 14, 2015Date of Patent: August 27, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Michael K. Gschwind
-
Patent number: 10387159Abstract: Methods and apparatuses relate to emulating architectural performance monitoring in a binary translation system. In one embodiment, a processor includes an architectural performance counter to maintain an architectural value associated with instruction execution, a register to store the architectural value of the architectural performance counter, binary translation logic to embed an architectural value from the architectural performance counter into a stream of translated instructions having a transactional code region and to store the architectural value into the register, and an execution unit to execute the transactional code region of the stream of translated instructions. The binary translation logic is configured to add the architectural value from the register to the architectural performance counter upon completion of the transactional code region of the stream of translated instructions.Type: GrantFiled: February 4, 2015Date of Patent: August 20, 2019Assignee: Intel CorporationInventors: Jason M Agron, Polychronis Xekalakis, Paul Caprioli, Jiwei Oliver Lu, Koichi Yamada
-
Patent number: 10360153Abstract: Embodiments relate to a system operation queue for a transaction. An aspect includes determining whether a system operation is part of an in-progress transaction of a central processing unit (CPU). Another aspect includes based on determining that the system operation is part of the in-progress transaction, storing the system operation in a system operation queue corresponding to the in-progress transaction. Yet another aspect includes, based on the in-progress transaction ending, processing the system operation in the system operation queue.Type: GrantFiled: September 4, 2015Date of Patent: July 23, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jonathan D. Bradbury, Michael K. Gschwind, Eric M. Schwarz
-
Patent number: 10346170Abstract: In one embodiment, a processor includes logic, responsive to a first instruction, to perform an operation on a first source operand and a second source operand associated with the first instruction and write a result of the operation to a destination location comprising a third source operand. The write may be a partial write of the destination location to maintain an unmodified portion of the third source operand. Other embodiments are described and claimed.Type: GrantFiled: May 5, 2015Date of Patent: July 9, 2019Assignee: Intel CorporationInventors: Jayesh Iyer, Jamison D. Collins, Sebastian Winkel
-
Patent number: 10318289Abstract: A compute instruction to be executed is to use a memory operand in a computation. An address associated with the memory operand is to be used to locate a portion of memory from which data is to be obtained and placed in the memory operand. A determination is made as to whether the portion of memory extends across a specified memory boundary. Based on the portion of memory extending across the specified memory boundary, the portion of memory includes a plurality of memory units and a check is made as to whether at least one specified memory unit is accessible and whether at least one specified memory unit is inaccessible.Type: GrantFiled: November 14, 2015Date of Patent: June 11, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael K. Gschwind, Brett Olsson
-
Patent number: 10318430Abstract: Embodiments relate to a system operation queue for a transaction. An aspect includes determining whether a system operation is part of an in-progress transaction of a central processing unit (CPU). Another aspect includes based on determining that the system operation is part of the in-progress transaction, storing the system operation in a system operation queue corresponding to the in-progress transaction. Yet another aspect includes, based on the in-progress transaction ending, processing the system operation in the system operation queue.Type: GrantFiled: June 26, 2015Date of Patent: June 11, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jonathan D. Bradbury, Michael K. Gschwind, Eric M. Schwarz
-
Patent number: 10318297Abstract: A self-timed parallelized multi-core processor has an instruction decoder unit for receiving a program code instruction, determining an operating code and latency for the instruction, and assigning a loop index to the instruction. An instruction decomposer creates a primitive by decomposing the instruction, replacing the loop index with a core index, and broadcasting the primitive. Self-timed processing cores each having a unique core index compare the core index to their unique processing core index. The processing cores act on the primitive when their processing core index is within a threshold of the core index.Type: GrantFiled: January 30, 2015Date of Patent: June 11, 2019Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Yiqun Ge, Wuxian Shi, Lan Hu