Superscalar Patents (Class 712/23)
  • Patent number: 10725783
    Abstract: According to one or more embodiments, an example computer-implemented method for executing one or more out-of-order instructions by a processing unit, includes decoding an instruction to be executed, and based on a determination that the instruction is a store instruction, identifying a split load-hit-store (LHS) table for the store instruction, wherein a LHS table of the processing unit includes multiple split LHS tables. Identifying the split LHS table includes determining, for the store instruction, a first split LHS table by performing a mod operation using one or more operands from the store instruction, and adding one or more parameters of the store instruction in the first split LHS table by generating an ITAG for the store instruction. The method further includes dispatching the store instruction for execution to an issue queue with the ITAG.
    Type: Grant
    Filed: November 2, 2018
    Date of Patent: July 28, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ehsan Fatehi, Richard J. Eickemeyer, Edmund J. Gieske
  • Patent number: 10691457
    Abstract: An example apparatus includes a plurality of execution units, a physical register file that includes a plurality of physical registers, an instruction buffer, and a scheduling circuit. The instruction buffer may receive a group of instructions to be performed by the plurality of execution units. The scheduling circuit may allocate a physical register of the plurality of physical registers in the physical register file to store an operand of a particular instruction of the group of instructions. The scheduling circuit may also, in response to determining that a result of the particular instruction is used as an operand for a different instruction of the group of instructions, assign a tag to the particular instruction and to the different instruction to indicate that the result of the particular instruction will be sent to the different instruction without using the physical register file.
    Type: Grant
    Filed: December 13, 2017
    Date of Patent: June 23, 2020
    Assignee: Apple Inc.
    Inventors: Ian Kountanis, Muawya Al-Otoom
  • Patent number: 10621082
    Abstract: An information processing apparatus includes a receiving unit that receives data from the outside, a first memory space to which data is written from the receiving unit, a second memory space to which a flag for synchronization is written, and an arithmetic unit. The arithmetic unit includes a synchronization control unit that instructs the receiving unit to synchronize the first memory space and the second memory space. The receiving unit includes a synchronization command issuing unit that issues a synchronization command to the first memory space and the second memory space, and a synchronization command receiving unit that receives a response indicating that data writing is guaranteed from the first memory space and a response indicating that flag writing is guaranteed from the second memory space, and responds to the arithmetic unit that synchronization is completed when writing to the first memory space and the second memory space is guaranteed.
    Type: Grant
    Filed: January 11, 2018
    Date of Patent: April 14, 2020
    Assignee: NEC CORPORATION
    Inventor: Eiichiro Kawaguchi
  • Patent number: 10572264
    Abstract: Aspects of the invention include detecting, in an out-of-order (OoO) processor, that all instructions in a first group of in-flight instructions have a status of finished. The first group of in-flight instructions is the oldest group in an entry of a global completion table (GCT). It is determined that the entry in the GCT is a merged entry that is associated with both the first group of in-flight instructions and a second group of in-flight instructions dispatched immediately subsequent to the first group of in-flight instructions. The first group of in-flight instructions and the second group of in-flight instructions are completed in a single processor cycle. The completing is based at least in part on detecting that all instructions in the first group of in-flight instructions have a status of finished. The completing includes requesting release of resources utilized by both the first and second groups of in-flight instructions.
    Type: Grant
    Filed: November 30, 2017
    Date of Patent: February 25, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Joel A. Silberman, Balaram Sinharoy
  • Patent number: 10564979
    Abstract: Aspects of the invention include detecting that all instructions in a first group of in-flight instructions have a status of finished. The first group of in-flight instructions is associated with a first allocated entry in a global completion table (GCT) which tracks a dispatch order and status of groups of in-flight instructions. The GCT includes a plurality of allocated entries including the first allocated entry and a second allocated entry. A second group of in-flight instructions dispatched immediately prior to the first group is associated with the second allocated entry in the GCT. Based at least in part on the detecting, the first allocated entry is merged into the second allocated entry to create a single merged second allocated entry in the GCT that includes completion information for both the first group of in-flight instructions and the second group of in-flight instructions. The first allocated entry is then deallocated.
    Type: Grant
    Filed: November 30, 2017
    Date of Patent: February 18, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Joel A. Silberman, Balaram Sinharoy
  • Patent number: 10552162
    Abstract: Variable latency flush filtering including receiving a first flush instruction tag (ITAG) and a second flush ITAG, wherein the first flush ITAG and the second flush ITAG are instructions to invalidate internal operation results after an internal operation identified by the first flush ITAG and the second flush ITAG; determining that the second flush ITAG is before the first flush ITAG by comparing the first flush ITAG and the second flush ITAG; determining that the first flush ITAG requires adjustment; and delaying the flush to a subsequent cycle in response to determining that the second flush ITAG is before the first flush ITAG and determining that the first flush ITAG requires adjustment.
    Type: Grant
    Filed: January 22, 2018
    Date of Patent: February 4, 2020
    Assignee: International Business Machines Corporation
    Inventors: Glenn O. Kincaid, David S. Levitan, Albert J. Van Norstrand, Jr.
  • Patent number: 10552165
    Abstract: Within a processor, speculative finishes of load instructions only are tracked in a speculative finish table by maintaining an oldest load instruction of a thread in the speculative finish table after data is loaded for the oldest load instruction, wherein a particular queue index tag assigned to the oldest load instruction by an execution unit points to a particular entry in the speculative finish table, wherein the oldest load instruction is waiting to be finished dependent upon an error check code result. Responsive to a flow unit receiving the particular queue index tag with an indicator that the error check code result for data retrieved for the oldest load instruction is good, finishing the oldest load instruction in the particular entry pointed to by the queue index tag and writing an instruction tag stored in the entry for the oldest load instruction out of the speculative finish table for completion.
    Type: Grant
    Filed: October 19, 2015
    Date of Patent: February 4, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Susan E. Eisen, David A. Hrusecky, Christopher M. Mueller, Dung Q. Nguyen, A. James Van Norstrand, Jr., Kenneth L. Ward
  • Patent number: 10534608
    Abstract: A central processing unit system includes: a pipeline configured to receive an instruction; and a register file partitioned into one or more subarrays where (i) the register file includes one or more computation elements and (ii) the one or more computation elements are directly connected to one or more subarrays.
    Type: Grant
    Filed: August 17, 2011
    Date of Patent: January 14, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Pradip Bose, Alper Buyuktosunoglu, Jeffrey Haskell Derby, Michele Martino Franceschini, Robert Kevin Montoye, Augusto J. Vega
  • Patent number: 10452434
    Abstract: Systems, apparatuses, and methods for efficiently scheduling processor instructions for execution. The reservation station in a processor stores instructions in each of a primary buffer and a secondary buffer. Control logic selects a first number of instructions with ready source operands in the primary buffer and a second number of instructions with ready source operands in the secondary buffer. If a third number of instructions to issue from the reservation station is greater than the first number of instructions, then the reservation station issues one or more instructions of the second number of instructions from the secondary buffer to the one or more execution units. Control logic selects a fourth number of instructions in the secondary buffer to transfer to the primary buffer, and cancels the transfer of a given instruction in response to determining the given instruction has issued to the one or more execution units.
    Type: Grant
    Filed: September 11, 2017
    Date of Patent: October 22, 2019
    Assignee: Apple Inc.
    Inventors: Conrado Blasco, Sean M. Reynolds
  • Patent number: 10452564
    Abstract: Format preserving encryption of object code is disclosed. One example is a system including at least one processor and a memory storing instructions executable by the at least one processor to identify object code to be secured, where the object code comprises a list of instructions, each instruction comprising an opcode and zero or more parameters. A format preserving encryption (FPE) is applied to the received object code, where the FPE is applied separately to a sub-plurality of instructions in the list of instructions, to generate an encrypted object code comprising a sub-plurality of encrypted instructions. An encrypted object code is provided to a service provider, where the encrypted object code comprises the sub-plurality of encrypted instructions, and any unencrypted portions of the object code.
    Type: Grant
    Filed: April 25, 2017
    Date of Patent: October 22, 2019
    Assignee: ENTIT SOFTWARE LLC
    Inventors: Luther Martin, Timothy Roake
  • Patent number: 10423423
    Abstract: Within a processor, speculative finishes of load instructions only are tracked in a speculative finish table by maintaining an oldest load instruction of a thread in the speculative finish table after data is loaded for the oldest load instruction, wherein a particular queue index tag assigned to the oldest load instruction by an execution unit points to a particular entry in the speculative finish table, wherein the oldest load instruction is waiting to be finished dependent upon an error check code result. Responsive to a flow unit receiving the particular queue index tag with an indicator that the error check code result for data retrieved for the oldest load instruction is good, finishing the oldest load instruction in the particular entry pointed to by the queue index tag and writing an instruction tag stored in the entry for the oldest load instruction out of the speculative finish table for completion.
    Type: Grant
    Filed: September 29, 2015
    Date of Patent: September 24, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Susan E. Eisen, David A. Hrusecky, Christopher M. Mueller, Dung Q. Nguyen, A. James Van Norstrand, Jr., Kenneth L. Ward
  • Patent number: 10379954
    Abstract: The present invention provides a method and an apparatus for cache management of transaction processing in persistent memory.
    Type: Grant
    Filed: December 28, 2015
    Date of Patent: August 13, 2019
    Assignee: TSINGHUA UNIVERSITY
    Inventors: Jiwu Shu, Youyou Lu
  • Patent number: 10318260
    Abstract: A method and system of compiling and linking source stream programs for efficient use of multi-node devices. The system includes a compiler, a linker, a loader and a runtime component. The process converts a source code stream program to a compiled object code that is used with a programmable node based computing device having a plurality of processing nodes coupled to each other. The programming modules include stream statements for input values and output values in the form of sources and destinations for at least one of the plurality of processing nodes and stream statements that determine the streaming flow of values for the at least one of the plurality of processing nodes. The compiler converts the source code stream based program to object modules, object module instances and executables. The linker matches the object module instances to at least one of the multiple cores.
    Type: Grant
    Filed: April 2, 2018
    Date of Patent: June 11, 2019
    Assignee: Cornami, Inc.
    Inventors: Frederick Furtek, Paul Master
  • Patent number: 10291391
    Abstract: A method to protect computational, in particular cryptographic, devices having multi-core processors from DPA and DFA attacks is disclosed herein. The method implies: Defining a library of execution units functionally grouped into business function related units, security function related units and scheduler function related units; Designating at random one among the plurality of processing cores on the computational device to as a master core for execution of the scheduler function related execution units; and Causing, under control of the scheduler, execution of the library of execution units, so as to result in a randomized execution flow capable of resisting security threats initiated on the computational device.
    Type: Grant
    Filed: June 4, 2014
    Date of Patent: May 14, 2019
    Assignee: GIESECKE+DEVRIENT MOBILE SECURITY GMBH
    Inventors: Sai Yanamandra, Vineet Kulkarni, Shrikanthrao Kulkarni
  • Patent number: 10180856
    Abstract: A method for performing dynamic port remapping during instruction scheduling in an out of order microprocessor is disclosed. The method comprises selecting and dispatching a plurality of instructions from a plurality of select ports in a scheduler module in first clock cycle. Next, it comprises determining if a first physical register file unit has capacity to support instructions dispatched in the first clock cycle. Further, it comprises supplying a response back to logic circuitry between the plurality of select ports and a plurality of execution ports, wherein the logic circuitry is operable to re-map select ports in the scheduler module to execution ports based on the response. Finally, responsive to a determination that the first physical register file unit is full, the method comprises re-mapping at least one select port connecting with an execution unit in the first physical register file unit to a second physical register file unit.
    Type: Grant
    Filed: July 25, 2016
    Date of Patent: January 15, 2019
    Assignee: INTEL CORPORATION
    Inventor: Nelson N. Chan
  • Patent number: 10108423
    Abstract: An approach is provided in which a mapper control unit matches a result instruction tag corresponding to an executed instruction to a history buffer entry's instruction tag. The matched history buffer entry includes multiple history buffer field sets that each include a field set state indicator. The mapper control unit identifies a subset of the history buffer field sets having a valid field set state indicator and stores result data corresponding to the result instruction tag in the identified subset of history buffer field sets. In turn, the mapper control unit restores a subset of a register's fields utilizing content from the subset of history buffer field sets.
    Type: Grant
    Filed: March 25, 2015
    Date of Patent: October 23, 2018
    Assignee: International Business Machines Corporation
    Inventors: Michael J. Genden, Dung Q. Nguyen, Kenneth L. Ward
  • Patent number: 10067788
    Abstract: A computing system can provide user interfaces and back-end operations to facilitate review and invalidation of executed jobs. The system can provide an interface that allows the operator to review quality-control information about a completed job. Once the operator identifies a job as invalid, the operator can be presented with further options, such as whether to invalidate only the reviewed job or the job and all its descendants. The operator can also review antecedent jobs to an invalid job (e.g., in order to trace the root of the problem) and can selectively invalidate antecedent jobs.
    Type: Grant
    Filed: September 2, 2016
    Date of Patent: September 4, 2018
    Assignee: Dropbox, Inc.
    Inventors: Shaunak Kishore, Karl Dray
  • Patent number: 10007522
    Abstract: A branch instruction and a corresponding branch instruction address are received at a data processing system. A first value is received and is compared to a portion of the branch instruction address. An entry at a branch target buffer corresponding to the branch instruction is selectively allocated based on a result of the comparing.
    Type: Grant
    Filed: May 20, 2014
    Date of Patent: June 26, 2018
    Assignee: NXP USA, Inc.
    Inventors: Jeffrey W. Scott, William C. Moyer
  • Patent number: 9977596
    Abstract: The speed at which files can be accessed from a remote location is increased by predicting the file access pattern based on a predictive model. The file access pattern describes the order in which blocks of data for a given file type are read by a given application. From aggregated data across many file accesses, one or more predictive models of access patterns can be built. A predictive model takes as input the application requesting the file access and the file type being requested, and outputs information describing an order of data blocks for transmitting the file to the requesting application. Accordingly, when a server receives a request for a file from an application, the server uses the predictive model to determine the order that the application is most likely to use the data blocks of the file. The data is then transmitted in that order to the client device.
    Type: Grant
    Filed: December 27, 2012
    Date of Patent: May 22, 2018
    Assignee: Dropbox, Inc.
    Inventors: Rian Hunter, Jeffrey Bartelma
  • Patent number: 9977674
    Abstract: Disclosed are an apparatus, system, and method for implementing predicated instructions using micro-operations. A micro-code engine receives an instruction, decomposes the instruction, and generates a plurality of micro-operations to implement the instruction. Each of the decomposed micro-operations indicates a single destination register. For predicated instructions, the decomposed micro-operations include “conditional move” micro-operations to select between two potential output values. Except in the case that one of the potential output values is a constant, the decomposed micro-operations for a predicated instruction also include an append instruction that saves the incoming value of a destination register in a temporary variable. For at least one embodiment, the qualifying predicate for a predicated instruction is appended to the incoming value stored in the temporary register.
    Type: Grant
    Filed: October 14, 2003
    Date of Patent: May 22, 2018
    Assignee: Intel Corporation
    Inventors: Jeffrey P. Rupley, II, Edward A. Brekelbaum, Edward T. Grochowski, Bryan P. Black
  • Patent number: 9965274
    Abstract: A computer processor is provided with a plurality of functional units that performs operations specified by the at least one instruction over the multiple machine cycles, wherein the operations produce result operands. The processor also includes circuitry that generates result tags dynamically according to the number of operations that produce result operands in a given machine cycle. A bypass network is configured to provide data paths for transfer of operand data between the plurality of functional units according to the result tags.
    Type: Grant
    Filed: October 15, 2014
    Date of Patent: May 8, 2018
    Assignee: Mill Computing, Inc.
    Inventors: Arthur David Kahlich, Roger Rawson Godard
  • Patent number: 9952870
    Abstract: An apparatus and method for filtering biased conditional branches in a branch predictor in favor of non-biased conditional branches are disclosed. Biased conditional branches, which are consistently skewed toward one direction or outcome, are filtered such that an increased number of non-biased conditional branches which resolve in both directions may be considered. As a result, more useful branches may be captured over larger distances, thereby providing correlations deeper in a global history. In addition, by tracking only the latest occurrences of non-biased conditional branches using a recency stack structure, even more distant branch correlations may be made.
    Type: Grant
    Filed: June 13, 2014
    Date of Patent: April 24, 2018
    Assignee: Wisconsin Alumni Research Foundation
    Inventors: Mikko Lipasti, Dibakar Gope
  • Patent number: 9942272
    Abstract: Processing streaming data in accordance with policies that group data by source, enforce a maximum permissible late arrival value for streaming data, a maximum permissible early arrival for data and/or a maximum degree to which data can be out of order and still be compliant with the out of order policy is described. The correct starting point for reading a data stream so as to produce correct output from a given output start time can be enabled using the early arrival policy. Using combinations of policies, output can be generated promptly (with low latency). When input from a given source is not disrupted, output can be generated with low latency. Output can be generated even when the input stops by applying a late arrival policy.
    Type: Grant
    Filed: June 5, 2015
    Date of Patent: April 10, 2018
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC.
    Inventors: Zhong Chen, Lev Novik, Boris Shulman, Clemens A. Szyperski
  • Patent number: 9864398
    Abstract: A method for transmitting a plurality of data bits and a clock signal on a return to zero (RZ) signal includes: transmitting a first voltage that is greater than a first threshold, the first voltage being decodable to first order of data bits; transmitting a second voltage that is between a second threshold and the first threshold, the second voltage being decodable to a second order of data bits; transmitting a third voltage that is between a third threshold and a fourth threshold, the third voltage being decodable to a third order of data bits; transmitting a fourth voltage that is greater in magnitude than the fourth threshold, the fourth voltage being decodable to a fourth order of data bits; and transitioning the clock signal in response to the RZ signal being between the second threshold and the third threshold.
    Type: Grant
    Filed: December 30, 2015
    Date of Patent: January 9, 2018
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventor: Robert Floyd Payne
  • Patent number: 9851977
    Abstract: A method for executing instructions on a single-program, multiple-data processor system having a fixed number of execution lanes, including: scheduling a primary instruction for execution with a first wave of multiple data; assigning the first wave to a corresponding primary subset of the execution lanes; scheduling a secondary instruction having a second wave of multiple data, such that the second wave fits in lanes that are unused by the primary subset of lanes; assigning the second wave to a corresponding secondary subset of the lanes; fetching the primary and secondary instructions; configuring the execution lanes such that the primary subset is responsive to the primary instruction and the secondary subset is simultaneously responsive to the secondary instruction; and simultaneously executing the primary and secondary instructions in the execution lanes.
    Type: Grant
    Filed: December 6, 2012
    Date of Patent: December 26, 2017
    Assignee: KALRAY
    Inventors: Nicolas Brunie, Sylvain Collange
  • Patent number: 9798590
    Abstract: A method and apparatus for post-retire transaction access tracking is herein described. Load and store buffers are capable of storing senior entries. In the load buffer a first access is scheduled based on a load buffer entry. Tracking information associated with the load is stored in a filter field in the load buffer entry. Upon retirement, the load buffer entry is marked as a senior load entry. A scheduler schedules a post-retire access to update transaction tracking information, if the filter field does not represent that the tracking information has already been updated during a pendency of the transaction. Before evicting a line in a cache, the load buffer is snooped to ensure no load accessed the line to be evicted.
    Type: Grant
    Filed: September 7, 2006
    Date of Patent: October 24, 2017
    Assignee: Intel Corporation
    Inventors: Haitham Akkary, Ravi Rajwar, Srikanth T. Srinivasan
  • Patent number: 9766894
    Abstract: A chaining bit decoder of a computer processor receives an instruction stream. The chaining bit decoder selects a group of instructions from the instruction stream. The chaining bit decoder extracts a designated bit from each instruction of the instruction stream to produce a sequence of chaining bits. The chaining bit decoder decodes the sequence of chaining bits. The chaining bit decoder identifies zero or more instruction stream dependencies among the selected group of instructions in view of the decoded sequence of chaining bits. The chaining bit decoder outputs control signals to cause one or more pipelines stages of the processor to execute the selected group of instructions in view of the identified zero or more instruction stream dependencies among the group sequence of instructions.
    Type: Grant
    Filed: November 12, 2014
    Date of Patent: September 19, 2017
    Assignee: Optimum Semiconductor Technologies, Inc.
    Inventors: C. John Glossner, Gary J. Nacer, Murugappan Senthilvelan, Vitaly Kalashnikov, Arthur J. Hoane, Paul D'Arcy, Sabin D. Iancu, Shenghong Wang
  • Patent number: 9766895
    Abstract: A computing device determines that a current software thread of a plurality of software threads having an issuing sequence does not have a first instruction waiting to be issued to a hardware thread during a clock cycle. The computing device identifies one or more alternative software threads in the issuing sequence having instructions waiting to be issued. The computing device selects, during the clock cycle by the computing device, a second instruction from a second software thread among the one or more alternative software threads in view of determining that the second instruction has no dependencies with any other instructions among the instructions waiting to be issued. Dependencies are identified by the computing device in view of the values of a chaining bit extracted from each of the instructions waiting to be issued. The computing device issues the second instruction to the hardware thread.
    Type: Grant
    Filed: November 12, 2014
    Date of Patent: September 19, 2017
    Assignee: Optimum Semiconductor Technologies, Inc.
    Inventors: Shenghong Wang, C. John Glossner, Gary J. Nacer
  • Patent number: 9747224
    Abstract: Provided is a method of managing a register port, the method including performing scheduling on register ports that are used during a plurality of cycles to enable performing of a calculation; encoding data of the register ports according to results of the scheduling, the encoding of the data including, with respect to data of one of the register ports that does not have a schedule during one of the plurality of cycles, equally encoding the data of the one register port during the one cycle with data of an adjacent cycle of the one register port, the adjacent cycle being adjacent to the one cycle; and transmitting results of the encoding to a device that includes the register ports.
    Type: Grant
    Filed: March 11, 2015
    Date of Patent: August 29, 2017
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Tai-Song Jin, Jae-Un Park, Do-hyung Kim, Seung-won Lee
  • Patent number: 9740494
    Abstract: Instruction issue circuits are disclosed that are configured to issue multiple instructions within a superscalar pipeline of a microprocessor. The instruction issue circuit includes an instruction queue that stores instructions. A ready generation circuit is operably associated with the instruction queue and generates ready signals that indicate which instructions in the instruction queue are ready for execution. To simplify the instruction issue circuit, the instruction issue circuit has group blocks. Each group block receives a different group of the ready signals corresponding to a different group of the instructions. Each group block generates a group output indicating a group set within the corresponding group of the instructions that has a highest instruction execution priority and are ready for execution. By splitting the ready signals into groups, the groups of ready signals can be processed in parallel thereby reducing both the resulting delay and complexity of the instruction issue circuit.
    Type: Grant
    Filed: April 30, 2012
    Date of Patent: August 22, 2017
    Assignee: Arizona Board of Regents for and on behalf of Arizona State University
    Inventors: Lawrence T. Clark, Siddhesh Mhambrey, Satendra Kumar Maurya
  • Patent number: 9720675
    Abstract: Illustrated is a system and method to receive an instruction to access a set of identified data structures, where each identified data structure is associated with a version data structure that includes annotations of particular dependencies amongst at least two members of the set of identical data structures. The system and method further comprising determining, based upon the dependencies, that a version mismatch exists between the at least two members of the set of identified data structures, the dependencies used to identify a most recent version and a locally cached version of the at least two members. The system and method further comprising delaying execution of the instruction until the version mismatch between the at least two members of the set of identified data structures is resolved through an upgrade of a version of one of the at least two members of the set of identified data structures.
    Type: Grant
    Filed: October 27, 2010
    Date of Patent: August 1, 2017
    Assignee: Hewlett Packard Enterprise Development LP
    Inventor: Antonio Lain
  • Patent number: 9652371
    Abstract: A circular queue implementing a scheme for prioritized reads is disclosed. In one embodiment, a circular queue (or buffer) includes a number of storage locations each configured to store a data value. A multiplexer tree is coupled between the storage locations and a read port. A priority circuit is configured to generate and provide selection signals to each multiplexer of the multiplexer tree, based on a priority scheme. Based on the states of the selection signals, one of the storage locations is coupled to the read port via the multiplexers of the multiplexer tree.
    Type: Grant
    Filed: February 18, 2015
    Date of Patent: May 16, 2017
    Assignee: Apple Inc.
    Inventors: Rajat Goel, Hari S. Kannan, Khurram Z. Malik
  • Patent number: 9645802
    Abstract: A device compiler and linker is configured to group instructions into different strands for execution by different threads based on the dependence of those instructions on other, long-latency instructions. A thread may execute a strand that includes long-latency instructions, and then hardware resources previously allocated for the execution of that thread may be de-allocated from the thread and re-allocated to another thread. The other thread may then execute another strand while the long-latency instructions are in flight. With this approach, the other thread is not required to wait for the long-latency instructions to complete before acquiring hardware resources and initiating execution of the other strand, thereby eliminating at least a portion of the time that the other thread would otherwise spend waiting.
    Type: Grant
    Filed: August 7, 2013
    Date of Patent: May 9, 2017
    Assignee: NVIDIA Corporation
    Inventors: Mojtaba Mehrara, Michael Garland, Gregory Diamos
  • Patent number: 9632977
    Abstract: A data processor includes a packet selector. The packet selector creates an ordered list of packets, each packet corresponding to a respective communication flow, determines whether each packet in the ordered list of packets is eligible for transfer to a prefetch unit based on whether a preceding packet in the same communication flow has been transferred to the prefetch unit, and sets a selection priority for each packet based on start time constraints for the respective communication flow, and based on a processing status of a preceding packet in the communication flow.
    Type: Grant
    Filed: March 13, 2013
    Date of Patent: April 25, 2017
    Assignee: NXP USA, Inc.
    Inventors: Timothy G. Boland, Anne C. Harris, Steven D. Millman
  • Patent number: 9626190
    Abstract: The present invention provides a method and apparatus for floating-point register caching. One embodiment of the method includes mapping a first set of architected registers defined by a first instruction set to a memory outside of a plurality of physical registers. The plurality of physical registers are configured to map to the first set, a second set of architected registers defined by a second construction set, and a set of rename registers. This embodiment of the method also includes adding the physical registers corresponding to the first set of architected registers to the set of rename registers.
    Type: Grant
    Filed: October 7, 2010
    Date of Patent: April 18, 2017
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Jeff Rupley
  • Patent number: 9612835
    Abstract: A system and method for fencing memory accesses. Memory loads can be fenced, or all memory access can be fenced. The system receives a fencing instruction that separates memory access instructions into older accesses and newer accesses. A buffer within the memory ordering unit is allocated to the instruction. The access instructions newer than the fencing instruction are stalled. The older access instructions are gradually retired. When all older memory accesses are retired, the fencing instruction is dispatched from the buffer.
    Type: Grant
    Filed: September 14, 2012
    Date of Patent: April 4, 2017
    Assignee: Intel Corporation
    Inventors: Salvador Palanca, Stephen A. Fischer, Subramaniam Maiyuran, Shekoufeh Qawami
  • Patent number: 9582283
    Abstract: A microcontroller includes a program memory, data memory, central processing unit, at least one register module, a memory management unit, and a transport network. Instructions are executed in one clock cycle via an instruction word. The instruction word indicates the source module from which data is to be retrieved and the destination module to which data is to be stored. The address/data capability of an instruction word may be extended via a prefix module. If an operation is performed on the data, the source module or the destination module may perform the operation during the same clock cycle in which the data is transferred.
    Type: Grant
    Filed: May 18, 2009
    Date of Patent: February 28, 2017
    Assignee: Maxim Integrated Products, Inc.
    Inventors: Jeffrey D. Owens, Edward Tangkwai Ma, Donald W. Loomis, III, Thomas Augustus Chenot
  • Patent number: 9535701
    Abstract: A pipelined processor selects an instruction fetch mode from a number of fetch modes including an executed branch fetch mode, a predicted fetch mode, and a sequential fetch mode. Each branch instruction is associated with branch delay slots, the size of which can be greater than or equal to zero, and can vary from one branch instance to another. Branch prediction is used to fetch instructions, with the source of information for predictions deriving from a last instruction in the branch delay slots. When a prediction error occurs, the executed branch fetch mode uses an address from branch instruction evaluation to fetch a next instruction.
    Type: Grant
    Filed: January 29, 2014
    Date of Patent: January 3, 2017
    Assignee: TELEFONAKTIEBOLAGET LM ERICSSON (publ)
    Inventors: Erik Rijshouwer, Ricky Nas
  • Patent number: 9529596
    Abstract: In accordance with embodiments disclosed herein, there are provided methods, systems, and apparatuses for scheduling instructions in a multi-strand out-of-order processor. For example, an apparatus for scheduling instructions in a multi-strand out-of-order processor includes an out-of-order instruction fetch unit to retrieve a plurality of interdependent instructions for execution from a multi-strand representation of a sequential program listing; an instruction scheduling unit to schedule the execution of the plurality of interdependent instructions based at least in part on operand synchronization bits encoded within each of the plurality of interdependent instructions; and a plurality of execution units to execute at least a subset of the plurality of interdependent instructions in parallel.
    Type: Grant
    Filed: July 1, 2011
    Date of Patent: December 27, 2016
    Assignee: Intel Corporation
    Inventors: Boris A. Babayan, Vladimir M. Pentkovski, Alexander V. Butuzov, Sergey Y. Shishlov, Alexey Y. Sivtsov, Nikolay E. Kosarev
  • Patent number: 9513904
    Abstract: A computer processing system with a hierarchical memory system that associates a number of valid bits for each cache line of the hierarchical memory system. The valid bits are provided for each cache line stored in a respective cache and make explicit which bytes are semantically defined and which are not for the associated given cache line. Memory requests to the cache(s) of the hierarchical memory system can include an address specifying a requested cache line as well as a mask that includes a number of bits each corresponding to a different byte of the requested cache line. The values of the bits of the byte mask indicate which bytes of the requested cache line are to be returned from the hierarchical memory system. The memory request is processed by the top level cache of the hierarchical memory system, looking for one or more valid bytes of the requested cache line corresponding to the target address of the memory request.
    Type: Grant
    Filed: October 15, 2014
    Date of Patent: December 6, 2016
    Assignee: MILL COMPUTING, INC.
    Inventors: Roger Rawson Godard, Arthur David Kahlich
  • Patent number: 9489204
    Abstract: An example method of storing a partial target address in an instruction cache includes receiving a branch instruction. The method also includes predicting a direction of the branch instruction as being not taken. The method further includes calculating a destination address based on executing the branch instruction. The method also includes determining a partial target address using the destination address. The method further includes in response to the predicted direction of the branch instruction changing from not taken to taken, replacing an offset in an instruction cache with the partial target address.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: November 8, 2016
    Assignee: QUALCOMM Incorporated
    Inventors: Jiajin Tu, Suresh K. Venkumahanti, Brian R. Mestan
  • Patent number: 9471480
    Abstract: A data processing apparatus has a memory rename table for storing memory rename entries each identifying a mapping between a memory address of a location in memory and a mapped register of a plurality of registers. The mapped register is identified by a register number. In response to a store instruction, the store target memory address of the store instruction is mapped to a store destination register and so the data value is stored to the store destination register instead of memory. A memory rename entry is provided in the table to identify the mapping between the store target memory address and store destination target register. In response to a load instruction, if there is a hit in the memory rename table for the load target memory address then the loaded value can be read from the mapped register instead of memory.
    Type: Grant
    Filed: February 21, 2014
    Date of Patent: October 18, 2016
    Assignee: The Regents of the University of Michigan
    Inventors: Joseph Michael Pusdesris, Yiping Kang, Andrea Pellegrini, Benjamin Allen Vandersloot, Trevor Nigel Mudge
  • Patent number: 9459871
    Abstract: A method, system, and computer program product for identifying loop information corresponding to a plurality of loop instructions. The loop instructions are stored into a queue. The loop instructions are replayed from the queue for execution. Loop iteration is counted based on the identified loop information. A determination is made of whether the last iteration of the loop is done. If the last iteration is not done, then embodiments continue replaying the loop instructions, until the last iteration is done.
    Type: Grant
    Filed: December 31, 2012
    Date of Patent: October 4, 2016
    Assignee: Intel Corporation
    Inventors: Masha Lipshits, Lihu Rappaport, Shantanu Gupta, Franck Sala, Naveen Kumar, Allan D. Knies
  • Patent number: 9436464
    Abstract: In a multithread processor capable of executing a plurality of threads, in order to select a thread and instruction for increasing a throughput of the multithread processor, an instruction-issuance controlling device included in the multithread processor includes a resource management unit configured to manage stall information indicating whether or not each of threads in execution is in a stalled state; a thread selection unit configured to select a thread which is not in the stalled state among the threads in execution; and an instruction-issuance controlling unit configured to perform controlling so that simultaneously issuable instructions are issued from among the selected thread.
    Type: Grant
    Filed: December 3, 2012
    Date of Patent: September 6, 2016
    Assignee: SOCIONECT INC.
    Inventor: Tomohiro Yamana
  • Patent number: 9430243
    Abstract: A system and method for efficiently reducing the latency of initializing registers. A register rename unit within a processor determines whether prior to an execution pipeline stage it is known a decoded given instruction writes a particular numerical value in a destination operand. An example is a move immediate instruction that writes a value of 0 in its destination operand. Other examples may also qualify. If the determination is made, a given physical register identifier is assigned to the destination operand, wherein the given physical register identifier is associated with the particular numerical value, but it is not associated with an actual physical register in a physical register file. The given instruction is marked to prevent it from proceeding to an execution pipeline stage. When the given physical register identifier is used to read the physical register file, no actual physical register is accessed.
    Type: Grant
    Filed: April 30, 2012
    Date of Patent: August 30, 2016
    Assignee: Apple Inc.
    Inventors: James B. Keller, John H. Mylius, Conrado Blasco-Allue, Gerard R. Williams, III
  • Patent number: 9430240
    Abstract: Embodiments relate to pre-computation slice (p-slice) merging for prefetching in a computer processor. An aspect includes determining a plurality of p-slices corresponding to a delinquent instruction. Another aspect includes selecting a first p-slice and a second p-slice of the plurality of p-slices. Another aspect includes traversing the first p-slice and the second p-slice to determine that divergent instructions exist between the first p-slice and the second p-slice. Another aspect includes, based on determining that divergent instructions exist between the first p-slice and the second p-slice, determining whether the first p-slice and the second p-slice converge after the divergent instructions. Another aspect includes, based on determining that the first p-slice and the second p-slice converge after the divergent instructions, merging the first p-slice and the second p-slice into a single merged p-slice.
    Type: Grant
    Filed: December 10, 2015
    Date of Patent: August 30, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Islam Atta, Ioana M. Baldini Soares, Kailash Gopalakrishnan, Vijayalakshmi Srinivasan
  • Patent number: 9430235
    Abstract: A method and information processing system manage load and store operations that can be executed out-of-order. At least one of a load instruction and a store instruction is executed. A determination is made that an operand store compare hazard has been encountered. An entry within an operand store compare hazard prediction table is created based on the determination. The entry includes at least an instruction address of the instruction that has been executed and a hazard indicating flag associated with the instruction. The hazard indicating flag indicates that the instruction has encountered the operand store compare hazard. When a load instruction is associated with the hazard indicating flag, the load instruction becomes dependent upon all store instructions associated with a substantially similar hazard indicating flag.
    Type: Grant
    Filed: July 29, 2013
    Date of Patent: August 30, 2016
    Assignee: International Business Machines Corporation
    Inventors: Gregory W. Alexander, Khary J. Alexander, Brian Curran, Jonathan T. Hsieh, Christian Jacobi, James R. Mitchell, Brian R. Prasky, Brian W. Thompto
  • Patent number: 9424365
    Abstract: Techniques and approaches are provided for creating indexes and column constraints on structured XML data that is stored in a relational database. Data Definition Language (DDL) Create Index and Create Constraint commands have extended syntax that allows the specification of a path-based expression instead of requiring a column and table name. A mapping created by the system when an XML Schema is registered stores the correspondence of XML data elements to automatically-created database tables and columns that are given names only useful for the internal system. When a user provides a path-based expression in a DDL when creating an index or constraint, the path-based expression is translated to the underlying database constructs using the mapping. Issues are addressed for handling path-based expressions that evaluate to more than one element. Additional index optimization is described using data type information available in the XML schema to select the optimal index type.
    Type: Grant
    Filed: October 30, 2009
    Date of Patent: August 23, 2016
    Assignee: Oracle International Corporation
    Inventors: Beda Christoph Hammerschmidt, Zhen Hua Liu, Thomas Baby
  • Patent number: 9354885
    Abstract: Processing of an instruction fetch from an instruction cache is provided, which includes: determining whether the next instruction fetch is in a same cache line of the instruction cache as a last instruction fetch; and based, at least in part, on determining that the next instruction fetch is in the same cache line, suppressing for the next instruction fetch one or more instruction cache-related directory accesses, and forcing for the next instruction an address match signal for the same cache line. The suppressing may include generating a known-to-hit signal where the next fetch is in the same cache line, and the last fetch is not a branch instruction, and issuing an instruction cache hit where a cache line segment of the same cache line having the next instruction has a valid validity bit, the valid validity bit having been retrieved and maintained based on a most-recent, instruction cache-directory-accessed fetch.
    Type: Grant
    Filed: January 8, 2016
    Date of Patent: May 31, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Patent number: 9354882
    Abstract: Example methods and apparatus to manage partial commit-checkpoints are disclosed. A disclosed example method includes identifying a commit instruction associated with a region of instructions executed by a processor, identifying candidate instructions from the region of instructions, and generating a processor partial commit-checkpoint to save a current state of the processor, the checkpoint based on calculated register values associated with live instructions, and including instruction reference addresses to link the candidate instructions.
    Type: Grant
    Filed: September 30, 2013
    Date of Patent: May 31, 2016
    Assignee: Intel Corporation
    Inventors: Edson Borin, Youfeng Wu