Dynamic Instruction Dependency Checking, Monitoring Or Conflict Resolution Patents (Class 712/216)
  • Patent number: 10969995
    Abstract: Systems and method are disclosed for monitoring processor performance. Embodiments described relate to differentiating function performance by input parameters. In one embodiment, a method includes configuring a counter contained in a processor to count occurrences of an event in the processor and to overflow upon the count of occurrences reaching a specified value, configuring a precise event based sampling (PEBS) handler circuit to generate and store a PEBS record into a PEBS memory buffer after at least one overflow, the PEBS record containing at least one stack entry read from a stack after the at least one overflow, enabling the PEBS handler circuit to generate and store the PEBS record after the at least one overflow, generating and storing the PEBS record into the PEBS memory buffer after the at least one overflow; and storing contents of the PEBS memory buffer to a PEBS trace file in a memory.
    Type: Grant
    Filed: October 16, 2018
    Date of Patent: April 6, 2021
    Assignee: Intel Corporation
    Inventors: Ahmad Yasin, Stanislav Bratanov
  • Patent number: 10963389
    Abstract: An apparatus to facilitate data prefetching is disclosed. The apparatus includes a cache, one or more execution units (EUs) to execute program code, prefetch logic to maintain tracking information of memory instructions in the program code that trigger a cache miss and compiler logic to receive the tracking information, insert one or more pre-fetch instructions in updated program code to prefetch data from a memory for execution of one or more of the memory instructions that triggered a cache miss and download the updated program code for execution by the one or more EUs.
    Type: Grant
    Filed: February 11, 2020
    Date of Patent: March 30, 2021
    Assignee: Intel Corporation
    Inventors: Vasileios Porpodas, Guei-Yuan Lueh, Subramaniam Maiyuran, Wei-Yu Chen
  • Patent number: 10949205
    Abstract: A computer system includes a dispatch routing network to dispatch a plurality of instructions, and a processor in signal communication with the dispatch routing network. The processor determines a move instruction from the plurality of instructions to move data produced by an older second instruction, and copies a splice target file (STF) tag from a source register of the move instruction to a destination register of the move instruction without physically copying data in a slice target register and without assigning a new STF tag destination to the move instruction.
    Type: Grant
    Filed: December 20, 2018
    Date of Patent: March 16, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Joshua Bowman, Dung Q. Nguyen, Hung Le, Brian Thompto, Maureen A. Delaney, Cliff Kucharski, Steven J Battle
  • Patent number: 10922159
    Abstract: A method for performing a data dump includes detecting an error in a segmented application having an address space and a buffer. In response to detecting the error, the method quiesces the address space and copies content of the address space to another location while the address space is quiesced. The method reactivates the address space after the content of the address space is completely copied. The method suspends write access to the buffer and copies content of the buffer to another location while write access to the buffer is suspended. While write access to the buffer is suspended, the method redirects writes intended for the buffer to a temporary storage area, and directs reads intended for the buffer to one of the buffer and the temporary storage area, depending on where valid data is stored. A corresponding system and computer program product are also disclosed.
    Type: Grant
    Filed: April 16, 2019
    Date of Patent: February 16, 2021
    Assignee: International Business Machines Corporation
    Inventors: Thomas C. Reed, David C. Reed
  • Patent number: 10838733
    Abstract: A load request to restore a plurality of architected registers is obtained. Based on obtaining the load request, one or more architected registers of the plurality of architected registers are restored. The restoring uses a snapshot that maps architected registers to physical registers to replace one or more physical registers currently assigned to the one or more architected registers with one or more physical registers of the snapshot corresponding to the one or more architected registers.
    Type: Grant
    Filed: April 18, 2017
    Date of Patent: November 17, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Valentina Salapura, Chung-Lung K. Shum, Timothy J. Slegel
  • Patent number: 10838725
    Abstract: Techniques are disclosed relating to fetching items from a compute command stream that includes compute kernels. In some embodiments, stream fetch circuitry sequentially pre-fetches items from the stream and stores them in a buffer. In some embodiments, fetch parse circuitry iterate through items in the buffer using a fetch parse pointer to detect indirect-data-access items and/or redirect items in the buffer. The fetch parse circuitry may send detected indirect data accesses to indirect-fetch circuitry, which may buffer requests. In some embodiments, execute parse circuitry iterates through items in the buffer using an execute parse pointer (e.g., which may trail the fetch parse pointer) and outputs both item data from the buffer and indirect-fetch results from indirect-fetch circuitry for execution. In various embodiments, the disclosed techniques may reduce fetch latency for compute kernels.
    Type: Grant
    Filed: September 26, 2018
    Date of Patent: November 17, 2020
    Assignee: Apple Inc.
    Inventors: Andrew M. Havlir, Jeffrey T. Brady
  • Patent number: 10824430
    Abstract: Managing program instruction execution by receiving a first OSC (operand store compare) instruction, the first OSC instruction comprising a first itag and a first instruction address and creating a first OSC table entry according to the first itag and first instruction address. Further, receiving a second OSC instruction, the second OSC instruction comprising a second itag and a second instruction address and creating a second OSC table entry according to the second itag and an itag delta between the first itag and the second itag, then appending the second OSC table entry according to an itag delta between the second itag and a third itag, and providing an itag delta from the second OSC table entry to an instruction sequencing unit (ISU).
    Type: Grant
    Filed: April 25, 2019
    Date of Patent: November 3, 2020
    Assignee: International Business Machines Corporation
    Inventors: Ehsan Fatehi, Brian W. Thompto
  • Patent number: 10776125
    Abstract: In an embodiment, at least one CPU processor and at least one coprocessor are included in a system. The CPU processor may issue operations to the coprocessor to perform, including load/store operations. The CPU processor may generate the addresses that are accessed by the coprocessor load/store operations, as well as executing its own CPU load/store operations. The CPU processor may include a memory ordering table configured to track at least one memory region within which there are outstanding coprocessor load/store memory operations that have not yet completed. The CPU processor may delay CPU load/store operations until the outstanding coprocessor load/store operations are complete. In this fashion, the proper ordering of CPU load/store operations and coprocessor load/store operations may be maintained.
    Type: Grant
    Filed: December 5, 2018
    Date of Patent: September 15, 2020
    Assignee: Apple Inc.
    Inventors: Aditya Kesiraju, Brett S. Feero, Nikhil Gupta
  • Patent number: 10776897
    Abstract: Embodiments described herein provide an apparatus comprising a processor to configure a plurality of contexts of a command engine to execute a graphics workload comprising a plurality of walkers, allocate, from a pool of execution units of a graphics processor, a subset of execution units to each walker in the plurality of walkers based at least in part on the predetermined number of walkers configured for the context, for each context in the plurality of contexts, dispatch one or more walkers of the plurality of walkers to the execution units, and upon dispatch of the one or more walkers of the plurality of walkers, write an opcode to a computer-readable memory indicating that the dispatch of the walker is complete, wherein the opcode comprises dependency data for the one or more walkers of the plurality of walkers. Other embodiments may be described and claimed.
    Type: Grant
    Filed: March 8, 2019
    Date of Patent: September 15, 2020
    Assignee: INTEL CORPORATION
    Inventors: James Valerio, Vasanth Ranganathan, Joydeep Ray, Abhishek R. Appu, Ben J. Ashbaugh, Brandon Fliflet, Jeffery S. Boles, Srinivasan Embar Raghukrishnan, Rahul Kulkarni
  • Patent number: 10761854
    Abstract: Preventing hazard flushes in an instruction sequencing unit of a multi-slice processor including receiving a load instruction in a load reorder queue, wherein the load instruction is an instruction to load data from a memory location; subsequent to receiving the load instruction, receiving a store instruction in a store reorder queue, wherein the store instruction is an instruction to store data in the memory location; determining that the store instruction causes a hazard against the load instruction; preventing a flush of the load reorder queue based on a state of the load instruction; and re-executing the load instruction.
    Type: Grant
    Filed: April 19, 2016
    Date of Patent: September 1, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Robert A. Cordes, David A. Hrusecky, Elizabeth A. McGlone
  • Patent number: 10761594
    Abstract: In an embodiment, a processor includes a first core and a power management agent (PMA), coupled to the first core, to include a static table that stores a list of operations, and a plurality of columns each to specify a corresponding flow that includes a corresponding subset of the operations. Execution of each flow is associated with a corresponding state of the first core. The PMA includes a control register (CR) that includes a plurality of storage elements to receive one of a first value and a second value. The processor includes execution logic, responsive to a command to place the first core into a first state, to execute an operation of a first flow when a corresponding storage element stores the first value and to refrain from execution of an operation of the first flow when the corresponding element stores the second value. Other embodiments are described and claimed.
    Type: Grant
    Filed: June 15, 2017
    Date of Patent: September 1, 2020
    Assignee: Intel Corporation
    Inventors: Israel Diamand, Asaf Rubinstein, Arik Gihon, Tal Kuzi, Tomer Ziv, Nadav Shulman
  • Patent number: 10762458
    Abstract: Systems and methods are provided for scheduling objects having pair-wise and cumulative constraints. The systems and methods presented can utilize a directed acyclic graph to increase or maximize a utilization function. The objects can comprise satellites in a constellation of satellites. In some implementations, the satellites are imaging satellites, and the systems and methods for scheduling can use human collaboration to determine events of interest for acquisition of images. In some implementations, dominant edges are removed from the directed acyclic graph. In some implementations, dynamic weights are assigned to nodes associated with downlink events in the directed acyclic graph.
    Type: Grant
    Filed: October 24, 2014
    Date of Patent: September 1, 2020
    Assignee: Planet Labs, Inc.
    Inventor: Sean Augenstein
  • Patent number: 10747539
    Abstract: Systems, apparatuses, and methods for instruction next fetch prediction. A scan-on-fill target predictor in a processor generates a predicted next fetch address for the instruction fetch unit. When a group of instructions is used to fill an instruction cache but is not currently being retrieved from the instruction cache for processing by other pipeline stages, the group of instructions are scanned to identify exit points of basic blocks within the group. An entry of a table in the scan-on-fill target predictor is allocated for an instruction in a basic block in the group when the basic block has an exit point with a target address that can be resolved within a single clock cycle. The scan-on-fill target predictor may perform a lookup of the table with the current fetch address. The prediction may be compared to a main branch predictor at a later pipeline stage for training purposes.
    Type: Grant
    Filed: November 14, 2016
    Date of Patent: August 18, 2020
    Assignee: Apple Inc.
    Inventors: James Robert Howard Hakewill, Constantin Pistol
  • Patent number: 10740269
    Abstract: Arbitration circuitry is provided for allocating up to M resources to N requesters, where M?2. The arbitration circuitry comprises group allocation circuitry to control a group allocation in which the N requesters are allocated to M groups of requesters, with each requester allocated to one of the groups; and M arbiters each corresponding to a respective one of the M groups. Each arbiter selects a winning requester from the corresponding group, which is to be allocated a corresponding resource of the M resources. In response to a given requester being selected as the winning requester by the arbiter for a given group, the group allocation is changed so that in a subsequent arbitration cycle the given requester is in a different group to the given group.
    Type: Grant
    Filed: July 17, 2018
    Date of Patent: August 11, 2020
    Assignee: ARM Limited
    Inventor: Andrew David Tune
  • Patent number: 10740107
    Abstract: Operation of a multi-slice processor that includes a plurality of execution slices and an instruction sequencing unit. Operation of such a multi-slice processor includes: receiving, at the instruction sequencing unit, a load instruction indicating load address data and a load data length; determining a previous store instruction in an issue queue such that store address data for the previous store instruction corresponds to the load address data, wherein the previous store instruction corresponds to a store data length; and generating, in dependence upon the store data length matching the load data length, an indication in the issue queue that indicates a dependency between the load instruction and the previous store instruction.
    Type: Grant
    Filed: June 1, 2016
    Date of Patent: August 11, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Salma Ayub, Joshua W. Bowman, Jeffrey C. Brownscheidle, Kurt A. Feiste, Dung Q. Nguyen, Salim A. Shah, Brian W. Thompto
  • Patent number: 10684834
    Abstract: Embodiments of the present invention disclose a method and an apparatus for detecting inter-instruction data dependency. The method comprises: comparing a thread number corresponding to a historical access operation with a thread number corresponding to a write access operation, if the thread number corresponding to the write access operation is less than the thread number corresponding to the historical access operation, which indicates existence of data dependency for a to-be-detected instruction, terminating the detection; or comparing a thread number corresponding to a historical write access operation with a thread number corresponding to a read access operation, if the thread number corresponding to the read access operation is less than the thread number corresponding to the historical write access operation, which indicates existence of data dependency for the to-be-detected instruction, terminating the detection.
    Type: Grant
    Filed: March 21, 2019
    Date of Patent: June 16, 2020
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Hongyuan Liu, Cho-Li Wang, KingTin Lam, Huanxin Lin, Bin Zhang, Junchao Ma
  • Patent number: 10684859
    Abstract: Providing memory dependence prediction in block-atomic dataflow architectures is provided, in one aspect, la a memory dependence prediction circuit. The memory dependence prediction circuit comprises a predictor table configured to store multiple predictor table entries, each comprising a store instruction identifier, a block reach set, and a load set. Using this data, the memory dependence prediction circuit determines, upon a fetch of an instruction block by an execution pipeline, whether the instruction block contains store instructions that reach dependent load instructions. If so, the store instructions are marked as having dependent load instructions to wake. In some aspects, the memory dependence prediction circuit is configured to determine whether the instruction block contains dependent load instructions reached by store instructions. If so, the memory dependence prediction circuit delays execution of the dependent load instructions.
    Type: Grant
    Filed: September 19, 2016
    Date of Patent: June 16, 2020
    Assignee: QUALCOMM Incorporated
    Inventors: Chen-Han Ho, Gregory Michael Wright
  • Patent number: 10678700
    Abstract: A computer processor includes an instruction processing pipeline that interfaces to a hierarchical memory system employing an address space. The instruction processing pipeline includes execution logic that executes at least one thread in different protection domains over time, wherein the different protection domains are defined by descriptors each including first data specifying a memory region of the address space employed by the hierarchical memory system and second data specifying permissions for accessing the associated memory region. The address space can be a virtual address space or a physical address space. The protection domains can be associated with different turfs each representing a collection of descriptors. A given thread can execute in a particular protection domain(turf), one protection domain (turf) at a time with the particular protection domain (turf) selectively configured to change over time.
    Type: Grant
    Filed: March 21, 2016
    Date of Patent: June 9, 2020
    Assignee: Mill Computing, Inc.
    Inventors: Roger Rawson Godard, Arthur David Kahlich, Jan Schukat, William Edwards
  • Patent number: 10635487
    Abstract: Methods and systems are disclosed for executing tasks in a partially out-of-order execution environment. Input is received indicating a task and task type for execution within an environment. Functions associated with the task and type of task may be selected. An instruction may be generated for each function indicating that the function is configured for static scheduling or dynamic scheduling. A schedule for instantiating each function may be generated, where functions configured for static scheduling are scheduled for instantiation according to a position of the function within the list and functions configured for dynamic scheduling are scheduled for instantiation at runtime based on an environment in which the function is instantiated and a position of the function of the subset of the set of functions within the list. A thread specification may then be generated using the functions and list. The thread specification may transmitted to remote devices.
    Type: Grant
    Filed: August 20, 2018
    Date of Patent: April 28, 2020
    Assignee: Oracle International Corporation
    Inventors: Andrew J. Giampetro, Russell Ashley Broom, Ricarda Heuss
  • Patent number: 10635442
    Abstract: A processor includes a front end, an execution unit, a retirement stage, a counter, and a performance monitoring unit. The front end includes logic to receive an event instruction to enable supervision of a front end event that will delay execution of instructions. The execution unit includes logic to set a register with parameters for supervision of the front end event. The front end further includes logic to receive a candidate instruction and match the candidate instruction to the front end event. The counter includes logic to generate the front end event upon retirement of the candidate instruction.
    Type: Grant
    Filed: March 12, 2018
    Date of Patent: April 28, 2020
    Assignee: Intel Corporation
    Inventor: Ahmad Yasin
  • Patent number: 10628125
    Abstract: A method of generating a hardware design to calculate a modulo value for any input value in a target input range with respect to a constant value d using one or more range reduction stages. The hardware design is generated through an iterative process that selects the optimum component for mapping successively increasing input ranges to the target output range until a component is selected that maps the target input range to the target output range. Each iteration includes generating hardware design components for mapping the input range to the target output range using each of a plurality of modulo preserving range reduction methods, synthesizing the generated hardware design components, and selecting one of the generated hardware design components based on the results of the synthesis.
    Type: Grant
    Filed: January 18, 2019
    Date of Patent: April 21, 2020
    Assignee: Imagination Technologies Limited
    Inventor: Samuel Lee
  • Patent number: 10628164
    Abstract: A system and method for efficiently handling speculative execution. A load store unit (LSU) of a processor stores a commit candidate pointer, which points to a given store instruction buffered in the store queue. The given store instruction is an oldest store instruction not currently permitted to commit to the data cache. The LSU receives a first pointer from the mapping unit, which points to an oldest instruction of non-dispatched branches and unresolved system instructions. The LSU receives a second pointer from the execution unit, which points to an oldest unresolved, issued branch instruction. When the LSU determines the commit candidate pointer is older than each of the first pointer and the second pointer, the commit candidate pointer is updated to point to an oldest store instruction younger than the given store instruction stored in the store queue. The given store instruction is permitted to commit to the data cache.
    Type: Grant
    Filed: July 30, 2018
    Date of Patent: April 21, 2020
    Assignee: Apple Inc.
    Inventors: Kulin N. Kothari, Mridul Agarwal, Aditya Kesiraju, Deepankar Duggal, Sean M. Reynolds
  • Patent number: 10620955
    Abstract: Predicting a Table of Contents (TOC) pointer value responsive to branching to a subroutine. A subroutine is called from a calling module executing on a processor. Based on calling the subroutine, a value of a pointer to a reference data structure, such as a TOC, is predicted. The predicting is performed prior to executing a sequence of one or more instructions in the subroutine to compute the value. The value that is predicted is used to access the reference data structure to obtain a variable value for a variable of the subroutine.
    Type: Grant
    Filed: September 19, 2017
    Date of Patent: April 14, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Patent number: 10606591
    Abstract: Technical solutions are described for issuing, by a load-store unit (LSU), a plurality of instructions from an out-of-order (OoO) window. The issuing includes, in response to determining a first effective address being used by a first instruction, the first effective address corresponding to a first real address, creating an effective real table (ERT) entry in an ERT, the ERT entry mapping the first effective address to the first real address. Further, the execution includes in response to determining an effective address synonym used by a second instruction, the effective address synonym being a second effective address that is also corresponding to said first real address: creating a synonym detection table (SDT) entry in an SDT, wherein the SDT entry maps the second effective address to the ERT entry, and relaunching the second instruction by replacing the second effective address in the second instruction with the first effective address.
    Type: Grant
    Filed: October 6, 2017
    Date of Patent: March 31, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bryan Lloyd, Balaram Sinharoy
  • Patent number: 10606592
    Abstract: Technical solutions are described for issuing, by a load-store unit (LSU), a plurality of instructions from an out-of-order (OoO) window. The issuing includes, in response to determining a first effective address being used by a first instruction, the first effective address corresponding to a first real address, creating an effective real table (ERT) entry in an ERT, the ERT entry mapping the first effective address to the first real address. Further, the execution includes in response to determining an effective address synonym used by a second instruction, the effective address synonym being a second effective address that is also corresponding to said first real address: creating a synonym detection table (SDT) entry in an SDT, wherein the SDT entry maps the second effective address to the ERT entry, and relaunching the second instruction by replacing the second effective address in the second instruction with the first effective address.
    Type: Grant
    Filed: November 29, 2017
    Date of Patent: March 31, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bryan Lloyd, Balaram Sinharoy
  • Patent number: 10599571
    Abstract: An apparatus to facilitate data prefetching is disclosed. The apparatus includes a cache, one or more execution units (EUs) to execute program code, prefetch logic to maintain tracking information of memory instructions in the program code that trigger a cache miss and compiler logic to receive the tracking information, insert one or more pre-fetch instructions in updated program code to prefetch data from a memory for execution of one or more of the memory instructions that triggered a cache miss and download the updated program code for execution by the one or more EUs.
    Type: Grant
    Filed: August 7, 2017
    Date of Patent: March 24, 2020
    Assignee: Intel Corporation
    Inventors: Vasileios Porpodas, Guei-Yuan Lueh, Subramaniam Maiyuran, Wei-Yu Chen
  • Patent number: 10579387
    Abstract: Technical solutions are described for executing one or more out-of-order (OoO) instructions by a processing unit. The execution includes detecting, by a load-store unit (LSU), a load-hit-store (LHS) in an out-of-order execution of the instructions, the detecting based only on effective addresses. The detecting includes determining an effective address associated with an operand of a load instruction. The detecting further includes determining whether a store instruction entry using said effective address to store a data value is present in a store reorder queue, and indicating that an LHS has been detected based at least in part on determining that store instruction entry using said effective address is present in the store reorder queue. In response to detecting the LHS, a store forwarding is performed that includes forwarding data from the store instruction to the load instruction.
    Type: Grant
    Filed: October 6, 2017
    Date of Patent: March 3, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Christopher Gonzalez, Bryan Lloyd, Balaram Sinharoy
  • Patent number: 10572256
    Abstract: Technical solutions are described for issuing, by a load-store unit (LSU), a plurality of instructions from an out-of-order (OoO) window. The issuing includes, in response to determining a first effective address (EA) being used by a first instruction, the first EA corresponding to a first real address (RA), creating a first effective real translation (ERT) table entry in an ERT table, the ERT entry mapping the first EA to the first RA. Further, in response to determining an EA synonym used by a second instruction, the execution includes replacing the first ERT entry with a second ERT entry, wherein the second ERT entry maps the second EA with the first RA, and creating an ERT eviction (ERTE) table entry in an ERTE table, wherein the ERTE entry maps the first RA to the first EA, the ERTE table entry maintains the relationship between the first EA and the first RA.
    Type: Grant
    Filed: October 6, 2017
    Date of Patent: February 25, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bryan Lloyd, Balaram Sinharoy
  • Patent number: 10572179
    Abstract: A lower level cache receives, from a processor core, a plurality of copy-type requests and a plurality of paste-type requests that together indicate a memory move to be performed, as well as a barrier request that requests ordering of memory access requests prior to and after the barrier request. The barrier request precedes a copy-type request and a paste-type request of the memory move in program order. Prior to completion of processing of the barrier request, the lower level cache allocates first and second state machines to service the copy-type and paste-type requests. The first state machine speculatively reads a data granule identified by a source real address of the copy-type request into a non-architected buffer. After processing of the barrier request is complete, the second state machine writes the data granule from the non-architected buffer to a storage location identified by a destination real address of the paste-type request.
    Type: Grant
    Filed: July 19, 2018
    Date of Patent: February 25, 2020
    Assignee: International Business Machines Corporation
    Inventors: Guy L. Guthrie, Derek E. Williams
  • Patent number: 10572257
    Abstract: Technical solutions are described for issuing, by a load-store unit (LSU), a plurality of instructions from an out-of-order (OoO) window. The issuing includes, in response to determining a first effective address (EA) being used by a first instruction, the first EA corresponding to a first real address (RA), creating a first effective real translation (ERT) table entry in an ERT table, the ERT entry mapping the first EA to the first RA. Further, in response to determining an EA synonym used by a second instruction, the execution includes replacing the first ERT entry with a second ERT entry, wherein the second ERT entry maps the second EA with the first RA, and creating an ERT eviction (ERTE) table entry in an ERTE table, wherein the ERTE entry maps the first RA to the first EA, the ERTE table entry maintains the relationship between the first EA and the first RA.
    Type: Grant
    Filed: November 29, 2017
    Date of Patent: February 25, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bryan Lloyd, Balaram Sinharoy
  • Patent number: 10558418
    Abstract: A technique for implementing synchronization monitors on an accelerated processing device (“APD”) is provided. Work on an APD includes workgroups that include one or more wavefronts. All wavefronts of a workgroup execute on a single compute unit. A monitor is a synchronization construct that allows workgroups to stall until a particular condition is met. Responsive to all wavefronts of a workgroup executing a wait instruction, the monitor coordinator records the workgroup in an “entry queue.” The workgroup begins saving its state to a general APD memory and, when such saving is complete, the monitor coordinator moves the workgroup to a “condition queue.” When the condition specified by the wait instruction is met, the monitor coordinator moves the workgroup to a “ready queue,” and, when sufficient resources are available on a compute unit, the APD schedules the ready workgroup for execution on a compute unit.
    Type: Grant
    Filed: July 27, 2017
    Date of Patent: February 11, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Alexandru Dutu, Bradford M. Beckmann
  • Patent number: 10546075
    Abstract: A system and method for a synthetic trace model includes providing a first system model, the first system model comprising a plurality of subsystem models, each of the plurality of subsystem models having a trace format, generating a first plurality of traces from an overall pool of trace instructions, each of the first plurality of traces generated for respective ones of the plurality of subsystem models, according to the trace format of the subsystem model, executing the traces on each of the subsystem models, and evaluating execution characteristics for each trace executed on the first system model.
    Type: Grant
    Filed: April 27, 2016
    Date of Patent: January 28, 2020
    Assignee: FUTUREWEI TECHNOLOGIES, INC.
    Inventors: YwhPyng Harn, Fa Yin, Xiaotao Chen
  • Patent number: 10545765
    Abstract: Embodiments include systems, methods, and computer program products for using a multi-level history buffer (HB) for a speculative transaction. One method includes after dispatching a first instruction indicating start of the speculative transaction, marking one or more register file (RF) entries as pre-transaction memory (PTM), and after dispatching a second instruction targeting one of the marked RF entries, moving data from the marked RF entry to a first level HB entry and marking the first level HB entry as PTM. The method also includes upon detecting a write back to the first level HB entry, moving data from the first level HB entry to a second level HB entry and marking the second level HB entry as PTM. The method further includes upon determining that the second level HB entry has been completed, moving data from the second level HB entry to a third level HB entry.
    Type: Grant
    Filed: May 17, 2017
    Date of Patent: January 28, 2020
    Assignee: International Business Machines Corporation
    Inventors: Brian D. Barrick, Steven J. Battle, Joshua W. Bowman, Hung Q. Le, Dung Q. Nguyen, David R. Terry, Albert J. Van Norstrand, Jr.
  • Patent number: 10540156
    Abstract: A computer generates a parallel program, based on an analysis of a single program that includes a plurality of tasks written for a single-core microcomputer, by parallelizing parallelizable tasks for a multi-core processor having multiple cores. The computer includes a macro task (MT) group extractor that analyzes, or finds, a commonly-accessed resource commonly accessed by the plurality of tasks, and extracts a plurality of MTs showing access to such commonly-accessed resource. Then, the computer uses an allocation restriction determiner to allocate the extracted plural MTs to the same core in the multi-core processor. By devising a parallelization method described above, an overhead in an execution time of the parallel program by the multi-core processor is reduced, and an in-vehicle device is enabled to execute each of the MTs in the program optimally.
    Type: Grant
    Filed: June 8, 2017
    Date of Patent: January 21, 2020
    Assignee: DENSO CORPORATION
    Inventor: Kenichi Mineda
  • Patent number: 10534616
    Abstract: Technical solutions are described for executing one or more out-of-order instructions by a load-store unit (LSU) by detecting a load-hit-load (LHL) case based only on effective addresses (EA). An example method includes, in response to receiving a first load instruction, creating an entry in a LHL table. Further, in response to receiving a second load instruction in the load reorder queue, and in response to the predetermined number of bits from a second EA used by the second load instruction matching the predetermined number of bits from the first EA, comparing the first EA and the second EA. Further, a first thread identifier for the first load instruction is compared with a second thread identifier for the second load instruction. In response to the first EA matching the second EA, and the first thread identifier matching the second thread identifier, the method includes flushing the first load instruction.
    Type: Grant
    Filed: October 6, 2017
    Date of Patent: January 14, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Christopher Gonzalez, Bryan Lloyd, Balaram Sinharoy
  • Patent number: 10528355
    Abstract: An apparatus has processing circuitry, register rename circuitry and control circuitry which selects one of first and second move handling techniques for handling a move instruction specifying a source logical register and a destination logical register. In the first technique, the register rename circuitry maps the destination logical register of the move to the same physical register as the source logical register. In the second technique, the processing circuitry writes a data value read from a physical register corresponding to the source logical register to a different physical register corresponding to the destination local register. The second technique is selected when the move instruction specifies the same source logical register as one of the source and destination logical registers as an earlier move instruction handled according to the first technique, and the register mapping used for that register when handling the earlier move instruction is still current.
    Type: Grant
    Filed: December 24, 2015
    Date of Patent: January 7, 2020
    Assignee: ARM Limited
    Inventors: Chris Abernathy, Florent Begon
  • Patent number: 10503512
    Abstract: Apparatus for data processing and a method of data processing are provided, according to which the processing circuitry of the apparatus can access a memory system and execute data processing instructions in one context of multiple contexts which it supports. When the processing circuitry executes a barrier instruction, the resulting access ordering constraint may be limited to being enforced for accesses which have been initiated by the processing circuitry when operating in an identified context, which may for example be the context in which the barrier instruction has been executed. This provides a separation between the operation of the processing circuitry in its multiple possible contexts and in particular avoids delays in the completion of the access ordering constraint, for example relating to accesses to high latency regions of memory, from affecting the timing sensitivities of other contexts.
    Type: Grant
    Filed: November 3, 2015
    Date of Patent: December 10, 2019
    Assignee: ARM Limited
    Inventors: Simon John Craske, Alexander Alfred Hornung, Max John Batley, Kauser Yakub Johar
  • Patent number: 10504270
    Abstract: Techniques are disclosed relating to synchronizing access to pixel resources. Examples of pixel resources include color attachments, a stencil buffer, and a depth buffer. In some embodiments, hardware registers are used to track status of assigned pixel resources and pixel wait and pixel release instruction are used to synchronize access to the pixel resources. In some embodiments, other accesses to the pixel resources may occur out of program order. Relative to tracking and ordering pass groups, this weak ordering and explicit synchronization may improve performance and reduce power consumption. Disclosed techniques may also facilitate coordination between fragment rendering threads and auxiliary mid-render compute tasks.
    Type: Grant
    Filed: December 22, 2016
    Date of Patent: December 10, 2019
    Assignee: Apple Inc.
    Inventors: Terence M. Potter, Richard W. Schreyer, James J. Ding, Alexander K. Kan, Michael Imbrogno
  • Patent number: 10489704
    Abstract: Aspects for supporting operation data of different bit widths in neural networks are described herein. The aspects may include a processing module that includes one or more processors. The processor may be capable of processing data of one or more respective bit-widths. Further, the aspects may include a determiner module configured to receive one or more instructions that include one or more operands and one or more width fields. The operands may correspond to one or more operand types and each of the width fields may indicate an operand bit-width of one operand type. The determiner module may be further configured to identify at least one operand bit-widths that is greater than each of the bit-widths. In addition, the aspects may include a processor combiner configured to designate a combination of two or more of the processors to process the operands.
    Type: Grant
    Filed: February 5, 2019
    Date of Patent: November 26, 2019
    Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED
    Inventors: Tianshi Chen, Qi Guo, Zidong Du
  • Patent number: 10489273
    Abstract: Reusing a related thread's cache during tracing. An embodiment includes executing a first thread at a processing unit while recording a trace to a first buffer. During execution, a context switch from the first thread to a second thread at the same processing unit is detected. Based on the context switch, it is determined that the second thread is related to the first thread, and that it is being traced to a separate second buffer. Based on this determination, a cache of the first thread is reused. The reuse includes recording a first identifier in the first buffer, and recording a second identifier in the second buffer. The first and second identifiers provide a linkage between the first buffer and the second buffer. Execution of the second thread is then initiated, while recording a trace to the second buffer, and without invalidating logging state of a cache.
    Type: Grant
    Filed: May 24, 2017
    Date of Patent: November 26, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventor: Jordi Mola
  • Patent number: 10467010
    Abstract: A method for performing memory disambiguation in an out-of-order microprocessor pipeline is disclosed. The method comprises storing a tag with a load operation, wherein the tag is an identification number representing a store instruction nearest to the load operation, wherein the store instruction is older with respect to the load operation and wherein the store has potential to result in a RAW violation in conjunction with the load operation. The method also comprises issuing the load operation from an instruction scheduling module. Further, the method comprises acquiring data for the load operation speculatively after the load operation has arrived at a load store queue module. Finally, the method comprises determining if an identification number associated with a last contiguous issued store with respect to the load operation is equal to or greater than the tag and gating a validation process for the load operation in response to the determination.
    Type: Grant
    Filed: March 13, 2014
    Date of Patent: November 5, 2019
    Assignee: Intel Corporation
    Inventors: Mohammad A. Abdallah, Mandeep Singh
  • Patent number: 10402263
    Abstract: A method for handling load faults in an out-of-order processor is described. The method includes detecting, by a memory ordering buffer of the out-of-order processor, a load fault corresponding to a load instruction that was executed out-of-order by the out-of-order processor; determining, by the memory ordering buffer, whether instant reclamation is available for resolving the load fault of the load instruction; and performing, in response to determining that instant reclamation is available for resolving the load fault of the load instruction, instant reclamation to re-fetch the load instruction for execution prior to attempting to retire the load instruction.
    Type: Grant
    Filed: December 4, 2017
    Date of Patent: September 3, 2019
    Assignee: Intel Corporation
    Inventors: Zeev Sperber, Stanislav Shwartsman, Jared W. Stark, IV, Lihu Rappoport, Igor Yanover, George Leifman
  • Patent number: 10402201
    Abstract: A method and apparatus for detecting potential memory conflicts in a parallel computing environment by executing two parallel program threads. The parallel program threads include special operands that are used by a processing core to identify memory addresses that have the potential for conflict. These memory addresses are combined into a composite access record for each thread. The composite access records are compared to each other in order to detect a potential memory conflict.
    Type: Grant
    Filed: March 9, 2017
    Date of Patent: September 3, 2019
    Inventors: Joel Kevin Jones, Ananth Jasty
  • Patent number: 10375038
    Abstract: Disclosed aspects relate to symmetric multiprocessing (SMP) management. A first SMP topology may be identified by a service processor firmware. The first SMP topology may indicate a first set of connection paths for a plurality of processor chips of a multi-node server. A second SMP topology may be identified by the service processor firmware. The second SMP topology may indicate a second set of connection paths for the plurality of processor chips of the multi-node server. The second SMP topology may differ from the first SMP topology. An error event related to the first SMP topology may be detected. A set of traffic may be routed using the second SMP topology. The set of traffic may be routed by the service processor firmware in response to detecting the error event related to the first SMP topology.
    Type: Grant
    Filed: November 30, 2016
    Date of Patent: August 6, 2019
    Assignee: International Business Machines Corporation
    Inventors: Deepak Kodihalli, Venkatesh Sainath, Dhruvaraj Subhashchandran
  • Patent number: 10360654
    Abstract: Embodiments described herein provide a graphics processor in which dependency tracking hardware is simplified via the use of compiler provided software scoreboard information. In one embodiment the shader compiler for shader programs is configured to encode software scoreboard information into each instruction. Dependencies can be evaluated by the shader compiler and provided as scoreboard information with each instruction. The hardware can then use the provided information when scheduling instructions. In one embodiment, a software scoreboard synchronization instruction is provided to facilitate software dependency handling within a shader program. Using software to facilitate software dependency handling and synchronization can simplify hardware design, reducing the area consumed by the hardware. In one embodiment, dependencies can be evaluated by the shader compiler instead of the GPU hardware.
    Type: Grant
    Filed: May 25, 2018
    Date of Patent: July 23, 2019
    Assignee: Intel Corporation
    Inventors: Subramaniam Maiyuran, Supratim Pal, Jorge E. Parra, Chandra S. Gurram, Ashwin J. Shivani, Ashutosh Garg, Brent A. Schwartz, Jorge F. Garcia Pabon, Darin M. Starkey, Shubh B. Shah, Guei-Yuan Lueh, Kaiyu Chen, Konrad Trifunovic, Buqi Cheng, Weiyu Chen
  • Patent number: 10360030
    Abstract: Embodiments of the present disclosure relate to processing a microprocessor instruction by receiving a microprocessor instruction for processing by a microprocessor, and processing the microprocessor instruction in a multi-cycle operation by acquiring a unit of data having a plurality of ordered bits, where the acquiring is performed by the microprocessor during a first clock cycle, and shifting the unit of data by a number of bits, where the shifting is performed by the microprocessor during a second clock cycle subsequent to the first clock cycle.
    Type: Grant
    Filed: December 20, 2017
    Date of Patent: July 23, 2019
    Assignee: International Business Machines Corporation
    Inventors: Eyal Naor, Martin Recktenwald, Christian Zoellin, Aaron Tsai
  • Patent number: 10353707
    Abstract: Embodiments of the present disclosure relate to processing a microprocessor instruction by receiving a microprocessor instruction for processing by a microprocessor, and processing the microprocessor instruction in a multi-cycle operation by acquiring a unit of data having a plurality of ordered bits, where the acquiring is performed by the microprocessor during a first clock cycle, and shifting the unit of data by a number of bits, where the shifting is performed by the microprocessor during a second clock cycle subsequent to the first clock cycle.
    Type: Grant
    Filed: July 12, 2017
    Date of Patent: July 16, 2019
    Assignee: International Business Machines Corporation
    Inventors: Eyal Naor, Martin Recktenwald, Christian Zoellin, Aaron Tsai
  • Patent number: 10318299
    Abstract: A read operation is initiated to obtain a wide input operand. Based on the initiating, a determination is made as to whether the wide input operand is available in a wide register or in two narrow registers. Based on determining the wide input operand is not available in the wide register, merging at least a portion of contents of the two narrow registers to obtain merged contents, writing the merged contents into the wide register, and continuing the read operation to obtain the wide input operand. Based on determining the wide input operand is available in the wide register, obtaining the wide input operand from the wide register.
    Type: Grant
    Filed: October 31, 2013
    Date of Patent: June 11, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Michael K. Gschwind
  • Patent number: 10296346
    Abstract: A method which includes, in a processor that processes instructions of program code, processing one or more of the instructions in a first segment of the instructions by a first hardware thread. Upon detecting that an instruction defined as a parallelization point has been fetched for the first thread, a second hardware thread is invoked to process at least one of the instructions in a second segment of the instructions, at least partially in parallel with processing of the instructions of the first segment by the first hardware thread, in accordance with a specification of register access that is indicative of data dependencies between the first and second segments.
    Type: Grant
    Filed: March 31, 2015
    Date of Patent: May 21, 2019
    Assignee: CENTIPEDE SEMI LTD.
    Inventors: Noam Mizrahi, Alberto Mandler, Shay Koren, Jonathan Friedmann
  • Patent number: 10296350
    Abstract: A method which includes, in a processor that processes instructions of program code, processing one or more of the instructions by a first hardware thread. Upon detecting that an instruction defined as a parallelization point has been fetched for the first thread, a second hardware thread is invoked to process at least one of the instructions at least partially in parallel with processing of the instructions by the first hardware thread.
    Type: Grant
    Filed: March 31, 2015
    Date of Patent: May 21, 2019
    Assignee: CENTIPEDE SEMI LTD.
    Inventors: Noam Mizrahi, Alberto Mandler, Shay Koren, Jonathan Friedmann