Instruction Issuing Patents (Class 712/214)
  • Patent number: 11150961
    Abstract: Methods, systems and apparatuses for graph processing are disclosed. One graph streaming processor includes a thread manager, wherein the thread manager is operative to dispatch operation of the plurality of threads of a plurality of thread processors before dependencies of the dependent threads have been resolved, maintain a scorecard of operation of the plurality of threads of the plurality of thread processors, and provide an indication to at least one of the plurality of thread processors when a dependency between the at least one of the plurality of threads that a request has or has not been satisfied. Further, a producer thread provides a response to the dependency when the dependency has been satisfied, and each of the plurality of thread processors is operative to provide processing updates to the thread manager, and provide queries to the thread manager upon reaching a dependency.
    Type: Grant
    Filed: February 8, 2019
    Date of Patent: October 19, 2021
    Assignee: Blaize, Inc.
    Inventors: Lokesh Agarwal, Sarvendra Govindammagari, Venkata Ganapathi Puppala, Satyaki Koneru
  • Patent number: 11144317
    Abstract: An AC parallelization circuit includes a transmitting circuit configured to transmit a stop signal to instruct a device for executing calculation in an iteration immediately preceding an iteration for which a concerned device is responsible to stop the calculation in loop-carried dependency calculation; and an estimating circuit configured to generate, as a result of executing the calculation in the preceding iteration, an estimated value to be provided to an arithmetic circuit when the transmitting circuit transmits the stop signal.
    Type: Grant
    Filed: August 20, 2020
    Date of Patent: October 12, 2021
    Assignee: FUJITSU LIMITED
    Inventor: Hisanao Akima
  • Patent number: 11119774
    Abstract: A system and/or method for processing information is disclosed that has at least one processor; a register file associated with the processor, the register file sliced into a plurality of STF blocks having a plurality of STF entries, and in an embodiment, each STF block is further partitioned into a plurality of sub-blocks, each sub-block having a different portion of the plurality of STF entries; and a plurality of execution units configured to read data from and write data to the register file, where the plurality of execution units are arranged in one or more execution slices. In one or more embodiments, the system is configured so that each execution slice has a plurality of STF blocks, and alternatively or additionally, each of the plurality of execution units in a single execution slice is assigned to write to one, and preferably only one, of the plurality of STF blocks.
    Type: Grant
    Filed: September 6, 2019
    Date of Patent: September 14, 2021
    Assignee: International Business Machines Corporation
    Inventors: Brian W. Thompto, Dung Q. Nguyen, Hung Q. Le, Sam Gat-Shang Chu
  • Patent number: 11112846
    Abstract: Embodiments of the present disclosure relate to detecting undervoltage conditions at a subcircuit. A power supply current of a first subcircuit is determined over a first number of previous clock cycles. A cross current flowing between the first subcircuit and a second subcircuit is determined over the first number of previous clock cycles. An estimated momentary supply voltage present at the first subcircuit is then determined based on the power supply current of the first subcircuit over the first number of previous clock cycles and the cross current flowing between the first subcircuit and the second subcircuit over the first number of previous clock cycles.
    Type: Grant
    Filed: December 19, 2018
    Date of Patent: September 7, 2021
    Assignee: International Business Machines Corporation
    Inventors: Thomas Strach, Preetham M. Lobo, Tobias Webel
  • Patent number: 11115964
    Abstract: A system and method of auto-detection of WLAN packets includes transmitting in a 60 GHz frequency band a wireless packet comprising a first header, a second header, a payload, and a training field, the first header carrying a plurality of bits, a logical value of a subset of the plurality of bits in the first header indicating the presence of the second header in the wireless packet.
    Type: Grant
    Filed: February 12, 2016
    Date of Patent: September 7, 2021
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Yan Xin, Osama Aboul-Magd, Jung Hoon Suh
  • Patent number: 11100390
    Abstract: A deep neural network (DNN) processor is configured to execute layer descriptors in layer descriptor lists. The descriptors define instructions for performing a forward pass of a DNN by the DNN processor. The layer descriptors can also be utilized to manage the flow of descriptors through the DNN module. For example, layer descriptors can define dependencies upon other descriptors. Descriptors defining a dependency will not execute until the descriptors upon which they are dependent have completed. Layer descriptors can also define a “fence,” or barrier, function that can be used to prevent the processing of upstream layer descriptors until the processing of all downstream layer descriptors is complete. The fence bit guarantees that there are no other layer descriptors in the DNN processing pipeline before the layer descriptor that has the fence to be asserted is processed.
    Type: Grant
    Filed: April 11, 2018
    Date of Patent: August 24, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Chad Balling McBride, Amol Ashok Ambardekar, Kent D. Cedola, George Petre, Larry Marvin Wall, Boris Bobrov
  • Patent number: 11093245
    Abstract: A computer system and a memory access technology are provided. In the computer system, when load/store instructions having a dependency relationship is processed, dependency information between a producer load/store instruction and a consumer load/store instruction can be obtained from a processor. A consumer load/store request is sent to a memory controller in the computer system based on the obtained dependency information, so that the memory controller can terminate a dependency relationship between load/store requests in the memory controller locally based on the dependency information in the received consumer load/store request, and execute the consumer load/store request.
    Type: Grant
    Filed: June 12, 2019
    Date of Patent: August 17, 2021
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Lei Fang, Xi Chen, Weiguang Cai
  • Patent number: 11086628
    Abstract: A system and method for load queue (LDQ) and store queue (STQ) entry allocations at address generation time that maintains age-order of instructions is described. In particular, writing LDQ and STQ entries are delayed until address generation time. This allows the load and store operations to dispatch, and younger operations (which may not be store and load operations) to also dispatch and execute their instructions. The address generation of the load or store operation is held at an address generation scheduler queue (AGSQ) until a load or store queue entry is available for the operation. The tracking of load queue entries or store queue entries is effectively being done in the AGSQ instead of at the decode engine. The LDQ and STQ depth is not visible from a decode engine's perspective, and increases the effective processing and queue depth.
    Type: Grant
    Filed: August 15, 2016
    Date of Patent: August 10, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventor: John M. King
  • Patent number: 11080055
    Abstract: Techniques are disclosed relating to arbitration among register file accesses. In some embodiments, an apparatus includes a register file configured to store operands for multiple client circuits and arbitration circuitry configured to select from among multiple received requests to access the register file. In some embodiments, the apparatus includes first interface circuitry configured to provide access requests from a first client circuit to the arbitration circuitry and supplemental interface circuitry configured to receive unsuccessful requests from the first client circuit and provide the received unsuccessful requests to the arbitration circuitry. The supplemental interface circuitry may provide additional catch-up bandwidth to clients that lose arbitration, which may result in fairness during bandwidth shortages.
    Type: Grant
    Filed: August 22, 2019
    Date of Patent: August 3, 2021
    Assignee: Apple Inc.
    Inventors: Robert D. Kenney, Terence M. Potter
  • Patent number: 11074079
    Abstract: A method of providing instructions to computer processing apparatus for improved event handling comprises the following. Instructions for execution on the computer processing apparatus are provided to an event processor generator. These instructions comprise a plurality of functional steps, a set of dependencies between the functional steps, and configuration data. The event processor generator creates instances of the functional steps from the instructions and represents the instances as directed acyclic graphs. The event processor generator identifies a plurality of event types and topologically sort is the directed acyclic graphs to determine a topologically ordered event path for each event type. The event processor generator then provides a revised set of instructions for execution on the computer processing apparatus in which original instructions have been replaced by instructions requiring each event type to be executed according to its topologically ordered event path.
    Type: Grant
    Filed: November 17, 2017
    Date of Patent: July 27, 2021
    Inventor: Greg Higgins
  • Patent number: 11036514
    Abstract: A method and apparatus for performing an indexed data dependency instruction wakeup is disclosed. A scheduler may issue one or more instruction operations from a number of entries therein, including a first instruction operation. In a second entry, a comparison operation may be performed between a dependency index and an index of the first instruction operation. A match between the index of the first instruction and the dependency index in the second entry indicates a dependency of the corresponding instruction on the first instruction, and further indicates that the first instruction operation has issued. The dependency may be determined based solely on the match between the dependency index and the index of the first instruction. Responsive to determining that the first instruction operation has issued in the second entry, an indication that a corresponding second instruction operation is ready to issue may be provided.
    Type: Grant
    Filed: August 23, 2016
    Date of Patent: June 15, 2021
    Assignee: Apple Inc.
    Inventors: Sean M. Reynolds, Gokul V. Ganesan
  • Patent number: 11023243
    Abstract: Latency-based instruction reservation clustering in a scheduler circuit in a processor is disclosed. The scheduler circuit includes a plurality of latency-based reservation circuits each having an assigned producer instruction cycle latency. Producer instructions with the same cycle latency can be clustered in the same latency-based reservation circuit. Thus, the number of reservation entries is distributed among the plurality of latency-based reservation circuits to avoid or reduce an increase in the number of scheduling path connections and complexity in each reservation circuit to avoid or reduce an increase in scheduling latency. The scheduling path connections are reduced for a given number of reservation entries over a non-clustered pick circuit, because signals (e.g., wake-up signals, pick-up signals) used for scheduling instructions in each latency-based reservation circuit do not have to have the same clock cycle latency so as to not impact performance.
    Type: Grant
    Filed: July 22, 2019
    Date of Patent: June 1, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yusuf Cagatay Tekmen, Shivam Priyadarshi, Rodney Wayne Smith
  • Patent number: 10996994
    Abstract: A plurality of ordered lists of dispatch queues corresponding to a plurality of processing entities are maintained, wherein each dispatch queue includes one or more task control blocks or is empty. A determination is made as to whether a primary dispatch queue of a processing entity is empty in an ordered list of dispatch queues for the processing entity. In response to determining that the primary dispatch queue of the processing entity is empty, a task control block is selected for processing by the processing entity from another dispatch queue of the ordered list of dispatch queues for the processing entity, wherein the another dispatch queue from which the task control block is selected meets a threshold criteria for the processing entity.
    Type: Grant
    Filed: February 22, 2019
    Date of Patent: May 4, 2021
    Assignee: International Business Machines Corporation
    Inventors: Seamus J. Burke, Trung N. Nguyen, Louis A. Rasor
  • Patent number: 10996954
    Abstract: By including a storing device that stores a plurality of memory access instructions decoded by a decoder and outputs the memory access instruction stored therein to a cache memory, a determiner that determines whether the storing device is afford to store the plurality of memory access instructions; and an inhibitor that inhibits, when the determiner determines that the storing device is not afford to store a first memory access instruction included in the plurality of memory access instructions, execution of a second memory access instruction being included in the plurality of memory access instructions and being subsequent to the first memory access instruction for a predetermined time period, regardless of a result of determination made on the second memory access instruction by the determiner, the calculation processing apparatus inhibits a switch of the order of a store instruction and a load instruction.
    Type: Grant
    Filed: October 7, 2019
    Date of Patent: May 4, 2021
    Assignee: FUJITSU LIMITED
    Inventors: Sota Sakashita, Yasunobu Akizuki
  • Patent number: 10990406
    Abstract: An instruction execution device includes a processor. The processor includes an instruction translator, a reorder buffer, an architecture register, and an execution unit. The instruction translator receives a macro-instruction and translates the macro-instruction into a first micro-instruction, a second micro-instruction and a third micro-instruction. The instruction translator marks the first micro-instruction and the second micro-instruction with the same atomic operation flag. The execution unit executes the first micro-instruction to generate a first execution result and to store the first execution result in a temporary register. The execution unit executes the second micro-instruction to generate a second execution result and to store the second execution result in the architecture register. The execution unit executes the third micro-instruction to read the first execution result from the temporary register and to store the first execution result in the architecture register.
    Type: Grant
    Filed: September 26, 2019
    Date of Patent: April 27, 2021
    Assignee: SHANGHAI ZHAOXIN SEMICONDUCTOR CO., LTD.
    Inventors: Penghao Zou, Zhi Zhang
  • Patent number: 10983800
    Abstract: A processor core having multiple parallel instruction execution slices and coupled to multiple dispatch queues provides flexible and efficient use of internal resources. The configuration of the execution slices is selectable so that capabilities of the processor core can be adjusted according to execution requirements for the instruction streams. A plurality of load-store slices coupled to the execution slices provides access to a plurality of cache slices that partition the lowest level of cache memory among the load-store slices.
    Type: Grant
    Filed: June 6, 2018
    Date of Patent: April 20, 2021
    Assignee: International Business Machines Corporation
    Inventors: Lee Evan Eisen, Hung Qui Le, Jentje Leenstra, Jose Eduardo Moreira, Bruce Joseph Ronchetti, Brian William Thompto, Albert James Van Norstrand, Jr.
  • Patent number: 10977762
    Abstract: One embodiment provides for a general-purpose graphics processing unit multiple processing elements having a single instruction, multiple thread (SIMT) architecture configured to perform hardware multithreading during execution of a plurality of thread groups. The plurality of thread groups can include one or more sub-groups of threads, with a first sub-group is associated with a first thread group and a second sub-group associated with a second thread group. Data dependencies can be used to trigger the launch of threads, such that when a first thread in the second sub-group has a data dependency upon a first thread in the first sub-group, circuitry in the general-purpose graphics processing unit can launch at least the first thread in the second sub-group to execute in response to satisfaction of the data dependency.
    Type: Grant
    Filed: March 30, 2020
    Date of Patent: April 13, 2021
    Assignee: Intel Corporation
    Inventors: Balaji Vembu, Altug Koker, Joydeep Ray
  • Patent number: 10936402
    Abstract: Aspects include copying a plurality of input data into a buffer of a processor configured to perform speculatively executing pipelined streaming of the input data. A bit counter maintains a difference in a number of input bits from the input data entering a pipeline of the processor and a number of the input bits consumed in the pipeline. The pipeline is flushed based on detecting an error. A portion of the input data is recirculated from the buffer into the pipeline based on a value of the bit counter.
    Type: Grant
    Filed: November 26, 2018
    Date of Patent: March 2, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bulent Abali, Bartholomew Blaner, John J. Reilly
  • Patent number: 10915328
    Abstract: An apparatus and method for offloading iterative, parallel work to a data parallel cluster. For example, one embodiment of a processor comprises: a host processor to execute a primary thread; a data parallel cluster coupled to the host processor over a high speed interconnect, the data parallel cluster comprising a plurality of execution lanes to perform parallel execution of one or more secondary threads related to the primary thread; and a data parallel cluster controller integral to the host processor to offload processing of the one or more secondary threads to the data parallel cluster in response to one of the cores executing a parallel processing call instruction from the primary thread.
    Type: Grant
    Filed: December 14, 2018
    Date of Patent: February 9, 2021
    Assignee: Intel Corporation
    Inventors: Jonathan Pearce, David Sheffield, Srikanth Srinivasan, Jeffrey Cook, Deborah Marr
  • Patent number: 10915317
    Abstract: The present disclosure relates to a computing device with a multiple pipeline architecture. The multiple pipeline architecture comprises a first and second pipeline for which are concurrently running, where the first pipeline runs at least one cycle ahead of the second pipeline. Special number detection is utilized on the first pipeline, where a special number is a numerical value which yields a predictable result. Upon the detection of a special number, a computation is optimized.
    Type: Grant
    Filed: December 10, 2018
    Date of Patent: February 9, 2021
    Assignee: ALIBABA GROUP HOLDING LIMITED
    Inventors: Liang Han, Xiaowei Jiang
  • Patent number: 10908880
    Abstract: An integrated circuit for processing audio signals from a microphone assembly, combinations thereof and methods therefor, including a multi-issue processor configured to execute multiple instructions concurrently and connectable to a memory with a plurality of locations each represented by a corresponding index. Bit-reversal is performed on a sequence of audio data bits stored in memory by concurrently performing a load or store operation related to a first index and determining whether to perform a load operation for a second index.
    Type: Grant
    Filed: October 18, 2019
    Date of Patent: February 2, 2021
    Assignee: Knowles Electronics, LLC
    Inventor: Leonardo Rub
  • Patent number: 10884797
    Abstract: A method of scheduling instructions within a parallel processing unit is described. The method comprises decoding, in an instruction decoder, an instruction in a scheduled task in an active state, and checking, by an instruction controller, if an ALU targeted by the decoded instruction is a primary instruction pipeline. If the targeted ALU is a primary instruction pipeline, a list associated with the primary instruction pipeline is checked to determine whether the scheduled task is already included in the list. If the scheduled task is already included in the list, the decoded instruction is sent to the primary instruction pipeline.
    Type: Grant
    Filed: June 18, 2018
    Date of Patent: January 5, 2021
    Assignee: Imagination Technologies Limited
    Inventors: Simon Nield, Yoong-Chert Foo, Adam de Grasse, Luca Iuliano
  • Patent number: 10884753
    Abstract: Aspects include monitoring a number of instructions of a first type dispatched to a first shared port of an issue queue of a processor and determining whether the number of instructions of the first type dispatched to the first shared port exceeds a port selection threshold. An instruction of a third type is dispatched to a second shared port of the issue queue associated with a plurality of instructions of a second type based on determining that the number of instructions of the first type dispatched to the first shared port exceeds the port selection threshold. The instruction of the third type is dispatched to the first shared port of the issue queue associated with a plurality of instructions of the first type based on determining that the number of instructions of the first type dispatched to the first shared port does not exceed the port selection threshold.
    Type: Grant
    Filed: November 30, 2017
    Date of Patent: January 5, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Balaram Sinharoy, Joel A. Silberman, Brian W. Thompto
  • Patent number: 10860370
    Abstract: A method of synchronizing a group of scheduled tasks within a parallel processing unit into a known state is described. The method uses a synchronization instruction in a scheduled task which triggers, in response to decoding of the instruction, an instruction decoder to place the scheduled task into a non-active state and forward the decoded synchronization instruction to an atomic ALU for execution. When the atomic ALU executes the decoded synchronization instruction, the atomic ALU performs an operation and check on data assigned to the group ID of the scheduled task and if the check is passed, all scheduled tasks having the particular group ID are removed from the non-active state.
    Type: Grant
    Filed: June 18, 2018
    Date of Patent: December 8, 2020
    Assignee: Imagination Technologies Limited
    Inventors: Ollie Mower, Yoong-Chert Foo
  • Patent number: 10846095
    Abstract: A system and method for a virtual load queue is described. Load micro-operations are processed through an instruction pipeline without requiring an entry in a load queue (LDQ). An address generation scheduler queue (AGSQ) entry is allocated to the load micro-operation and a LDQ entry is not allocated to the load micro-operation. The LDQ entries are reserved for the N oldest load micro-operations, where N is the depth of the LDQ. Deallocation of the AGSQ entry is done if the load micro-operation is one of the N oldest load micro-operations, or upon successful completion of the load micro-operation. Deallocation of the AGSQ entry is not done if the load micro-operation gets a bad status and is not one of the N oldest micro-operations. Consequently, the AGSQ acts as a virtual queue for the LDQ and mitigates the limiting effect of the LDQ depth.
    Type: Grant
    Filed: November 28, 2017
    Date of Patent: November 24, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventor: John M. King
  • Patent number: 10838883
    Abstract: An arbiter that performs accelerated arbitration by approximating relative ages including a memory, blur logic, and grant logic. Multiple entries arbitrate for one or more resources. The memory stores age values each providing a relative age between each pair of entries, and further stores blurred age values. The entries are divided into subsets in which each entry belongs to only one subset. The blur logic determines each blurred age value to indicate a relative age between an entry of a first subset and an entry of a different subset for each pair of subsets. The grant logic grants access by an entry to a resource based on relative age using corresponding age values when comparing relative age between entries within a common subset, and using corresponding blurred age values when comparing relative age between entries in different subsets. Each blurred age value represents multiple age values to simplify arbitration.
    Type: Grant
    Filed: August 19, 2016
    Date of Patent: November 17, 2020
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.
    Inventor: Nikhil A. Patil
  • Patent number: 10831492
    Abstract: Systems, methods, and computer program products are disclosed that control issuing branch instructions in a simultaneous multi-threading (SMT) system. An embodiment system includes an SMT processor circuit that receives, from one of a plurality of threads, a branch instruction having a favor bit. The SMT processor circuit schedules the branch instruction to issue, relative to branch instructions received from other threads in the plurality of threads, based on the favor bit. When the favor bit has a first value, the branch instruction is scheduled to have a higher priority to issue before the branch instructions received from other threads in the plurality of threads. When the favor bit has a second value, the branch instruction is scheduled to issue based an age of the branch instruction relative to respective ages of the branch instructions received from other threads in the plurality of threads.
    Type: Grant
    Filed: July 5, 2018
    Date of Patent: November 10, 2020
    Assignee: International Business Machines Corporation
    Inventors: Salma Ayub, Glenn O. Kincaid, Christopher M. Mueller, Dung Q. Nguyen, Eula Faye Abalos Tolentino, Albert J. Van Norstrand, Jr., Kenneth L. Ward
  • Patent number: 10810343
    Abstract: A language disclosed herein includes a loop construct that maps to a circuit implementation. The circuit implementation may be used to design or program a synchronous digital circuit. The circuit implementation includes a hardware pipeline that implements a body of a loop and a condition associated with the loop. The circuit implementation also includes the hardware first-in-first-out (FIFO) queues that marshal threads (i.e. collections of local variables) into, around, and out of the hardware pipeline. A pipeline policy circuit limits a number of threads allowed within the hardware pipeline to a capacity of the hardware FIFO queues.
    Type: Grant
    Filed: January 14, 2019
    Date of Patent: October 20, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Blake D. Pelton, Adrian Michael Caulfield
  • Patent number: 10713049
    Abstract: A processor including a stunt box with an intermediate storage, including a plurality of registers, configured to store a plurality of execution pipe results as a plurality of intermediate results; a storage, communicatively coupled to the intermediate storage, configured to store a plurality of storage results which may include one or more of the plurality of intermediate results; and an arbiter, communicatively coupled to the intermediate storage and the storage, configured to receive the plurality of execution pipe results, the plurality of intermediate results, and the plurality of storage results and to select an output to retire from of the plurality of results, the plurality of intermediate results, and the plurality of storage results.
    Type: Grant
    Filed: October 31, 2014
    Date of Patent: July 14, 2020
    Assignee: AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE. LIMITED
    Inventors: Sophie Wilson, Tariq Kurd
  • Patent number: 10678549
    Abstract: Examples of techniques for executing instructions out of order are described herein. An example computer-implemented method includes receiving, via a processor, a plurality of instructions to be executed. The method includes sending, via the processor, an instruction to a minimal dependency queue in response to detecting the instruction includes a minimally dependent instruction. The method also includes selecting, via the processor, an instruction from a set of instructions that are eligible to be executed based on a scheme. The method further includes executing, via the processor, the instruction.
    Type: Grant
    Filed: November 28, 2017
    Date of Patent: June 9, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Avraham Ayzenfeld, Eyal Naor, Amir Turi
  • Patent number: 10649779
    Abstract: Techniques disclosed herein describe a variable latency pipe for interleaving instruction tags in a processor. According to one embodiment presented herein, an instruction tag is associated with an instruction upon issue of the instruction from the issue queue. One of a plurality of positions in the latency pipe is determined. The pipe stores one or more instruction tags, each associated with a respective instruction. The pipe also stores the instruction tags in a respective position based on the latency of each respective instruction. The instruction tag is stored at the determined position in the pipe.
    Type: Grant
    Filed: March 17, 2016
    Date of Patent: May 12, 2020
    Assignee: International Business Machines Corporation
    Inventors: Salma Ayub, Josh Bowman, Sundeep Chadha, Dhivya Jeganathan, Cliff Kucharski, Dung Q. Nguyen
  • Patent number: 10649780
    Abstract: A data processing apparatus and method are provided for executing a stream of instructions out-of-order with respect to original program order. At least some of the instructions in the stream identify one or more architectural registers from a set of architectural registers. The apparatus comprises a plurality of out-of-order components configured to manage execution of a first subset of instructions out-of-order, the plurality of out-of-order components being configured to remove false dependencies between instructions in the first subset. The plurality of out-of-order components include a first issue queue into which the instructions in the first subset are buffered prior to execution. A second issue queue is used to buffer a second subset of instructions prior to execution, the second subset of instructions being constrained to execute in order.
    Type: Grant
    Filed: March 18, 2015
    Date of Patent: May 12, 2020
    Assignee: The Regents of the University of Michigan
    Inventors: Faissal Mohamad Sleiman, Thomas Friedrich Wenisch
  • Patent number: 10635335
    Abstract: A storage system and method for adaptive scheduling of background operations are provided. In one embodiment, after a storage system completes a host operation in the memory, the storage system remains in a high power mode for a period of time, after which the storage system enters a low-power mode. The storage system estimates whether there will be enough time to perform a background operation in the memory during the period of time without the background operation being interrupted by another host operation. In response to estimating that there will be enough time to perform the background operation in the memory without the background operation being interrupted by another host operation, the storage system performs the background operation in the memory.
    Type: Grant
    Filed: June 21, 2018
    Date of Patent: April 28, 2020
    Assignee: Western Digital Technologies, Inc.
    Inventors: Yuval Grossman, Alexander Bazarsky, Tomer Eliash
  • Patent number: 10623281
    Abstract: One or more dynamically occurring events in a distributed data streaming system are monitored. The occurrence of at least one of the one or more dynamically occurring events is evaluated. A checkpoint operation is initiated in the distributed data streaming system based on the evaluation of the occurrence of the at least one dynamically occurring event.
    Type: Grant
    Filed: April 18, 2017
    Date of Patent: April 14, 2020
    Assignee: EMC IP Holding Company LLC
    Inventors: Junping Zhao, Kevin Xu
  • Patent number: 10606603
    Abstract: Embodiments provide methods and apparatus for facilitating a memory mis-speculation recovery. The method includes detecting a speculative load and a set of speculative load dependent instructions and creating a Load Recovery Metadata (LRM) including an information for instruction re-execution for the speculative load. The method includes creating a set of Dependent Recovery Metadata (DRM). Each DRM of the set of DRM corresponds to each speculative load dependent instruction. A DRM includes an information for instruction re-execution for a speculative load dependent instruction. The method includes creating a Load Dependency Data (LDD) including information of the LRM and information of the set of DRM. The method includes detecting the speculative load as a mis-speculated load. The method includes re-executing the mis-speculated load and the set of speculative load dependent instructions. The re-execution includes utilizing the LDD to fetch the information of the LRM and the information of the set of DRM.
    Type: Grant
    Filed: April 8, 2019
    Date of Patent: March 31, 2020
    Inventor: Ye Tao
  • Patent number: 10585782
    Abstract: A debugging capability that enables the efficient debugging of code that has prefixes, referred to herein as prefixed code. To debug application code, in which the application code includes a prefixed instruction to be modified by a prefix, a trap is provided. The trap is configured to report a presence of the prefix, but to otherwise perform the trap functions absent the prefix; i.e., the prefix is otherwise ignored in the processing of the trap.
    Type: Grant
    Filed: February 26, 2019
    Date of Patent: March 10, 2020
    Assignee: International Business Machines Corporation
    Inventor: Michael K. Gschwind
  • Patent number: 10554386
    Abstract: A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers.
    Type: Grant
    Filed: August 29, 2013
    Date of Patent: February 4, 2020
    Assignee: Intel Corporation
    Inventors: Shay Gueron, Wajdi K. Feghali, Vinodh Gopal, Raghunandan Makaram, Martin G. Dixon, Srinivas Chennupaty, Michael E. Kounavis
  • Patent number: 10521208
    Abstract: A mechanism for generating optimized native code for a program having dynamic behavior uses a static analysis of the program to predict the likelihood that different elements of the program are likely to be used when the program executes. The static analysis is performed prior to execution of the program and marks certain elements of the program with confidence indicators that classify the elements with either a high level of confidence or a low level of confidence. The confidence indicators are then used by an ahead-of-time native compiler to generate native code and to optimize the code for faster execution and/or a smaller-sized native code.
    Type: Grant
    Filed: June 23, 2017
    Date of Patent: December 31, 2019
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC.
    Inventors: Morgan Asher Brown, David Charles Wrighton, Mei-Chin Tsai, Shah Mohammad Faizur Rahman, Yi Zhang, Ian M. Bearman, Erdembilegt Janchivdorj, David Adam Hartglass, David Mitford Gillies
  • Patent number: 10496449
    Abstract: A computer-implemented method, computerized apparatus and computer program product for verification of atomic memory operations are disclosed. The method comprising: independently generating for each of a plurality of threads at least one instruction for performing an atomic memory operation of a predetermined type on an allocated shared memory location accessed by the plurality of threads; and, determining an evaluation function over arguments comprising values operated on or obtained in performing the atomic memory operation of the predetermined type on the allocated shared memory location by each of the plurality of threads; wherein the evaluation function is determined based on the atomic memory operation of the predetermined type such that a result thereof is not effected by an order in which each of the plurality of threads performs the atomic memory operation of the predetermined type on the allocated shared memory location.
    Type: Grant
    Filed: June 11, 2018
    Date of Patent: December 3, 2019
    Assignee: International Business Machines Corporation
    Inventors: Tom Har'el Kolan, Hillel Mendelson, Vitali Sokhin
  • Patent number: 10481662
    Abstract: A semiconductor circuit including a first subcircuit, at least a second subcircuit, and power management circuitry. The power management circuitry is operable for estimating a metric indicative of a momentary supply voltage present at the first subcircuit based on a power supply current of the first subcircuit and a cross current flowing between the first subcircuit and the second subcircuit.
    Type: Grant
    Filed: November 21, 2016
    Date of Patent: November 19, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Preetham M. Lobo, Thomas Strach, Tobias Webel
  • Patent number: 10467118
    Abstract: Techniques and apparatus for performance analysis of a program are described. In one embodiment, for example, an apparatus may include at least one memory, and logic, at least a portion of comprised in hardware coupled to the at least one memory, to access a program for performance analysis, the program comprising at least one producer instruction and at least one consumer instruction for the at least one producer instruction, and generate an analysis program based on the program, the analysis program comprising a stall time instruction set to determine a stall time of the at least one producer instruction, the stall time instruction set comprising a first time stamp instruction immediately preceding a consumer instruction, a second time stamp instruction immediately following the consumer instruction, and a stall time instruction to determine the stall time as the difference between the second time stamp and the first time stamp. Other embodiments are described and claimed.
    Type: Grant
    Filed: September 28, 2017
    Date of Patent: November 5, 2019
    Assignee: INTEL CORPORATION
    Inventors: Konstantin Levit-Gurevich, Michael Berezalsky, Noam Itzhaki, Arik Narkis
  • Patent number: 10445100
    Abstract: Methods and apparatus for transmitting data between execution slices of a multi-slice processor including receiving, by an execution slice, a broadcast message comprising an instruction tag (ITAG) for a producer instruction, a latency, and a source identifier; determining that an issue queue in the execution slice comprises an ITAG for a consumer instruction, wherein the consumer instruction depends on result data from the producer instruction; calculating a cycle countdown using the latency and the source identifier; determining that the cycle countdown has expired; and in response to determining that the cycle countdown has expired, reading the result data from the producer instruction.
    Type: Grant
    Filed: June 9, 2016
    Date of Patent: October 15, 2019
    Assignee: International Business Machines Corporation
    Inventors: Salma Ayub, Joshua W. Bowman, Jeffrey C. Brownscheidle, Sundeep Chadha, Dhivya Jeganathan, Dung Q. Nguyen, Salim A. Shah, Brian W. Thompto
  • Patent number: 10437756
    Abstract: Operation of a multi-slice processor implementing datapath steering, where the multi-slice processor includes a plurality of execution slices. Operation of such a multi-slice processor includes: identifying, from a set of instructions, a second instruction that is dependent upon a first instruction in the set of instructions; and responsive to the second instruction being dependent upon the first instruction in the set of instructions, issuing each of the instructions in the set of instructions to a particular set of execution slices configured with bypass logic between execution slices that reduces execution latencies between dependent instructions.
    Type: Grant
    Filed: July 27, 2016
    Date of Patent: October 8, 2019
    Assignee: International Business Machines Corporation
    Inventors: Steven R. Carlough, Kurt A. Feiste, Brian W. Thompto, Phillip G. Williams
  • Patent number: 10423330
    Abstract: Data collection is facilitated by a multi-threaded processor. One thread of the processor obtains data placed in a buffer by another thread of the processor. The thread placing the data in the buffer is an execution thread executing a customer application and the one thread obtaining the data from the buffer is an assist thread. The assist thread stores the data obtained from the buffer in a selected location, such as a cache, main memory, a measurement control block, a persistent storage device or a network.
    Type: Grant
    Filed: July 29, 2015
    Date of Patent: September 24, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Christine Axnix, Ute Gaertner, Jakob C. Lang, Angel Nunez Mencias
  • Patent number: 10417152
    Abstract: Operation of a multi-slice processor implementing datapath steering, where the multi-slice processor includes a plurality of execution slices. Operation of such a multi-slice processor includes: identifying, from a set of instructions, a second instruction that is dependent upon a first instruction in the set of instructions; and responsive to the second instruction being dependent upon the first instruction in the set of instructions, issuing each of the instructions in the set of instructions to a particular set of execution slices configured with bypass logic between execution slices that reduces execution latencies between dependent instructions.
    Type: Grant
    Filed: June 3, 2016
    Date of Patent: September 17, 2019
    Assignee: International Business Machines Corporation
    Inventors: Steven R. Carlough, Kurt A. Feiste, Brian W. Thompto, Phillip G. Williams
  • Patent number: 10365990
    Abstract: A debugging capability that enables the efficient debugging of code that has prefixes, referred to herein as prefixed code. To debug application code, in which the application code includes a prefixed instruction to be modified by a prefix, a trap is provided. The trap is configured to report a presence of the prefix, but to otherwise perform the trap functions absent the prefix; i.e., the prefix is otherwise ignored in the processing of the trap.
    Type: Grant
    Filed: December 14, 2017
    Date of Patent: July 30, 2019
    Assignee: International Business Machines Corporation
    Inventor: Michael K. Gschwind
  • Patent number: 10360033
    Abstract: A Conditional Transaction End (CTEND) instruction is provided that allows a program executing in a nonconstrained transactional execution mode to inspect a storage location that is modified by either another central processing unit or the Input/Output subsystem. Based on the inspected data, transactional execution may be ended or aborted, or the decision to end/abort may be delayed, e.g., until a predefined event occurs. For instance, when the instruction executes, the processor is in a nonconstrained transaction execution mode, and the transaction nesting depth is one at the beginning of the instruction, a second operand of the instruction is inspected, and based on the inspected data, transaction execution may be ended or aborted, or the decision to end/abort may be delayed, e.g., until a predefined event occurs, such as the value of the second operand becomes a prespecified value or a time interval is exceeded.
    Type: Grant
    Filed: June 26, 2018
    Date of Patent: July 23, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Dan F. Greiner, Christian Jacobi, Marcel Mitran, Donald W. Schmidt, Timothy J. Slegel
  • Patent number: 10332232
    Abstract: Techniques to dispatch threads of a graphics kernel for execution to increase the interval between dependent threads and the associated are disclosed. The dispatch interval may be increased by dispatching associated threads, followed by threads without any dependencies, followed by threads dependent on the earlier dispatched associated threads. As such, the interval between dependent threads and their associated threads can be increased, leading to increased parallelism.
    Type: Grant
    Filed: November 20, 2017
    Date of Patent: June 25, 2019
    Assignee: INTEL CORPORATION
    Inventors: Julia A. Gould, Haihua Wu
  • Patent number: 10331445
    Abstract: A processor circuit is provided that includes an input terminal and an output terminal, a plurality of vector processor operation circuits, a selector circuit coupled to the input terminal, the output terminal, and each of the vector processor operation circuits, and a scheduler circuit adapted to control the selector circuit to configure a vector processing pipeline comprising zero, one or more of the vector processor operation circuits in any order between the input terminal and the output terminal.
    Type: Grant
    Filed: May 24, 2017
    Date of Patent: June 25, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jeremy Halden Fowers, Ming Gang Liu, Kalin Ovtcharov, Steven Karl Reinhardt, Eric Sen Chung
  • Patent number: 10318303
    Abstract: A method and apparatus for performing branch prediction is disclosed. A branch predictor includes a history buffer configured to store a branch history table indicative of a history of a plurality of previously fetched branch instructions. The branch predictor also includes a branch target cache (BTC) configured to store branch target addresses for fetch addresses that have been identified as including branch instructions but have not yet been predicted. A hash circuit is configured to form a hash of a fetch address, history information received from the history buffer, and hit information received from the BTC, wherein the fetch address includes a branch instruction. A branch prediction unit (BPU) configured to generate a branch prediction for the branch instruction included in the fetch address based on the hash formed from the fetch address, history information, and BTC hit information.
    Type: Grant
    Filed: March 28, 2017
    Date of Patent: June 11, 2019
    Assignee: Oracle International Corporation
    Inventors: Manish Shah, Jared Smolens