Simultaneous Issuance Of Multiple Instructions Patents (Class 712/215)
  • Patent number: 10324724
    Abstract: Methods and apparatuses relating to a fusion manager to fuse instructions are described. In one embodiment, a hardware processor includes a hardware binary translator to translate an instruction stream into a translated instruction stream, a hardware fusion manager to fuse multiple instructions of the translated instruction stream into a single fused instruction, a hardware decode unit to decode the single fused instruction into a decoded, single fused instruction, and a hardware execution unit to execute the decoded, single fused instruction.
    Type: Grant
    Filed: December 16, 2015
    Date of Patent: June 18, 2019
    Assignee: Intel Corporation
    Inventors: Patrick P. Lai, Tyler N. Sondag, Sebastian Winkel, Polychronis Xekalakis, Ethan Schuchman, Jayesh Iyer
  • Patent number: 10282207
    Abstract: Operation of a multi-slice processor that includes execution slices and load/store slices coupled via a results bus includes: receiving, by an execution slice, a producer instruction, including: storing, in an entry of an issue queue, the producer instruction; and storing, in a register, an issue queue entry identifier representing the entry of the issue queue in which the producer instruction is stored; receiving, by the execution slice, a source instruction, the source instruction dependent upon the result of the producer instruction, including: storing, in another entry of the issue queue, the source instruction and the issue queue entry identifier of the producer instruction; determining in dependence upon the issue queue entry identifier of the producer instruction that the producer instruction has issued from the issue queue; and responsive to the determination that the producer instruction has issued from the issue queue, issuing the source instruction from the issue queue.
    Type: Grant
    Filed: February 18, 2016
    Date of Patent: May 7, 2019
    Assignee: International Business Machines Corporation
    Inventors: Brian D. Barrick, Sundeep Chadha, Michael J. Genden, Jerry Y. Lu, Dung Q. Nguyen, Nasrin Sultana, David R. Terry, David S. Walder
  • Patent number: 10268482
    Abstract: Operation of a multi-slice processor that includes execution slices and load/store slices coupled via a results bus includes: receiving, by an execution slice, a producer instruction, including: storing, in an entry of an issue queue, the producer instruction; and storing, in a register, an issue queue entry identifier representing the entry of the issue queue in which the producer instruction is stored; receiving, by the execution slice, a source instruction, the source instruction dependent upon the result of the producer instruction, including: storing, in another entry of the issue queue, the source instruction and the issue queue entry identifier of the producer instruction; determining in dependence upon the issue queue entry identifier of the producer instruction that the producer instruction has issued from the issue queue; and responsive to the determination that the producer instruction has issued from the issue queue, issuing the source instruction from the issue queue.
    Type: Grant
    Filed: December 15, 2015
    Date of Patent: April 23, 2019
    Assignee: International Business Machines Corporation
    Inventors: Brian D. Barrick, Sundeep Chadha, Michael J. Genden, Jerry Y. Lu, Dung Q. Nguyen, Nasrin Sultana, David R. Terry, David S. Walder
  • Patent number: 10269088
    Abstract: A mechanism is described for facilitating thread execution arbitration for thread scheduling relating to graphics processors at computing devices. A method of embodiments, as described herein, includes assigning priority levels to threads based on stall signals communicated from the one or more shared function units to one or more execution units of a processor including a graphics processor, and selecting a first thread to be scheduled and a second thread to be ignored based on the stall signals.
    Type: Grant
    Filed: April 21, 2017
    Date of Patent: April 23, 2019
    Assignee: INTEL CORPORATION
    Inventors: Joydeep Ray, Abhishek R. Appu, Subramaniam M. Maiyuran, Eric J. Hoekstra, Prasoonkumar Surti, Balaji Vembu, Altug Koker
  • Patent number: 10248421
    Abstract: Operation of a multi-slice processor that includes execution slices and load/store slices coupled via a results bus, including: for a target instruction targeting a logical register, determining whether an entry in a general purpose register representing the logical register is pending a flush; if the entry in the general purpose register representing the logical register is pending a flush: cancelling the flush in the entry of the general purpose register; storing the target instruction in the entry of the general purpose register representing the logical register, and if an entry in a history buffer targeting the logical register is pending a restore, cancelling the restore for the entry of the history buffer.
    Type: Grant
    Filed: February 16, 2016
    Date of Patent: April 2, 2019
    Assignee: International Business Machines Corporation
    Inventors: Salma Ayub, Brian D. Barrick, Joshua W. Bowman, Sundeep Chadha, Cliff Kucharski, Dung Q. Nguyen, David R. Terry, Jing Zhang
  • Patent number: 10241557
    Abstract: A processor includes a mechanism for disabling a memory array of a branch prediction unit. The processor may include a next fetch prediction unit that may include a number of entries. Each entry may correspond to a next instruction fetch group and may store an indication of whether or not the corresponding the next fetch group includes a conditional branch instruction. In response to an indication that the next fetch group does not include a conditional branch instruction, the fetch prediction unit may be configured to disable, in a next instruction execution cycle, the memory array of the branch prediction unit.
    Type: Grant
    Filed: December 12, 2013
    Date of Patent: March 26, 2019
    Assignee: Apple Inc.
    Inventors: Conrado Blasco, Ronald P Hall, Ramesh B Gunna, Ian D Kountanis, Shyam Sundar, André Seznec
  • Patent number: 10241790
    Abstract: Operation of a multi-slice processor that includes execution slices and load/store slices coupled via a results bus, including: for a target instruction targeting a logical register, determining whether an entry in a general purpose register representing the logical register is pending a flush; if the entry in the general purpose register representing the logical register is pending a flush: cancelling the flush in the entry of the general purpose register; storing the target instruction in the entry of the general purpose register representing the logical register, and if an entry in a history buffer targeting the logical register is pending a restore, cancelling the restore for the entry of the history buffer.
    Type: Grant
    Filed: December 15, 2015
    Date of Patent: March 26, 2019
    Assignee: International Business Machines Corporation
    Inventors: Salma Ayub, Brian D. Barrick, Joshua W. Bowman, Sundeep Chadha, Cliff Kucharski, Dung Q. Nguyen, David R. Terry, Jing Zhang
  • Patent number: 10235181
    Abstract: An out-of-order (OOO) processor includes ready logic that provides a signal indicating an instruction is ready when all operands for the instruction are ready, or when all operands are either ready or are marked back-to-back to a current instruction. By marking a second instruction that consumes an operand as ready when it is back-to-back with a first instruction that produces the operand, but the first instruction has not yet produced the operand, latency due to missed cycles in executing back-to-back instructions is minimized.
    Type: Grant
    Filed: February 3, 2017
    Date of Patent: March 19, 2019
    Assignee: International Business Machines Corporation
    Inventor: Brian W. Thompto
  • Patent number: 10235232
    Abstract: A processor includes an indicator configured to indicate a first mode or a second mode and a functional unit configured to perform computations with a full degree of accuracy when the indicator indicates the first mode and to perform computations with less than the full degree of accuracy when the indicator indicates the second mode.
    Type: Grant
    Filed: October 23, 2014
    Date of Patent: March 19, 2019
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD
    Inventors: G. Glenn Henry, Terry Parks, Rodney E. Hooker
  • Patent number: 10229066
    Abstract: A data processing apparatus is provided including queue circuitry to respond to control signals each associated with a memory access instruction, and to queue a plurality of requests for data, each associated with a reference to a storage location. Resolution circuitry acquires a request for data, and issues the request for data, the resolution circuitry having a resolution circuitry limit. When a current capacity of the resolution circuitry is below the resolution circuitry limit, the resolution circuitry acquires the request for data by receiving the request for data from the queue circuitry, stores the request for data in association with the storage location, issues the request for data, and causes a result of issuing the request for data to be provided to said storage location.
    Type: Grant
    Filed: September 30, 2016
    Date of Patent: March 12, 2019
    Assignee: ARM Limited
    Inventors: Miles Robert Dooley, Matthew Andrew Rafacz, Huzefa Moiz Sanjeliwala, Michael Filippo
  • Patent number: 10228982
    Abstract: A mechanism is provided for allocating a hyper-threaded processor to nodes of multi-tenant distributed software systems. Responsive to receiving a request to provision a node of the multi-tenant distributed software system on the host data processing system, a cluster to which the node belongs is identified. Responsive to the node being a second type of node, responsive to determining that another second type of node in the same cluster has been provisioned on the host data processing system, and responsive to the number of unallocated VPs on different physical processors from that of the other second type of node being greater than or equal to the requested number of VPs for the second type of node, the requested number of VPs for the second type of node is allocated each to a different physical processor from that of the other second type of node.
    Type: Grant
    Filed: January 25, 2018
    Date of Patent: March 12, 2019
    Assignee: International Business Machines Corporation
    Inventors: Rachit Arora, Dharmesh K. Jain, Padmanabhan Krishnan, Shrinivas S. Kulkarni, Subin Shekhar
  • Patent number: 10223126
    Abstract: An out-of-order (OOO) processor includes ready logic that provides a signal indicating an instruction is ready when all operands for the instruction are ready, or when all operands are either ready or are marked back-to-back to a current instruction. By marking a second instruction that consumes an operand as ready when it is back-to-back with a first instruction that produces the operand, but the first instruction has not yet produced the operand, latency due to missed cycles in executing back-to-back instructions is minimized.
    Type: Grant
    Filed: January 6, 2017
    Date of Patent: March 5, 2019
    Assignee: International Business Machines Corporation
    Inventor: Brian W. Thompto
  • Patent number: 10216547
    Abstract: A mechanism is provided for allocating a hyper-threaded processor to nodes of multi-tenant distributed software systems. Responsive to receiving a request to provision a node of the multi-tenant distributed software system on the host data processing system, a cluster to which the node belongs is identified. Responsive to the node being a second type of node, responsive to determining that another second type of node in the same cluster has been provisioned on the host data processing system, and responsive to the number of unallocated VPs on different physical processors from that of the other second type of node being greater than or equal to the requested number of VPs for the second type of node, the requested number of VPs for the second type of node is allocated each to a different physical processor from that of the other second type of node.
    Type: Grant
    Filed: November 22, 2016
    Date of Patent: February 26, 2019
    Assignee: International Business Machines Corporation
    Inventors: Rachit Arora, Dharmesh K. Jain, Padmanabhan Krishnan, Shrinivas S. Kulkarni, Subin Shekhar
  • Patent number: 10169187
    Abstract: A performance monitor including a saturating counter provides a relative measure of event frequency without requiring a minimum polling rate or periodic reset to avoid or account for counter overflow. The saturating counter is incremented upon detection of an event and decremented if an event is not detected within a predetermined period. The period of detecting may be programmable and may be determined by real time clock, processor or instruction cycles. Multiple event types may be selected from for detection and input to a single counter, or alternatively multiple event counters may be provided for various event types. The saturating counter may additionally be periodically reset in a selected operating mode, in combination with the decrementing action performed on the counter.
    Type: Grant
    Filed: August 18, 2010
    Date of Patent: January 1, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Venkat Rajeev Indukuru, Alexander Erik Mericas
  • Patent number: 10140129
    Abstract: A processor having one or more processing cores is described. Each of the one or more processing cores has front end logic circuitry and a plurality of processing units. The front end logic circuitry is to fetch respective instructions of threads and decode the instructions into respective micro-code and input operand and resultant addresses of the instructions. Each of the plurality of processing units is to be assigned at least one of the threads, is coupled to said front end unit, and has a respective buffer to receive and store microcode of its assigned at least one of the threads.
    Type: Grant
    Filed: December 28, 2012
    Date of Patent: November 27, 2018
    Assignee: Intel Corporation
    Inventors: Ilan Pardo, Dror Markovich, Oren Ben-Kiki, Yuval Yosef
  • Patent number: 10114644
    Abstract: A decoding logic method is arranged to execute a zero-overhead loop in an embedded digital signal processor (DSP). In the method, instruction data is fetched from a memory, and a plurality of instruction tokens, which are derived from the instruction data, are stored in a token buffer. A first portion of one or more instruction tokens from the token buffer are passed to a first decode module, which may be an instruction decode module, and a second portion of the one or more instruction tokens from the token buffer are passed to a second decode module, which may be a loop decode module. The second decode module detects a special loop instruction token, and based on the detection of the special loop instruction token, a loop counter is conditionally tested. Using the first decode module, at least one instruction token of an iterative algorithm is assembled into a single instruction, which is executable in a single execution cycle.
    Type: Grant
    Filed: July 26, 2016
    Date of Patent: October 30, 2018
    Assignee: STMICROELECTRONICS (BEIJING) R&D CO. LTD
    Inventors: PengFei Zhu, Xiao Kang Jiao
  • Patent number: 10114638
    Abstract: In one embodiment, command message generation and execution using a machine code-instruction is performed. One embodiment includes a particular machine executing a single machine-code instruction including a reference into a command-message-building data structure stored in memory. This executing the single machine-code instruction includes generating a command message and initiating communication of the command message to a hardware accelerator, including copying command information from the command-message-building data structure based on the reference into the command message. The hardware accelerator receives and executes the command message. In one embodiment, the command message is message-switched from a processor to a hardware accelerator, such as, but not limited to, a memory controller, a table lookup unit, or a prefix lookup unit. In one embodiment, a plurality of threads share the command-message-building data structure.
    Type: Grant
    Filed: December 15, 2014
    Date of Patent: October 30, 2018
    Assignee: Cisco Technology, Inc.
    Inventor: Donald Edward Steiss
  • Patent number: 10095637
    Abstract: Techniques for improving execution of a lock instruction are provided herein. A lock instruction and younger instructions are allowed to speculatively retire prior to the store portion of the lock instruction committing its value to memory. These instructions thus do not have to wait for the lock instruction to complete before retiring. In the event that the processor detects a violation of the atomic or fencing properties of the lock instruction prior to committing the value of the lock instruction, the processor rolls back state and executes the lock instruction in a slow mode in which younger instructions are not allowed to retire until the stored value of the lock instruction is committed. Speculative retirement of these instructions results in increased processing speed, as instructions no longer need to wait to retire after execution of a lock instruction.
    Type: Grant
    Filed: September 15, 2016
    Date of Patent: October 9, 2018
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Gregory W. Smaus, John M. King, Michael D. Achenbach, Kevin M. Lepak, Matthew A. Rafacz, Noah Bamford
  • Patent number: 10089081
    Abstract: A method and apparatus for generating a signal processing pipeline based, at least in part, on manipulation of a GUI, the signal processing pipeline comprising one or more signal processing operations.
    Type: Grant
    Filed: December 31, 2014
    Date of Patent: October 2, 2018
    Assignee: Excalibur IP, LLC
    Inventors: Guangxin Yang, Ji Zhou, Shuo Yang, Yan Xia, Delu Zhu
  • Patent number: 10083039
    Abstract: A processor core having multiple parallel instruction execution slices and coupled to multiple dispatch queues by a dispatch routing network provides flexible and efficient use of internal resources. The configuration of the execution slices is selectable so that capabilities of the processor core can be adjusted according to execution requirements for the instruction streams. Two or more execution slices can be combined as super-slices to handle wider data, wider operands and/or vector operations, according to one or more mode control signal that also serves as a configuration control signal. The mode control signal is also used to partition clusters of the execution slices within the processor core according to whether single-threaded or multi-threaded operation is selected, and additionally according to a number of hardware threads that are active.
    Type: Grant
    Filed: January 30, 2018
    Date of Patent: September 25, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Lee Evan Eisen, Hung Qui Le, Jentje Leenstra, Jose Eduardo Moreira, Bruce Joseph Ronchetti, Brian William Thompto, Albert James Van Norstrand, Jr.
  • Patent number: 10078516
    Abstract: Techniques are disclosed for back-to-back issue of instructions in a processor. A first instruction is stored in a queue position in an issue queue. The issue queue stores instructions in a corresponding queue position. The first instruction includes a target instruction tag and at least a source instruction tag. The target instruction tag is stored in a table storing a plurality of target instruction tags associated with a corresponding instruction. Each stored target instruction tag specifies a logical register that stores a target operand. Upon determining, based on the source instruction tag associated with the first instruction and the target instruction tag associated with a second instruction, that the first instruction is dependent on the second instruction, a pointer to the first instruction is associated with the second instruction. The pointer is used to wake up the first instruction upon issue of the second instruction.
    Type: Grant
    Filed: August 24, 2015
    Date of Patent: September 18, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jeffrey C. Brownscheidle, Sundeep Chadha, Maureen A. Delaney, Dung Q. Nguyen
  • Patent number: 9983875
    Abstract: Operation of a multi-slice processor that includes a plurality of execution slices, a plurality of load/store slices, and an instruction sequencing unit, where operation includes: receiving, at a load/store slice, a load instruction to be issued; determining, at the load/store slice, that the load instruction has not completed and is to be reissued; and responsive to determining that the load instruction is to be reissued, delaying a signal, from the load/store slice to the instruction sequencing unit, that allows the instruction sequencing unit to issue one or more instructions dependent upon the load instruction.
    Type: Grant
    Filed: March 4, 2016
    Date of Patent: May 29, 2018
    Assignee: International Business Machines Corporation
    Inventors: Sundeep Chadha, David A. Hrusecky, Elizabeth A. McGlone, Jennifer L. Molnar
  • Patent number: 9971601
    Abstract: Dynamic resource allocation is provided in which additional resources, such as additional architected registers, are provided to an instruction, if it is determined that resources in addition to those configured to be provided to the instruction are to be used for the particular instruction. An instruction to be executed is dispatched on a pipe of a pipeline and that pipe is configured to have a set number of architected registers for use by the instruction. However, if one or more other architected registers are needed, those additional architected registers are dynamically allocated to the instruction by assigning one or more source ports of an additional pipe to the instruction.
    Type: Grant
    Filed: February 13, 2015
    Date of Patent: May 15, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Gregory W. Alexander, Brian D. Barrick, Fadi Y. Busaba, Wen H. Li, Edward T. Malley
  • Patent number: 9971600
    Abstract: Techniques are disclosed for back-to-back issue of instructions in a processor. A first instruction is stored in a queue position in an issue queue. The issue queue stores instructions in a corresponding queue position. The first instruction includes a target instruction tag and at least a source instruction tag. The target instruction tag is stored in a table storing a plurality of target instruction tags associated with a corresponding instruction. Each stored target instruction tag specifies a logical register that stores a target operand. Upon determining, based on the source instruction tag associated with the first instruction and the target instruction tag associated with a second instruction, that the first instruction is dependent on the second instruction, a pointer to the first instruction is associated with the second instruction. The pointer is used to wake up the first instruction upon issue of the second instruction.
    Type: Grant
    Filed: June 26, 2015
    Date of Patent: May 15, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jeffrey C. Brownscheidle, Sundeep Chadha, Maureen A. Delaney, Dung Q. Nguyen
  • Patent number: 9870229
    Abstract: Embodiments of the present invention provide systems and methods for mapping the architected state of one or more threads to a set of distributed physical register files to enable independent execution of one or more threads in a multiple slice processor. In one embodiment, a system is disclosed including a plurality of dispatch queues which receive instructions from one or more threads and an even number of parallel execution slices, each parallel execution slice containing a register file. A routing network directs an output from the dispatch queues to the parallel execution slices and the parallel execution slices independently execute the one or more threads.
    Type: Grant
    Filed: September 29, 2015
    Date of Patent: January 16, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sam G. Chu, Markus Kaltenbach, Hung Q. Le, Jentje Leenstra, Jose E. Moreira, Dung Q. Nguyen, Brian W. Thompto
  • Patent number: 9811343
    Abstract: A method, system, and computer program product synchronize a group of workitems executing an instruction stream on a processor. The processor is yielded by a first workitem responsive to a synchronization instruction in the instruction stream. A first one of a plurality of program counters is updated to point to a next instruction following the synchronization instruction in the instruction stream to be executed by the first workitem. A second workitem is run on the processor after the yielding.
    Type: Grant
    Filed: May 26, 2017
    Date of Patent: November 7, 2017
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Lee W. Howes, Benedict R. Gaster, Michael C. Houston
  • Patent number: 9811377
    Abstract: A method for executing multithreaded instructions grouped into blocks. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks, wherein the instructions of the instruction blocks are interleaved with multiple threads; scheduling the instructions of the instruction block to execute in accordance with the multiple threads; and tracking execution of the multiple threads to enforce fairness in an execution pipeline.
    Type: Grant
    Filed: March 14, 2014
    Date of Patent: November 7, 2017
    Assignee: INTEL CORPORATION
    Inventor: Mohammad Abdallah
  • Patent number: 9747238
    Abstract: A computer processor including a plurality of functional units that performs operations that produce result operands at different characteristic latencies over multiple cycles. An interconnect network provides data paths for transfer of operand data between functional units. The interconnect network includes first and second crossbar parts. The first crossbar part is configured to route result operands produced with the lowest characteristic latency to any other functional unit. The second crossbar part is configured to route result operands with higher characteristic latency relative to the lowest characteristic latency to the first crossbar part where such result operands are in turn routed to any functional unit. In another aspect, the functional units can be organized as multiple slots where each slot can produce multiple result operands of different characteristic latencies in the same cycle, and wherein each slot employs separate result registers for each characteristic latency present on the slot.
    Type: Grant
    Filed: June 23, 2014
    Date of Patent: August 29, 2017
    Assignee: MILL COMPUTING, INC.
    Inventors: Roger Rawson Godard, Arthur David Kahlich, Sebastien Paul Maurice Mirolo, David Arthur Yost
  • Patent number: 9720697
    Abstract: In an embodiment, a method is provided. The method includes managing user-level threads on a first instruction sequencer in response to executing user-level instructions on a second instruction sequencer that is under control of an application level program. A first user-level thread is run on the second instruction sequencer and contains one or more user level instructions. A first user level instruction has at least 1) a field that makes reference to one or more instruction sequencers or 2) implicitly references with a pointer to code that specifically addresses one or more instruction sequencers when the code is executed.
    Type: Grant
    Filed: September 10, 2012
    Date of Patent: August 1, 2017
    Assignee: INTEL CORPORATION
    Inventors: Hong Wang, John Shen, Ed Grochowski, James Paul Held, Bryant Bigbee, Shivnandan D. Kaushik, Gautham Chinya, Xiang Zou, Per Hammarlund, Xinmin Tian, Anil Aggarwal, Scott Dion Rodgers, Prashant Sethi, Baiju V. Patel, Richard Andrew Hankins
  • Patent number: 9710274
    Abstract: Various methods tightly couple together decode logic associated with multiple types of execution units and having varying priorities to enable instructions that are decoded as valid instructions for multiple types of execution units to be forwarded to a highest priority type of execution unit among the multiple types of execution units. Among other benefits, when an auxiliary execution unit is coupled to a general purpose processing core with the decode logic for the auxiliary execution unit tightly coupled with the decode logic for the general purpose processing core, the auxiliary execution unit may be used to effectively overlay new functionality for an existing instruction that is normally executed by the general purpose processing core, e.g., to patch a design flaw in the general purpose processing core or to provide improved performance for specialized applications.
    Type: Grant
    Filed: April 11, 2016
    Date of Patent: July 18, 2017
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 9697004
    Abstract: A Very Long Instruction Word (VLIW) processor having an instruction set with a reduced size resulting in a small number of bits being necessary to specify registers. The VLIW processor includes a register file, and first through third operation units, and executes a very long instruction word. Further, the very long instruction word includes a register specifying field which specifies a least one of the registers in the register file and a plurality of instructions. The operand of each instruction includes bits src1, src2, and dst, which indicate whether or not the registers specified by the register specifying field are to be used as the source register and the destination register.
    Type: Grant
    Filed: April 8, 2014
    Date of Patent: July 4, 2017
    Assignee: SOCIONEXT INC.
    Inventors: Takahiro Kageyama, Hideshi Nishida, Takeshi Tanaka, Kouji Nakajima
  • Patent number: 9684671
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for streaming external data in parallel from a second distributed system to a first distributed system. One of the methods includes receiving a query that requests a join of first rows of a first table in a first distributed system with second rows of an external table, the external table representing data in a second distributed system. Each of the segment nodes communicates with a respective extension service that obtains fragments from one or more data nodes of the second distributed system according to location information for the respective fragments, and provides to the segment node a stream of data corresponding to second rows of the external table. Each of the segment nodes computes joined rows between the first rows of the first table and the stream of data corresponding to second rows of the external table.
    Type: Grant
    Filed: February 28, 2014
    Date of Patent: June 20, 2017
    Assignee: Pivotal Software, Inc.
    Inventors: Dov Yaron Dorin, Alon Goldshuv, Alex Shacked
  • Patent number: 9633160
    Abstract: A method and system are provided for deriving a resultant compiled software code with increased compatibility for placement and routing of a dynamically reconfigurable processor.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: April 25, 2017
    Inventor: Robert Keith Mykland
  • Patent number: 9606798
    Abstract: A processor, includes a first comparison operation unit; a second comparison operation unit; a first operation unit; a second operation unit; a third operation unit; and a register, wherein the first comparison operation unit receives a first comparison operation signal, a first input signal, and a second input signal, performs a comparison operation indicated by the first comparison operation signal on the first input signal and the second input signal, and outputs a result of the comparison operation, the second comparison operation unit receives a second comparison operation signal, a third input signal, and a fourth input signal, performs a comparison operation indicated by the second comparison operation signal on the third input signal and the fourth input signal, and outputs a result of the comparison operation, the first operation unit receives the comparison result of the first comparison operation unit.
    Type: Grant
    Filed: January 6, 2016
    Date of Patent: March 28, 2017
    Assignee: Renesas Electronics Corporation
    Inventor: Yuki Kobayashi
  • Patent number: 9594562
    Abstract: Various circuit arrangements tightly couple together decode logic associated with multiple types of execution units and having varying priorities to enable instructions that are decoded as valid instructions for multiple types of execution units to be forwarded to a highest priority type of execution unit among the multiple types of execution units. Among other benefits, when an auxiliary execution unit is coupled to a general purpose processing core with the decode logic for the auxiliary execution unit tightly coupled with the decode logic for the general purpose processing core, the auxiliary execution unit may be used to effectively overlay new functionality for an existing instruction that is normally executed by the general purpose processing core, e.g., to patch a design flaw in the general purpose processing core or to provide improved performance for specialized applications.
    Type: Grant
    Filed: April 11, 2016
    Date of Patent: March 14, 2017
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 9575766
    Abstract: Some implementations provide techniques and arrangements for causing an interrupt in a processor in response to an occurrence of a number of events. A first event counter counts the occurrences of a type of event within the processor and outputs a signal to activate a second event counter in response to reaching a first predefined count. The second event counter counts the occurrences of the type of event within the processor and causes an interrupt of the processor in response to reaching a second predefined count.
    Type: Grant
    Filed: December 29, 2011
    Date of Patent: February 21, 2017
    Assignee: Intel Corporation
    Inventors: Ahmad Yasin, Peggy J. Irelan, Ofer Levy, Emile Ziedan, Grant Zhou
  • Patent number: 9519538
    Abstract: An instruction processing pipeline having error detection and error recovery circuitry associated with one or more of the pipeline stages. If an error is detected within a signal value within that pipeline stage, then it can be repaired. Part of the error recovery may be to flush upstream program instructions from the instruction pipeline. When multi-threading, only those instructions from a thread including an instruction which has been lost as a consequence of the error recovery need be flushed from the instruction pipeline. The instruction pipeline may additionally/alternatively be provided with more than one main storage element associated with each signal value with these main storage elements used in an alternating fashion such that if a signal value has been erroneously captured and needs to be repaired, there is still available a main storage element to properly capture the signal value corresponding to the following program instruction.
    Type: Grant
    Filed: June 6, 2011
    Date of Patent: December 13, 2016
    Assignee: ARM Limited
    Inventors: Emre Özer, Shidhartha Das, David Michael Bull
  • Patent number: 9471506
    Abstract: For data processing in a computing storage environment by a processor device, the environment incorporating at least high-speed and lower-speed caches, and managed tiered levels of storage, groups of data segments are migrated between the tiered levels of storage such that clumped uniformly hot ones of the groups of data segments are migrated to use a Solid State Drive (SSD) portion of the tiered levels of storage; uniformly hot groups of data segments are determined using a first, largest granulated, heat map for a selected one of the group of the data segments; a second heat map, which is smaller than the first and having the largest granularity of the first heat map, is used to determine the clumped hot groups; and sparsely hot groups are determined when neither the first heat map nor the second heat map are hotter than the first and second predetermined thresholds, respectively.
    Type: Grant
    Filed: April 22, 2015
    Date of Patent: October 18, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael T. Benhase, Lokesh M. Gupta, Cheng-Chung Song
  • Patent number: 9430244
    Abstract: A method includes processing a sequence of instructions of program code that are specified using one or more architectural registers, by a hardware-implemented pipeline that renames the architectural registers in the instructions so as to produce operations specified using one or more physical registers. At least first and second segments of the sequence of instructions are selected, wherein the second segment occurs later in the sequence than the first segment. One or more of the architectural registers in the instructions of the second segment are renamed, before completing renaming the architectural registers in the instructions of the first segment, by pre-allocating one or more of the physical registers to one or more of the architectural registers.
    Type: Grant
    Filed: October 28, 2015
    Date of Patent: August 30, 2016
    Assignee: CENTIPEDE SEMI LTD.
    Inventors: Omri Tennenhaus, Alberto Mandler, Noam Mizrahi
  • Patent number: 9389867
    Abstract: In a processor core, high latency operations are tracked in entries of a data structure associated with an execution unit of the processor core. In the execution unit, execution of an instruction dependent on a high latency operation tracked by an entry of the data structure is speculatively finished prior to completion of the high latency operation. Speculatively finishing the instruction includes reporting an identifier of the entry to completion logic of the processor core and removing the instruction from an execution pipeline of the execution unit. The completion logic records dependence of the instruction on the high latency operation and commits execution results of the instruction to an architected state of the processor only after successful completion of the high latency operation.
    Type: Grant
    Filed: August 31, 2015
    Date of Patent: July 12, 2016
    Assignee: International Business Machines Corporation
    Inventors: Sundeep Chadha, Bryan Lloyd, Dung Q. Nguyen, David S. Ray, Benjamin W. Stolt
  • Patent number: 9384002
    Abstract: In a processor core, high latency operations are tracked in entries of a data structure associated with an execution unit of the processor core. In the execution unit, execution of an instruction dependent on a high latency operation tracked by an entry of the data structure is speculatively finished prior to completion of the high latency operation. Speculatively finishing the instruction includes reporting an identifier of the entry to completion logic of the processor core and removing the instruction from an execution pipeline of the execution unit. The completion logic records dependence of the instruction on the high latency operation and commits execution results of the instruction to an architected state of the processor only after successful completion of the high latency operation.
    Type: Grant
    Filed: November 16, 2012
    Date of Patent: July 5, 2016
    Assignee: International Business Machines Corporation
    Inventors: Sundeep Chadha, Bryan Lloyd, Dung Q. Nguyen, David S. Ray, Benjamin W. Stolt
  • Patent number: 9378069
    Abstract: A method, system and computer-usable medium are disclosed for a lock-spin-wait operation for managing multi-threaded applications in a multi-core computing environment. A target processor core, referred to as a “spin-wait core” (SWC), is assigned (or reserved) for primarily running spin-waiting threads. Threads operating in the multi-core computing environment that are identified as spin-waiting are then moved to a run queue associated with the SWC to acquire a lock. The spin-waiting threads are then allocated a lock response time that is less than the default lock response time of the operating system (OS) associated with the SWC. If a spin-waiting fails to acquire a lock within the allocated lock response time, the SWC is relinquished, ceding its availability for other spin-waiting threads in the run queue to acquire a lock. Once a spin-waiting thread acquires a lock, it is migrated to its original, or an available, processor core.
    Type: Grant
    Filed: March 5, 2014
    Date of Patent: June 28, 2016
    Assignee: International Business Machines Corporation
    Inventors: Men-Chow Chiang, Ken V. Vu
  • Patent number: 9372777
    Abstract: Methods and arrangements for enhancing a ticket relative to user interaction with a system. An information technology ticket related to user interaction with an information technology system is received, and a system trace is activated, wherein additional input related to the user interaction with the information technology system is accepted. Information derived from the trace of the information technology system is associated with the information technology ticket. Other variants and embodiments are broadly contemplated herein.
    Type: Grant
    Filed: February 28, 2013
    Date of Patent: June 21, 2016
    Assignee: International Business Machines Corporation
    Inventors: Pankaj Dhoolia, Diptikalyan Saha, Ram Viswanathan
  • Patent number: 9330011
    Abstract: A microprocessor includes an instruction cache and a hardware state machine configured to detect a no operation (NOP) slide by counting a continuous sequence of NOP instructions within a stream of instructions fetched from the instruction cache. The microprocessor is configured to suspend execution of the stream of instructions, and transfer control to another routine, in response to detecting the NOP slide.
    Type: Grant
    Filed: October 10, 2013
    Date of Patent: May 3, 2016
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.
    Inventor: Terry Parks
  • Patent number: 9286069
    Abstract: Within a processing pipeline 14, issue control circuitry 12 serves to arbitrate write port availability when floating point multiplication instructions are issued into a floating point pipeline 14. If not operating in a flush-to-zero mode, then depending upon the output operands generated denormal handling may or may not be required. A pessimistic assumption is made upon issue that denormal handling will be required and accordingly the write port reserved is a first predetermined number of processing cycles after the start cycle to take account of use of the denormal handling pipeline stage 20. Partway along the processing pipeline 14, state becomes available which indicates whether or not denormal handling is actually required. If denormal handling is not required and a write port is available one processing cycle earlier, then bypass circuitry 22 serves to bypass the denormal handling pipeline stage 20 such that the output operand will be written to the register bank 16 one processing cycle earlier.
    Type: Grant
    Filed: December 21, 2012
    Date of Patent: March 15, 2016
    Assignee: ARM Limited
    Inventors: Cédric Denis Robert Airaud, Luca Scalabrino, Frederic Jean Denis Arsanto, Guillaume Schon
  • Patent number: 9251014
    Abstract: A method for detecting a software-race condition in a program includes copying a state of a transaction of the program from a first core of a multi-core processor to at least one additional core of the multi-core processor, running the transaction, redundantly, on the first core and the at least one additional core given the state, outputting a result of the first core and the at least one additional core, and detecting a difference in the results between the first core and the at least one additional core, wherein the difference indicates the software-race condition.
    Type: Grant
    Filed: August 8, 2013
    Date of Patent: February 2, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Harold W. Cain, III, David M. Daly, Michael C. Huang, Kattamuri Ekanadham, Jose E. Moreira, Mauricio J. Serrano
  • Patent number: 9250898
    Abstract: In a processor, a first operation unit outputs as a first operation result, an output of a first comparison operation unit, or an AND or OR of the output and a value already held in a register according to a first control signal. A second operation unit outputs, as a second operation result, an output of a second comparison operation unit, or an AND or OR of the output and a value already held in the register according to a second control signal. A third operation unit outputs, as an execution result, the first operation result, or an AND or OR of the first operation result and the second operation result to the register according to a third control signal. The register newly holds and outputs the execution result from the third operation unit.
    Type: Grant
    Filed: November 27, 2012
    Date of Patent: February 2, 2016
    Assignee: Renesas Electronics Corporation
    Inventor: Yuki Kobayashi
  • Patent number: 9201801
    Abstract: A computing device includes: an instruction cache storing primary execution unit instructions and auxiliary execution unit instructions in a sequential order; a primary execution unit configured to receive and execute the primary execution unit instructions from the instruction cache; an auxiliary execution unit configured to receive and execute only the auxiliary execution unit instructions from the instruction cache in a manner independent from and asynchronous to the primary execution unit; and completion circuitry configured to coordinate completion of the primary execution unit instructions by the primary execution unit and the auxiliary execution unit instructions according to the sequential order.
    Type: Grant
    Filed: September 15, 2010
    Date of Patent: December 1, 2015
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bechara F. Boury, Michael Bryan Mitchell, Paul Michael Steinmetz, Kenichi Tsuchiya
  • Patent number: 9176738
    Abstract: Method and apparatus for fast decoding of microinstructions are disclosed. An integrated circuit is disclosed wherein microinstructions are queued for execution in an execution unit having multiple pipelines where each pipeline is configured to execute a set of supported microinstructions. The execution unit receives microinstruction data including an operation code (opcode) or a complex opcode. The execution unit executes the microinstruction multiple times wherein the microinstruction is executed at least once to get an address value and at least once to get a result of an operation. The execution unit processes complex opcodes by utilizing both a load/store support and a simple opcode support by splitting the complex opcode into load/store and simple opcode components and creating an internal source/destination between the two components.
    Type: Grant
    Filed: January 12, 2011
    Date of Patent: November 3, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Ganesh Venkataramanan, Emil Talpes
  • Patent number: 9087409
    Abstract: This disclosure describes techniques for reducing memory access bandwidth in a graphics processing system based on destination alpha values. The techniques may include retrieving a destination alpha value from a bin buffer, the destination alpha value being generated in response to processing a first pixel associated with a first primitive. The techniques may further include determining, based on the destination alpha value, whether to perform an action that causes one or more texture values for a second pixel to not be retrieved from a texture buffer. In some examples, the action may include discarding the second pixel from a pixel processing pipeline prior to the second pixel arriving at a texture mapping stage of the pixel processing pipeline. The second pixel may be associated with a second primitive different than the first primitive.
    Type: Grant
    Filed: March 1, 2012
    Date of Patent: July 21, 2015
    Assignee: QUALCOMM Incorporated
    Inventor: Andrew Gruber