Simultaneous Issuance Of Multiple Instructions Patents (Class 712/215)
-
Patent number: 10324724Abstract: Methods and apparatuses relating to a fusion manager to fuse instructions are described. In one embodiment, a hardware processor includes a hardware binary translator to translate an instruction stream into a translated instruction stream, a hardware fusion manager to fuse multiple instructions of the translated instruction stream into a single fused instruction, a hardware decode unit to decode the single fused instruction into a decoded, single fused instruction, and a hardware execution unit to execute the decoded, single fused instruction.Type: GrantFiled: December 16, 2015Date of Patent: June 18, 2019Assignee: Intel CorporationInventors: Patrick P. Lai, Tyler N. Sondag, Sebastian Winkel, Polychronis Xekalakis, Ethan Schuchman, Jayesh Iyer
-
Patent number: 10282207Abstract: Operation of a multi-slice processor that includes execution slices and load/store slices coupled via a results bus includes: receiving, by an execution slice, a producer instruction, including: storing, in an entry of an issue queue, the producer instruction; and storing, in a register, an issue queue entry identifier representing the entry of the issue queue in which the producer instruction is stored; receiving, by the execution slice, a source instruction, the source instruction dependent upon the result of the producer instruction, including: storing, in another entry of the issue queue, the source instruction and the issue queue entry identifier of the producer instruction; determining in dependence upon the issue queue entry identifier of the producer instruction that the producer instruction has issued from the issue queue; and responsive to the determination that the producer instruction has issued from the issue queue, issuing the source instruction from the issue queue.Type: GrantFiled: February 18, 2016Date of Patent: May 7, 2019Assignee: International Business Machines CorporationInventors: Brian D. Barrick, Sundeep Chadha, Michael J. Genden, Jerry Y. Lu, Dung Q. Nguyen, Nasrin Sultana, David R. Terry, David S. Walder
-
Patent number: 10268482Abstract: Operation of a multi-slice processor that includes execution slices and load/store slices coupled via a results bus includes: receiving, by an execution slice, a producer instruction, including: storing, in an entry of an issue queue, the producer instruction; and storing, in a register, an issue queue entry identifier representing the entry of the issue queue in which the producer instruction is stored; receiving, by the execution slice, a source instruction, the source instruction dependent upon the result of the producer instruction, including: storing, in another entry of the issue queue, the source instruction and the issue queue entry identifier of the producer instruction; determining in dependence upon the issue queue entry identifier of the producer instruction that the producer instruction has issued from the issue queue; and responsive to the determination that the producer instruction has issued from the issue queue, issuing the source instruction from the issue queue.Type: GrantFiled: December 15, 2015Date of Patent: April 23, 2019Assignee: International Business Machines CorporationInventors: Brian D. Barrick, Sundeep Chadha, Michael J. Genden, Jerry Y. Lu, Dung Q. Nguyen, Nasrin Sultana, David R. Terry, David S. Walder
-
Patent number: 10269088Abstract: A mechanism is described for facilitating thread execution arbitration for thread scheduling relating to graphics processors at computing devices. A method of embodiments, as described herein, includes assigning priority levels to threads based on stall signals communicated from the one or more shared function units to one or more execution units of a processor including a graphics processor, and selecting a first thread to be scheduled and a second thread to be ignored based on the stall signals.Type: GrantFiled: April 21, 2017Date of Patent: April 23, 2019Assignee: INTEL CORPORATIONInventors: Joydeep Ray, Abhishek R. Appu, Subramaniam M. Maiyuran, Eric J. Hoekstra, Prasoonkumar Surti, Balaji Vembu, Altug Koker
-
Patent number: 10248421Abstract: Operation of a multi-slice processor that includes execution slices and load/store slices coupled via a results bus, including: for a target instruction targeting a logical register, determining whether an entry in a general purpose register representing the logical register is pending a flush; if the entry in the general purpose register representing the logical register is pending a flush: cancelling the flush in the entry of the general purpose register; storing the target instruction in the entry of the general purpose register representing the logical register, and if an entry in a history buffer targeting the logical register is pending a restore, cancelling the restore for the entry of the history buffer.Type: GrantFiled: February 16, 2016Date of Patent: April 2, 2019Assignee: International Business Machines CorporationInventors: Salma Ayub, Brian D. Barrick, Joshua W. Bowman, Sundeep Chadha, Cliff Kucharski, Dung Q. Nguyen, David R. Terry, Jing Zhang
-
Patent number: 10241557Abstract: A processor includes a mechanism for disabling a memory array of a branch prediction unit. The processor may include a next fetch prediction unit that may include a number of entries. Each entry may correspond to a next instruction fetch group and may store an indication of whether or not the corresponding the next fetch group includes a conditional branch instruction. In response to an indication that the next fetch group does not include a conditional branch instruction, the fetch prediction unit may be configured to disable, in a next instruction execution cycle, the memory array of the branch prediction unit.Type: GrantFiled: December 12, 2013Date of Patent: March 26, 2019Assignee: Apple Inc.Inventors: Conrado Blasco, Ronald P Hall, Ramesh B Gunna, Ian D Kountanis, Shyam Sundar, André Seznec
-
Patent number: 10241790Abstract: Operation of a multi-slice processor that includes execution slices and load/store slices coupled via a results bus, including: for a target instruction targeting a logical register, determining whether an entry in a general purpose register representing the logical register is pending a flush; if the entry in the general purpose register representing the logical register is pending a flush: cancelling the flush in the entry of the general purpose register; storing the target instruction in the entry of the general purpose register representing the logical register, and if an entry in a history buffer targeting the logical register is pending a restore, cancelling the restore for the entry of the history buffer.Type: GrantFiled: December 15, 2015Date of Patent: March 26, 2019Assignee: International Business Machines CorporationInventors: Salma Ayub, Brian D. Barrick, Joshua W. Bowman, Sundeep Chadha, Cliff Kucharski, Dung Q. Nguyen, David R. Terry, Jing Zhang
-
Patent number: 10235181Abstract: An out-of-order (OOO) processor includes ready logic that provides a signal indicating an instruction is ready when all operands for the instruction are ready, or when all operands are either ready or are marked back-to-back to a current instruction. By marking a second instruction that consumes an operand as ready when it is back-to-back with a first instruction that produces the operand, but the first instruction has not yet produced the operand, latency due to missed cycles in executing back-to-back instructions is minimized.Type: GrantFiled: February 3, 2017Date of Patent: March 19, 2019Assignee: International Business Machines CorporationInventor: Brian W. Thompto
-
Patent number: 10235232Abstract: A processor includes an indicator configured to indicate a first mode or a second mode and a functional unit configured to perform computations with a full degree of accuracy when the indicator indicates the first mode and to perform computations with less than the full degree of accuracy when the indicator indicates the second mode.Type: GrantFiled: October 23, 2014Date of Patent: March 19, 2019Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTDInventors: G. Glenn Henry, Terry Parks, Rodney E. Hooker
-
Patent number: 10229066Abstract: A data processing apparatus is provided including queue circuitry to respond to control signals each associated with a memory access instruction, and to queue a plurality of requests for data, each associated with a reference to a storage location. Resolution circuitry acquires a request for data, and issues the request for data, the resolution circuitry having a resolution circuitry limit. When a current capacity of the resolution circuitry is below the resolution circuitry limit, the resolution circuitry acquires the request for data by receiving the request for data from the queue circuitry, stores the request for data in association with the storage location, issues the request for data, and causes a result of issuing the request for data to be provided to said storage location.Type: GrantFiled: September 30, 2016Date of Patent: March 12, 2019Assignee: ARM LimitedInventors: Miles Robert Dooley, Matthew Andrew Rafacz, Huzefa Moiz Sanjeliwala, Michael Filippo
-
Patent number: 10228982Abstract: A mechanism is provided for allocating a hyper-threaded processor to nodes of multi-tenant distributed software systems. Responsive to receiving a request to provision a node of the multi-tenant distributed software system on the host data processing system, a cluster to which the node belongs is identified. Responsive to the node being a second type of node, responsive to determining that another second type of node in the same cluster has been provisioned on the host data processing system, and responsive to the number of unallocated VPs on different physical processors from that of the other second type of node being greater than or equal to the requested number of VPs for the second type of node, the requested number of VPs for the second type of node is allocated each to a different physical processor from that of the other second type of node.Type: GrantFiled: January 25, 2018Date of Patent: March 12, 2019Assignee: International Business Machines CorporationInventors: Rachit Arora, Dharmesh K. Jain, Padmanabhan Krishnan, Shrinivas S. Kulkarni, Subin Shekhar
-
Patent number: 10223126Abstract: An out-of-order (OOO) processor includes ready logic that provides a signal indicating an instruction is ready when all operands for the instruction are ready, or when all operands are either ready or are marked back-to-back to a current instruction. By marking a second instruction that consumes an operand as ready when it is back-to-back with a first instruction that produces the operand, but the first instruction has not yet produced the operand, latency due to missed cycles in executing back-to-back instructions is minimized.Type: GrantFiled: January 6, 2017Date of Patent: March 5, 2019Assignee: International Business Machines CorporationInventor: Brian W. Thompto
-
Patent number: 10216547Abstract: A mechanism is provided for allocating a hyper-threaded processor to nodes of multi-tenant distributed software systems. Responsive to receiving a request to provision a node of the multi-tenant distributed software system on the host data processing system, a cluster to which the node belongs is identified. Responsive to the node being a second type of node, responsive to determining that another second type of node in the same cluster has been provisioned on the host data processing system, and responsive to the number of unallocated VPs on different physical processors from that of the other second type of node being greater than or equal to the requested number of VPs for the second type of node, the requested number of VPs for the second type of node is allocated each to a different physical processor from that of the other second type of node.Type: GrantFiled: November 22, 2016Date of Patent: February 26, 2019Assignee: International Business Machines CorporationInventors: Rachit Arora, Dharmesh K. Jain, Padmanabhan Krishnan, Shrinivas S. Kulkarni, Subin Shekhar
-
Patent number: 10169187Abstract: A performance monitor including a saturating counter provides a relative measure of event frequency without requiring a minimum polling rate or periodic reset to avoid or account for counter overflow. The saturating counter is incremented upon detection of an event and decremented if an event is not detected within a predetermined period. The period of detecting may be programmable and may be determined by real time clock, processor or instruction cycles. Multiple event types may be selected from for detection and input to a single counter, or alternatively multiple event counters may be provided for various event types. The saturating counter may additionally be periodically reset in a selected operating mode, in combination with the decrementing action performed on the counter.Type: GrantFiled: August 18, 2010Date of Patent: January 1, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Venkat Rajeev Indukuru, Alexander Erik Mericas
-
Patent number: 10140129Abstract: A processor having one or more processing cores is described. Each of the one or more processing cores has front end logic circuitry and a plurality of processing units. The front end logic circuitry is to fetch respective instructions of threads and decode the instructions into respective micro-code and input operand and resultant addresses of the instructions. Each of the plurality of processing units is to be assigned at least one of the threads, is coupled to said front end unit, and has a respective buffer to receive and store microcode of its assigned at least one of the threads.Type: GrantFiled: December 28, 2012Date of Patent: November 27, 2018Assignee: Intel CorporationInventors: Ilan Pardo, Dror Markovich, Oren Ben-Kiki, Yuval Yosef
-
Patent number: 10114644Abstract: A decoding logic method is arranged to execute a zero-overhead loop in an embedded digital signal processor (DSP). In the method, instruction data is fetched from a memory, and a plurality of instruction tokens, which are derived from the instruction data, are stored in a token buffer. A first portion of one or more instruction tokens from the token buffer are passed to a first decode module, which may be an instruction decode module, and a second portion of the one or more instruction tokens from the token buffer are passed to a second decode module, which may be a loop decode module. The second decode module detects a special loop instruction token, and based on the detection of the special loop instruction token, a loop counter is conditionally tested. Using the first decode module, at least one instruction token of an iterative algorithm is assembled into a single instruction, which is executable in a single execution cycle.Type: GrantFiled: July 26, 2016Date of Patent: October 30, 2018Assignee: STMICROELECTRONICS (BEIJING) R&D CO. LTDInventors: PengFei Zhu, Xiao Kang Jiao
-
Patent number: 10114638Abstract: In one embodiment, command message generation and execution using a machine code-instruction is performed. One embodiment includes a particular machine executing a single machine-code instruction including a reference into a command-message-building data structure stored in memory. This executing the single machine-code instruction includes generating a command message and initiating communication of the command message to a hardware accelerator, including copying command information from the command-message-building data structure based on the reference into the command message. The hardware accelerator receives and executes the command message. In one embodiment, the command message is message-switched from a processor to a hardware accelerator, such as, but not limited to, a memory controller, a table lookup unit, or a prefix lookup unit. In one embodiment, a plurality of threads share the command-message-building data structure.Type: GrantFiled: December 15, 2014Date of Patent: October 30, 2018Assignee: Cisco Technology, Inc.Inventor: Donald Edward Steiss
-
Patent number: 10095637Abstract: Techniques for improving execution of a lock instruction are provided herein. A lock instruction and younger instructions are allowed to speculatively retire prior to the store portion of the lock instruction committing its value to memory. These instructions thus do not have to wait for the lock instruction to complete before retiring. In the event that the processor detects a violation of the atomic or fencing properties of the lock instruction prior to committing the value of the lock instruction, the processor rolls back state and executes the lock instruction in a slow mode in which younger instructions are not allowed to retire until the stored value of the lock instruction is committed. Speculative retirement of these instructions results in increased processing speed, as instructions no longer need to wait to retire after execution of a lock instruction.Type: GrantFiled: September 15, 2016Date of Patent: October 9, 2018Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Gregory W. Smaus, John M. King, Michael D. Achenbach, Kevin M. Lepak, Matthew A. Rafacz, Noah Bamford
-
Patent number: 10089081Abstract: A method and apparatus for generating a signal processing pipeline based, at least in part, on manipulation of a GUI, the signal processing pipeline comprising one or more signal processing operations.Type: GrantFiled: December 31, 2014Date of Patent: October 2, 2018Assignee: Excalibur IP, LLCInventors: Guangxin Yang, Ji Zhou, Shuo Yang, Yan Xia, Delu Zhu
-
Patent number: 10083039Abstract: A processor core having multiple parallel instruction execution slices and coupled to multiple dispatch queues by a dispatch routing network provides flexible and efficient use of internal resources. The configuration of the execution slices is selectable so that capabilities of the processor core can be adjusted according to execution requirements for the instruction streams. Two or more execution slices can be combined as super-slices to handle wider data, wider operands and/or vector operations, according to one or more mode control signal that also serves as a configuration control signal. The mode control signal is also used to partition clusters of the execution slices within the processor core according to whether single-threaded or multi-threaded operation is selected, and additionally according to a number of hardware threads that are active.Type: GrantFiled: January 30, 2018Date of Patent: September 25, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Lee Evan Eisen, Hung Qui Le, Jentje Leenstra, Jose Eduardo Moreira, Bruce Joseph Ronchetti, Brian William Thompto, Albert James Van Norstrand, Jr.
-
Patent number: 10078516Abstract: Techniques are disclosed for back-to-back issue of instructions in a processor. A first instruction is stored in a queue position in an issue queue. The issue queue stores instructions in a corresponding queue position. The first instruction includes a target instruction tag and at least a source instruction tag. The target instruction tag is stored in a table storing a plurality of target instruction tags associated with a corresponding instruction. Each stored target instruction tag specifies a logical register that stores a target operand. Upon determining, based on the source instruction tag associated with the first instruction and the target instruction tag associated with a second instruction, that the first instruction is dependent on the second instruction, a pointer to the first instruction is associated with the second instruction. The pointer is used to wake up the first instruction upon issue of the second instruction.Type: GrantFiled: August 24, 2015Date of Patent: September 18, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jeffrey C. Brownscheidle, Sundeep Chadha, Maureen A. Delaney, Dung Q. Nguyen
-
Patent number: 9983875Abstract: Operation of a multi-slice processor that includes a plurality of execution slices, a plurality of load/store slices, and an instruction sequencing unit, where operation includes: receiving, at a load/store slice, a load instruction to be issued; determining, at the load/store slice, that the load instruction has not completed and is to be reissued; and responsive to determining that the load instruction is to be reissued, delaying a signal, from the load/store slice to the instruction sequencing unit, that allows the instruction sequencing unit to issue one or more instructions dependent upon the load instruction.Type: GrantFiled: March 4, 2016Date of Patent: May 29, 2018Assignee: International Business Machines CorporationInventors: Sundeep Chadha, David A. Hrusecky, Elizabeth A. McGlone, Jennifer L. Molnar
-
Patent number: 9971601Abstract: Dynamic resource allocation is provided in which additional resources, such as additional architected registers, are provided to an instruction, if it is determined that resources in addition to those configured to be provided to the instruction are to be used for the particular instruction. An instruction to be executed is dispatched on a pipe of a pipeline and that pipe is configured to have a set number of architected registers for use by the instruction. However, if one or more other architected registers are needed, those additional architected registers are dynamically allocated to the instruction by assigning one or more source ports of an additional pipe to the instruction.Type: GrantFiled: February 13, 2015Date of Patent: May 15, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Gregory W. Alexander, Brian D. Barrick, Fadi Y. Busaba, Wen H. Li, Edward T. Malley
-
Patent number: 9971600Abstract: Techniques are disclosed for back-to-back issue of instructions in a processor. A first instruction is stored in a queue position in an issue queue. The issue queue stores instructions in a corresponding queue position. The first instruction includes a target instruction tag and at least a source instruction tag. The target instruction tag is stored in a table storing a plurality of target instruction tags associated with a corresponding instruction. Each stored target instruction tag specifies a logical register that stores a target operand. Upon determining, based on the source instruction tag associated with the first instruction and the target instruction tag associated with a second instruction, that the first instruction is dependent on the second instruction, a pointer to the first instruction is associated with the second instruction. The pointer is used to wake up the first instruction upon issue of the second instruction.Type: GrantFiled: June 26, 2015Date of Patent: May 15, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jeffrey C. Brownscheidle, Sundeep Chadha, Maureen A. Delaney, Dung Q. Nguyen
-
Patent number: 9870229Abstract: Embodiments of the present invention provide systems and methods for mapping the architected state of one or more threads to a set of distributed physical register files to enable independent execution of one or more threads in a multiple slice processor. In one embodiment, a system is disclosed including a plurality of dispatch queues which receive instructions from one or more threads and an even number of parallel execution slices, each parallel execution slice containing a register file. A routing network directs an output from the dispatch queues to the parallel execution slices and the parallel execution slices independently execute the one or more threads.Type: GrantFiled: September 29, 2015Date of Patent: January 16, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Sam G. Chu, Markus Kaltenbach, Hung Q. Le, Jentje Leenstra, Jose E. Moreira, Dung Q. Nguyen, Brian W. Thompto
-
Patent number: 9811343Abstract: A method, system, and computer program product synchronize a group of workitems executing an instruction stream on a processor. The processor is yielded by a first workitem responsive to a synchronization instruction in the instruction stream. A first one of a plurality of program counters is updated to point to a next instruction following the synchronization instruction in the instruction stream to be executed by the first workitem. A second workitem is run on the processor after the yielding.Type: GrantFiled: May 26, 2017Date of Patent: November 7, 2017Assignee: Advanced Micro Devices, Inc.Inventors: Lee W. Howes, Benedict R. Gaster, Michael C. Houston
-
Patent number: 9811377Abstract: A method for executing multithreaded instructions grouped into blocks. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks, wherein the instructions of the instruction blocks are interleaved with multiple threads; scheduling the instructions of the instruction block to execute in accordance with the multiple threads; and tracking execution of the multiple threads to enforce fairness in an execution pipeline.Type: GrantFiled: March 14, 2014Date of Patent: November 7, 2017Assignee: INTEL CORPORATIONInventor: Mohammad Abdallah
-
Patent number: 9747238Abstract: A computer processor including a plurality of functional units that performs operations that produce result operands at different characteristic latencies over multiple cycles. An interconnect network provides data paths for transfer of operand data between functional units. The interconnect network includes first and second crossbar parts. The first crossbar part is configured to route result operands produced with the lowest characteristic latency to any other functional unit. The second crossbar part is configured to route result operands with higher characteristic latency relative to the lowest characteristic latency to the first crossbar part where such result operands are in turn routed to any functional unit. In another aspect, the functional units can be organized as multiple slots where each slot can produce multiple result operands of different characteristic latencies in the same cycle, and wherein each slot employs separate result registers for each characteristic latency present on the slot.Type: GrantFiled: June 23, 2014Date of Patent: August 29, 2017Assignee: MILL COMPUTING, INC.Inventors: Roger Rawson Godard, Arthur David Kahlich, Sebastien Paul Maurice Mirolo, David Arthur Yost
-
Patent number: 9720697Abstract: In an embodiment, a method is provided. The method includes managing user-level threads on a first instruction sequencer in response to executing user-level instructions on a second instruction sequencer that is under control of an application level program. A first user-level thread is run on the second instruction sequencer and contains one or more user level instructions. A first user level instruction has at least 1) a field that makes reference to one or more instruction sequencers or 2) implicitly references with a pointer to code that specifically addresses one or more instruction sequencers when the code is executed.Type: GrantFiled: September 10, 2012Date of Patent: August 1, 2017Assignee: INTEL CORPORATIONInventors: Hong Wang, John Shen, Ed Grochowski, James Paul Held, Bryant Bigbee, Shivnandan D. Kaushik, Gautham Chinya, Xiang Zou, Per Hammarlund, Xinmin Tian, Anil Aggarwal, Scott Dion Rodgers, Prashant Sethi, Baiju V. Patel, Richard Andrew Hankins
-
Patent number: 9710274Abstract: Various methods tightly couple together decode logic associated with multiple types of execution units and having varying priorities to enable instructions that are decoded as valid instructions for multiple types of execution units to be forwarded to a highest priority type of execution unit among the multiple types of execution units. Among other benefits, when an auxiliary execution unit is coupled to a general purpose processing core with the decode logic for the auxiliary execution unit tightly coupled with the decode logic for the general purpose processing core, the auxiliary execution unit may be used to effectively overlay new functionality for an existing instruction that is normally executed by the general purpose processing core, e.g., to patch a design flaw in the general purpose processing core or to provide improved performance for specialized applications.Type: GrantFiled: April 11, 2016Date of Patent: July 18, 2017Assignee: International Business Machines CorporationInventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
-
Patent number: 9697004Abstract: A Very Long Instruction Word (VLIW) processor having an instruction set with a reduced size resulting in a small number of bits being necessary to specify registers. The VLIW processor includes a register file, and first through third operation units, and executes a very long instruction word. Further, the very long instruction word includes a register specifying field which specifies a least one of the registers in the register file and a plurality of instructions. The operand of each instruction includes bits src1, src2, and dst, which indicate whether or not the registers specified by the register specifying field are to be used as the source register and the destination register.Type: GrantFiled: April 8, 2014Date of Patent: July 4, 2017Assignee: SOCIONEXT INC.Inventors: Takahiro Kageyama, Hideshi Nishida, Takeshi Tanaka, Kouji Nakajima
-
Patent number: 9684671Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for streaming external data in parallel from a second distributed system to a first distributed system. One of the methods includes receiving a query that requests a join of first rows of a first table in a first distributed system with second rows of an external table, the external table representing data in a second distributed system. Each of the segment nodes communicates with a respective extension service that obtains fragments from one or more data nodes of the second distributed system according to location information for the respective fragments, and provides to the segment node a stream of data corresponding to second rows of the external table. Each of the segment nodes computes joined rows between the first rows of the first table and the stream of data corresponding to second rows of the external table.Type: GrantFiled: February 28, 2014Date of Patent: June 20, 2017Assignee: Pivotal Software, Inc.Inventors: Dov Yaron Dorin, Alon Goldshuv, Alex Shacked
-
Patent number: 9633160Abstract: A method and system are provided for deriving a resultant compiled software code with increased compatibility for placement and routing of a dynamically reconfigurable processor.Type: GrantFiled: March 15, 2013Date of Patent: April 25, 2017Inventor: Robert Keith Mykland
-
Patent number: 9606798Abstract: A processor, includes a first comparison operation unit; a second comparison operation unit; a first operation unit; a second operation unit; a third operation unit; and a register, wherein the first comparison operation unit receives a first comparison operation signal, a first input signal, and a second input signal, performs a comparison operation indicated by the first comparison operation signal on the first input signal and the second input signal, and outputs a result of the comparison operation, the second comparison operation unit receives a second comparison operation signal, a third input signal, and a fourth input signal, performs a comparison operation indicated by the second comparison operation signal on the third input signal and the fourth input signal, and outputs a result of the comparison operation, the first operation unit receives the comparison result of the first comparison operation unit.Type: GrantFiled: January 6, 2016Date of Patent: March 28, 2017Assignee: Renesas Electronics CorporationInventor: Yuki Kobayashi
-
Patent number: 9594562Abstract: Various circuit arrangements tightly couple together decode logic associated with multiple types of execution units and having varying priorities to enable instructions that are decoded as valid instructions for multiple types of execution units to be forwarded to a highest priority type of execution unit among the multiple types of execution units. Among other benefits, when an auxiliary execution unit is coupled to a general purpose processing core with the decode logic for the auxiliary execution unit tightly coupled with the decode logic for the general purpose processing core, the auxiliary execution unit may be used to effectively overlay new functionality for an existing instruction that is normally executed by the general purpose processing core, e.g., to patch a design flaw in the general purpose processing core or to provide improved performance for specialized applications.Type: GrantFiled: April 11, 2016Date of Patent: March 14, 2017Assignee: International Business Machines CorporationInventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
-
Patent number: 9575766Abstract: Some implementations provide techniques and arrangements for causing an interrupt in a processor in response to an occurrence of a number of events. A first event counter counts the occurrences of a type of event within the processor and outputs a signal to activate a second event counter in response to reaching a first predefined count. The second event counter counts the occurrences of the type of event within the processor and causes an interrupt of the processor in response to reaching a second predefined count.Type: GrantFiled: December 29, 2011Date of Patent: February 21, 2017Assignee: Intel CorporationInventors: Ahmad Yasin, Peggy J. Irelan, Ofer Levy, Emile Ziedan, Grant Zhou
-
Patent number: 9519538Abstract: An instruction processing pipeline having error detection and error recovery circuitry associated with one or more of the pipeline stages. If an error is detected within a signal value within that pipeline stage, then it can be repaired. Part of the error recovery may be to flush upstream program instructions from the instruction pipeline. When multi-threading, only those instructions from a thread including an instruction which has been lost as a consequence of the error recovery need be flushed from the instruction pipeline. The instruction pipeline may additionally/alternatively be provided with more than one main storage element associated with each signal value with these main storage elements used in an alternating fashion such that if a signal value has been erroneously captured and needs to be repaired, there is still available a main storage element to properly capture the signal value corresponding to the following program instruction.Type: GrantFiled: June 6, 2011Date of Patent: December 13, 2016Assignee: ARM LimitedInventors: Emre Özer, Shidhartha Das, David Michael Bull
-
Patent number: 9471506Abstract: For data processing in a computing storage environment by a processor device, the environment incorporating at least high-speed and lower-speed caches, and managed tiered levels of storage, groups of data segments are migrated between the tiered levels of storage such that clumped uniformly hot ones of the groups of data segments are migrated to use a Solid State Drive (SSD) portion of the tiered levels of storage; uniformly hot groups of data segments are determined using a first, largest granulated, heat map for a selected one of the group of the data segments; a second heat map, which is smaller than the first and having the largest granularity of the first heat map, is used to determine the clumped hot groups; and sparsely hot groups are determined when neither the first heat map nor the second heat map are hotter than the first and second predetermined thresholds, respectively.Type: GrantFiled: April 22, 2015Date of Patent: October 18, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael T. Benhase, Lokesh M. Gupta, Cheng-Chung Song
-
Patent number: 9430244Abstract: A method includes processing a sequence of instructions of program code that are specified using one or more architectural registers, by a hardware-implemented pipeline that renames the architectural registers in the instructions so as to produce operations specified using one or more physical registers. At least first and second segments of the sequence of instructions are selected, wherein the second segment occurs later in the sequence than the first segment. One or more of the architectural registers in the instructions of the second segment are renamed, before completing renaming the architectural registers in the instructions of the first segment, by pre-allocating one or more of the physical registers to one or more of the architectural registers.Type: GrantFiled: October 28, 2015Date of Patent: August 30, 2016Assignee: CENTIPEDE SEMI LTD.Inventors: Omri Tennenhaus, Alberto Mandler, Noam Mizrahi
-
Patent number: 9389867Abstract: In a processor core, high latency operations are tracked in entries of a data structure associated with an execution unit of the processor core. In the execution unit, execution of an instruction dependent on a high latency operation tracked by an entry of the data structure is speculatively finished prior to completion of the high latency operation. Speculatively finishing the instruction includes reporting an identifier of the entry to completion logic of the processor core and removing the instruction from an execution pipeline of the execution unit. The completion logic records dependence of the instruction on the high latency operation and commits execution results of the instruction to an architected state of the processor only after successful completion of the high latency operation.Type: GrantFiled: August 31, 2015Date of Patent: July 12, 2016Assignee: International Business Machines CorporationInventors: Sundeep Chadha, Bryan Lloyd, Dung Q. Nguyen, David S. Ray, Benjamin W. Stolt
-
Patent number: 9384002Abstract: In a processor core, high latency operations are tracked in entries of a data structure associated with an execution unit of the processor core. In the execution unit, execution of an instruction dependent on a high latency operation tracked by an entry of the data structure is speculatively finished prior to completion of the high latency operation. Speculatively finishing the instruction includes reporting an identifier of the entry to completion logic of the processor core and removing the instruction from an execution pipeline of the execution unit. The completion logic records dependence of the instruction on the high latency operation and commits execution results of the instruction to an architected state of the processor only after successful completion of the high latency operation.Type: GrantFiled: November 16, 2012Date of Patent: July 5, 2016Assignee: International Business Machines CorporationInventors: Sundeep Chadha, Bryan Lloyd, Dung Q. Nguyen, David S. Ray, Benjamin W. Stolt
-
Patent number: 9378069Abstract: A method, system and computer-usable medium are disclosed for a lock-spin-wait operation for managing multi-threaded applications in a multi-core computing environment. A target processor core, referred to as a “spin-wait core” (SWC), is assigned (or reserved) for primarily running spin-waiting threads. Threads operating in the multi-core computing environment that are identified as spin-waiting are then moved to a run queue associated with the SWC to acquire a lock. The spin-waiting threads are then allocated a lock response time that is less than the default lock response time of the operating system (OS) associated with the SWC. If a spin-waiting fails to acquire a lock within the allocated lock response time, the SWC is relinquished, ceding its availability for other spin-waiting threads in the run queue to acquire a lock. Once a spin-waiting thread acquires a lock, it is migrated to its original, or an available, processor core.Type: GrantFiled: March 5, 2014Date of Patent: June 28, 2016Assignee: International Business Machines CorporationInventors: Men-Chow Chiang, Ken V. Vu
-
Patent number: 9372777Abstract: Methods and arrangements for enhancing a ticket relative to user interaction with a system. An information technology ticket related to user interaction with an information technology system is received, and a system trace is activated, wherein additional input related to the user interaction with the information technology system is accepted. Information derived from the trace of the information technology system is associated with the information technology ticket. Other variants and embodiments are broadly contemplated herein.Type: GrantFiled: February 28, 2013Date of Patent: June 21, 2016Assignee: International Business Machines CorporationInventors: Pankaj Dhoolia, Diptikalyan Saha, Ram Viswanathan
-
Patent number: 9330011Abstract: A microprocessor includes an instruction cache and a hardware state machine configured to detect a no operation (NOP) slide by counting a continuous sequence of NOP instructions within a stream of instructions fetched from the instruction cache. The microprocessor is configured to suspend execution of the stream of instructions, and transfer control to another routine, in response to detecting the NOP slide.Type: GrantFiled: October 10, 2013Date of Patent: May 3, 2016Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.Inventor: Terry Parks
-
Patent number: 9286069Abstract: Within a processing pipeline 14, issue control circuitry 12 serves to arbitrate write port availability when floating point multiplication instructions are issued into a floating point pipeline 14. If not operating in a flush-to-zero mode, then depending upon the output operands generated denormal handling may or may not be required. A pessimistic assumption is made upon issue that denormal handling will be required and accordingly the write port reserved is a first predetermined number of processing cycles after the start cycle to take account of use of the denormal handling pipeline stage 20. Partway along the processing pipeline 14, state becomes available which indicates whether or not denormal handling is actually required. If denormal handling is not required and a write port is available one processing cycle earlier, then bypass circuitry 22 serves to bypass the denormal handling pipeline stage 20 such that the output operand will be written to the register bank 16 one processing cycle earlier.Type: GrantFiled: December 21, 2012Date of Patent: March 15, 2016Assignee: ARM LimitedInventors: Cédric Denis Robert Airaud, Luca Scalabrino, Frederic Jean Denis Arsanto, Guillaume Schon
-
Patent number: 9251014Abstract: A method for detecting a software-race condition in a program includes copying a state of a transaction of the program from a first core of a multi-core processor to at least one additional core of the multi-core processor, running the transaction, redundantly, on the first core and the at least one additional core given the state, outputting a result of the first core and the at least one additional core, and detecting a difference in the results between the first core and the at least one additional core, wherein the difference indicates the software-race condition.Type: GrantFiled: August 8, 2013Date of Patent: February 2, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Harold W. Cain, III, David M. Daly, Michael C. Huang, Kattamuri Ekanadham, Jose E. Moreira, Mauricio J. Serrano
-
Patent number: 9250898Abstract: In a processor, a first operation unit outputs as a first operation result, an output of a first comparison operation unit, or an AND or OR of the output and a value already held in a register according to a first control signal. A second operation unit outputs, as a second operation result, an output of a second comparison operation unit, or an AND or OR of the output and a value already held in the register according to a second control signal. A third operation unit outputs, as an execution result, the first operation result, or an AND or OR of the first operation result and the second operation result to the register according to a third control signal. The register newly holds and outputs the execution result from the third operation unit.Type: GrantFiled: November 27, 2012Date of Patent: February 2, 2016Assignee: Renesas Electronics CorporationInventor: Yuki Kobayashi
-
Patent number: 9201801Abstract: A computing device includes: an instruction cache storing primary execution unit instructions and auxiliary execution unit instructions in a sequential order; a primary execution unit configured to receive and execute the primary execution unit instructions from the instruction cache; an auxiliary execution unit configured to receive and execute only the auxiliary execution unit instructions from the instruction cache in a manner independent from and asynchronous to the primary execution unit; and completion circuitry configured to coordinate completion of the primary execution unit instructions by the primary execution unit and the auxiliary execution unit instructions according to the sequential order.Type: GrantFiled: September 15, 2010Date of Patent: December 1, 2015Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bechara F. Boury, Michael Bryan Mitchell, Paul Michael Steinmetz, Kenichi Tsuchiya
-
Patent number: 9176738Abstract: Method and apparatus for fast decoding of microinstructions are disclosed. An integrated circuit is disclosed wherein microinstructions are queued for execution in an execution unit having multiple pipelines where each pipeline is configured to execute a set of supported microinstructions. The execution unit receives microinstruction data including an operation code (opcode) or a complex opcode. The execution unit executes the microinstruction multiple times wherein the microinstruction is executed at least once to get an address value and at least once to get a result of an operation. The execution unit processes complex opcodes by utilizing both a load/store support and a simple opcode support by splitting the complex opcode into load/store and simple opcode components and creating an internal source/destination between the two components.Type: GrantFiled: January 12, 2011Date of Patent: November 3, 2015Assignee: Advanced Micro Devices, Inc.Inventors: Ganesh Venkataramanan, Emil Talpes
-
Patent number: 9087409Abstract: This disclosure describes techniques for reducing memory access bandwidth in a graphics processing system based on destination alpha values. The techniques may include retrieving a destination alpha value from a bin buffer, the destination alpha value being generated in response to processing a first pixel associated with a first primitive. The techniques may further include determining, based on the destination alpha value, whether to perform an action that causes one or more texture values for a second pixel to not be retrieved from a texture buffer. In some examples, the action may include discarding the second pixel from a pixel processing pipeline prior to the second pixel arriving at a texture mapping stage of the pixel processing pipeline. The second pixel may be associated with a second primitive different than the first primitive.Type: GrantFiled: March 1, 2012Date of Patent: July 21, 2015Assignee: QUALCOMM IncorporatedInventor: Andrew Gruber