Patents Examined by Daniel Pan
-
Patent number: 9665372Abstract: A processor core having multiple parallel instruction execution slices and coupled to multiple dispatch queues by a dispatch routing network provides flexible and efficient use of internal resources. The dispatch routing network is controlled to dynamically vary the relationship between the slices and instruction streams according to execution requirements for the instruction streams and the availability of resources in the instruction execution slices. The instruction execution slices may be dynamically reconfigured as between single-instruction-multiple-data (SIMD) instruction execution and ordinary instruction execution on a per-instruction basis, permitting the mixture of those instruction types. Instructions having an operand width greater than the width of a single instruction execution slice may be processed by multiple instruction execution slices configured to act in concert for the particular instructions.Type: GrantFiled: May 12, 2014Date of Patent: May 30, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Lee Evan Eisen, Hung Qui Le, Jentje Leenstra, Jose Eduardo Moreira, Bruce Joseph Ronchetti, Brian William Thompto, Albert James Van Norstrand, Jr.
-
Patent number: 9658853Abstract: A technique for operating a processor includes storing a first result to a writeback buffer, in response to a first execution unit of the processor attempting to write the first result of a first completed instruction to a register file of the processor at a same processor time as a second execution unit of the processor is attempting to write a second result of a second completed instruction to the register file. The writeback buffer is positioned in a dataflow between the first execution unit and the register file. A buffer full indicator logic is used to detect that the writeback buffer is unavailable. A buffer unavailable signal is transmitted, from the buffer full indicator logic, in response to detecting the writeback buffer is unavailable. In response to receiving the buffer unavailable signal, a buffer retrieving logic writes the first result from the writeback buffer to the register file.Type: GrantFiled: July 31, 2014Date of Patent: May 23, 2017Assignee: GLOBALFOUNDRIES INCInventors: Harry Barowski, Tim Niggemeier
-
Patent number: 9652240Abstract: Methods and apparatus for predicting the value of a stack pointer which store data when an instruction is seen which grows the stack. The information which is stored includes a size parameter which indicates by how much the stack is grown and one or both of: the register ID currently holding the stack pointer value or the current stack pointer value. When a subsequent instruction shrinking the stack is seen, the stored data is searched for one or more entries which has a corresponding size parameter. If such an entry is identified, the other information stored in that entry is used to predict the value of the stack pointer instead of using the instruction to calculate the new stack pointer value. Where register renaming is used, the information in the entry is used to remap the stack pointer to a different physical register.Type: GrantFiled: January 14, 2015Date of Patent: May 16, 2017Assignee: Imagination Technologies LimitedInventor: Hugh Jackson
-
Patent number: 9652241Abstract: Apparatus comprises a processor configured for operation under a sequence of instructions from an instruction set, wherein said processor comprises: means for conditionally inhibiting at least one type of trap, interrupt or exception (TIE) event, wherein, when operating under a sequence of instructions, said inhibition means is inaccessible by said instructions to inhibit the or each type of TIE event, without interrupting said sequence. A data processing apparatus includes a processor adapted to operate under control of program code comprising instructions selected from an instruction set, the apparatus comprising: a predefined memory space providing a predefined addressable memory for storing program code and data, a larger memory space providing a larger addressable memory, means for accessing program code and data within the predefined memory space, and means for controlling the access means so as to enable the access means to access program code located within the larger memory space.Type: GrantFiled: April 10, 2007Date of Patent: May 16, 2017Assignee: Cambridge Consultants Ltd.Inventors: Alistair G. Morfey, Karl Leighton Swepson, Neil Edward Johnson
-
Patent number: 9633409Abstract: Techniques are disclosed relating to predication. In one embodiment, a graphics processing unit is disclosed that includes a first set of architecturally-defined registers configured to store predication information. The graphics processing unit further includes a second set of registers configured to mirror the first set of registers and an execution pipeline configured to discontinue execution of an instruction sequence based on predication information in the second set of registers. In one embodiment, the second set of registers includes one or more registers proximal to an output of the execution pipeline. In some embodiments, the execution pipeline writes back a predicate value determined for a predicate writer to the second set of registers. The first set of architecturally-defined registers is then updated with the predicate value written back to the second set of registers. In some embodiments, the execution pipeline discontinues execution of the instruction sequence without stalling.Type: GrantFiled: August 26, 2013Date of Patent: April 25, 2017Assignee: Apple Inc.Inventors: Andrew M. Havlir, Brian K. Reynolds, Michael A. Geary
-
Patent number: 9626188Abstract: Embodiments relate to a method and computer program product for relative offset branching in a reduced instruction set computing (RISC) architecture. One aspect is a method that includes fetching a branch instruction from an instruction stream having a fixed instruction width. A relative offset value is acquired from the instruction stream. The relative offset value is formatted as an offset relative to a program counter value and sized as a multiple of the fixed instruction width. The relative offset value is added with the program counter value to form a branch target address value. The branch target address value is loaded into a program counter based on the branch instruction. Execution of the instruction stream is redirected to a next instruction based on the branch target address value in the program counter.Type: GrantFiled: September 5, 2014Date of Patent: April 18, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Michael K. Gschwind
-
Patent number: 9619226Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed horizontal add or subtract of packed data elements in response to a single vector packed horizontal add or subtract instruction that includes a destination vector register operand, a source vector register operand, and an opcode are describes.Type: GrantFiled: December 23, 2011Date of Patent: April 11, 2017Assignee: Intel CorporationInventors: Mostafa Hagog, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Amit Gradstein, Simon Rubanovich, Zeev Sperber
-
Patent number: 9606802Abstract: A processor system is adapted to carry out a predicate swap instruction of an instruction set to swap, via a data pathway, predicate data in a first predicate data location of a predicate register with data in a corresponding additional predicate data location of a first additional predicate data container and to swap, via a data pathway, predicate data in a second predicate storage location of the predicate register with data in a corresponding additional predicate data location in a second additional predicate data container.Type: GrantFiled: March 25, 2011Date of Patent: March 28, 2017Assignee: NXP USA, INC.Inventors: Yuval Peled, Itzhak Barak, Uri Dayan, Amir Kleen, Idan Rozenberg
-
Patent number: 9606803Abstract: This invention addresses implements a range of interesting technologies into a single block. Each DSP CPU has a streaming engine. The streaming engines include: a SE to L2 interface that can request 512 bits/cycle from L2; a loose binding between SE and L2 interface, to allow a single stream to peak at 1024 bits/cycle; one-way coherence where the SE sees all earlier writes cached in system, but not writes that occur after stream opens; full protection against single-bit data errors within its internal storage via single-bit parity with semi-automatic restart on parity error.Type: GrantFiled: July 15, 2014Date of Patent: March 28, 2017Assignee: TEXAS INSTRUMENTS INCORPORATEDInventors: Timothy D. Anderson, Joseph Zbiciak, Duc Quang Bui, Abhijeet A. Chachad, Kai Chirca, Naveen Bhoria, Matthew D. Pierson, Daniel Wu, Ramakrishnan Venkatasubramanian
-
Patent number: 9606804Abstract: Embodiments relate to a method and computer program product for absolute address branching in a reduced instruction set computing (RISC) architecture. One aspect is a method that includes fetching a branch instruction from an instruction stream having a fixed instruction width. A branch target address value is acquired from the instruction stream. The branch target address value represents a target address of the branch instruction. The branch target address value is formatted as an absolute address and sized as a multiple of the fixed instruction width. The branch target address value is loaded into a program counter based on the branch instruction. Execution of the instruction stream is redirected to a next instruction based on the branch target address value in the program counter.Type: GrantFiled: September 5, 2014Date of Patent: March 28, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Michael K. Gschwind
-
Patent number: 9600194Abstract: An address and a data size are provided to a rotator. The rotator stores, based on the address and the data size, a data element in a location having a defined number of positions. The data element includes one or more data units and the one or more data units are aligned correctly in one or more positions of the location based on a predefined position in the location to receive a selected data unit of the one or more data units. The rotator replicates a value of a chosen data unit of the one or more data units to one or more other positions of the location.Type: GrantFiled: November 25, 2015Date of Patent: March 21, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Michael K. Gschwind
-
Patent number: 9582323Abstract: A method for scheduling the execution of a computer instruction, receive an entitlement processor resource percentage for a logical partition on a computer system. The logical partition is associated with a hardware thread of a processor of the computer system. The entitlement processor resource percentage for the logical partition is stored in a register of the hardware thread associated with the logical partition. An instruction is received from the logical partition of the computer system and the processor dispatches the instruction based on the entitlement processor resource percentage stored in the register of the hardware thread associated with the logical partition.Type: GrantFiled: June 19, 2014Date of Patent: February 28, 2017Assignee: International Business Machines CorporationInventors: Nitin Gupta, Mehulkumar J. Patel, Deepak C. Shetty
-
Patent number: 9582464Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector double block packed sum of absolute differences (SAD) in response to a single vector double block packed sum of absolute differences instruction that includes a destination vector register operand, first and second source operands, an immediate, and an opcode are described.Type: GrantFiled: December 23, 2011Date of Patent: February 28, 2017Assignee: Intel CorporationInventors: Elmoustapha Ould-Ahmed-Vall, Mostafa Hagog, Robert Valentine, Amit Gradstein, Simon Rubanovich, Zeev Sperber
-
Patent number: 9582274Abstract: Corruption of call stacks is detected by using guard words placed in the call stacks. A store guard word instruction is used to store a guard word on a stack frame of a caller routine, and a verify guard word instruction issued by one or more callee routines is used to verify the guard word is an expected value. If the guard word is an unexpected value, corruption is indicated.Type: GrantFiled: January 6, 2016Date of Patent: February 28, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Michael K. Gschwind
-
Patent number: 9575815Abstract: In a multithreaded data processing system including a plurality of processor cores, storage-modifying requests of a plurality of concurrently executing hardware threads are received in a shared queue. The storage-modifying requests include a translation invalidation request of an initiating hardware thread. The translation invalidation request is removed from the shared queue and buffered in sidecar logic in one of a plurality of sidecars each associated with a respective one of the plurality of hardware threads. While the translation invalidation request is buffered in the sidecar, the sidecar logic broadcasts the translation invalidation request so that it is received and processed by the plurality of processor cores. In response to confirmation of completion of processing of the translation invalidation request by the initiating processor core, the sidecar logic removes the translation invalidation request from the sidecar.Type: GrantFiled: December 22, 2015Date of Patent: February 21, 2017Assignee: International Business Machines CorporationInventors: Guy L. Guthrie, Hugh Shen, Derek E. Williams
-
Patent number: 9575541Abstract: A microprocessor a plurality of processing cores, wherein each of the plurality of processing cores instantiates a respective architecturally-visible storage resource. A first core of the plurality of processing cores is configured to encounter an architectural instruction that instructs the first core to update the respective architecturally-visible storage resource of the first core with a value specified by the architectural instruction. The first core is further configured to, in response to encountering the architectural instruction, provide the value to each of the other of the plurality of processing cores and update the respective architecturally-visible storage resource of the first core with the value. Each core of the plurality of processing cores other than the first core is configured to update the respective architecturally-visible storage resource of the core with the value provided by the first core without encountering the architectural instruction.Type: GrantFiled: May 19, 2014Date of Patent: February 21, 2017Assignee: VIA TECHNOLOGIES, INC.Inventors: G. Glenn Henry, Stephan Gaskins
-
Patent number: 9563430Abstract: Embodiments relate to multithreaded branch prediction. An aspect includes a system for dynamically evaluating how to share entries of a multithreaded branch prediction structure. The system includes a first-level branch target buffer coupled to a processor circuit. The processor circuit is configured to perform a method. The method includes receiving a search request to locate branch prediction information associated with the search request, and searching for an entry corresponding to the search request in the first-level branch prediction structure. The entry is not allowed based on a thread state of the entry indicating that the entry has caused a problem on a thread associated with the thread state.Type: GrantFiled: March 19, 2014Date of Patent: February 7, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: James J. Bonanno, Daniel Lipetz, Brian R. Prasky, Anthony Saporito
-
Patent number: 9563427Abstract: Embodiments relate to a system for relative offset branching in a reduced instruction set computing (RISC) architecture. One aspect is a system that includes memory and a processing circuit communicatively coupled to the memory. The system is configured to perform a method that includes fetching a branch instruction from an instruction stream having a fixed instruction width. A relative offset value is acquired from the instruction stream. The relative offset value is formatted as an offset relative to a program counter value and sized as a multiple of the fixed instruction width. The relative offset value is added with the program counter value to form a branch target address value. The branch target address value is loaded into a program counter based on the branch instruction. Execution of the instruction stream is redirected to a next instruction based on the branch target address value in the program counter.Type: GrantFiled: May 30, 2014Date of Patent: February 7, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Michael K. Gschwind
-
Patent number: 9563558Abstract: A technique for operating a cache memory of a data processing system includes creating respective pollution vectors to track which of multiple concurrent threads executed by an associated processor core are currently polluted by a store operation resident in the cache memory. Dependencies in a dependency data structure of a store queue of the cache memory are set based on the pollution vectors to reduce unnecessary ordering effects. Store operations are dispatched from the store queue in accordance with the dependencies indicated by the dependency data structure.Type: GrantFiled: August 28, 2014Date of Patent: February 7, 2017Assignee: International Business Machines CorporationInventors: Guy L. Guthrie, Hugh Shen, William J. Starke, Derek E. Williams
-
Patent number: 9552205Abstract: A processor including a decode unit to receive a vector indexed load plus arithmetic and/or logical (A/L) operation plus store instruction. The instruction is to indicate a source packed memory indices operand that is to have a plurality of packed memory indices. The instruction is also to indicate a source packed data operand that is to have a plurality of packed data elements. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the instruction, is to load a plurality of data elements from memory locations corresponding to the plurality of packed memory indices, perform A/L operations on the plurality of packed data elements of the source packed data operand and the loaded plurality of data elements, and store a plurality of result data elements in the memory locations corresponding to the plurality of packed memory indices.Type: GrantFiled: September 27, 2013Date of Patent: January 24, 2017Assignee: Intel CorporationInventors: Igor Ermolaev, Bret L. Toll, Robert Valentine, Jesus Corbal San Adrian, Gautam B. Doshi, Rama Kishan V. Malladi, Prasenjit Chakraborty