Patents Examined by Daniel H. Pan

Architecture emulation in a parallel processing environment

Patent number: 10013391

Abstract: An integrated circuit includes a plurality of processor core. Processing instructions in the integrated circuit includes: managing a plurality of sets of processor cores, each set including one or more processor cores assigned to a function associated with executing instructions; and reconfiguring the number of processor cores assigned to at least one of the sets during execution based on characteristics associated with executing the instructions.

Type: Grant

Filed: August 19, 2013

Date of Patent: July 3, 2018

Assignee: Massachusetts Institute of Technology

Inventors: Anant Agarwal, David M. Wentzlaff
Single instruction array index computation

Patent number: 10013258

Abstract: Embodiments are directed to a method of adjusting an index, wherein the index identifies a location of an element within an array. The method includes executing, by a computer, a single instruction that adjusts a first parameter of the index to match a parameter of an array address. The single instruction further adjusts a second parameter of the index to match a parameter of the array element. The adjustment of the first parameter includes a sign extension.

Type: Grant

Filed: September 29, 2014

Date of Patent: July 3, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Michael K. Gschwind
Hardware-based run-time mitigation of conditional branches

Patent number: 10013255

Abstract: A method includes, in a processor, processing a sequence of pre-compiled instructions by an instruction pipeline of the processor. A first block of instructions is identified in the instructions flowing via the pipeline. The first block includes a conditional branch instruction that conditionally diverges execution of the instructions into at least first and second flow-control traces that differ from one another in multiple instructions and converge at a given instruction that is again common to the first and second flow-control traces. A second block of instructions, which is logically equivalent to the first block but replaces the first and second flow-control traces by a single flow-control trace, is created by the processor at runtime. The pipeline is caused to execute the second block instead of the first block.

Type: Grant

Filed: March 23, 2016

Date of Patent: July 3, 2018

Assignee: CENTIPEDE SEMI LTD.

Inventors: Jonathan Friedmann, Ido Goren, Shay Koren, Noam Mizrahi, Alberto Mandler
System and method for selectively allocating entries at a branch target buffer

Patent number: 10007522

Abstract: A branch instruction and a corresponding branch instruction address are received at a data processing system. A first value is received and is compared to a portion of the branch instruction address. An entry at a branch target buffer corresponding to the branch instruction is selectively allocated based on a result of the comparing.

Type: Grant

Filed: May 20, 2014

Date of Patent: June 26, 2018

Assignee: NXP USA, Inc.

Inventors: Jeffrey W. Scott, William C. Moyer
Power management of branch predictors in a computer processor

Patent number: 9996351

Abstract: A computer processor includes a branch prediction unit that includes a local branch predictor and a global branch predictor. Managing power consumption in such a computer processor includes, for each of a plurality of branch instructions: performing, by the local branch predictor, a local branch prediction; performing, by each of the global branch predictors, a global branch prediction; determining to utilize the local branch prediction over the global branch predictions as a branch prediction for the branch instruction; incrementing a value of a counter; determining whether the value of the counter exceeds a predetermined threshold; and if the value of the counter exceeds the predetermined threshold, powering down at least one of the global branch predictors and configuring the branch prediction unit to bypass the powered down global branch predictor for branch predictions of subsequent branch instructions.

Type: Grant

Filed: May 26, 2016

Date of Patent: June 12, 2018

Assignee: International Business Machines Corporation

Inventors: David S. Levitan, Nicholas R. Orzol, Robert A. Philhower
Instruction definition to implement load store reordering and optimization

Patent number: 9990198

Abstract: A method for forwarding data from the store instructions to a corresponding load instruction in an out of order processor. The method includes accessing an incoming sequence of instructions, and of said sequence of instructions, splitting store instructions into a store address instruction and a store data instruction, wherein the store address performs address calculation and fetch, and wherein the store data performs a load of register contents to a memory address. The method further includes, of said sequence of instructions, splitting load instructions into a load address instruction and a load data instruction, wherein the load address performs address calculation and fetch, and wherein the load data performs a load of memory address contents into a register, and reordering the store address and load address instructions earlier and further away from LD/SD the instruction sequence to enable earlier dispatch and execution of the loads and the stores.

Type: Grant

Filed: December 11, 2014

Date of Patent: June 5, 2018

Assignee: Intel Corporation

Inventors: Mohammad A. Abdallah, Gregory A. Woods
Method and apparatus for improved thread selection

Patent number: 9983880

Abstract: An apparatus and method are described for improved thread selection. For example, one embodiment of a processor comprises: first logic to maintain a history table comprising a plurality of entries, each entry in the table associated with an instruction and including history data indicating prior hits and/or misses to a cache level and/or a translation lookaside buffer (TLB) for that instruction; and second logic to select a particular thread for execution at a particular processor pipeline stage based on the history data.

Type: Grant

Filed: September 26, 2014

Date of Patent: May 29, 2018

Assignee: Intel Corporation

Inventors: Rekai Gonzalez-Alberquilla, Tanausu Ramirez, Josep M. Codina, Enric Gibert Codina
Operation of a multi-slice processor preventing early dependent instruction wakeup

Patent number: 9983875

Abstract: Operation of a multi-slice processor that includes a plurality of execution slices, a plurality of load/store slices, and an instruction sequencing unit, where operation includes: receiving, at a load/store slice, a load instruction to be issued; determining, at the load/store slice, that the load instruction has not completed and is to be reissued; and responsive to determining that the load instruction is to be reissued, delaying a signal, from the load/store slice to the instruction sequencing unit, that allows the instruction sequencing unit to issue one or more instructions dependent upon the load instruction.

Type: Grant

Filed: March 4, 2016

Date of Patent: May 29, 2018

Assignee: International Business Machines Corporation

Inventors: Sundeep Chadha, David A. Hrusecky, Elizabeth A. McGlone, Jennifer L. Molnar
Operation of a multi-slice processor implementing dynamic switching of instruction issuance order

Patent number: 9983879

Abstract: Operation of a multi-slice processor that includes execution slices implementing dynamic switching of instruction issuance order. Such a multi-slice processor includes determining a current issuance order for a plurality of instructions and a change in an operating condition of the multi-slice processor; responsive to determining the change in the operating condition, determining an alternate issuance order for the plurality of instructions; and responsive to determining the alternate issuance order, switching from the current issuance order for the plurality of instructions to the alternate issuance order for the plurality of instructions.

Type: Grant

Filed: March 3, 2016

Date of Patent: May 29, 2018

Assignee: International Business Machines Corporation

Inventors: Jeffrey C. Brownscheidle, Sundeep Chadha, Maureen A. Delaney, Dhivya Jeganathan, Dung Q. Nguyen, Salim A. Shah
Method and apparatus for SIMD structured branching

Patent number: 9983884

Abstract: An apparatus and method for a SIMD structured branching. For example, one embodiment of a processor comprises: an execution unit having a plurality of channels to execute instructions; and a branch unit to process control flow instructions and to maintain a per channel count for each channel and a control instruction count for the control flow instructions, the branch unit to enable and disable the channels based at least on the per channel count.

Type: Grant

Filed: September 26, 2014

Date of Patent: May 29, 2018

Assignee: Intel Corporation

Inventors: Subramaniam Maiyuran, Darin M. Starkey, Thomas A. Piazza
Optimize control-flow convergence on SIMD engine using divergence depth

Patent number: 9952876

Abstract: There are provided a system, a method and a computer program product for selecting an active data stream (a lane) while running SPMD (Single Program Multiple Data) code on SIMD (Single Instruction Multiple Data) machine. The machine runs an instruction stream over input data streams. The machine increments lane depth counters of all active lanes upon the thread-PC reaching a branch operation. The machine updates the lane-PC of each active lane according to targets of the branch operation. The machine selects an active lane and activates only lanes whose lane-PCs match the thread-PC. The machine decrements the lane depth counters of the selected active lanes and updates the lane-PC of each active lane upon the instruction stream reaching a first instruction. The machine assigns the lane-PC of a lane with a largest lane depth counter value to the thread-PC and activates all lanes whose lane-PCs match the thread-PC.

Type: Grant

Filed: August 26, 2014

Date of Patent: April 24, 2018

Assignee: International Business Machines Corporation

Inventors: Gheorghe Almasi, Jose Moreira, Jessica H. Tseng, Peng Wu
Accessing data in multi-dimensional tensors using adders

Patent number: 9946539

Abstract: Methods, systems, and apparatus, including an apparatus for accessing a N-dimensional tensor, the apparatus including, for each dimension of the N-dimensional tensor, a partial address offset value element that stores a partial address offset value for the dimension based at least on an initial value for the dimension, a step value for the dimension, and a number of iterations of a loop for the dimension. The apparatus includes a hardware adder and a processor. The processor obtains an instruction to access a particular element of the N-dimensional tensor. The N-dimensional tensor has multiple elements arranged across each of the N dimensions, where N is an integer that is equal to or greater than one. The processor determines, using the partial address offset value elements and the hardware adder, an address of the particular element and outputs data indicating the determined address for accessing the particular element of the N-dimensional tensor.

Type: Grant

Filed: May 23, 2017

Date of Patent: April 17, 2018

Assignee: Google LLC

Inventors: Olivier Temam, Harshit Khaitan, Ravi Narayanaswami, Dong Hyuk Woo
Reuse of decoded instructions

Patent number: 9940136

Abstract: Systems and methods are disclosed for reusing fetched and decoded instructions in block-based processor architectures. In one example of the disclosed technology, a system includes a plurality of block-based processor cores and an instruction scheduler. A respective core is capable of executing one or more instruction blocks of a program. The instruction scheduler can be configured to identify a given instruction block of the program that is resident on a first processor core of the processor cores and is to be executed again. The instruction scheduler can be configured to adjust a mapping of instruction blocks in flight so that the given instruction block is re-executed on the first processor core without re-fetching the given instruction block.

Type: Grant

Filed: June 26, 2015

Date of Patent: April 10, 2018

Assignee: Microsoft Technology Licensing, LLC

Inventors: Douglas Christopher Burger, Aaron Smith
Decoding a complex program instruction corresponding to multiple micro-operations

Patent number: 9934037

Abstract: A data processing apparatus 2 has processing circuitry 4 which can process multiple parallel threads of processing. A shared instruction decoder 30 decodes program instructions to generate micro-operations to be processed by the processing circuitry 4. The instructions include at least one complex instruction which has multiple micro-operations. Multiple fetch units 8 are provided for fetching the micro-operations generated by the decoder 30 for processing by the processing circuitry 4. Each fetch unit 8 is associated with at least one of the threads. The decoder 30 generates the micro-operations of a complex instruction individually in response to separate decode requests 24 triggered by a fetch unit 8, each decode request 24 identifying which micro-operation of the complex instruction is to be generated by the decoder 30 in response to the decode request 24.

Type: Grant

Filed: August 22, 2014

Date of Patent: April 3, 2018

Assignee: ARM Limited

Inventor: Rune Holm
Translation entry invalidation in a multithreaded data processing system

Patent number: 9928119

Abstract: In a multithreaded data processing system including a plurality of processor cores, storage-modifying requests of a plurality of concurrently executing hardware threads are received in a shared queue. The storage-modifying requests include a translation invalidation request of an initiating hardware thread. The translation invalidation request is removed from the shared queue and buffered in sidecar logic in one of a plurality of sidecars each associated with a respective one of the plurality of hardware threads. While the translation invalidation request is buffered in the sidecar, the sidecar logic broadcasts the translation invalidation request so that it is received and processed by the plurality of processor cores. In response to confirmation of completion of processing of the translation invalidation request by the initiating processor core, the sidecar logic removes the translation invalidation request from the sidecar.

Type: Grant

Filed: March 28, 2016

Date of Patent: March 27, 2018

Assignee: International Business Machines Corporation

Inventors: Guy L. Guthrie, Hugh Shen, Derek E. Williams
Apparatus and method for vector compression

Patent number: 9929745

Abstract: An apparatus and method are described for performing vector compression. For example, one embodiment of a processor comprises: vector compression logic to compress a source vector comprising a plurality of valid data elements and invalid data elements to generate a destination vector in which valid data elements are stored contiguously on one side of the destination vector, the vector compression logic to utilize a bit mask associated with the source vector and comprising a plurality of bits, each bit corresponding to one of the plurality of data elements of the source vector and indicating whether the data element comprises a valid data element or an invalid data element, the vector compression logic to utilize indices of the bit mask and associated bit values of the bit mask to generate a control vector; and shuffle logic to shuffle/permute the data elements of the source vector to the destination vector in accordance with the control vector.

Type: Grant

Filed: September 26, 2014

Date of Patent: March 27, 2018

Assignee: Intel Corporation

Inventors: Simon Rubanovich, David M. Russinoff, Amit Gradstein, John W. O'Leary, Zeev Sperber
Accessing data in multi-dimensional tensors

Patent number: 9875104

Abstract: Methods, systems, and apparatus, including an apparatus for processing an instruction for accessing a N-dimensional tensor, the apparatus including multiple tensor index elements and multiple dimension multiplier elements, where each of the dimension multiplier elements has a corresponding tensor index element. The apparatus includes one or more processors configured to obtain an instruction to access a particular element of a N-dimensional tensor, where the N-dimensional tensor has multiple elements arranged across each of the N dimensions, and where N is an integer that is equal to or greater than one; determine, using one or more tensor index elements of the multiple tensor index elements and one or more dimension multiplier elements of the multiple dimension multiplier elements, an address of the particular element; and output data indicating the determined address for accessing the particular element of the N-dimensional tensor.

Type: Grant

Filed: February 3, 2016

Date of Patent: January 23, 2018

Assignee: Google LLC

Inventors: Dong Hyuk Woo, Andrew Everett Phelps
Method and circuits for early detection of a full queue

Patent number: 8090930

Abstract: In a pipelined computer architecture in which instructions may be removed from the instruction queue out of sequence, instruction queue status at a cycle K is determined by adding together the number of invalid instructions or free rows in the queue during cycle K?2, the number of instructions issued for cycle K?1 and the number of instructions speculatively issued in cycle K?1 that have produced a cache hit, and subtracting from the sum the number of instructions enqueued for cycle K?1. The result indicates the number of invalid instructions in the queue cycle K. The number of invalid entries instructions, the number of issued instructions, and the number of enqueued instructions are preferably represented as flat vectors, so that adding is performed by shifting in one direction, while subtracting is performed by shifting in the opposite direction. The result is compared with either the number of instructions to be enqueued in the present cycle, which number is encoded, or with a predetermined value.

Type: Grant

Filed: January 31, 2003

Date of Patent: January 3, 2012

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Timothy Charles Fischer, Daniel Lawrence Leibholz, James Arthur Farrell
Method and system for fast linked processor in a system on a chip (SoC)

Patent number: 7340585

Abstract: A fast linked multiprocessor network (22) including a plurality of processing modules (24, 26, 28, 30, 32, and 34) implemented on a field programmable gate array (10) and a plurality of configurable uni-directional links (21, 23, 25, 27, 29, 31) coupled among at least two of the plurality processing modules providing a streaming communication channel between at least two of the plurality of processing modules.

Type: Grant

Filed: August 27, 2002

Date of Patent: March 4, 2008

Assignee: Xilinx, Inc.

Inventors: Satish R. Ganesan, Goran Bilski, Usha Prabhu, Ralph D. Wittig
Non-temporal memory reference control mechanism

Patent number: 7328328

Abstract: An apparatus and method are provided for extending a microprocessor instruction set to specify non-temporal memory references at the instruction level. The apparatus includes translation logic and extended execution logic. The translation logic translates an extended instruction into a micro instruction sequence. The extended instruction has an extended prefix and an extended prefix tag. The extended prefix specifies a non-temporal access for a memory reference prescribed by the extended instruction, where the non-temporal access cannot be specified by an existing instruction from an existing instruction set. The extended prefix tag indicates the extended prefix, where the extended prefix tag is an otherwise architecturally specified opcode within the existing instruction set. The extended execution logic is coupled to the translation logic. The extended execution logic receives the micro instruction sequence, and executes the non-temporal access to perform the memory reference.

Type: Grant

Filed: August 22, 2002

Date of Patent: February 5, 2008

Assignee: IP-First, LLC

Inventors: G. Glenn Henry, Rodney E. Hooker, Terry Parks

prev … 9 10 11 12 13 14 15 16 17 … next