Patents Examined by Daniel H. Pan
  • Patent number: 10013391
    Abstract: An integrated circuit includes a plurality of processor core. Processing instructions in the integrated circuit includes: managing a plurality of sets of processor cores, each set including one or more processor cores assigned to a function associated with executing instructions; and reconfiguring the number of processor cores assigned to at least one of the sets during execution based on characteristics associated with executing the instructions.
    Type: Grant
    Filed: August 19, 2013
    Date of Patent: July 3, 2018
    Assignee: Massachusetts Institute of Technology
    Inventors: Anant Agarwal, David M. Wentzlaff
  • Patent number: 10013258
    Abstract: Embodiments are directed to a method of adjusting an index, wherein the index identifies a location of an element within an array. The method includes executing, by a computer, a single instruction that adjusts a first parameter of the index to match a parameter of an array address. The single instruction further adjusts a second parameter of the index to match a parameter of the array element. The adjustment of the first parameter includes a sign extension.
    Type: Grant
    Filed: September 29, 2014
    Date of Patent: July 3, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Michael K. Gschwind
  • Patent number: 10013255
    Abstract: A method includes, in a processor, processing a sequence of pre-compiled instructions by an instruction pipeline of the processor. A first block of instructions is identified in the instructions flowing via the pipeline. The first block includes a conditional branch instruction that conditionally diverges execution of the instructions into at least first and second flow-control traces that differ from one another in multiple instructions and converge at a given instruction that is again common to the first and second flow-control traces. A second block of instructions, which is logically equivalent to the first block but replaces the first and second flow-control traces by a single flow-control trace, is created by the processor at runtime. The pipeline is caused to execute the second block instead of the first block.
    Type: Grant
    Filed: March 23, 2016
    Date of Patent: July 3, 2018
    Assignee: CENTIPEDE SEMI LTD.
    Inventors: Jonathan Friedmann, Ido Goren, Shay Koren, Noam Mizrahi, Alberto Mandler
  • Patent number: 10007522
    Abstract: A branch instruction and a corresponding branch instruction address are received at a data processing system. A first value is received and is compared to a portion of the branch instruction address. An entry at a branch target buffer corresponding to the branch instruction is selectively allocated based on a result of the comparing.
    Type: Grant
    Filed: May 20, 2014
    Date of Patent: June 26, 2018
    Assignee: NXP USA, Inc.
    Inventors: Jeffrey W. Scott, William C. Moyer
  • Patent number: 9996351
    Abstract: A computer processor includes a branch prediction unit that includes a local branch predictor and a global branch predictor. Managing power consumption in such a computer processor includes, for each of a plurality of branch instructions: performing, by the local branch predictor, a local branch prediction; performing, by each of the global branch predictors, a global branch prediction; determining to utilize the local branch prediction over the global branch predictions as a branch prediction for the branch instruction; incrementing a value of a counter; determining whether the value of the counter exceeds a predetermined threshold; and if the value of the counter exceeds the predetermined threshold, powering down at least one of the global branch predictors and configuring the branch prediction unit to bypass the powered down global branch predictor for branch predictions of subsequent branch instructions.
    Type: Grant
    Filed: May 26, 2016
    Date of Patent: June 12, 2018
    Assignee: International Business Machines Corporation
    Inventors: David S. Levitan, Nicholas R. Orzol, Robert A. Philhower
  • Patent number: 9990198
    Abstract: A method for forwarding data from the store instructions to a corresponding load instruction in an out of order processor. The method includes accessing an incoming sequence of instructions, and of said sequence of instructions, splitting store instructions into a store address instruction and a store data instruction, wherein the store address performs address calculation and fetch, and wherein the store data performs a load of register contents to a memory address. The method further includes, of said sequence of instructions, splitting load instructions into a load address instruction and a load data instruction, wherein the load address performs address calculation and fetch, and wherein the load data performs a load of memory address contents into a register, and reordering the store address and load address instructions earlier and further away from LD/SD the instruction sequence to enable earlier dispatch and execution of the loads and the stores.
    Type: Grant
    Filed: December 11, 2014
    Date of Patent: June 5, 2018
    Assignee: Intel Corporation
    Inventors: Mohammad A. Abdallah, Gregory A. Woods
  • Patent number: 9983880
    Abstract: An apparatus and method are described for improved thread selection. For example, one embodiment of a processor comprises: first logic to maintain a history table comprising a plurality of entries, each entry in the table associated with an instruction and including history data indicating prior hits and/or misses to a cache level and/or a translation lookaside buffer (TLB) for that instruction; and second logic to select a particular thread for execution at a particular processor pipeline stage based on the history data.
    Type: Grant
    Filed: September 26, 2014
    Date of Patent: May 29, 2018
    Assignee: Intel Corporation
    Inventors: Rekai Gonzalez-Alberquilla, Tanausu Ramirez, Josep M. Codina, Enric Gibert Codina
  • Patent number: 9983875
    Abstract: Operation of a multi-slice processor that includes a plurality of execution slices, a plurality of load/store slices, and an instruction sequencing unit, where operation includes: receiving, at a load/store slice, a load instruction to be issued; determining, at the load/store slice, that the load instruction has not completed and is to be reissued; and responsive to determining that the load instruction is to be reissued, delaying a signal, from the load/store slice to the instruction sequencing unit, that allows the instruction sequencing unit to issue one or more instructions dependent upon the load instruction.
    Type: Grant
    Filed: March 4, 2016
    Date of Patent: May 29, 2018
    Assignee: International Business Machines Corporation
    Inventors: Sundeep Chadha, David A. Hrusecky, Elizabeth A. McGlone, Jennifer L. Molnar
  • Patent number: 9983879
    Abstract: Operation of a multi-slice processor that includes execution slices implementing dynamic switching of instruction issuance order. Such a multi-slice processor includes determining a current issuance order for a plurality of instructions and a change in an operating condition of the multi-slice processor; responsive to determining the change in the operating condition, determining an alternate issuance order for the plurality of instructions; and responsive to determining the alternate issuance order, switching from the current issuance order for the plurality of instructions to the alternate issuance order for the plurality of instructions.
    Type: Grant
    Filed: March 3, 2016
    Date of Patent: May 29, 2018
    Assignee: International Business Machines Corporation
    Inventors: Jeffrey C. Brownscheidle, Sundeep Chadha, Maureen A. Delaney, Dhivya Jeganathan, Dung Q. Nguyen, Salim A. Shah
  • Patent number: 9983884
    Abstract: An apparatus and method for a SIMD structured branching. For example, one embodiment of a processor comprises: an execution unit having a plurality of channels to execute instructions; and a branch unit to process control flow instructions and to maintain a per channel count for each channel and a control instruction count for the control flow instructions, the branch unit to enable and disable the channels based at least on the per channel count.
    Type: Grant
    Filed: September 26, 2014
    Date of Patent: May 29, 2018
    Assignee: Intel Corporation
    Inventors: Subramaniam Maiyuran, Darin M. Starkey, Thomas A. Piazza
  • Patent number: 9952876
    Abstract: There are provided a system, a method and a computer program product for selecting an active data stream (a lane) while running SPMD (Single Program Multiple Data) code on SIMD (Single Instruction Multiple Data) machine. The machine runs an instruction stream over input data streams. The machine increments lane depth counters of all active lanes upon the thread-PC reaching a branch operation. The machine updates the lane-PC of each active lane according to targets of the branch operation. The machine selects an active lane and activates only lanes whose lane-PCs match the thread-PC. The machine decrements the lane depth counters of the selected active lanes and updates the lane-PC of each active lane upon the instruction stream reaching a first instruction. The machine assigns the lane-PC of a lane with a largest lane depth counter value to the thread-PC and activates all lanes whose lane-PCs match the thread-PC.
    Type: Grant
    Filed: August 26, 2014
    Date of Patent: April 24, 2018
    Assignee: International Business Machines Corporation
    Inventors: Gheorghe Almasi, Jose Moreira, Jessica H. Tseng, Peng Wu
  • Patent number: 9946539
    Abstract: Methods, systems, and apparatus, including an apparatus for accessing a N-dimensional tensor, the apparatus including, for each dimension of the N-dimensional tensor, a partial address offset value element that stores a partial address offset value for the dimension based at least on an initial value for the dimension, a step value for the dimension, and a number of iterations of a loop for the dimension. The apparatus includes a hardware adder and a processor. The processor obtains an instruction to access a particular element of the N-dimensional tensor. The N-dimensional tensor has multiple elements arranged across each of the N dimensions, where N is an integer that is equal to or greater than one. The processor determines, using the partial address offset value elements and the hardware adder, an address of the particular element and outputs data indicating the determined address for accessing the particular element of the N-dimensional tensor.
    Type: Grant
    Filed: May 23, 2017
    Date of Patent: April 17, 2018
    Assignee: Google LLC
    Inventors: Olivier Temam, Harshit Khaitan, Ravi Narayanaswami, Dong Hyuk Woo
  • Patent number: 9940136
    Abstract: Systems and methods are disclosed for reusing fetched and decoded instructions in block-based processor architectures. In one example of the disclosed technology, a system includes a plurality of block-based processor cores and an instruction scheduler. A respective core is capable of executing one or more instruction blocks of a program. The instruction scheduler can be configured to identify a given instruction block of the program that is resident on a first processor core of the processor cores and is to be executed again. The instruction scheduler can be configured to adjust a mapping of instruction blocks in flight so that the given instruction block is re-executed on the first processor core without re-fetching the given instruction block.
    Type: Grant
    Filed: June 26, 2015
    Date of Patent: April 10, 2018
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Douglas Christopher Burger, Aaron Smith
  • Patent number: 9934037
    Abstract: A data processing apparatus 2 has processing circuitry 4 which can process multiple parallel threads of processing. A shared instruction decoder 30 decodes program instructions to generate micro-operations to be processed by the processing circuitry 4. The instructions include at least one complex instruction which has multiple micro-operations. Multiple fetch units 8 are provided for fetching the micro-operations generated by the decoder 30 for processing by the processing circuitry 4. Each fetch unit 8 is associated with at least one of the threads. The decoder 30 generates the micro-operations of a complex instruction individually in response to separate decode requests 24 triggered by a fetch unit 8, each decode request 24 identifying which micro-operation of the complex instruction is to be generated by the decoder 30 in response to the decode request 24.
    Type: Grant
    Filed: August 22, 2014
    Date of Patent: April 3, 2018
    Assignee: ARM Limited
    Inventor: Rune Holm
  • Patent number: 9928119
    Abstract: In a multithreaded data processing system including a plurality of processor cores, storage-modifying requests of a plurality of concurrently executing hardware threads are received in a shared queue. The storage-modifying requests include a translation invalidation request of an initiating hardware thread. The translation invalidation request is removed from the shared queue and buffered in sidecar logic in one of a plurality of sidecars each associated with a respective one of the plurality of hardware threads. While the translation invalidation request is buffered in the sidecar, the sidecar logic broadcasts the translation invalidation request so that it is received and processed by the plurality of processor cores. In response to confirmation of completion of processing of the translation invalidation request by the initiating processor core, the sidecar logic removes the translation invalidation request from the sidecar.
    Type: Grant
    Filed: March 28, 2016
    Date of Patent: March 27, 2018
    Assignee: International Business Machines Corporation
    Inventors: Guy L. Guthrie, Hugh Shen, Derek E. Williams
  • Patent number: 9929745
    Abstract: An apparatus and method are described for performing vector compression. For example, one embodiment of a processor comprises: vector compression logic to compress a source vector comprising a plurality of valid data elements and invalid data elements to generate a destination vector in which valid data elements are stored contiguously on one side of the destination vector, the vector compression logic to utilize a bit mask associated with the source vector and comprising a plurality of bits, each bit corresponding to one of the plurality of data elements of the source vector and indicating whether the data element comprises a valid data element or an invalid data element, the vector compression logic to utilize indices of the bit mask and associated bit values of the bit mask to generate a control vector; and shuffle logic to shuffle/permute the data elements of the source vector to the destination vector in accordance with the control vector.
    Type: Grant
    Filed: September 26, 2014
    Date of Patent: March 27, 2018
    Assignee: Intel Corporation
    Inventors: Simon Rubanovich, David M. Russinoff, Amit Gradstein, John W. O'Leary, Zeev Sperber
  • Patent number: 9875104
    Abstract: Methods, systems, and apparatus, including an apparatus for processing an instruction for accessing a N-dimensional tensor, the apparatus including multiple tensor index elements and multiple dimension multiplier elements, where each of the dimension multiplier elements has a corresponding tensor index element. The apparatus includes one or more processors configured to obtain an instruction to access a particular element of a N-dimensional tensor, where the N-dimensional tensor has multiple elements arranged across each of the N dimensions, and where N is an integer that is equal to or greater than one; determine, using one or more tensor index elements of the multiple tensor index elements and one or more dimension multiplier elements of the multiple dimension multiplier elements, an address of the particular element; and output data indicating the determined address for accessing the particular element of the N-dimensional tensor.
    Type: Grant
    Filed: February 3, 2016
    Date of Patent: January 23, 2018
    Assignee: Google LLC
    Inventors: Dong Hyuk Woo, Andrew Everett Phelps
  • Patent number: 8090930
    Abstract: In a pipelined computer architecture in which instructions may be removed from the instruction queue out of sequence, instruction queue status at a cycle K is determined by adding together the number of invalid instructions or free rows in the queue during cycle K?2, the number of instructions issued for cycle K?1 and the number of instructions speculatively issued in cycle K?1 that have produced a cache hit, and subtracting from the sum the number of instructions enqueued for cycle K?1. The result indicates the number of invalid instructions in the queue cycle K. The number of invalid entries instructions, the number of issued instructions, and the number of enqueued instructions are preferably represented as flat vectors, so that adding is performed by shifting in one direction, while subtracting is performed by shifting in the opposite direction. The result is compared with either the number of instructions to be enqueued in the present cycle, which number is encoded, or with a predetermined value.
    Type: Grant
    Filed: January 31, 2003
    Date of Patent: January 3, 2012
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Timothy Charles Fischer, Daniel Lawrence Leibholz, James Arthur Farrell
  • Patent number: 7340585
    Abstract: A fast linked multiprocessor network (22) including a plurality of processing modules (24, 26, 28, 30, 32, and 34) implemented on a field programmable gate array (10) and a plurality of configurable uni-directional links (21, 23, 25, 27, 29, 31) coupled among at least two of the plurality processing modules providing a streaming communication channel between at least two of the plurality of processing modules.
    Type: Grant
    Filed: August 27, 2002
    Date of Patent: March 4, 2008
    Assignee: Xilinx, Inc.
    Inventors: Satish R. Ganesan, Goran Bilski, Usha Prabhu, Ralph D. Wittig
  • Patent number: 7328328
    Abstract: An apparatus and method are provided for extending a microprocessor instruction set to specify non-temporal memory references at the instruction level. The apparatus includes translation logic and extended execution logic. The translation logic translates an extended instruction into a micro instruction sequence. The extended instruction has an extended prefix and an extended prefix tag. The extended prefix specifies a non-temporal access for a memory reference prescribed by the extended instruction, where the non-temporal access cannot be specified by an existing instruction from an existing instruction set. The extended prefix tag indicates the extended prefix, where the extended prefix tag is an otherwise architecturally specified opcode within the existing instruction set. The extended execution logic is coupled to the translation logic. The extended execution logic receives the micro instruction sequence, and executes the non-temporal access to perform the memory reference.
    Type: Grant
    Filed: August 22, 2002
    Date of Patent: February 5, 2008
    Assignee: IP-First, LLC
    Inventors: G. Glenn Henry, Rodney E. Hooker, Terry Parks