Patents Examined by Daniel H. Pan
-
Patent number: 10013391Abstract: An integrated circuit includes a plurality of processor core. Processing instructions in the integrated circuit includes: managing a plurality of sets of processor cores, each set including one or more processor cores assigned to a function associated with executing instructions; and reconfiguring the number of processor cores assigned to at least one of the sets during execution based on characteristics associated with executing the instructions.Type: GrantFiled: August 19, 2013Date of Patent: July 3, 2018Assignee: Massachusetts Institute of TechnologyInventors: Anant Agarwal, David M. Wentzlaff
-
Patent number: 10013258Abstract: Embodiments are directed to a method of adjusting an index, wherein the index identifies a location of an element within an array. The method includes executing, by a computer, a single instruction that adjusts a first parameter of the index to match a parameter of an array address. The single instruction further adjusts a second parameter of the index to match a parameter of the array element. The adjustment of the first parameter includes a sign extension.Type: GrantFiled: September 29, 2014Date of Patent: July 3, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Michael K. Gschwind
-
Patent number: 10013255Abstract: A method includes, in a processor, processing a sequence of pre-compiled instructions by an instruction pipeline of the processor. A first block of instructions is identified in the instructions flowing via the pipeline. The first block includes a conditional branch instruction that conditionally diverges execution of the instructions into at least first and second flow-control traces that differ from one another in multiple instructions and converge at a given instruction that is again common to the first and second flow-control traces. A second block of instructions, which is logically equivalent to the first block but replaces the first and second flow-control traces by a single flow-control trace, is created by the processor at runtime. The pipeline is caused to execute the second block instead of the first block.Type: GrantFiled: March 23, 2016Date of Patent: July 3, 2018Assignee: CENTIPEDE SEMI LTD.Inventors: Jonathan Friedmann, Ido Goren, Shay Koren, Noam Mizrahi, Alberto Mandler
-
Patent number: 10007522Abstract: A branch instruction and a corresponding branch instruction address are received at a data processing system. A first value is received and is compared to a portion of the branch instruction address. An entry at a branch target buffer corresponding to the branch instruction is selectively allocated based on a result of the comparing.Type: GrantFiled: May 20, 2014Date of Patent: June 26, 2018Assignee: NXP USA, Inc.Inventors: Jeffrey W. Scott, William C. Moyer
-
Patent number: 9996351Abstract: A computer processor includes a branch prediction unit that includes a local branch predictor and a global branch predictor. Managing power consumption in such a computer processor includes, for each of a plurality of branch instructions: performing, by the local branch predictor, a local branch prediction; performing, by each of the global branch predictors, a global branch prediction; determining to utilize the local branch prediction over the global branch predictions as a branch prediction for the branch instruction; incrementing a value of a counter; determining whether the value of the counter exceeds a predetermined threshold; and if the value of the counter exceeds the predetermined threshold, powering down at least one of the global branch predictors and configuring the branch prediction unit to bypass the powered down global branch predictor for branch predictions of subsequent branch instructions.Type: GrantFiled: May 26, 2016Date of Patent: June 12, 2018Assignee: International Business Machines CorporationInventors: David S. Levitan, Nicholas R. Orzol, Robert A. Philhower
-
Patent number: 9990198Abstract: A method for forwarding data from the store instructions to a corresponding load instruction in an out of order processor. The method includes accessing an incoming sequence of instructions, and of said sequence of instructions, splitting store instructions into a store address instruction and a store data instruction, wherein the store address performs address calculation and fetch, and wherein the store data performs a load of register contents to a memory address. The method further includes, of said sequence of instructions, splitting load instructions into a load address instruction and a load data instruction, wherein the load address performs address calculation and fetch, and wherein the load data performs a load of memory address contents into a register, and reordering the store address and load address instructions earlier and further away from LD/SD the instruction sequence to enable earlier dispatch and execution of the loads and the stores.Type: GrantFiled: December 11, 2014Date of Patent: June 5, 2018Assignee: Intel CorporationInventors: Mohammad A. Abdallah, Gregory A. Woods
-
Patent number: 9983880Abstract: An apparatus and method are described for improved thread selection. For example, one embodiment of a processor comprises: first logic to maintain a history table comprising a plurality of entries, each entry in the table associated with an instruction and including history data indicating prior hits and/or misses to a cache level and/or a translation lookaside buffer (TLB) for that instruction; and second logic to select a particular thread for execution at a particular processor pipeline stage based on the history data.Type: GrantFiled: September 26, 2014Date of Patent: May 29, 2018Assignee: Intel CorporationInventors: Rekai Gonzalez-Alberquilla, Tanausu Ramirez, Josep M. Codina, Enric Gibert Codina
-
Patent number: 9983875Abstract: Operation of a multi-slice processor that includes a plurality of execution slices, a plurality of load/store slices, and an instruction sequencing unit, where operation includes: receiving, at a load/store slice, a load instruction to be issued; determining, at the load/store slice, that the load instruction has not completed and is to be reissued; and responsive to determining that the load instruction is to be reissued, delaying a signal, from the load/store slice to the instruction sequencing unit, that allows the instruction sequencing unit to issue one or more instructions dependent upon the load instruction.Type: GrantFiled: March 4, 2016Date of Patent: May 29, 2018Assignee: International Business Machines CorporationInventors: Sundeep Chadha, David A. Hrusecky, Elizabeth A. McGlone, Jennifer L. Molnar
-
Patent number: 9983879Abstract: Operation of a multi-slice processor that includes execution slices implementing dynamic switching of instruction issuance order. Such a multi-slice processor includes determining a current issuance order for a plurality of instructions and a change in an operating condition of the multi-slice processor; responsive to determining the change in the operating condition, determining an alternate issuance order for the plurality of instructions; and responsive to determining the alternate issuance order, switching from the current issuance order for the plurality of instructions to the alternate issuance order for the plurality of instructions.Type: GrantFiled: March 3, 2016Date of Patent: May 29, 2018Assignee: International Business Machines CorporationInventors: Jeffrey C. Brownscheidle, Sundeep Chadha, Maureen A. Delaney, Dhivya Jeganathan, Dung Q. Nguyen, Salim A. Shah
-
Patent number: 9983884Abstract: An apparatus and method for a SIMD structured branching. For example, one embodiment of a processor comprises: an execution unit having a plurality of channels to execute instructions; and a branch unit to process control flow instructions and to maintain a per channel count for each channel and a control instruction count for the control flow instructions, the branch unit to enable and disable the channels based at least on the per channel count.Type: GrantFiled: September 26, 2014Date of Patent: May 29, 2018Assignee: Intel CorporationInventors: Subramaniam Maiyuran, Darin M. Starkey, Thomas A. Piazza
-
Patent number: 9952876Abstract: There are provided a system, a method and a computer program product for selecting an active data stream (a lane) while running SPMD (Single Program Multiple Data) code on SIMD (Single Instruction Multiple Data) machine. The machine runs an instruction stream over input data streams. The machine increments lane depth counters of all active lanes upon the thread-PC reaching a branch operation. The machine updates the lane-PC of each active lane according to targets of the branch operation. The machine selects an active lane and activates only lanes whose lane-PCs match the thread-PC. The machine decrements the lane depth counters of the selected active lanes and updates the lane-PC of each active lane upon the instruction stream reaching a first instruction. The machine assigns the lane-PC of a lane with a largest lane depth counter value to the thread-PC and activates all lanes whose lane-PCs match the thread-PC.Type: GrantFiled: August 26, 2014Date of Patent: April 24, 2018Assignee: International Business Machines CorporationInventors: Gheorghe Almasi, Jose Moreira, Jessica H. Tseng, Peng Wu
-
Patent number: 9946539Abstract: Methods, systems, and apparatus, including an apparatus for accessing a N-dimensional tensor, the apparatus including, for each dimension of the N-dimensional tensor, a partial address offset value element that stores a partial address offset value for the dimension based at least on an initial value for the dimension, a step value for the dimension, and a number of iterations of a loop for the dimension. The apparatus includes a hardware adder and a processor. The processor obtains an instruction to access a particular element of the N-dimensional tensor. The N-dimensional tensor has multiple elements arranged across each of the N dimensions, where N is an integer that is equal to or greater than one. The processor determines, using the partial address offset value elements and the hardware adder, an address of the particular element and outputs data indicating the determined address for accessing the particular element of the N-dimensional tensor.Type: GrantFiled: May 23, 2017Date of Patent: April 17, 2018Assignee: Google LLCInventors: Olivier Temam, Harshit Khaitan, Ravi Narayanaswami, Dong Hyuk Woo
-
Patent number: 9940136Abstract: Systems and methods are disclosed for reusing fetched and decoded instructions in block-based processor architectures. In one example of the disclosed technology, a system includes a plurality of block-based processor cores and an instruction scheduler. A respective core is capable of executing one or more instruction blocks of a program. The instruction scheduler can be configured to identify a given instruction block of the program that is resident on a first processor core of the processor cores and is to be executed again. The instruction scheduler can be configured to adjust a mapping of instruction blocks in flight so that the given instruction block is re-executed on the first processor core without re-fetching the given instruction block.Type: GrantFiled: June 26, 2015Date of Patent: April 10, 2018Assignee: Microsoft Technology Licensing, LLCInventors: Douglas Christopher Burger, Aaron Smith
-
Patent number: 9934037Abstract: A data processing apparatus 2 has processing circuitry 4 which can process multiple parallel threads of processing. A shared instruction decoder 30 decodes program instructions to generate micro-operations to be processed by the processing circuitry 4. The instructions include at least one complex instruction which has multiple micro-operations. Multiple fetch units 8 are provided for fetching the micro-operations generated by the decoder 30 for processing by the processing circuitry 4. Each fetch unit 8 is associated with at least one of the threads. The decoder 30 generates the micro-operations of a complex instruction individually in response to separate decode requests 24 triggered by a fetch unit 8, each decode request 24 identifying which micro-operation of the complex instruction is to be generated by the decoder 30 in response to the decode request 24.Type: GrantFiled: August 22, 2014Date of Patent: April 3, 2018Assignee: ARM LimitedInventor: Rune Holm
-
Patent number: 9928119Abstract: In a multithreaded data processing system including a plurality of processor cores, storage-modifying requests of a plurality of concurrently executing hardware threads are received in a shared queue. The storage-modifying requests include a translation invalidation request of an initiating hardware thread. The translation invalidation request is removed from the shared queue and buffered in sidecar logic in one of a plurality of sidecars each associated with a respective one of the plurality of hardware threads. While the translation invalidation request is buffered in the sidecar, the sidecar logic broadcasts the translation invalidation request so that it is received and processed by the plurality of processor cores. In response to confirmation of completion of processing of the translation invalidation request by the initiating processor core, the sidecar logic removes the translation invalidation request from the sidecar.Type: GrantFiled: March 28, 2016Date of Patent: March 27, 2018Assignee: International Business Machines CorporationInventors: Guy L. Guthrie, Hugh Shen, Derek E. Williams
-
Patent number: 9929745Abstract: An apparatus and method are described for performing vector compression. For example, one embodiment of a processor comprises: vector compression logic to compress a source vector comprising a plurality of valid data elements and invalid data elements to generate a destination vector in which valid data elements are stored contiguously on one side of the destination vector, the vector compression logic to utilize a bit mask associated with the source vector and comprising a plurality of bits, each bit corresponding to one of the plurality of data elements of the source vector and indicating whether the data element comprises a valid data element or an invalid data element, the vector compression logic to utilize indices of the bit mask and associated bit values of the bit mask to generate a control vector; and shuffle logic to shuffle/permute the data elements of the source vector to the destination vector in accordance with the control vector.Type: GrantFiled: September 26, 2014Date of Patent: March 27, 2018Assignee: Intel CorporationInventors: Simon Rubanovich, David M. Russinoff, Amit Gradstein, John W. O'Leary, Zeev Sperber
-
Patent number: 9875104Abstract: Methods, systems, and apparatus, including an apparatus for processing an instruction for accessing a N-dimensional tensor, the apparatus including multiple tensor index elements and multiple dimension multiplier elements, where each of the dimension multiplier elements has a corresponding tensor index element. The apparatus includes one or more processors configured to obtain an instruction to access a particular element of a N-dimensional tensor, where the N-dimensional tensor has multiple elements arranged across each of the N dimensions, and where N is an integer that is equal to or greater than one; determine, using one or more tensor index elements of the multiple tensor index elements and one or more dimension multiplier elements of the multiple dimension multiplier elements, an address of the particular element; and output data indicating the determined address for accessing the particular element of the N-dimensional tensor.Type: GrantFiled: February 3, 2016Date of Patent: January 23, 2018Assignee: Google LLCInventors: Dong Hyuk Woo, Andrew Everett Phelps
-
Patent number: 8090930Abstract: In a pipelined computer architecture in which instructions may be removed from the instruction queue out of sequence, instruction queue status at a cycle K is determined by adding together the number of invalid instructions or free rows in the queue during cycle K?2, the number of instructions issued for cycle K?1 and the number of instructions speculatively issued in cycle K?1 that have produced a cache hit, and subtracting from the sum the number of instructions enqueued for cycle K?1. The result indicates the number of invalid instructions in the queue cycle K. The number of invalid entries instructions, the number of issued instructions, and the number of enqueued instructions are preferably represented as flat vectors, so that adding is performed by shifting in one direction, while subtracting is performed by shifting in the opposite direction. The result is compared with either the number of instructions to be enqueued in the present cycle, which number is encoded, or with a predetermined value.Type: GrantFiled: January 31, 2003Date of Patent: January 3, 2012Assignee: Hewlett-Packard Development Company, L.P.Inventors: Timothy Charles Fischer, Daniel Lawrence Leibholz, James Arthur Farrell
-
Patent number: 7340585Abstract: A fast linked multiprocessor network (22) including a plurality of processing modules (24, 26, 28, 30, 32, and 34) implemented on a field programmable gate array (10) and a plurality of configurable uni-directional links (21, 23, 25, 27, 29, 31) coupled among at least two of the plurality processing modules providing a streaming communication channel between at least two of the plurality of processing modules.Type: GrantFiled: August 27, 2002Date of Patent: March 4, 2008Assignee: Xilinx, Inc.Inventors: Satish R. Ganesan, Goran Bilski, Usha Prabhu, Ralph D. Wittig
-
Patent number: 7328328Abstract: An apparatus and method are provided for extending a microprocessor instruction set to specify non-temporal memory references at the instruction level. The apparatus includes translation logic and extended execution logic. The translation logic translates an extended instruction into a micro instruction sequence. The extended instruction has an extended prefix and an extended prefix tag. The extended prefix specifies a non-temporal access for a memory reference prescribed by the extended instruction, where the non-temporal access cannot be specified by an existing instruction from an existing instruction set. The extended prefix tag indicates the extended prefix, where the extended prefix tag is an otherwise architecturally specified opcode within the existing instruction set. The extended execution logic is coupled to the translation logic. The extended execution logic receives the micro instruction sequence, and executes the non-temporal access to perform the memory reference.Type: GrantFiled: August 22, 2002Date of Patent: February 5, 2008Assignee: IP-First, LLCInventors: G. Glenn Henry, Rodney E. Hooker, Terry Parks