Patents Examined by Daniel H. Pan
  • Patent number: 10157064
    Abstract: A method of managing instruction execution for multiple instruction streams using a processor core having multiple parallel instruction execution slices. An event is detected indicating that either resource requirement or resource availability for a subsequent instruction of an instruction stream will not be met by the instruction execution slice currently executing the instruction stream. In response to detecting the event, dispatch of at least a portion of the subsequent instruction is made to another instruction execution slice. The event may be a compiler-inserted directive, may be an event detected by logic in the processor core, or may be determined by a thread sequencer. The instruction execution slices may be dynamically reconfigured as between single-instruction-multiple-data (SIMD) instruction execution, ordinary instruction execution, wide instruction execution.
    Type: Grant
    Filed: February 27, 2017
    Date of Patent: December 18, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Lee Evan Eisen, Hung Qui Le, Jentje Leenstra, Jose Eduardo Moreira, Bruce Joseph Ronchetti, Brian William Thompto, Albert James Van Norstrand, Jr.
  • Patent number: 10146538
    Abstract: Suspendable load address tracking inside transactions is disclosed. An example processing device of implementations of the disclosure includes a transactional memory (TM) read set tracking component circuitry to identify a suspend read tracking instruction within a transaction executed by the processing device, mark load instructions occurring in the transaction subsequent to the identified suspend read tracking instruction with a suspend attribute, wherein the addresses corresponding to the marked load instructions are excluded from a read set maintained for the transaction, identify a resume read tracking instruction within the transaction, and stop marking the load instructions occurring subsequent to the identified resume read tracking instruction with the suspend attribute.
    Type: Grant
    Filed: September 30, 2016
    Date of Patent: December 4, 2018
    Assignee: Intel Corporation
    Inventors: Raanan Sade, Roman Dementiev, Ravi Rajwar, Ady Tal, Alex Gerber
  • Patent number: 10140138
    Abstract: Methods for supporting wide and efficient front-end operation with guest architecture emulation are disclosed. As a part of a method for supporting wide and efficient front-end operation, upon receiving a request to fetch a first far taken branch instruction, a cache line that includes the first far taken branch instruction, a next cache line and a cache line located at the target of the first far taken branch instruction is read. Based on information that is accessed from a data table, the cache line and either the next cache line or the cache line located at the target is fetched in a single cycle.
    Type: Grant
    Filed: March 17, 2014
    Date of Patent: November 27, 2018
    Assignee: Intel Corporation
    Inventors: Mohammad Abdallah, Ankur Groen, Erika Gunadi, Mandeep Singh, Ravishankar Rao
  • Patent number: 10126976
    Abstract: An address and a data size are provided to a rotator. The rotator stores, based on the address and the data size, a data element in a location having a defined number of positions. The data element includes one or more data units and the one or more data units are aligned correctly in one or more positions of the location based on a predefined position in the location to receive a selected data unit of the one or more data units. The rotator replicates a value of a chosen data unit of the one or more data units to one or more other positions of the location.
    Type: Grant
    Filed: February 17, 2017
    Date of Patent: November 13, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Michael K. Gschwind
  • Patent number: 10114673
    Abstract: A method for scheduling the execution of a computer instruction, receive an entitlement processor resource percentage for a logical partition on a computer system. The logical partition is associated with a hardware thread of a processor of the computer system. The entitlement processor resource percentage for the logical partition is stored in a register of the hardware thread associated with the logical partition. An instruction is received from the logical partition of the computer system and the processor dispatches the instruction based on the entitlement processor resource percentage stored in the register of the hardware thread associated with the logical partition.
    Type: Grant
    Filed: September 29, 2016
    Date of Patent: October 30, 2018
    Assignee: International Business Machines Corporation
    Inventors: Nitin Gupta, Mehulkumar J. Patel, Deepak C. Shetty
  • Patent number: 10108419
    Abstract: Systems and methods for dependency-prediction include executing instructions in an instruction pipeline of a processor and detecting a conditionality-imposing control instruction, such as an If-Then (IT) instruction, which imposes dependent behavior on a conditionality block size of one or more dependent instructions. Prior to executing a first instruction, a dependency-prediction is made to determine if the first instruction is a dependent instruction of the conditionality-imposing control instruction, based on the conditionality block size and one or more parameters of the instruction pipeline. The first instruction is executed based on the dependency-prediction. When the first instruction is dependency-mispredicted, an associated dependency-misprediction penalty is mitigated. If the first instruction is a branch instruction, the mitigation involves training a branch prediction tracking mechanism to correctly dependency-predict future occurrences of the first instruction.
    Type: Grant
    Filed: September 26, 2014
    Date of Patent: October 23, 2018
    Assignee: QUALCOMM Incorporated
    Inventors: Brian Michael Stempel, James Norris Dieffenderfer, Michael Scott McIlvaine, Melinda Joyce Brown
  • Patent number: 10095519
    Abstract: Apparatus and methods are disclosed for controlling instruction flow in block-based processor architectures. In one example of the disclosed technology, an instruction block address register stores an index address to a memory storing a plurality of instructions for an instruction block, the indexed address being inaccessible when the processor is in one or more unprivileged operational modes, one or more execution units configured to execute instructions for the instruction block, and a control unit configured to fetch and decode two or more of the plurality of instructions from the memory based on the indexed address.
    Type: Grant
    Filed: March 3, 2016
    Date of Patent: October 9, 2018
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Douglas C. Burger, Aaron L. Smith
  • Patent number: 10095522
    Abstract: A processor includes a core, a memory subsystem, a predictor module, and a memory rename module. The predictor module may include a first logic to identify a dependency between a store instruction and a load instruction, and a second logic to assign a memory renaming (MRN) register to the store instruction and the load instruction based on the identified dependency. Further, the memory rename module may include a third logic to copy, based on the assigned MRN register, information in a first logical register associated with the store instruction directly to a second logical register associated with the load instruction.
    Type: Grant
    Filed: December 23, 2014
    Date of Patent: October 9, 2018
    Assignee: Intel Corporation
    Inventors: Kamil Garifullin, Stanislav Shwartsman, Lihu Rappoport, Zeev Sperber, Pavel I. Kryukov, Andrey Kluchnikov, Igor Yanover, George Leifman, Alex Gerber, Jared W. Stark
  • Patent number: 10083035
    Abstract: A streaming engine employed in a digital data processor specifies fixed first and second read only data streams. Corresponding stream address generator produces address of data elements of the two streams. Corresponding steam head registers stores data elements next to be supplied to functional units for use as operands. The two streams share two memory ports. A toggling preference of stream to port ensures fair allocation. The arbiters permit one stream to borrow the other's interface when the other interface is idle. Thus one stream may issue two memory requests, one from each memory port, if the other stream is idle. This spreads the bandwidth demand for each stream across both interfaces, ensuring neither interface becomes a bottleneck.
    Type: Grant
    Filed: December 20, 2016
    Date of Patent: September 25, 2018
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Joseph Zbiciak, Timothy Anderson
  • Patent number: 10078551
    Abstract: This invention is a streaming engine employed in a digital signal processor. A fixed data stream sequence including plural nested loops is specified by a control register. The streaming engine includes an address generator producing addresses of data elements and a steam head register storing data elements next to be supplied as operands. The streaming engine fetches stream data ahead of use by the central processing unit core in a stream buffer. Parity bits are formed upon storage of data in the stream buffer which are stored with the corresponding data. Upon transfer to the stream head register a second parity is calculated and compared with the stored parity. The streaming engine signals a parity fault if the parities do not match. The streaming engine preferably restarts fetching the data stream at the data element generating a parity fault.
    Type: Grant
    Filed: December 20, 2016
    Date of Patent: September 18, 2018
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Joseph Zbiciak, Timothy Anderson
  • Patent number: 10073696
    Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. The streaming engine fetches stream data ahead of use by the central processing unit core in a stream buffer constructed like a cache. The stream buffer cache includes plural cache lines, each includes tag bits, at least one valid bit and data bits. Cache lines are allocated to store newly fetched stream data. Cache lines are deallocated upon consumption of the data by a central processing unit core functional unit. Instructions preferably include operand fields with a first subset of codings corresponding to registers, a stream read only operand coding and a stream read and advance operand coding.
    Type: Grant
    Filed: December 20, 2016
    Date of Patent: September 11, 2018
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventor: Joseph Zbiciak
  • Patent number: 10061584
    Abstract: Apparatus and methods are disclosed for nullifying memory store instructions identified in a target field of a nullification instruction. In some examples of the disclosed technology, an apparatus can include memory and one or more block-based processor cores configured to fetch and execute a plurality of instruction blocks. One of the cores can include a control unit configured, based at least in part on receiving a nullification instruction, to obtain an instruction identification for a memory access instruction of a plurality of memory access instructions, based on a target field of the nullification instruction. The memory access instruction associated with the instruction identification is nullified. The memory access instruction is in a first instruction block of the plurality of instruction blocks. Based on the nullified memory access instruction, a subsequent memory access instruction from the first instruction block is executed.
    Type: Grant
    Filed: March 3, 2016
    Date of Patent: August 28, 2018
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Douglas C. Burger, Aaron L. Smith
  • Patent number: 10061675
    Abstract: This invention is a streaming engine employed in a digital signal processor. A fixed data stream sequence is specified by a control register. The streaming engine fetches stream data ahead of use by a central processing unit and stores it in a stream buffer. Upon occurrence of a fault reading data from memory, the streaming engine identifies the data element triggering the fault preferably storing this address in a fault address register. The streaming engine defers signaling the fault to the central processing unit until this data element is used as an operand. If the data element is never used by the central processing unit, the streaming engine never signals the fault. The streaming engine preferably stores data identifying the fault in a fault source register. The fault address register and the fault source register are preferably extended control registers accessible only via a debugger.
    Type: Grant
    Filed: December 20, 2016
    Date of Patent: August 28, 2018
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Joseph Zbiciak, Timothy D. Anderson, Duc Bui, Kai Chirca
  • Patent number: 10061746
    Abstract: A processor includes a front end to fetch an instruction. The instruction is to calculate a data point using inputs from a plurality of adjacent source data in a plurality of dimensions. The processor includes a decoder to decode the instruction. The processor also includes a core to, based on the decoded instruction, perform a plurality of tabular vector read operations to read the plurality of adjacent source data and perform a tabular vector calculation to execute the instruction. The tabular vector calculation is based upon results of performing the plurality of tabular vector read operations. The core is further to write results of the tabular vector calculation.
    Type: Grant
    Filed: September 26, 2014
    Date of Patent: August 28, 2018
    Assignee: Intel Corporation
    Inventor: Charles R. Yount
  • Patent number: 10042640
    Abstract: A data processing system 2 includes multiple out-of-order issue queues 8, 10. A master serialization instruction MSI received by a first issue queue 8 is detected by slave generation circuitry 24 which generates a slave serialization instruction SSI added to a second issue queue 10. The master serialization instruction MSI manages serialization relative to the instructions within the first issue queue 8. The slave serialization instruction SSI manages serialization relative to the instructions within the second issue queue 10. The master serialization instruction MSI and the slave serialization instruction SSI are removed when both have met their serialization conditions and are respectively the oldest instructions within their issue queues.
    Type: Grant
    Filed: March 22, 2016
    Date of Patent: August 7, 2018
    Assignee: ARM Limited
    Inventors: Luca Scalabrino, Frederic Jean Denis Arsanto, Thomas Gilles Tarridec, Cedric Denis Robert Airaud
  • Patent number: 10037207
    Abstract: A computer processor includes a branch prediction unit that includes a local branch predictor and a global branch predictor. Managing power consumption in such a computer processor includes, for each of a plurality of branch instructions: performing, by the local branch predictor, a local branch prediction; performing, by each of the global branch predictors, a global branch prediction; determining to utilize the local branch prediction over the global branch predictions as a branch prediction for the branch instruction; incrementing a value of a counter; determining whether the value of the counter exceeds a predetermined threshold; and if the value of the counter exceeds the predetermined threshold, powering down at least one of the global branch predictors and configuring the branch prediction unit to bypass the powered down global branch predictor for branch predictions of subsequent branch instructions.
    Type: Grant
    Filed: July 27, 2016
    Date of Patent: July 31, 2018
    Assignee: International Business Machines Corporation
    Inventors: David S. Levitan, Nicholas R. Orzol, Robert A. Philhower
  • Patent number: 10037211
    Abstract: Operation of a multi-slice processor that includes a plurality of execution slices and a plurality of load/store slices, where each load/store slice includes a load miss queue and a load reorder queue, includes: receiving, at a load reorder queue, a load instruction requesting data; responsive to the data not being stored in a data cache, determining whether a previous load instruction is pending a fetch of a cache line comprising the data; if the cache line does not comprise the data, allocating an entry for the load instruction in the load miss queue; and if the cache line does comprise the data: merging, in the load reorder queue, the load instruction with an entry for the previous load instruction.
    Type: Grant
    Filed: March 22, 2016
    Date of Patent: July 31, 2018
    Assignee: International Business Machines Corporation
    Inventors: Kimberly M. Fernsler, David A. Hrusecky, Hung Q. Le, Elizabeth A. McGlone, Brian W. Thompto
  • Patent number: 10031756
    Abstract: Apparatus and methods are disclosed for nullifying memory store instructions and one or more registers identified in a target field of a nullification instruction. In some examples of the disclosed technology, an apparatus can include memory and one or more block-based processor cores configured to fetch and execute a plurality of instruction blocks. One of the cores can include a control unit configured, based at least in part on receiving a nullification instruction, to obtain an instruction identification for a memory access instruction of a plurality of memory access instructions and a register identification of at least one of a plurality of registers, based on a first and second target fields of the nullification instruction. The at least one register and the memory access instruction associated with the instruction identification are nullified. Based on the nullified memory access instruction, a subsequent memory access instruction is executed.
    Type: Grant
    Filed: March 3, 2016
    Date of Patent: July 24, 2018
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Douglas C. Burger, Aaron L. Smith
  • Patent number: 10025752
    Abstract: Disclosed are a data processing method, a processor, and a data processing device. The method comprises: an arbiter sends data D(a,1) to a first processing circuit; the first processing circuit processes the data D(a,1) to obtain data D(1,2), the first processing circuit being a processing circuit among m processing circuits; the first processing circuit sends the data D(1,2) to a second processing circuit; the second processing circuit to an mth processing circuit separately process the received data; and the arbiter receives data D(m,a) sent by the mth processing circuit. The processor comprises an arbiter and a first processing circuit to an (m+1)th processing circuit. Each processing circuit in the first processing circuit to the (m+1)th processing circuit can receive first data to be processed sent by the arbiter, and process the first data to be processed. The scheme is helpful to improve efficiency of data processing.
    Type: Grant
    Filed: October 28, 2016
    Date of Patent: July 17, 2018
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Nan Li, Linchun Wang, Hongfei Chen
  • Patent number: 10019263
    Abstract: In a processor, a disambiguation-free out of order load store queue method. The method includes implementing a memory resource that can be accessed by a plurality of asynchronous cores; implementing a store retirement buffer, wherein stores from a store queue have entries in the store retirement buffer in original program order; and implementing speculative execution, wherein results of speculative execution can be saved in the store retirement/reorder buffer as a speculative state. The method further includes, upon dispatch of a subsequent load from a load queue, searching the store retirement buffer for address matching; and, in cases where there are a plurality of address matches, locating a correct forwarding entry by scanning for the store retirement buffer for a first match, and forwarding data from the first match to the subsequent load. Once speculative outcomes are known, the speculative state is retired to memory.
    Type: Grant
    Filed: December 12, 2014
    Date of Patent: July 10, 2018
    Assignee: Intel Corporation
    Inventor: Mohammad A. Abdallah