Patents Examined by Corey S Faherty
  • Patent number: 10235178
    Abstract: Embodiments relate to improving user experiences when executing binary code that has been translated from other binary code. Binary code (instructions) for a source instruction set architecture (ISA) cannot natively execute on a processor that implements a target ISA. The instructions in the source ISA are binary-translated to instructions in the target ISA and are executed on the processor. The overhead of performing binary translation and/or the overhead of executing binary-translated code are compensated for by increasing the speed at which the translated code is executed, relative to non-translated code. Translated code may be executed on hardware that has one or more power-performance parameters of the processor set to increase the performance of the processor with respect to the translated code. The increase in power-performance for translated code may be proportional to the degree of translation overhead.
    Type: Grant
    Filed: June 2, 2017
    Date of Patent: March 19, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Hee jun Park, Mehmet Iyigun
  • Patent number: 10235180
    Abstract: A scheduler implementing a dependency matrix having restricted entries is disclosed. A processing device of the disclosure includes a decode unit to decode an instruction and a scheduler communicably coupled to the decode unit. In one embodiment, the scheduler is configured to receive the decoded instruction, determine that the decoded instruction qualifies for allocation as a restricted reservation station (RS) entry type in a dependency matrix maintained by the scheduler, identify RS entries in the dependency matrix that are free for allocation, allocate one of the identified free RS entries with information of the decoded instruction in the dependency matrix, and update a row of the dependency matrix corresponding to the claimed RS entry with source dependency information of the decoded instruction.
    Type: Grant
    Filed: December 21, 2012
    Date of Patent: March 19, 2019
    Assignee: Intel Corporation
    Inventors: Srikanth T. Srinivasan, Matthew C. Merten, Bambang Sutanto, Rahul R. Kulkarni, Justin M. Deinlein, James D. Hadley
  • Patent number: 10223113
    Abstract: A processor of an aspect includes a decode unit to decode an instruction indicating a first source packed data operand including at least four data elements, a source mask including at least four mask elements, and a destination storage location. An execution unit, in response to the instruction, stores a result packed data operand having a series of at least two unmasked result data elements. Each of the unmasked result data elements stores a value of a different one of at least two consecutive data elements of the first source packed data operand in a relative order. All masked result elements, which are between a nearest corresponding pair of unmasked result data elements, have a same value as an unmasked result data element of the corresponding pair, which is closest to a first end of the result packed data operand. The masked result data elements correspond to masked mask elements.
    Type: Grant
    Filed: March 27, 2014
    Date of Patent: March 5, 2019
    Assignee: Intel Corporation
    Inventor: Mikhail Plotnikov
  • Patent number: 10223119
    Abstract: A processor of an aspect includes a decode unit to decode an instruction that indicates a first source packed data operand including a first plurality of data elements, a source mask including a plurality of mask elements, and a destination storage location. An execution unit, in response to the instruction, stores a result packed data operand. The result packed data operand has at least two unmasked result data elements corresponding to unmasked mask elements of the source mask. Each of the unmasked result data elements has a value of a corresponding data element of the first source packed data operand in a same relative position. All masked result data elements, between each nearest pair of unmasked result data elements, have a same value as an unmasked result data element of the pair closest to a first end of the result packed data operand.
    Type: Grant
    Filed: March 28, 2014
    Date of Patent: March 5, 2019
    Assignee: Intel Corporation
    Inventor: Mikhail Plotnikov
  • Patent number: 10223112
    Abstract: A method of an aspect includes receiving an instruction. The instruction indicates an integer stride, indicates an integer offset, and indicates a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four integers in numerical order with a smallest one of the at least four integers differing from zero by the integer offset and with all integers of the sequence in consecutive positions differing by the integer stride. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Grant
    Filed: September 30, 2017
    Date of Patent: March 5, 2019
    Inventors: Seth Abraham, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Zeev Sperber, Amit Gradstein
  • Patent number: 10223111
    Abstract: A method of an aspect includes receiving an instruction. The instruction indicates an integer stride, indicates an integer offset, and indicates a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four integers in numerical order with a smallest one of the at least four integers differing from zero by the integer offset and with all integers of the sequence in consecutive positions differing by the integer stride. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Grant
    Filed: September 30, 2017
    Date of Patent: March 5, 2019
    Assignee: Intel Corporation
    Inventors: Seth Abraham, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Zeev Sperber, Amit Gradstein
  • Patent number: 10216515
    Abstract: Circuitry may be configured to identify a particular element position of a bit vector stored in a register, where a value of the element occupying the particular element position matches a first predetermined value, and determine an address value dependent upon the particular element position of the bit vector and a base address. The circuitry may be further configured to load data from a memory dependent upon the address value.
    Type: Grant
    Filed: October 18, 2016
    Date of Patent: February 26, 2019
    Assignee: Oracle International Corporation
    Inventors: Erik Schlanger, Charles Roth, Daniel Fowler
  • Patent number: 10191740
    Abstract: A method performed by a processor includes receiving an instruction. The instruction indicating a source operand, indicating a stride, indicating at least one set of strided data element positions out of all sets of strided data element positions for the indicated stride, and indicating at least one destination packed data register. The method also includes storing, in response to the instruction, for each of the indicated at least one set of strided data element positions, a corresponding result packed data operand, in a corresponding destination packed data register of the processor. Each result packed data operand including a plurality of data elements, which are from the corresponding indicated set of strided data element positions of the source operand. The strided data element positions of the set are separated from one another by integer multiples of the indicated stride. Other methods, processors, systems, and machine readable media are also disclosed.
    Type: Grant
    Filed: February 28, 2017
    Date of Patent: January 29, 2019
    Assignee: Intel Corporation
    Inventors: Mikhail Plotnikov, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 10191746
    Abstract: A method for accelerating code optimization a microprocessor. The method includes fetching an incoming microinstruction sequence using an instruction fetch component and transferring the fetched macroinstructions to a decoding component for decoding into microinstructions. Optimization processing is performed by reordering the microinstruction sequence into an optimized microinstruction sequence comprising a plurality of dependent code groups. The plurality of dependent code groups are then output to a plurality of engines of the microprocessor for execution in parallel. A copy of the optimized microinstruction sequence is stored into a sequence cache for subsequent use upon a subsequent hit optimized microinstruction sequence.
    Type: Grant
    Filed: November 22, 2011
    Date of Patent: January 29, 2019
    Assignee: INTEL CORPORATION
    Inventor: Mohammad Abdallah
  • Patent number: 10175986
    Abstract: A processor includes a logic for stateless capture of data linear addresses (DLA) during precise event based sampling (PEBS) for an out-of-order execution engine. The engine may include a PEBS unit with logic to increment a counter each time an instance of a designated micro-op is retired a reorder buffer, capture output DLA referenced by an instance of the micro-op that executes after the counter overflows, set a captured bit associated with a reorder buffer identifier for the instance of the micro-op, and store a PEBS record in a debug storage when the instance of the micro-op is retired from the reorder buffer. The designated micro-op references a DLA of a memory accessible to the processor.
    Type: Grant
    Filed: May 8, 2017
    Date of Patent: January 8, 2019
    Assignee: Intel Corporation
    Inventors: Roger Gramunt, Ramon Matas, Benjamin C. Chaffin, Neal S. Moyer, Rammohan Padmanabhan, Alexey P. Suprun, Matthew G. Smith
  • Patent number: 10157062
    Abstract: A method is described for operating a microprocessor, in which a conversion software executed in the microprocessor carries out a binary translation, in the course of which a source instruction that is encoded according to a first instruction-set architecture is translated into a target instruction in a binary manner, which is encoded according to a second instruction-set architecture, and the target instruction translated by the translation software into the second instruction-set architecture being replicated, and in this replicated target instruction a memory area which is to be accessed in the course of the execution of the target instruction is replaced by a second memory area, and the target instruction and the copied target instruction is executed by the microprocessor.
    Type: Grant
    Filed: February 28, 2017
    Date of Patent: December 18, 2018
    Assignee: ROBERT BOSCH GMBH
    Inventor: Jaroslaw Topp
  • Patent number: 10146736
    Abstract: A data processing system comprising: a processor comprising a plurality of cores, each core comprising a first processing pipeline and a second processing pipeline, the second processing pipeline having a different architecture to the first processing pipeline; a framework configured to manage the processing resources of the data processing system including the processor; and an interface configured to present to the framework each of the processing pipelines as a core.
    Type: Grant
    Filed: September 18, 2015
    Date of Patent: December 4, 2018
    Assignee: Imagination Technologies Limited
    Inventor: Jung-Wook Park
  • Patent number: 10146541
    Abstract: Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block.
    Type: Grant
    Filed: November 12, 2015
    Date of Patent: December 4, 2018
    Assignee: Intel Corporation
    Inventors: Julien Sebot, William W. Macy, Jr., Eric L. Debes, Huy V. Nguyen
  • Patent number: 10146542
    Abstract: Methods and apparatuses relating to converting encoding formats are described. In one embodiment, a hardware processor includes a decode circuit to decode an instruction comprising a state operand, a source vector operand, a destination vector operand, and a control operand, and an execution circuit to execute the instruction to convert elements from the source vector operand in a first encoding format to a second encoding format, store the elements in the second encoding format in the destination vector operand, store a total length of the elements in the second encoding format in the state operand, and set a stream completion indication in the control operand when the elements from the source vector operand are a last elements in a data stream.
    Type: Grant
    Filed: December 29, 2015
    Date of Patent: December 4, 2018
    Assignee: Intel Corporation
    Inventors: Yevgeny Y. Rouban, Daniil Y. Sokolov
  • Patent number: 10146535
    Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand.
    Type: Grant
    Filed: October 20, 2016
    Date of Patent: December 4, 2018
    Assignee: Intel Corporatoin
    Inventors: Jesus Corbal, Robert Valentine, Roman S. Dubtsov, Nikita A. Shustrov, Mark J. Charney, Dennis R. Bradford, Milind B. Girkar, Edward T. Grochowski, Thomas D. Fletcher, Warren E. Ferguson
  • Patent number: 10140128
    Abstract: A parallelized multiple dispatch ordered queue including an ordered queue, qualify logic, ordered select logic, and dispatch logic. The ordered queue stores candidates in order from oldest to youngest into multiple entries. The ordered queue is divided into N groups in which an i'th group includes every i'th entry of every N entries of the ordered queue, wherein i is an integer less than or equal to N. The qualify logic determines whether any candidate is ready to be dispatched. The ordered select logic respectively determines the oldest candidate in each group that is ready to be dispatched. The dispatch logic dispatches the oldest ready candidates in parallel. The shift logic shifts the stored candidates in the ordered queue to fill any vacant entries between remaining ones of the stored candidates without changing an order of the remaining ones of the stored candidates in the ordered queue. The ordered queue may have any size or depth and N is any suitable integer determining the number of candidates (e.g.
    Type: Grant
    Filed: March 10, 2015
    Date of Patent: November 27, 2018
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.
    Inventors: Qianli Di, Jianbin Wang, Weili Li, Xiaoyuan Yu, Xin Yu Gao
  • Patent number: 10127043
    Abstract: A method and system for implementing very long instruction words (VLIW), the system operable to: receive a first very long instruction word (VLIW) including a set of slot instructions corresponding to a set of functional units, where: each slot instruction includes an opcode identifying an operation to be performed by the set of functional units and value fields related to the operation, where a dedicated subset of the value fields include dedicated bits dedicated to the slot instruction and an allocable subset of the value fields include allocable bits allocable to other slot instructions; identify the opcodes of each slot instruction; determine, based on the opcodes, which allocable bits are allocated to which slot instructions; and instruct each functional unit to perform an operation identified by a corresponding slot instruction using the corresponding dedicated bits and any allocable bits determined to be allocated to the slot instruction.
    Type: Grant
    Filed: October 19, 2016
    Date of Patent: November 13, 2018
    Assignee: Rex Computing, Inc.
    Inventors: Paul Michael Sebexen, Thomas Rex Sohmers
  • Patent number: 10127059
    Abstract: A multi-tenant virtual machine infrastructure (MTVMI) allows multiple tenants to independently access and use a plurality of virtual computing resources via the Internet. Within the MTVMI, different tenants may define unique configurations of virtual computing resources and unique rules to govern the use of the virtual computing resources. The MTVMI may be configured to provide valuable services for tenants and users associated with the tenants.
    Type: Grant
    Filed: May 2, 2009
    Date of Patent: November 13, 2018
    Assignee: Skytap
    Inventors: Nicholas Luis Astete, Aaron Benjamin Brethorst, Joseph Michael Goldberg, Matthew Hanlon, Anthony A. Hutchinson, Jr., Gopalakrishnan Janakiraman, Alexander Kotelnikov, Petr Novodvorskiy, David William Richardson, Roxanne Camille Skelly, Nikolai Slioussar, Jonathan Weeks
  • Patent number: 10120685
    Abstract: An apparatus and method for supporting simultaneous multiple iterations (SMI) in a course grained reconfigurable architecture (CGRA). In support of SMI, the apparatus includes: Hardware structures that connect all of multiple processing engines (PEs) to a load-store unit (LSU) configured to keep track of which compiled program code iterations have completed, which ones are in flight and which are yet to begin, and a control unit including hardware structures that are used to maintain synchronization and initiate and terminate loops within the PEs. SMI permits execution of the next instruction within any iteration (in flight). If instructions from multiple iterations are ready for execution (and are pre-decoded), then the hardware selects the lowest iteration number ready for execution. If in a particular clock cycle, a loop iteration with a lower iteration number is stalled (i.e.
    Type: Grant
    Filed: November 4, 2015
    Date of Patent: November 6, 2018
    Assignee: International Business Machines Corporation
    Inventors: Chia-yu Chen, Kailash Gopalakrishnan, Jinwook Oh, Sunil K. Shukla, Vijayalakshmi Srinivasan
  • Patent number: 10114639
    Abstract: An arithmetic device which controls a parallel arithmetic operation includes a global memory, a plurality of compute units, each of the compute units including a local memory and a plurality of processing elements, and each of the processing elements including a private memory and processing data blocks stored in the private memory, an attribute group holding unit which includes a specific attribute which includes a parameter indicative of a size of the data block, an arithmetic attribute which includes a parameter indicating whether the data block is a data relevant to processing, and indicating a transfer order when the data block is data relevant to processing, and a policy attribute which includes a parameter indicative of how to execute a transfer of the data block and how to execute processing of the data block.
    Type: Grant
    Filed: April 28, 2017
    Date of Patent: October 30, 2018
    Assignee: RENESAS ELECTRONICS CORPORATION
    Inventor: Shorin Kyo