Patents Examined by Kenneth Kim
  • Patent number: 9268566
    Abstract: Multiple sets of character data having termination characters are compared using parallel processing and without causing unwarranted exceptions. Each set of character data to be compared is loaded within one or more vector registers. In particular, in one embodiment, for each set of character data to be compared, an instruction is used that loads data in a vector register to a specified boundary, and provides a way to determine the number of characters loaded. Further, an instruction is used to find the index of the first delimiter character, i.e., the first zero or null character, or the index of unequal characters. Using these instructions, a location of the end of one of the sets of data or a location of an unequal character is efficiently provided.
    Type: Grant
    Filed: March 15, 2012
    Date of Patent: February 23, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Michael K. Gschwind, Timothy J. Slegel
  • Patent number: 9256439
    Abstract: A data processing apparatus causes multiple processors to process in parallel input data that is arrayed two-dimensionally, and stores the data of the processing results in a cache line of a cache memory, where the data of the processing results includes a plurality of pieces of data of a predetermined width that is smaller than a cache line width of the cache memory. The data stored in the cache memory is then transferred together to a main memory as in the cache line.
    Type: Grant
    Filed: April 10, 2012
    Date of Patent: February 9, 2016
    Assignee: Canon Kabushiki Kaisha
    Inventor: Hirokazu Takahashi
  • Patent number: 9256429
    Abstract: This disclosure describes techniques for selectively activating a resume check operation in a single instruction, multiple data (SIMD) processing system. A processor is described that is configured to selectively enable or disable a resume check operation for a particular instruction based on information included in the instruction that indicates whether a resume check operation is to be performed for the instruction. A compiler is also described that is configured to generate compiled code which, when executed, causes a resume check operation to be selectively enabled or disabled for particular instructions. The compiled code may include one or more instructions that each specify whether a resume check operation is to be performed for the respective instruction. The techniques of this disclosure may be used to reduce the power consumption of and/or improve the performance of a SIMD system that utilizes a resume check operation to manage the reactivation of deactivated threads.
    Type: Grant
    Filed: September 21, 2012
    Date of Patent: February 9, 2016
    Assignee: QUALCOMM Incorporated
    Inventors: Lin Chen, Yun Du, Andrew Gruber
  • Patent number: 9256426
    Abstract: A system and method for controlling processor instruction execution. In one example, a method for controlling a total number of instructions executed by a processor includes instructing the processor to iteratively execute instructions via multiple iterations until a predetermined time period has elapsed. A number of instructions executed in each iteration of the iterations is less than a number of instructions executed in a prior iteration of the iterations. The method also includes determining the total number of instructions executed during the predetermined time period.
    Type: Grant
    Filed: September 14, 2012
    Date of Patent: February 9, 2016
    Assignee: General Electric Company
    Inventors: William David Smith, II, Jon Marc Diekema, Joshua Nathaniel Edmison, Safayet Nizam Uddin Ahmed
  • Patent number: 9244686
    Abstract: An instruction translator receives a conditional load/store instruction that specifies a condition, destination/data register, base register, offset source, and memory addressing mode. The instruction instructs the microprocessor to load data from a memory location into the destination register (conditional load) or store data to the memory location from the data register (conditional store) only if the condition flags satisfy the condition. The offset source specifies whether the offset is an immediate value or a value in an offset register. The addressing mode specifies whether the base register is updated when the condition flags satisfy the condition. The instruction translator translates the conditional load instruction into a number of microinstructions, which varies as a function of the offset source, addressing mode, and whether the conditional instruction is a conditional load or store instruction.
    Type: Grant
    Filed: April 6, 2012
    Date of Patent: January 26, 2016
    Assignee: VIA TECHNOLOGIES, INC.
    Inventors: G. Glenn Henry, Terry Parks, Rodney E. Hooker, Gerard M. Col, Colin Eddy
  • Patent number: 9223577
    Abstract: Various techniques for processing instructions that specify multiple destinations. A first portion of a processor pipeline is configured to split a multi-destination instruction into a plurality of single-destination operations. A second portion of the pipeline is configured to process the plurality of single-destination operations. A third portion of the pipeline is configured to merge the plurality of single-destination operations into one or more multi-destination operations. The one or more multi-destination operations may be performed. The first portion of the pipeline may include a decode unit. The second portion of the pipeline may include a map unit, which may in turn include circuitry configured to maintain a list of free architectural registers and a mapping table that maps physical registers to architectural registers. The third portion of the pipeline may comprise a dispatch unit. In some embodiments, this may provide certain advantages such as reduced area and/or power consumption.
    Type: Grant
    Filed: September 26, 2012
    Date of Patent: December 29, 2015
    Assignee: Apple Inc.
    Inventors: John H. Mylius, Gerard R. Williams, III, James B. Keller, Fang Liu, Shyam Sundar
  • Patent number: 9223572
    Abstract: An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element.
    Type: Grant
    Filed: December 29, 2012
    Date of Patent: December 29, 2015
    Assignee: Intel Corporation
    Inventors: Alexander D. Peleg, Yaakov Yaari, Millind Mittal, Larry M. Mennemeier, Benny Eitan
  • Patent number: 9201658
    Abstract: In an embodiment, a processor may be configured to fetch N instruction bytes from an instruction cache (a “fetch group”), even if the fetch group crosses a cache line boundary. A branch predictor may be configured to produce branch predictions for up to M branches in the fetch group, where M is a maximum number of branches that may be included in the fetch group. In an embodiment, branch prediction values from multiple entries in each table may be read and respective branch prediction values may be combined to form branch predictions for up to M branches in the fetch group.
    Type: Grant
    Filed: September 24, 2012
    Date of Patent: December 1, 2015
    Assignee: Apple Inc.
    Inventors: Ian D. Kountanis, Gerard R. Williams, III, James B. Keller
  • Patent number: 9182983
    Abstract: A processor of an aspect includes a register file including a first register to hold a first packed data including a first low data element and a first high data element, a second register to hold a second packed data including a second low data element and a second high data element, and a third register. The processor also includes a decoder to decode an unpack instruction. The processor also includes a functional unit coupled with the decoder and the register file. The functional unit, in response to the decoder decoding the unpack instruction, is to transfer the first low data element to a high position of the third register and the second low data element to a low position of the third register.
    Type: Grant
    Filed: December 29, 2012
    Date of Patent: November 10, 2015
    Assignee: Intel Corporation
    Inventors: Alexander D. Peleg, Yaakov Yaari, Millind Mittal, Larry M. Mennemeier, Benny Eitan
  • Patent number: 9170979
    Abstract: An integrated circuit includes one or more transaction data sources and one or more transaction data destinations connected via interconnect circuitry comprising a plurality of interconnect nodes. Within the interconnect nodes there are one or more converging interconnect nodes. A converging interconnect node includes prediction data generation circuitry for reading characteristics of a current item of transaction data from the converging interconnect node and generating associated prediction data for a future item of transaction data which will be returned to the converging interconnect node at a predetermined time in the future. This prediction data is stored within prediction data storage circuitry and is read by prediction data evaluation circuitry to control processing of a future item of transaction data corresponding to that prediction data when it is returned to the converging interconnect node. The interconnect circuitry may have a branching network topology or recirculating ring based topology.
    Type: Grant
    Filed: March 23, 2012
    Date of Patent: October 27, 2015
    Assignee: ARM Limited
    Inventors: Sean James Salisbury, Andrew David Tune
  • Patent number: 9164769
    Abstract: A reconfigurable array is provided. The reconfigurable array includes a Very Long Instruction Word (VLIW) mode and a Coarse-Grained Array (CGA) mode. When the VLIW mode is converted to the CGA mode, instead of sharing a central register file between the VLIW mode and the CGA mode, live data to be used in the CGA mode is copied from the central register file to local register files.
    Type: Grant
    Filed: December 8, 2010
    Date of Patent: October 20, 2015
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Won-Sub Kim, Tai-Song Jin, Dong-Hoon Yoo, Bernhard Egger, Jin-Seok Lee
  • Patent number: 9164772
    Abstract: A queuing apparatus having a hierarchy of queues, in one of a number of aspects, is configured to control backpressure between processors in a multiprocessor system. A fetch queue is coupled to an instruction cache and configured to store first instructions for a first processor and second instructions for a second processor in an order fetched from the instruction cache. An in-order queue is coupled to the fetch queue and configured to store the second instructions accepted from the fetch queue in response to a write indication. An out-of-order queue is coupled to the fetch queue and to the in-order queue and configured to store the second instructions accepted from the fetch queue in response to an indication that space is available in the out-of-order queue, wherein the second instructions may be accessed out-of-order with respect to other second instructions executing on different execution pipelines.
    Type: Grant
    Filed: January 25, 2012
    Date of Patent: October 20, 2015
    Assignee: QUALCOMM Incorporated
    Inventors: Kenneth Alan Dockser, Yusuf Cagatay Tekmen
  • Patent number: 9158539
    Abstract: A microprocessor, a method for enhanced precision sum-of-products calculation and a video decoding device are provided, in which at least one general-purpose-register is arranged to provide a number of destination bits to a multiply unit, and a control unit is adapted to provide at least a multiply-high instruction and a multiply-high-and-accumulate instruction to the multiply unit. The multiply unit is arranged to receive at least first and second source operands having an associated number of source bits, a sum of source bits exceeding the number of destination bits, connected to a register-extension cache comprising at least one cache entry arranged to store a number of precision-enhancement bits, and adapted to store a destination portion of a result operand in the general-purpose-register and a precision enhancement portion in the cache entry. The result operand is generated by a multiply-high operation or by a multiply-high-and-accumulate operation, depending on the received instructions.
    Type: Grant
    Filed: November 30, 2009
    Date of Patent: October 13, 2015
    Assignee: RACORS GmbH
    Inventor: Martin Raubuch
  • Patent number: 9158545
    Abstract: A bytecode interpreter is provided. The interpreter assists in branch prediction by a host processor reducing branch misprediction and achieving high performance. The bytecode branch processor includes an interpreter configured to process a program in a bytecode format in a virtual machine, a branch information generator configured to obtain, while a predefined number of bytecodes are read prior to a current bytecode being processed by the interpreter, a branch address and a target address of a predicted path of a branch corresponding to a preceding bytecode, the branch address being of a branch code included in a preceding handler that processes the preceding bytecode, and the target address being of a current handler that processes the current bytecode to which the preceding handler branches, and a branch target buffer updater configured to update a branch target buffer in the bytecode branch processor with the obtained branch address and target address.
    Type: Grant
    Filed: October 18, 2011
    Date of Patent: October 13, 2015
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Kue-Hwan Sihn, Seung-Mo Cho
  • Patent number: 9152421
    Abstract: A method of encapsulating a long instruction in a set of short instructions for execution on a processor, the long instruction having k bits and each short instruction having l bits where l<k, includes assembling a first portion of the long instruction and a first identifier to form a first instruction of the set of short instructions; and assembling a second portion of the long instruction and a second identifier to form a second instruction of the set of short instructions; wherein at least one of the first and second identifiers is for identifying to the processor that the set of short instructions encapsulates the long instruction.
    Type: Grant
    Filed: December 21, 2011
    Date of Patent: October 6, 2015
    Assignee: QUALCOMM Incorporated
    Inventors: Peter Smith, David Richard Hargreaves
  • Patent number: 9152426
    Abstract: A method of data processing includes a processor of a data processing system executing a controlling thread of a program and detecting occurrence of a particular asynchronous event during execution of the controlling thread of the program. In response to occurrence of the particular asynchronous event during execution of the controlling thread of the program, the processor initiates execution of an assist thread of the program such that the processor simultaneously executes the assist thread and controlling thread of the program.
    Type: Grant
    Filed: April 16, 2012
    Date of Patent: October 6, 2015
    Assignee: International Business Machines Corporation
    Inventors: Giles R. Frazier, Venkat R. Indukuru
  • Patent number: 9152427
    Abstract: The present invention discloses a single chip sequential processor comprising at least one ALU-Block wherein said sequential processor is capable of maintaining its op-codes while processing data such as to overcome the necessity of requiring a new instruction in every clock cycle.
    Type: Grant
    Filed: October 15, 2009
    Date of Patent: October 6, 2015
    Assignee: Hyperion Core, Inc.
    Inventors: Martin Vorbach, Frank May, Markus Weinhardt
  • Patent number: 9141387
    Abstract: An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element.
    Type: Grant
    Filed: December 29, 2012
    Date of Patent: September 22, 2015
    Assignee: Intel Corporation
    Inventors: Alexander Peleg, Yaakov Yaari, Millind Mittal, Larry M. Mennemeier, Benny Eitan
  • Patent number: 9135006
    Abstract: In accordance with the teachings described herein, systems and methods are provided for advanced execution of branch instructions in a microprocessor pipeline. In one embodiment, a branch instruction of an assembly language program code is executed that includes (i) a condition operand, (ii) a branch destination operand, and (iii) a program count operand. It is determined whether a current program count matches a stored program count operand. After determining that a condition was met when the branch instruction was executed, and in response to determining that the current program count matches the stored program count operand, a destination instruction specified by the stored branch destination operand is fetched.
    Type: Grant
    Filed: September 5, 2012
    Date of Patent: September 15, 2015
    Assignee: MARVELL INTERNATIONAL LTD.
    Inventors: Li Sha, Ching-Han Tsai, Chi-Kuang Chen, Tzun-Wei Lee
  • Patent number: 9135010
    Abstract: Systems and methods are disclosed for processing data. In accordance with one implementation, a processor may include an arithmetic logic unit (ALU). The processor may also include pipeline circuitry to, in a non-error correction code (ECC) operating mode, execute a sequence of single-cycle instructions in the ALU in a first execution stage, and in an ECC operating mode, execute the same sequence of single-cycle instructions in the ALU in a second execution stage instead of the first execution stage. Further, the processor may include mode control signaling to configure the pipeline circuitry between the non-ECC and ECC operating modes.
    Type: Grant
    Filed: January 25, 2013
    Date of Patent: September 15, 2015
    Assignee: Rambus Inc.
    Inventors: William C. Moyer, Jeffrey W. Scott