Patents Examined by Andrew Caldwell
  • Patent number: 9710265
    Abstract: A computing unit is disclosed, comprising a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs one or more computations associated with at least one element of a data array, the one or more computations being performed by the MAC operator and comprising, in part, a multiply operation of the input activation received from the data bus and a parameter received from the second memory bank.
    Type: Grant
    Filed: March 17, 2017
    Date of Patent: July 18, 2017
    Assignee: Google Inc.
    Inventors: Olivier Temam, Ravi Narayanaswami, Harshit Khaitan, Dong Hyuk Woo
  • Patent number: 9703559
    Abstract: When the branch condition of a branch command for a loop process is satisfied and enters the loop mode, the relative branch address is saved in a branch relative address save circuit that points to the branch command for loop processing, and the loop state flag is set in a loop state save circuit. When the loop state flag is set, if the absolute value of the value outputted by a command code counter circuit matches the absolute value of the relative branch address outputted by the branch relative address save circuit, a program counter sum value switching circuit outputs the relative branch address to an program counter adder. If the absolute values do not match, the program counter sum value switching circuit outputs the value ‘1’ to the program counter adder. With this, the branch penalty during loop processing is eliminated even with little hardware.
    Type: Grant
    Filed: November 2, 2012
    Date of Patent: July 11, 2017
    Inventor: Hiroyuki Igura
  • Patent number: 9704602
    Abstract: A random number generation circuit may include a memory block. The random number generation circuit may include a fuse block configured to store an address of a failed memory cell from a memory cell array of the memory block, as a repair address, and generate a match signal by comparing the repair address with a normal address inputted from an exterior. The random number generation circuit may include a register configured to output a true random number by latching an address corresponding to activation timing of the match signal among normal addresses.
    Type: Grant
    Filed: October 28, 2015
    Date of Patent: July 11, 2017
    Assignee: SK hynix Inc.
    Inventors: Kyung Hoon Kim, In Sik Yoon
  • Patent number: 9703565
    Abstract: Embodiments provide methods, apparatus, systems, and computer readable media associated with predicting predicates and branch targets during execution of programs using combined branch target and predicate predictions. The predictions may be made using one or more prediction control flow graphs which represent predicates in instruction blocks and branches between blocks in a program. The prediction control flow graphs may be structured as trees such that each node in the graphs is associated with a predicate instruction, and each leaf associated with a branch target which jumps to another block. During execution of a block, a prediction generator may take a control point history and generate a prediction. Following the path suggested by the prediction through the tree, both predicate values and branch targets may be predicted. Other embodiments may be described and claimed.
    Type: Grant
    Filed: March 25, 2015
    Date of Patent: July 11, 2017
    Assignee: The Board of Regents of the University of Texas System
    Inventors: Douglas C. Burger, Stephen W. Keckler
  • Patent number: 9697005
    Abstract: In an example, there is disclosed a digital signal processor having a register containing a modular integer configured for use as a thread offset counter. In a multi-stage, pipelined loop, which may be implemented in microcode, the main body of the loop has only one repeating stage. On each stage, the operation executed by each thread of the single repeating stage is identified by the sum of a fixed integer and the thread offset counter. After each pass through the loop, the thread offset counter is incremented, thus maintaining pipelined operation of the single repeating stage.
    Type: Grant
    Filed: December 4, 2013
    Date of Patent: July 4, 2017
    Assignee: ANALOG DEVICES, INC.
    Inventor: Boris Lerner
  • Patent number: 9684489
    Abstract: Methods, apparatuses, and computer program products for squaring an operand include identifying a fixed-point value with a fixed word size and a substring size for substrings of the fixed-point value, wherein the fixed-point value comprises a binary bit string. A square of the fixed-point value can be determined using the fixed point value, the substring size, and least significant bits of the fixed-point value equal to the substring size.
    Type: Grant
    Filed: August 31, 2012
    Date of Patent: June 20, 2017
    Assignee: Southern Methodist University
    Inventors: Mitchell A. Thornton, Saurabh Gupta
  • Patent number: 9684509
    Abstract: Vector processing engines (VPEs) employing merging circuitry in data flow paths between execution units and vector data memory to provide in-flight merging of output vector data stored to vector data memory are disclosed. Related vector processing instructions, systems, and methods are also disclosed. Merging circuitry is provided in data flow paths between execution units and vector data memory in the VPE. The merging circuitry is configured to merge an output vector data sample set from execution units as a result of performing vector processing operations in-flight while the output vector data sample set is being provided over the output data flow paths from the execution units to the vector data memory to be stored. The merged output vector data sample set is stored in a merged form in the vector data memory without requiring additional post-processing steps, which may delay subsequent vector processing operations to be performed in execution units.
    Type: Grant
    Filed: November 15, 2013
    Date of Patent: June 20, 2017
    Assignee: QUALCOMM Incorporated
    Inventor: Raheel Khan
  • Patent number: 9672164
    Abstract: Embodiments include processing systems that determine, based on an instruction address range indicator stored in a first register, whether a next instruction fetch address corresponds to a location within a first memory region associated with a current privilege state or within a second memory region associated with a different privilege state. When the next instruction fetch address is not within the first memory region, the next instruction is allowed to be fetched only when a transition to the different privilege state is legal. In a further embodiment, when a data access address is generated for an instruction, a determination is made, based on a data address range indicator stored in a second register, whether access to a memory location corresponding to the data access address is allowed. The access is allowed when the current privilege state is a privilege state in which access to the memory location is allowed.
    Type: Grant
    Filed: May 31, 2012
    Date of Patent: June 6, 2017
    Assignee: NXP USA, INC.
    Inventors: Daniel M. McCarthy, Joseph C. Circello, Kristen A. Hausman
  • Patent number: 9665377
    Abstract: A processing apparatus, comprising at least a first processing unit and a second processing unit, is proposed. The first processing unit comprises a set of first stateful elements, the second processing unit comprises a set of second stateful elements. A set of synchronization data lines may connect the first stateful elements to the second stateful elements in a pairwise manner. A control unit may control the first processing unit, the second processing unit and the synchronization data lines so as to copy the states of the first stateful elements in parallel via the synchronization data lines to the second stateful elements in response to a synchronization request. A method of synchronizing the processing units is also proposed.
    Type: Grant
    Filed: July 20, 2011
    Date of Patent: May 30, 2017
    Assignee: NXP USA, Inc.
    Inventors: Vladimir Litovtchenko, Harald Luepken, Markus Regner
  • Patent number: 9658877
    Abstract: The disclosure relates generally to techniques, methods and apparatus for controlling context switching at a central processing unit. Alternatively, methods and apparatus are provided for providing security to memory blocks. Alternatively, methods and apparatus are provided for enabling transactional processing using a multi-core device.
    Type: Grant
    Filed: August 23, 2010
    Date of Patent: May 23, 2017
    Inventor: James Barwick
  • Patent number: 9652242
    Abstract: An apparatus and method for calculating flag bits is disclosed. The flag bits may be used in a processor utilizing branch predication. More particularly, the apparatus and method may be used to calculate a predicate that can be used by a branch unit to evaluate whether a branch is to be taken. In one embodiment, the apparatus is coupled to receive a condition code associated with an instruction, and flag bits generated responsive to execution of the instruction. The condition code is indicative of a condition to be checked resulting from execution of the instruction. The apparatus may then provide an indication of whether the condition is true.
    Type: Grant
    Filed: May 2, 2012
    Date of Patent: May 16, 2017
    Assignee: Apple Inc.
    Inventors: Rajat Goel, Sandeep Gupta, Yamini Modukuru
  • Patent number: 9645972
    Abstract: A butterfly channelizer includes at least two stages. Each stage includes at least one dual-channel module configured to convert an input time domain signal into a second time domain signal of lower bandwidth. At least one clock is configured to generate a clock signal that drives the at least two stages. A first stage has a first number of dual-channel modules and a second stage following the first stage has a second number of dual-channel modules greater than the first number.
    Type: Grant
    Filed: July 1, 2014
    Date of Patent: May 9, 2017
    Inventors: Harry B. Marr, Daniel Thompson
  • Patent number: 9645819
    Abstract: A computer system, a computer processor and a method executable on a computer processor involve placing each sequence of a plurality of sequences of computer instructions being scheduled for execution in the processor into a separate queue. The head instruction from each queue is stored into a first storage unit prior to determining whether the head instruction is ready for scheduling. For each instruction in the first storage unit that is determined to be ready, the instruction is moved from the first storage unit to a second storage unit. During a first processor cycle, each instruction in the first storage unit that is determined to be not ready is retained in the first storage unit, and the determining of whether the instruction is ready is repeated during the next processor cycle. Scheduling logic performs scheduling of instructions contained in the second storage unit.
    Type: Grant
    Filed: June 15, 2012
    Date of Patent: May 9, 2017
    Assignee: Intel Corporation
    Inventors: Jayesh Iyer, Nikolay Kosarev, Sergey Shishlov, Alexey Sivtsov, Alexander Butuzov, Boris A. Babayan, Vladimir Penkovski
  • Patent number: 9645866
    Abstract: This disclosure describes communication techniques that may be used within a multiple-processor computing platform. The techniques may, in some examples, provide software interfaces that may be used to support message passing within a multiple-processor computing platform that initiates tasks using command queues. The techniques may, in additional examples, provide software interfaces that may be used for shared memory inter-processor communication within a multiple-processor computing platform. In further examples, the techniques may provide a graphics processing unit (GPU) that includes hardware for supporting message passing and/or shared memory communication between the GPU and a host CPU.
    Type: Grant
    Filed: September 16, 2011
    Date of Patent: May 9, 2017
    Assignee: QUALCOMM Incorporated
    Inventors: Alexei V. Bourd, Colin Christopher Sharp, David Rigel Garcia Garcia, Chihong Zhang
  • Patent number: 9645949
    Abstract: Embodiments of the invention relate to a data processing apparatus including a processor adapted to operate under control of an executable comprising instructions, and in any of a plurality of operating modes including a non-privileged mode and a privileged mode, the apparatus comprising: means for storing a plurality of stacks; a first stack pointer register for storing a pointer to an address in a first of said stacks; a second stack pointer register for storing a pointer to an address in a second of said stacks, wherein said processing apparatus is adapted to use said second stack pointer when said processor is operating in either the non-privileged mode or the privileged mode; and means for transferring operation of said processor from the non-privileged mode to the privileged mode in response to at least one of said instructions. Embodiments of the invention also relate to a method of operating a data processing apparatus.
    Type: Grant
    Filed: May 27, 2009
    Date of Patent: May 9, 2017
    Assignee: Cambridge Consultants Ltd.
    Inventors: Alistair G. Morfey, Karl Leighton Swepson, Peter Giles Lloyd
  • Patent number: 9639371
    Abstract: A system and method for efficiently processing instructions in hardware parallel execution lanes within a processor. In response to a given divergent point within an identified loop, a compiler generates code wherein when executed determines a size of a next very large instruction world (VLIW) to process and determine multiple pointer values to store in multiple corresponding PC registers in a target processor. The updated PC registers point to instructions intermingled from different basic blocks between the given divergence point and a corresponding convergence point. The target processor includes a single instruction multiple data (SIMD) micro-architecture. The assignment for a given lane is based on branch direction found at runtime for the given lane at the given divergent point. The processor includes a vector register for mapping PC registers to execution lanes.
    Type: Grant
    Filed: January 29, 2013
    Date of Patent: May 2, 2017
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Reza Yazdani
  • Patent number: 9639360
    Abstract: A data processing system is used to evaluate a data processing function by executing a sequence of program instructions including an intermediate value generating instruction Inst0 and an intermediate value consuming instruction Inst1. In dependence upon one or more input operands to the evaluation, an embedded opcode within the intermediate value passed between the intermediate value generating instruction and the intermediate value consuming instruction may be set to have a value indicating that a substitute instruction should be used in place of the intermediate value consuming instruction. The instructions may be floating point instructions, such as a floating point power instruction evaluating the data processing function ab.
    Type: Grant
    Filed: February 26, 2014
    Date of Patent: May 2, 2017
    Assignee: ARM Limited
    Inventor: Jorn Nystad
  • Patent number: 9639451
    Abstract: Debugger system, method and computer program product for debugging instructions.
    Type: Grant
    Filed: January 25, 2010
    Date of Patent: May 2, 2017
    Assignee: NXP USA, INC.
    Inventors: Constantin Tudor, Sorin Babeanu
  • Patent number: 9632789
    Abstract: Branch prediction using a correlating event, such as an unconditional branch that calls a routine including the branch, instead of the branch itself, to predict the behavior of the branch. The circumstances in which the branch is employed, and not the actual branch itself, is used to predict how strongly taken or not taken the branch is to behave. An anchor point associated with the branch (e.g., an address of the instruction calling a routine that includes the branch), an address of the branch, and a value that represents the number of selected branch instructions between the anchor point and the branch are used to select information to be used to predict the direction of the branch.
    Type: Grant
    Filed: November 25, 2014
    Date of Patent: April 25, 2017
    Inventors: James J. Bonanno, Richard J. Moore, Brian R. Prasky
  • Patent number: 9632781
    Abstract: Techniques are provided for executing a vector alignment instruction. A scalar register file in a first processor is configured to share one or more register values with a second processor, the one or more register values accessed from the scalar register file according to an Rt address specified in a vector alignment instruction, wherein a start location is determined from one of the shared register values. An alignment circuit in the second processor is configured to align data identified between the start location within a beginning Vu register of a vector register file (VRF) and an end location of a last Vu register of the VRF according to the vector alignment instruction. A store circuit is configured to select the aligned data from the alignment circuit and store the aligned data in the vector register file according to an alignment store address specified by the vector alignment instruction.
    Type: Grant
    Filed: February 26, 2013
    Date of Patent: April 25, 2017
    Assignee: QUALCOMM Incorporated
    Inventors: Ajay A. Ingle, Marc M. Hoffman, Jose Fridman, Lucian Codrescu