Patents Examined by Shawn Doman
  • Patent number: 11281464
    Abstract: A method is provided that includes performing, by a processor in response to a vector sort instruction, sorting of values stored in lanes of the vector to generate a sorted vector, wherein the values in a first portion of the lanes are sorted in a first order indicated by the vector sort instruction and the values in a second portion of the lanes are sorted in a second order indicated by the vector sort instruction; and storing the sorted vector in a storage location.
    Type: Grant
    Filed: September 30, 2019
    Date of Patent: March 22, 2022
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Timothy David Anderson, Mujibur Rahman
  • Patent number: 11275584
    Abstract: A universal floating-point Instruction Set Architecture (ISA) implemented entirely in hardware. Using a single instruction, the universal floating-point ISA has the ability, in hardware, to compute directly with dual decimal character sequences up to IEEE 754-2008 “H=20” in length, without first having to explicitly perform a conversion-to-binary-format process in software before computing with these human-readable floating-point or integer representations. The ISA does not employ opcodes, but rather pushes and pulls “gobs” of data without the encumbering opcode fetch, decode, and execute bottleneck. Instead, the ISA employs stand-alone, memory-mapped operators, complete with their own pipeline that is completely decoupled from the processor's primary push-pull pipeline.
    Type: Grant
    Filed: July 30, 2020
    Date of Patent: March 15, 2022
    Inventor: Jerry D. Harthcock
  • Patent number: 11269806
    Abstract: A time deterministic computer is architected so that exchange code compiled for one set of tiles, e.g., a column, can be reused on other sets.
    Type: Grant
    Filed: May 22, 2019
    Date of Patent: March 8, 2022
    Assignee: Graphcore Limited
    Inventors: Stephen Felix, Simon Christian Knowles
  • Patent number: 11269644
    Abstract: A system and corresponding method enforce strong load ordering in a processor. The system comprises an ordering ring that stores entries corresponding to in-flight memory instructions associated with a program order, scanning logic, and recovery logic. The scanning logic scans the ordering ring in response to execution or completion of a given load instruction of the in-flight memory instructions and detects an ordering violation in an event at least one entry of the entries indicates that a younger load instruction has completed and is associated with an invalidated cache line. In response to the ordering violation, the recovery logic allows the given load instruction to complete, flushes the younger load instruction, and restarts execution of the processor after the given load instruction in the program order, causing data returned by the given and younger load instructions to be returned consistent with execution according to the program order to satisfy strong load ordering.
    Type: Grant
    Filed: July 29, 2019
    Date of Patent: March 8, 2022
    Assignee: MARVELL ASIA PTE, LTD.
    Inventors: David A. Carlson, Shubhendu S. Mukherjee, Wilson P. Snyder, II
  • Patent number: 11249764
    Abstract: A microprocessor is shown, in which a branch predictor and an instruction cache are decoupled by a fetch-target queue (FTQ). The FTQ stores at least an instruction address whose branch prediction has been finished by the branch predictor. The instruction addresses queued in the FTQ is to be read out later as an instruction-fetching address for the instruction cache. The instruction address that is input into the branch predictor and used for branch prediction leads the instruction-fetching address.
    Type: Grant
    Filed: October 13, 2020
    Date of Patent: February 15, 2022
    Assignee: SHANGHAI ZHAOXIN SEMICONDUCTOR CO., LTD.
    Inventors: Fangong Gong, Mengchen Yang
  • Patent number: 11210097
    Abstract: A streaming engine employed in a digital signal processor specifies a fixed read only data stream. Once fetched the data stream is stored in two head registers for presentation to functional units in the fixed order. Data use by the functional unit is preferably controlled using the input operand fields of the corresponding instruction. A first read only operand coding supplies data from the first head register. A first read/advance operand coding supplies data from the first head register and also advances the stream to the next sequential data elements. Corresponding second read only operand coding and second read/advance operand coding operate similarly with the second head register. A third read only operand coding supplies double width data from both head registers.
    Type: Grant
    Filed: July 1, 2019
    Date of Patent: December 28, 2021
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventor: Joseph Zbiciak
  • Patent number: 11204889
    Abstract: A method of processing partitions of a tensor in a target order includes receiving, by a reorder unit and from two or more producer units, a plurality of partitions of a tensor in a first order that is different from the target order, storing the plurality of partitions in the reorder unit, and providing, from the reorder unit, the plurality of partitions in the target order to one or more consumer units. In an example, the one or more consumer units process the plurality of partitions in the target order.
    Type: Grant
    Filed: March 29, 2021
    Date of Patent: December 21, 2021
    Assignee: SambaNova Systems, Inc.
    Inventors: Raghu Prabhakar, Nathan Francis Sheeley, Matheen Musaddiq, Scott Layson Burson, Sitanshu Gupta, Sumti Jairath, Pramod Nataraja, Ajit Punj
  • Patent number: 11194584
    Abstract: Retiring instructions out-of-order includes: receiving processor instructions comprising two or more and fewer than all processor instructions generated based on a program, where the processor instructions include a first instruction and a second instruction such that the first instruction precedes the second instruction in a program order of the program; receiving a start instruction that immediately precedes the processor instructions and indicates that the processor instructions are to be retired out-of-order; receiving a stop instruction immediately that succeeds the processor instructions and indicates a stop to out-of-order instruction retirement; and, in response to completing execution of the second instruction before completing execution of the first instruction, retiring the second instruction before retiring the first instruction.
    Type: Grant
    Filed: April 30, 2020
    Date of Patent: December 7, 2021
    Assignee: Marvell Asia Pte, Ltd.
    Inventor: Shubhendu Sekhar Mukherjee
  • Patent number: 11188337
    Abstract: Micro-architecture designs and methods are provided. A computer processing architecture may include an instruction cache for storing producer instructions, a half-instruction cache for storing half instructions, and eager shelves for storing a result of a first producer instruction. The computer processing architecture may fetch the first producer instruction and a first half instruction; send the first half instruction to the eager shelves; based on execution of the first producer instruction, send a second half instruction to the eager shelves; assemble the first producer instruction in the eager shelves based on the first half instruction and the second half instruction; and dispatch the first producer instruction for execution.
    Type: Grant
    Filed: September 30, 2019
    Date of Patent: November 30, 2021
    Assignees: The Florida State University Research Foundation, Inc., Michigan Technological University
    Inventors: David Whalley, Soner Onder
  • Patent number: 11175915
    Abstract: Systems and methods related to implementing vector registers in memory. A memory system for implementing vector registers in memory can include an array of memory cells, where a plurality of rows in the array serve as a plurality of vector registers as defined by an instruction set architecture. The memory system for implementing vector registers in memory can also include a processing resource configured to, responsive to receiving a command to perform a particular vector operation on a particular vector register, access a particular row of the array serving as the particular register to perform the vector operation.
    Type: Grant
    Filed: October 10, 2018
    Date of Patent: November 16, 2021
    Assignee: Micron Technology, Inc.
    Inventors: Timothy P Finkbeiner, Troy D. Larsen
  • Patent number: 11163574
    Abstract: A method for managing tasks in a computer system comprising a processor and a memory, the method includes performing a first task by the processor, the first task comprising task-relating branch instructions and task-independent branch instructions and executing the branch prediction method, the execution resulting in task-relating branch prediction data in the branch prediction history table. In response to determining that the first task is to be interrupted or terminated, the method includes storing the task-relating branch prediction data of the first task in the task structure of the first task. In response to determining that a second task is to be continued, the method includes reading task-relating branch prediction data of the second task from the task structure of the second task and storing the task-relating branch prediction data of the second task in the branch prediction history table.
    Type: Grant
    Filed: July 31, 2019
    Date of Patent: November 2, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Wolfgang Gellerich, Peter M. Held, Martin Schwidefsky, Chung-Lung K. Shum
  • Patent number: 11132196
    Abstract: Address collisions are managed when performing vector operations. A register store stores vector operands. Execution circuitry performs memory access operations to move the vector operands between the register store and memory and data processing operations using the vector operands. The execution circuitry may iteratively execute a vector loop, where during each iteration the execution circuitry executes a sequence of instructions to implement the vector loop. The sequence includes a check instruction identifying a plurality of memory addresses. The execution circuitry responds to the check instruction to determine whether an address hazard condition exists among the plurality of memory addresses. For each iteration of the vector loop, the execution circuitry responds to the check instruction determining an absence of the hazard address condition to employ a default level of vectorization when executing the sequence of instructions to implement the vector loop.
    Type: Grant
    Filed: April 6, 2017
    Date of Patent: September 28, 2021
    Assignee: Arm Limited
    Inventors: Mbou Eyole, Jacob Eapen, Alejandro Martinez Vicente
  • Patent number: 11119781
    Abstract: A data processing system includes multiple processing units all having access to a shared memory. A processing unit of the data processing system includes a processor core including an upper level cache, core reservation logic that records addresses in the shared memory for which the processor core has obtained reservations, and an execution unit that executes memory access instructions including a fronting load instruction. Execution of the fronting load instruction generates a load request that specifies a load target address. The processing unit further includes lower level cache that, responsive to receipt of the load request and based on the load request indicating an address match for the load target address in the core reservation logic, protects the load target address against access by any conflicting memory access request during a protection interval following servicing of the load request.
    Type: Grant
    Filed: December 11, 2018
    Date of Patent: September 14, 2021
    Assignee: International Business Machines Corporation
    Inventors: Derek E. Williams, Guy L. Guthrie, Hugh Shen, Sanjeev Ghai
  • Patent number: 11093250
    Abstract: An apparatus and method for efficiently processing invariant operations on a parallel execution engine.
    Type: Grant
    Filed: September 29, 2018
    Date of Patent: August 17, 2021
    Assignee: Intel Corporation
    Inventors: Jonathan Pearce, David Sheffield, Srikanth Srinivasan, Jaewoong Sim, Andrey Ayupov
  • Patent number: 11086815
    Abstract: Supporting multiple clients on a single programmable integrated circuit (IC) can include implementing a first image within the programmable IC in response to a first request for processing to be performed by the programmable IC, wherein the request is from a first process executing in a host data processing system coupled to the programmable IC, receiving, using a processor of the host data processing system, a second request for processing to be performed on the programmable IC from a second and different process executing in the host data processing system while the programmable IC still implements the first image, comparing, using the processor, a second image specified by the second request to the first image, and, in response to determining that the second image matches the first image based on the comparing, granting, using the processor, the second request for processing to be performed by the programmable IC.
    Type: Grant
    Filed: April 15, 2019
    Date of Patent: August 10, 2021
    Assignee: Xilinx, Inc.
    Inventors: Sonal Santan, Soren T. Soe, Cheng Zhen
  • Patent number: 11080060
    Abstract: Managing application execution by receiving a store instruction, including a store instruction itag and store instruction address, creating a hash of the store instruction address, receiving a load instruction and matching a hash of a store instruction address associated with the load instruction with the hash of the store instruction address associated with the store instruction. The store instruction itag is sent to an instruction sequencing unit (ISU). The ISU delays execution of the load instruction according to the received itag.
    Type: Grant
    Filed: April 23, 2019
    Date of Patent: August 3, 2021
    Assignee: International Business Machines Corporation
    Inventors: Ehsan Fatehi, Brian W. Thompto, John B. Griswell, Jr.
  • Patent number: 11061672
    Abstract: A microprocessor is configured for unchained and chained modes of split execution of a fused compound arithmetic operation. In both modes of split execution, a first execution unit executes only a first part of the fused compound arithmetic operation and produces an intermediate result thereof, and a second instruction execution unit receives the intermediate result and executes a second part of the fused compound arithmetic operation to produce a final result. In the unchained mode, execution is accomplished by dispatching separate split-execution microinstructions to the first and second instruction execution units. In the chained mode, execution is accomplished by dispatching a single split-execution microinstruction to the first instruction execution unit and sending a chaining control signal or signal group to the second execution unit, causing it to execute its part of the fused arithmetic operation without needing an instruction.
    Type: Grant
    Filed: July 5, 2016
    Date of Patent: July 13, 2021
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.
    Inventors: Thomas Elmer, Nikhil A. Patil
  • Patent number: 11061673
    Abstract: An example core for data processing engine (DPE) includes a first register file configured to provide a first plurality of output lanes, a processor, coupled to the register file, including: a multiply-accumulate (MAC) circuit, and a first permute circuit coupled between the first register file and the MAC circuit. The first permute circuit is configured to generate a first vector by selecting a first set of output lanes from the first plurality of output lanes, and a second permute circuit coupled between the first register file and the MAC circuit. The second permute circuit is configured to generate a second vector by selecting a second set of output lanes from the first plurality of output lanes.
    Type: Grant
    Filed: April 3, 2018
    Date of Patent: July 13, 2021
    Assignee: XILINX, INC.
    Inventors: Baris Ozgul, Jan Langer, Juan J. Noguera Serra, Goran H. K. Bilski, Richard L. Walke
  • Patent number: 11061679
    Abstract: A processor comprising an execution unit, memory and one or more register files. The execution unit is configured to execute instances of machine code instructions from an instruction set. The types of instruction defined in the instruction set include a double-load instruction for loading from the memory to at least one of the one or more register files. The execution unit is configured so as, when the load instruction is executed, to perform a first load operation strided by a fixed stride, and a second load operation strided by a variable stride, the variable stride being specified in a variable stride register in one of the one or more register files.
    Type: Grant
    Filed: April 19, 2019
    Date of Patent: July 13, 2021
    Assignee: Graphcore Limited
    Inventors: Alan Graham Alexander, Simon Christian Knowles, Mrudula Chidambar Gore
  • Patent number: 11036500
    Abstract: Processing circuitry performs processing operations specified by program instructions. An instruction decoder decodes an atomic-add-with-carry instruction AADDC to control the processing circuitry to perform an atomic operation of an add of an addend operand value and a data value stored in a memory to generate a result value stored in the memory and a carry value indicative of whether or not the add generated a carry out. The atomic-add-with-carry instructions may be used within systems which accumulate a local sum value prior to a data value being returned into a local cache memory at which time the local sum value is added to the return data value. The atomic-add-with-carry instructions may also be used in embodiments comprising a coalescing tree of respective processing apparatus where the carry out values generated from local sums produced at each node are returned early to higher nodes within the hierarchy thereby releasing them to commence other processing.
    Type: Grant
    Filed: October 23, 2019
    Date of Patent: June 15, 2021
    Assignee: Arm Limited
    Inventor: Andreas Due Engh-Halstvedt