Patents Examined by Eric Coleman
  • Patent number: 11048508
    Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.
    Type: Grant
    Filed: April 29, 2019
    Date of Patent: June 29, 2021
    Assignee: Intel Corporation
    Inventors: Edward T. Grochowski, Asit K. Mishra, Robert Valentine, Mark J. Charney, Simon C. Steely, Jr.
  • Patent number: 11042403
    Abstract: A computing platform, including: an execution unit to execute a program, the program including a first phase and a second phase; and a quick response module (QRM) to: receive a program phase signature for the first phase; store the program phase signature in a pattern match action (PMA) table; identify entry of the program into the first phase via the PMA; and apply an optimization to the computing platform.
    Type: Grant
    Filed: July 10, 2017
    Date of Patent: June 22, 2021
    Assignee: Intel Corporation
    Inventors: Christopher B. Wilkerson, Karl I. Taht, Ren Wang, James J. Greensky, Tsung-Yuan C. Tai
  • Patent number: 11036503
    Abstract: Processing circuitry selectively applies vector processing operations to one or more data items of one or more data vectors. Each data vector comprises a plurality of data items at respective vector positions in the data vector according to the state of respective predicate indicators associated with the vector positions. Predicate generation circuitry apply a processing operation to generate a set of predicate indicators, each associated with a respective one of the vector positions, to generate a count value indicative of the number of predicate indicators in the set having a given state, and to store the generated set of predicate indicators and the count value in a predicate store.
    Type: Grant
    Filed: August 15, 2016
    Date of Patent: June 15, 2021
    Assignee: ARM LIMITED
    Inventors: Gary Alan Gorman, Lee Evan Eisen, Neil Burgess, Daniel Arulraj
  • Patent number: 11036512
    Abstract: A processor element in a processor-based system is configured to fetch one or more instructions associated with a program binary, where the one or more instructions include an instruction having an immediate operand. The processor element is configured to determine if the immediate operand is a reference to a wide immediate operand. In response to determining that the immediate operand is a reference to a wide immediate operand, the processor element is configured to retrieve the wide immediate operand from a common intermediate lookup table (CILT) in the program binary, where the immediate operand indexes the wide immediate operand in the CILT. The processor element is then configured to process the instruction having the immediate operand such that the immediate operand is replaced with the wide immediate operand from the CILT.
    Type: Grant
    Filed: September 23, 2019
    Date of Patent: June 15, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Arthur Perais, Rodney Wayne Smith, Shivam Priyadarshi, Rami Mohammad Al Sheikh, Vignyan Reddy Kothinti Naresh
  • Patent number: 11029953
    Abstract: Disclosed embodiments relate to the usage of a branch prediction unit in service of performance sensitive microcode flows. In one example, a processor includes a branch prediction unit (BPU) and a pipeline including a fetch stage to fetch an instruction specifying an opcode, an operand, and a loop condition based on the operand, wherein the BPU is to generate a hint reflecting a predicted result of the loop condition, a decode stage to generate either a first or a second micro-operation flow as per the hint, the pipeline to begin executing the generated micro-operation flow; a read stage to read the operand and resolve the loop condition; and execution circuitry to continue the generated micro-operation flow if the prediction was correct, and, otherwise, to flush the pipeline, update the prediction, and switch from the generated micro-operation flow to the other of the first and second micro-operation flows.
    Type: Grant
    Filed: June 26, 2019
    Date of Patent: June 8, 2021
    Assignee: Intel Corporation
    Inventors: Michael Mishaeli, Ido Ouziel, Jared Warner Stark, IV
  • Patent number: 11029958
    Abstract: Systems, methods, and apparatuses relating to configurable operand size operation circuitry in an operation configurable spatial accelerator are described.
    Type: Grant
    Filed: December 28, 2019
    Date of Patent: June 8, 2021
    Assignee: Intel Corporation
    Inventors: Chuanjun Zhang, Kermin E. Chofleming
  • Patent number: 11029961
    Abstract: Various embodiments are described herein that relate to computer programs and computer-implemented techniques for predicting when jobs in the queue of a batch scheduler will be completed. More specifically, various embodiments are described herein that relate to mechanisms for predicting the wait time and/or the estimated time to completion for jobs that are to be executed by a software asset management platform. For example, heuristics and algorithms could be used to discover when execution of a job is likely to begin and/or end. The estimated time to completion for a given job can be estimated by summing the expected execution time of the given job and the expected execution times of any jobs to be executed prior to the given job, while the wait time for a given job can be estimated by summing the expected execution times of any jobs to be executed prior to the given job.
    Type: Grant
    Filed: September 17, 2018
    Date of Patent: June 8, 2021
    Assignee: Flexera Software LLC
    Inventor: Rajeesh Chirayath Kuttan
  • Patent number: 11029951
    Abstract: Examples of the present disclosure provide apparatuses and methods for smallest value element or largest value element determination in memory. An example method comprises: storing an elements vector comprising a plurality of elements in a group of memory cells coupled to an access line of an array; performing, using sensing circuitry coupled to the array, a logical operation using a first vector and a second vector as inputs, with a result of the logical operation being stored in the array as a result vector; updating the result vector responsive to performing a plurality of subsequent logical operations using the sensing circuitry; and providing an indication of which of the plurality of elements have one of a smallest value and a largest value.
    Type: Grant
    Filed: August 15, 2016
    Date of Patent: June 8, 2021
    Assignee: Micron Technology, Inc.
    Inventor: Sanjay Tiwari
  • Patent number: 11029955
    Abstract: Method, apparatus, and program means for performing a string comparison operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources store a result of a comparison between each data element of a first and second operand corresponding to a first and second text string, respectively.
    Type: Grant
    Filed: June 29, 2019
    Date of Patent: June 8, 2021
    Assignee: Intel Corporation
    Inventors: Michael A. Julier, Jeffrey D. Gray, Srinivas Chennupaty, Sean P. Mirkes, Mark P. Seconi
  • Patent number: 11023236
    Abstract: Method, apparatus, and program means for performing a string comparison operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources store a result of a comparison between each data element of a first and second operand corresponding to a first and second text string, respectively.
    Type: Grant
    Filed: June 29, 2019
    Date of Patent: June 1, 2021
    Assignee: Intel Corporation
    Inventors: Michael A. Julier, Jeffrey D. Gray, Srinivas Chennupaty, Sean P. Mirkes, Mark P. Seconi
  • Patent number: 11023413
    Abstract: A method of operating a system comprising multiple processor tiles divided into a plurality of domains wherein within each domain the tiles are connected to one another via a respective instance of a time-deterministic interconnect and between domains the tiles are connected to one another via a non-time-deterministic interconnect. The method comprises: performing a compute stage, then performing a respective internal barrier synchronization within each domain, then performing an internal exchange phase within each domain, then performing an external barrier synchronization to synchronize between different domains, then performing an external exchange phase between the domains.
    Type: Grant
    Filed: December 23, 2019
    Date of Patent: June 1, 2021
    Assignee: GRAPHCORE LIMITED
    Inventors: Daniel John Pelham Wilkinson, Stephen Felix, Richard Luke Southwell Osborne, Simon Christian Knowles, Alan Graham Alexander, Ian James Quinn
  • Patent number: 11016773
    Abstract: Embodiments described herein provide for a computing device comprising a hardware processor including a processor trace module to generate trace data indicative of an order of instructions executed by the processor, wherein the processor trace module is configurable to selectively output a processor trace packet associated with execution of a selected non-deterministic control flow transfer instruction.
    Type: Grant
    Filed: September 27, 2019
    Date of Patent: May 25, 2021
    Assignee: INTEL CORPORATION
    Inventors: Salmin Sultana, Beeman Strong, Ravi Sahita
  • Patent number: 11010163
    Abstract: Disclosed herein is an apparatus which comprises a plurality of execution units, and a first general register file (GRF) communicatively couple to the plurality of execution units, wherein the first GRF is shared by the plurality of execution units.
    Type: Grant
    Filed: July 30, 2019
    Date of Patent: May 18, 2021
    Assignee: INTEL CORPORATION
    Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, Kamal Sinha, Kiran C. Veernapu, Subramaniam Maiyuran, Prasoonkumar Surti, Guei-Yuan Lueh, David Puffer, Supratim Pal, Eric J. Hoekstra, Travis T. Schluessler, Linda L. Hurd
  • Patent number: 11003620
    Abstract: An integrated circuit that is capable of performing sequence alignment via dynamic programming methods is provided. The integrated circuit may include a linear systolic array having series-connected processing engines, each of which has a n-stage deep pipeline. The systolic array may align first and second sequences, wherein the first sequence is divided into multiple segments equal to the internal depth of the pipeline. The systolic array may compute matrix scores for these segments in parallel until the entire sequence matrix score is computed. The internal pipeline structure and a loopback memory within the systolic array are configured to take care of any required data dependencies in the computation of the matrix scores.
    Type: Grant
    Filed: December 22, 2017
    Date of Patent: May 11, 2021
    Assignee: Intel Corporation
    Inventors: Saurabh Patil, Srajudheen Makkadayil, Rekha Manjunath, Tarjinder Singh, Vikram Sharma Mailthody
  • Patent number: 10996953
    Abstract: A computer processing system is provided. The computer processing system includes a processor configured to execute a record form instruction cracked into two internal instructions. A first one of the two internal instructions executes out-of-order to compute a target register and a second one of the two internal instructions executes in-order to compute a condition register (CR) to improve a processing speed of the record form instruction.
    Type: Grant
    Filed: September 5, 2019
    Date of Patent: May 4, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Brian J. D. Barrick, Maarten J. Boersma, Niels Fricke, Michael J. Genden
  • Patent number: 10990394
    Abstract: An integrated circuit may include a mixed instruction multiple data (xIMD) computing system. The xIMD computing system may include a plurality of data processors, each data processor representative of a lane of a single instruction multiple data (SIMD) computing system, wherein the plurality of data processors are configured to use a first dominant lane for instruction execution and to fork a second dominant lane when a data dependency instruction that does not share a taken/not-taken state with the first dominant lane is encountered during execution of a program by the xIMD computing system.
    Type: Grant
    Filed: September 28, 2017
    Date of Patent: April 27, 2021
    Assignee: Intel Corporation
    Inventor: Jeffrey L. Nye
  • Patent number: 10990569
    Abstract: A sorter sorts a list of elements using a plurality of registers. Each register stores a value of at most one element. Each register receives an input from a previous one of the registers indicating whether the previous one of the registers is storing a value of a list element before storing a value of a list element. Each register supplies an indication to a next register whether the register is storing a list element value. A register sends a stored value and the register identification to a register stack. The register stack uses the value as an index to store a pointer to the register. In that way a sorted list is created in the register stack. A register stores list location information for one or more occurrences of a value stored by the register. Overflow of list location information is handled in a duplicate values stack.
    Type: Grant
    Filed: May 16, 2019
    Date of Patent: April 27, 2021
    Assignees: AT&T INTELLECTUAL PROPERTY I, L.P., AT&T MOBILITY II LLC
    Inventors: Sheldon K. Meredith, William C. Cottrill
  • Patent number: 10977042
    Abstract: A technique for using expedited RCU grace periods to avoid avoiding out-of-memory conditions for offloaded RCU callbacks. In an example embodiment, one or more processors in a computer system may be designated as no-callbacks (No-CBs) processors that do not perform read-copy update (RCU) callback processing. One or more RCU callback offload kernel threads (rcuo kthreads) may be spawned to perform RCU callback management for RCU callbacks generated by workloads running on the No-CBs processors. The rcuo kthreads may run on processors that are not No-CBs processors. The rcuo kthreads may perform RCU grace period waiting as part of their RCU callback management. The RCU grace period waiting may include selectively invoking either an RCU expedited grace period or waiting for a normal RCU grace period to elapse.
    Type: Grant
    Filed: July 26, 2019
    Date of Patent: April 13, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Paul E. McKenney
  • Patent number: 10977037
    Abstract: In one embodiment, a synchronization instruction causes a processor to ensure that specified threads included within a warp concurrently execute a single subsequent instruction. The specified threads include at least a first thread and a second thread. In operation, the first thread arrives at the synchronization instruction. The processor determines that the second thread has not yet arrived at the synchronization instruction and configures the first thread to stop executing instructions. After issuing at least one instruction for the second thread, the processor determines that all the specified threads have arrived at the synchronization instruction. The processor then causes all the specified threads to execute the subsequent instruction. Advantageously, unlike conventional approaches to synchronizing threads, the synchronization instruction enables the processor to reliably and properly execute code that includes complex control flows and/or instructions that presuppose that threads are converged.
    Type: Grant
    Filed: October 7, 2019
    Date of Patent: April 13, 2021
    Assignee: NVIDIA Corporation
    Inventors: Ajay Sudarshan Tirumala, Olivier Giroux, Peter Nelson, Jack Choquette
  • Patent number: 10963259
    Abstract: Implementing processor instrumentation in a processor pipeline includes determining a pipeline depth of each micro-operator for an instruction group used in an execution phase of the processor pipeline. The pipeline depth corresponds with a duration of execution, each micro-operator performs a type of functional operation in the execution phase, and the instruction group includes all the micro-operators required for the execution phase. A targeted micro-operator is identified for which the processor instrumentation is being performed, and the pipeline depth corresponding with the targeted micro-operator is used to determine and report a performance of the targeted micro-operator as part of the processor instrumentation. Problems indicated by the processor instrumentation are diagnosed and addressed based on the performance of the targeted micro-operator.
    Type: Grant
    Filed: June 6, 2019
    Date of Patent: March 30, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Avery Francois, Gregory William Alexander, Jonathan Ting Hsieh