Patents Examined by Michael J Metzger
  • Patent number: 11281467
    Abstract: Circuitry comprises a prediction register having one or more entries each storing prediction data; prediction circuitry configured to map a value of the stored prediction data to a prediction of whether or not a branch represented by a given branch instruction is predicted to be taken, according to a data mapping; and control circuitry configured to selectively vary the data mapping between the prediction and the value of the stored prediction data.
    Type: Grant
    Filed: October 16, 2019
    Date of Patent: March 22, 2022
    Assignee: Arm Limited
    Inventors: Houdhaifa Bouzguarrou, Guillaume Bolbenes, Vincenzo Consales
  • Patent number: 11281466
    Abstract: A floating point unit includes a non-pickable scheduler queue (NSQ) that offers a load operation concurrently with a load store unit retrieving load data for an operand that is to be loaded by the load operation. The floating point unit also includes a renamer that renames architectural registers used by the load operation and allocates physical register numbers to the load operation in response to receiving the load operation from the NSQ. The floating point unit further includes a set of pickable scheduler queues that receive the load operation from the renamer and store the load operation prior to execution. A physical register file is implemented in the floating point unit and a free list is used to store physical register numbers of entries in the physical register file that are available for allocation.
    Type: Grant
    Filed: October 22, 2019
    Date of Patent: March 22, 2022
    Assignees: Advanced Micro Devices, Inc., ATI TECHNOLOGIES ULC
    Inventors: Arun A. Nair, Michael Estlick, Erik Swanson, Sneha V. Desai, Donglin Ji
  • Patent number: 11275583
    Abstract: An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group.
    Type: Grant
    Filed: August 3, 2017
    Date of Patent: March 15, 2022
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
  • Patent number: 11275589
    Abstract: A processor interacts with a memory set including a cache memory, a first memory storing at least a first piece of information in a first information group, and a second memory storing at least a second piece of information in a second information group. In response to a first cache miss and following a first request from the processor for the first piece of information, the first piece of information obtained from the first memory is supplied to the processor. After a second request from the processor for the second piece of information, the second piece of information obtained from the second memory is supplied to the processor, even if the first information group is currently being transferred from the first memory for loading into the cache memory.
    Type: Grant
    Filed: September 17, 2019
    Date of Patent: March 15, 2022
    Assignees: STMicroelectronics (Rousset) SAS, STMicroelectronics (Grenoble 2) SAS
    Inventors: Sebastien Metzger, Silvia Brini
  • Patent number: 11269690
    Abstract: A circuit arrangement and program product for dynamically providing a status of a hardware thread/hardware resource independent of the operation of the hardware thread/hardware resource using an inter-thread communication protocol. A master hardware thread may be configured to communicate status requests to associated slave hardware threads and/or hardware resources. Each slave hardware thread/hardware resource may be configured with hardware logic configured to automatically determine status information for the slave hardware thread/hardware resource and communicate a status response to the master hardware thread without interrupting processing of the slave hardware thread/hardware resource.
    Type: Grant
    Filed: November 5, 2019
    Date of Patent: March 8, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jamie R. Kuesel, Mark G. Kupferschmidt, Paul E. Schardt, Robert A. Shearer
  • Patent number: 11269651
    Abstract: A system for processing instructions with extended results includes a first instruction execution unit having a first result bus for execution of processor instructions. The system further includes a second instruction execution unit having a second result bus for execution of processor instructions. The first instruction execution unit is configured to selectively send a portion of results calculated by the first instruction execution unit to the second instruction execution unit during prosecution of a processor instruction if the second instruction execution unit is not used for executing the processor instruction and if the received processor instruction produces a result having a data width greater than the width of the first result bus. The second instruction execution unit is configured to receive the portion of results calculated by the first instruction execution unit and put the received results on the second results bus.
    Type: Grant
    Filed: September 10, 2019
    Date of Patent: March 8, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael Klein, Nicol Hofmann, Cedric Lichtenau, Osher Yifrach
  • Patent number: 11263009
    Abstract: Disclosed embodiments relate to systems and methods for performing 16-bit floating-point vector dot product instructions. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first source, second source, and destination vectors, the opcode to indicate execution circuitry is to multiply N pairs of 16-bit floating-point formatted elements of the specified first and second sources, and accumulate the resulting products with previous contents of a corresponding single-precision element of the specified destination, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.
    Type: Grant
    Filed: February 4, 2021
    Date of Patent: March 1, 2022
    Assignee: Intel Corporation
    Inventors: Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Raanan Sade, Menachem Adelman, Zeev Sperber, Amit Gradstein, Simon Rubanovich
  • Patent number: 11263018
    Abstract: A vector processor is disclosed. The vector processor includes a plurality of register files provided to each of a plurality of single instruction multiple data (SIMD) lanes, storing each of a plurality of pieces of data, and respectively outputting input data to be used in a current cycle among the plurality of pieces of data, a shuffle unit for receiving a plurality of pieces of input data outputted from the plurality of register files, and performing shuffling such that the received plurality of pieces of input data respectively correspond to the plurality of SIMD lanes and outputting the same; and a command execution unit for performing a parallel operation by receiving input data outputted from the shuffle unit.
    Type: Grant
    Filed: October 23, 2017
    Date of Patent: March 1, 2022
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Ki-seok Kwon, Jae-un Park, Dong-kwan Suh, Kang-jin Yoon
  • Patent number: 11249762
    Abstract: An apparatus and method are provided for handling incorrect branch direction predictions. The apparatus has processing circuitry for executing instructions, branch prediction circuitry for making branch direction predictions in respect of branch instructions, and fetch circuitry for fetching instructions from an instruction cache in dependence on the branch direction predictions and for forwarding the fetched instructions to the processing circuitry for execution. A cache location buffer stores cache location information for a given branch instruction for which accuracy of the branch direction predictions made by the branch prediction circuitry is below a determined threshold. The cache location information identifies where within the instruction cache one or more instructions are stored that will need to be executed in the event that a subsequent branch direction prediction made for the given branch instruction is incorrect.
    Type: Grant
    Filed: October 24, 2019
    Date of Patent: February 15, 2022
    Assignee: Arm Limited
    Inventors: Houdhaifa Bouzguarrou, Guillaume Bolbenes, Thibaut Elie Lanois
  • Patent number: 11243773
    Abstract: A computer system, includes a store queue that holds store entries and a load queue that holds load entries sleeping on a store entry. A processor detects a store drain merge operation call and generates a pair of store tags comprising a first store tag corresponding to a first store entry to be drained and a second store tag corresponding to a second store entry to be drained. The processor determines the pair of store tags an even-type store tag or an odd-type store tag. The processor disables the odd store tag included in the even-type store tag pair when detecting the even-type store tag pair, and wakes up a first load entry dependent on the even store tag and a second load entry dependent on the odd store tag based on the even store tag included in the even-type store tag pair while the odd store tag is disabled.
    Type: Grant
    Filed: December 14, 2020
    Date of Patent: February 8, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bryan Lloyd, David Campbell, Brian Chen, Robert A. Cordes
  • Patent number: 11231933
    Abstract: A method and apparatus for controlling pre-fetching in a processor. A processor includes an execution pipeline and an instruction pre-fetch unit. The execution pipeline is configured to execute instructions. The instruction pre-fetch unit is coupled to the execution pipeline. The instruction pre-fetch unit includes instruction storage to store pre-fetched instructions, and pre-fetch control logic. The pre-fetch control logic is configured to fetch instructions from memory and store the fetched instructions in the instruction storage. The pre-fetch control logic is also configured to provide instructions stored in the instruction storage to the execution pipeline for execution. The pre-fetch control logic is further configured set a maximum number of instruction words to be pre-fetched for execution subsequent to execution of an instruction currently being executed in the execution pipeline.
    Type: Grant
    Filed: April 9, 2020
    Date of Patent: January 25, 2022
    Assignee: Texas Instruments Incorporated
    Inventors: Christian Wiencke, Johann Zipperer
  • Patent number: 11226926
    Abstract: Multi-level hierarchical routing matrices for pattern-recognition processors are provided. One such routing matrix may include one or more programmable and/or non-programmable connections in and between levels of the matrix. The connections may couple routing lines to feature cells, groups, rows, blocks, or any other arrangement of components of the pattern-recognition processor.
    Type: Grant
    Filed: May 27, 2020
    Date of Patent: January 18, 2022
    Assignee: Micron Technology, Inc.
    Inventors: Harold B Noyes, David R. Brown
  • Patent number: 11226927
    Abstract: Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) that each may comprise a configuration buffer, a sequencer coupled to the configuration buffer of each of the plurality of PEs and configured to distribute one or more PE configurations to the plurality of PEs, and a gasket memory coupled to the plurality of PEs and being configured to store at least one PE execution result to be used by at least one of the plurality of PEs during a next PE configuration.
    Type: Grant
    Filed: September 13, 2019
    Date of Patent: January 18, 2022
    Assignee: AZURENGINE TECHNOLOGIES ZHUHAI INC.
    Inventors: Yuan Li, Jianbin Zhu
  • Patent number: 11216281
    Abstract: Various embodiments are provided for facilitating data processing by one or more processors in a computing system. An instruction to be executed may be obtained. The instruction is a single instruction multiple data (SIMD) reduction operation of an operand vector with a plurality of vector elements. The SIMD reduction operation may be executed to produce a result vector with a plurality of alternative vector elements. One or more reduction functions may be performed on each of a pair of vector elements from the plurality of vector elements of the operand vector and a result of the one or more reduction functions may be placed in a corresponding vector element of the result vector.
    Type: Grant
    Filed: May 14, 2019
    Date of Patent: January 4, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bruce Fleischer, Kailash Gopalakrishnan, Jinwook Oh, Sunil Shukla, Silvia Mueller
  • Patent number: 11204771
    Abstract: Aspects of the present disclosure relate to an apparatus comprising decode circuitry to receive an instruction and identify the received instruction as a load instruction, and prediction circuitry to predict, based on a prediction scheme, a target address of the load instruction, and trigger a speculative memory access in respect of the predicted target address.
    Type: Grant
    Filed: October 24, 2019
    Date of Patent: December 21, 2021
    Assignee: Arm Limited
    Inventors: Alexander Alfred Hornung, Jose Gonzalez-Gonzalez
  • Patent number: 11204765
    Abstract: A graphics processing unit (GPU) utilizes block general purpose registers (bGPRs) to load multiple waves of samples for an instruction group into a processing pipeline and receive processed samples from the pipeline. The GPU acquires a credit for the bGPR for execution of the instruction group for a first wave using a persistent GPR and the bGPR. The GPU refunds the credit upon loading the first wave into the pipeline. The GPU executes a subsequent wave for the instruction group to load samples to the pipeline when at least one credit is available and the pipeline is processing the first wave. The GPU stores an indication of each wave that has been loaded into the pipeline in a queue. The GPU returns samples for a next wave in the queue from the pipeline to the bGPR for further processing when the physical slot of the bGPR is available.
    Type: Grant
    Filed: August 26, 2020
    Date of Patent: December 21, 2021
    Assignee: QUALCOMM Incorporated
    Inventors: Yun Du, Fei Wei, Gang Zhong, Minjie Huang, Jian Jiang, Zilin Ying, Baoguang Yang, Yang Xia, Jing Han, Liangxiao Hu, Chihong Zhang, Chun Yu, Andrew Evan Gruber, Eric Demers
  • Patent number: 11188326
    Abstract: Selected installed function of a multi-function instruction is hidden such that even though a processor is capable of performing the hidden installed function, the availability of the hidden function is hidden such that responsive to the multi-function instruction querying the availability of functions, only functions not hidden are reported as installed.
    Type: Grant
    Filed: March 18, 2020
    Date of Patent: November 30, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Dan F. Greiner, Damian L. Osisek, Timothy J. Slegel
  • Patent number: 11188334
    Abstract: Obsoleting values stored in registers in a processor based on processing obsolescent register-encoded instructions is disclosed. The processor is configured to support execution of read and/or write instructions that include obsolescence encoding indicating that one or more of its source and/or target register operands are to be obsoleted by the processor. A register encoded as obsolescent means the data value stored in such register will not be used by subsequent instructions in an instruction stream, and thus does not need to be retained. Thus, such register can be set as being in an obsolescent state so that the data value stored in such register can be ignored to improve performance. As one example, data values for registers having an obsolescent state can be ignored and thus not stored in a saved context for a process being switched out, thus conserving memory and improving processing time for a process switch.
    Type: Grant
    Filed: December 2, 2019
    Date of Patent: November 30, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Thomas Andrew Sartorius, Thomas Philip Speier, Michael Scott McIlvaine, James Norris Dieffenderfer, Rodney Wayne Smith
  • Patent number: 11188332
    Abstract: A method, processor and/or system for processing data is disclosed that in an aspect includes providing a physical register file with one or more register file entries for storing data; identifying each physical register file entry with a row identifier to identify the entry row in the physical register file; enabling one or more columns within a target entry row of the physical register file; and revising data in the columns enabled within the target entry row of the physical register file. In an aspect, each physical register file entry is partitioned into a plurality of columns having a bit width and a column mask preferably is used to enable the one or more columns within the target row of the physical register file, and data is revised in only the columns enabled by the column mask.
    Type: Grant
    Filed: May 10, 2019
    Date of Patent: November 30, 2021
    Assignee: International Business Machines Corporation
    Inventors: Steven J. Battle, Salma Ayub, Brian D. Barrick, Joshua W. Bowman, Susan E. Eisen, Brandon Goddard, Cliff Kucharski, Dung Q. Nguyen
  • Patent number: 11182159
    Abstract: Methods, systems, and apparatus, including computer-readable media, are described for performing vector reductions using a shared scratchpad memory of a hardware circuit having processor cores that communicate with the shared memory. For each of the processor cores, a respective vector of values is generated based on computations performed at the processor core. The shared memory receives the respective vectors of values from respective resources of the processor cores using a direct memory access (DMA) data path of the shared memory. The shared memory performs an accumulation operation on the respective vectors of values using an operator unit coupled to the shared memory. The operator unit is configured to accumulate values based on arithmetic operations encoded at the operator unit. A result vector is generated based on performing the accumulation operation using the respective vectors of values.
    Type: Grant
    Filed: August 31, 2020
    Date of Patent: November 23, 2021
    Assignee: Google LLC
    Inventors: Thomas Norrie, Gurushankar Rajamani, Andrew Everett Phelps, Matthew Leever Hedlund, Norman Paul Jouppi