Patents Examined by Michael J Metzger
-
Patent number: 11281467Abstract: Circuitry comprises a prediction register having one or more entries each storing prediction data; prediction circuitry configured to map a value of the stored prediction data to a prediction of whether or not a branch represented by a given branch instruction is predicted to be taken, according to a data mapping; and control circuitry configured to selectively vary the data mapping between the prediction and the value of the stored prediction data.Type: GrantFiled: October 16, 2019Date of Patent: March 22, 2022Assignee: Arm LimitedInventors: Houdhaifa Bouzguarrou, Guillaume Bolbenes, Vincenzo Consales
-
Patent number: 11281466Abstract: A floating point unit includes a non-pickable scheduler queue (NSQ) that offers a load operation concurrently with a load store unit retrieving load data for an operand that is to be loaded by the load operation. The floating point unit also includes a renamer that renames architectural registers used by the load operation and allocates physical register numbers to the load operation in response to receiving the load operation from the NSQ. The floating point unit further includes a set of pickable scheduler queues that receive the load operation from the renamer and store the load operation prior to execution. A physical register file is implemented in the floating point unit and a free list is used to store physical register numbers of entries in the physical register file that are available for allocation.Type: GrantFiled: October 22, 2019Date of Patent: March 22, 2022Assignees: Advanced Micro Devices, Inc., ATI TECHNOLOGIES ULCInventors: Arun A. Nair, Michael Estlick, Erik Swanson, Sneha V. Desai, Donglin Ji
-
Patent number: 11275583Abstract: An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group.Type: GrantFiled: August 3, 2017Date of Patent: March 15, 2022Assignee: Intel CorporationInventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
-
Patent number: 11275589Abstract: A processor interacts with a memory set including a cache memory, a first memory storing at least a first piece of information in a first information group, and a second memory storing at least a second piece of information in a second information group. In response to a first cache miss and following a first request from the processor for the first piece of information, the first piece of information obtained from the first memory is supplied to the processor. After a second request from the processor for the second piece of information, the second piece of information obtained from the second memory is supplied to the processor, even if the first information group is currently being transferred from the first memory for loading into the cache memory.Type: GrantFiled: September 17, 2019Date of Patent: March 15, 2022Assignees: STMicroelectronics (Rousset) SAS, STMicroelectronics (Grenoble 2) SASInventors: Sebastien Metzger, Silvia Brini
-
Patent number: 11269690Abstract: A circuit arrangement and program product for dynamically providing a status of a hardware thread/hardware resource independent of the operation of the hardware thread/hardware resource using an inter-thread communication protocol. A master hardware thread may be configured to communicate status requests to associated slave hardware threads and/or hardware resources. Each slave hardware thread/hardware resource may be configured with hardware logic configured to automatically determine status information for the slave hardware thread/hardware resource and communicate a status response to the master hardware thread without interrupting processing of the slave hardware thread/hardware resource.Type: GrantFiled: November 5, 2019Date of Patent: March 8, 2022Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jamie R. Kuesel, Mark G. Kupferschmidt, Paul E. Schardt, Robert A. Shearer
-
Patent number: 11269651Abstract: A system for processing instructions with extended results includes a first instruction execution unit having a first result bus for execution of processor instructions. The system further includes a second instruction execution unit having a second result bus for execution of processor instructions. The first instruction execution unit is configured to selectively send a portion of results calculated by the first instruction execution unit to the second instruction execution unit during prosecution of a processor instruction if the second instruction execution unit is not used for executing the processor instruction and if the received processor instruction produces a result having a data width greater than the width of the first result bus. The second instruction execution unit is configured to receive the portion of results calculated by the first instruction execution unit and put the received results on the second results bus.Type: GrantFiled: September 10, 2019Date of Patent: March 8, 2022Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael Klein, Nicol Hofmann, Cedric Lichtenau, Osher Yifrach
-
Patent number: 11263009Abstract: Disclosed embodiments relate to systems and methods for performing 16-bit floating-point vector dot product instructions. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first source, second source, and destination vectors, the opcode to indicate execution circuitry is to multiply N pairs of 16-bit floating-point formatted elements of the specified first and second sources, and accumulate the resulting products with previous contents of a corresponding single-precision element of the specified destination, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.Type: GrantFiled: February 4, 2021Date of Patent: March 1, 2022Assignee: Intel CorporationInventors: Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Raanan Sade, Menachem Adelman, Zeev Sperber, Amit Gradstein, Simon Rubanovich
-
Patent number: 11263018Abstract: A vector processor is disclosed. The vector processor includes a plurality of register files provided to each of a plurality of single instruction multiple data (SIMD) lanes, storing each of a plurality of pieces of data, and respectively outputting input data to be used in a current cycle among the plurality of pieces of data, a shuffle unit for receiving a plurality of pieces of input data outputted from the plurality of register files, and performing shuffling such that the received plurality of pieces of input data respectively correspond to the plurality of SIMD lanes and outputting the same; and a command execution unit for performing a parallel operation by receiving input data outputted from the shuffle unit.Type: GrantFiled: October 23, 2017Date of Patent: March 1, 2022Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Ki-seok Kwon, Jae-un Park, Dong-kwan Suh, Kang-jin Yoon
-
Patent number: 11249762Abstract: An apparatus and method are provided for handling incorrect branch direction predictions. The apparatus has processing circuitry for executing instructions, branch prediction circuitry for making branch direction predictions in respect of branch instructions, and fetch circuitry for fetching instructions from an instruction cache in dependence on the branch direction predictions and for forwarding the fetched instructions to the processing circuitry for execution. A cache location buffer stores cache location information for a given branch instruction for which accuracy of the branch direction predictions made by the branch prediction circuitry is below a determined threshold. The cache location information identifies where within the instruction cache one or more instructions are stored that will need to be executed in the event that a subsequent branch direction prediction made for the given branch instruction is incorrect.Type: GrantFiled: October 24, 2019Date of Patent: February 15, 2022Assignee: Arm LimitedInventors: Houdhaifa Bouzguarrou, Guillaume Bolbenes, Thibaut Elie Lanois
-
Patent number: 11243773Abstract: A computer system, includes a store queue that holds store entries and a load queue that holds load entries sleeping on a store entry. A processor detects a store drain merge operation call and generates a pair of store tags comprising a first store tag corresponding to a first store entry to be drained and a second store tag corresponding to a second store entry to be drained. The processor determines the pair of store tags an even-type store tag or an odd-type store tag. The processor disables the odd store tag included in the even-type store tag pair when detecting the even-type store tag pair, and wakes up a first load entry dependent on the even store tag and a second load entry dependent on the odd store tag based on the even store tag included in the even-type store tag pair while the odd store tag is disabled.Type: GrantFiled: December 14, 2020Date of Patent: February 8, 2022Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bryan Lloyd, David Campbell, Brian Chen, Robert A. Cordes
-
Patent number: 11231933Abstract: A method and apparatus for controlling pre-fetching in a processor. A processor includes an execution pipeline and an instruction pre-fetch unit. The execution pipeline is configured to execute instructions. The instruction pre-fetch unit is coupled to the execution pipeline. The instruction pre-fetch unit includes instruction storage to store pre-fetched instructions, and pre-fetch control logic. The pre-fetch control logic is configured to fetch instructions from memory and store the fetched instructions in the instruction storage. The pre-fetch control logic is also configured to provide instructions stored in the instruction storage to the execution pipeline for execution. The pre-fetch control logic is further configured set a maximum number of instruction words to be pre-fetched for execution subsequent to execution of an instruction currently being executed in the execution pipeline.Type: GrantFiled: April 9, 2020Date of Patent: January 25, 2022Assignee: Texas Instruments IncorporatedInventors: Christian Wiencke, Johann Zipperer
-
Patent number: 11226926Abstract: Multi-level hierarchical routing matrices for pattern-recognition processors are provided. One such routing matrix may include one or more programmable and/or non-programmable connections in and between levels of the matrix. The connections may couple routing lines to feature cells, groups, rows, blocks, or any other arrangement of components of the pattern-recognition processor.Type: GrantFiled: May 27, 2020Date of Patent: January 18, 2022Assignee: Micron Technology, Inc.Inventors: Harold B Noyes, David R. Brown
-
Patent number: 11226927Abstract: Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) that each may comprise a configuration buffer, a sequencer coupled to the configuration buffer of each of the plurality of PEs and configured to distribute one or more PE configurations to the plurality of PEs, and a gasket memory coupled to the plurality of PEs and being configured to store at least one PE execution result to be used by at least one of the plurality of PEs during a next PE configuration.Type: GrantFiled: September 13, 2019Date of Patent: January 18, 2022Assignee: AZURENGINE TECHNOLOGIES ZHUHAI INC.Inventors: Yuan Li, Jianbin Zhu
-
Patent number: 11216281Abstract: Various embodiments are provided for facilitating data processing by one or more processors in a computing system. An instruction to be executed may be obtained. The instruction is a single instruction multiple data (SIMD) reduction operation of an operand vector with a plurality of vector elements. The SIMD reduction operation may be executed to produce a result vector with a plurality of alternative vector elements. One or more reduction functions may be performed on each of a pair of vector elements from the plurality of vector elements of the operand vector and a result of the one or more reduction functions may be placed in a corresponding vector element of the result vector.Type: GrantFiled: May 14, 2019Date of Patent: January 4, 2022Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bruce Fleischer, Kailash Gopalakrishnan, Jinwook Oh, Sunil Shukla, Silvia Mueller
-
Patent number: 11204771Abstract: Aspects of the present disclosure relate to an apparatus comprising decode circuitry to receive an instruction and identify the received instruction as a load instruction, and prediction circuitry to predict, based on a prediction scheme, a target address of the load instruction, and trigger a speculative memory access in respect of the predicted target address.Type: GrantFiled: October 24, 2019Date of Patent: December 21, 2021Assignee: Arm LimitedInventors: Alexander Alfred Hornung, Jose Gonzalez-Gonzalez
-
Patent number: 11204765Abstract: A graphics processing unit (GPU) utilizes block general purpose registers (bGPRs) to load multiple waves of samples for an instruction group into a processing pipeline and receive processed samples from the pipeline. The GPU acquires a credit for the bGPR for execution of the instruction group for a first wave using a persistent GPR and the bGPR. The GPU refunds the credit upon loading the first wave into the pipeline. The GPU executes a subsequent wave for the instruction group to load samples to the pipeline when at least one credit is available and the pipeline is processing the first wave. The GPU stores an indication of each wave that has been loaded into the pipeline in a queue. The GPU returns samples for a next wave in the queue from the pipeline to the bGPR for further processing when the physical slot of the bGPR is available.Type: GrantFiled: August 26, 2020Date of Patent: December 21, 2021Assignee: QUALCOMM IncorporatedInventors: Yun Du, Fei Wei, Gang Zhong, Minjie Huang, Jian Jiang, Zilin Ying, Baoguang Yang, Yang Xia, Jing Han, Liangxiao Hu, Chihong Zhang, Chun Yu, Andrew Evan Gruber, Eric Demers
-
Patent number: 11188326Abstract: Selected installed function of a multi-function instruction is hidden such that even though a processor is capable of performing the hidden installed function, the availability of the hidden function is hidden such that responsive to the multi-function instruction querying the availability of functions, only functions not hidden are reported as installed.Type: GrantFiled: March 18, 2020Date of Patent: November 30, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Dan F. Greiner, Damian L. Osisek, Timothy J. Slegel
-
Patent number: 11188334Abstract: Obsoleting values stored in registers in a processor based on processing obsolescent register-encoded instructions is disclosed. The processor is configured to support execution of read and/or write instructions that include obsolescence encoding indicating that one or more of its source and/or target register operands are to be obsoleted by the processor. A register encoded as obsolescent means the data value stored in such register will not be used by subsequent instructions in an instruction stream, and thus does not need to be retained. Thus, such register can be set as being in an obsolescent state so that the data value stored in such register can be ignored to improve performance. As one example, data values for registers having an obsolescent state can be ignored and thus not stored in a saved context for a process being switched out, thus conserving memory and improving processing time for a process switch.Type: GrantFiled: December 2, 2019Date of Patent: November 30, 2021Assignee: Microsoft Technology Licensing, LLCInventors: Thomas Andrew Sartorius, Thomas Philip Speier, Michael Scott McIlvaine, James Norris Dieffenderfer, Rodney Wayne Smith
-
Patent number: 11188332Abstract: A method, processor and/or system for processing data is disclosed that in an aspect includes providing a physical register file with one or more register file entries for storing data; identifying each physical register file entry with a row identifier to identify the entry row in the physical register file; enabling one or more columns within a target entry row of the physical register file; and revising data in the columns enabled within the target entry row of the physical register file. In an aspect, each physical register file entry is partitioned into a plurality of columns having a bit width and a column mask preferably is used to enable the one or more columns within the target row of the physical register file, and data is revised in only the columns enabled by the column mask.Type: GrantFiled: May 10, 2019Date of Patent: November 30, 2021Assignee: International Business Machines CorporationInventors: Steven J. Battle, Salma Ayub, Brian D. Barrick, Joshua W. Bowman, Susan E. Eisen, Brandon Goddard, Cliff Kucharski, Dung Q. Nguyen
-
Patent number: 11182159Abstract: Methods, systems, and apparatus, including computer-readable media, are described for performing vector reductions using a shared scratchpad memory of a hardware circuit having processor cores that communicate with the shared memory. For each of the processor cores, a respective vector of values is generated based on computations performed at the processor core. The shared memory receives the respective vectors of values from respective resources of the processor cores using a direct memory access (DMA) data path of the shared memory. The shared memory performs an accumulation operation on the respective vectors of values using an operator unit coupled to the shared memory. The operator unit is configured to accumulate values based on arithmetic operations encoded at the operator unit. A result vector is generated based on performing the accumulation operation using the respective vectors of values.Type: GrantFiled: August 31, 2020Date of Patent: November 23, 2021Assignee: Google LLCInventors: Thomas Norrie, Gurushankar Rajamani, Andrew Everett Phelps, Matthew Leever Hedlund, Norman Paul Jouppi