Patents Examined by Keith E Vicary
  • Patent number: 11966742
    Abstract: Systems, methods, and apparatuses relating to instructions to reset software thread runtime property histories in a hardware processor are described. In one embodiment, a hardware processor includes a hardware guide scheduler comprising a plurality of software thread runtime property histories; a decoder to decode a single instruction into a decoded single instruction, the single instruction having a field that identifies a model-specific register; and an execution circuit to execute the decoded single instruction to check that an enable bit of the model-specific register is set, and when the enable bit is set, to reset the plurality of software thread runtime property histories of the hardware guide scheduler.
    Type: Grant
    Filed: May 3, 2023
    Date of Patent: April 23, 2024
    Assignee: Intel Corporation
    Inventors: Eliezer Weissmann, Mark Charney, Michael Mishaeli, Robert Valentine, Itai Ravid, Jason W. Brandt, Gilbert Neiger, Baruch Chaikin, Efraim Rotem
  • Patent number: 11954496
    Abstract: In various examples, systems and methods for reducing written requirements in a system on chip (SoC) are described herein. For instance, a total number of iterations may be determined for processing data, such as data representing an array. In some circumstances, a set of iterations may include a first number of iterations that is less than a second number of iterations. As such, and during execution of the set of iterations, a predicate flag corresponding to an excess iteration of the set of iterations may be generated, where the excess iteration corresponds to an iteration that is part of a number of excess iterations that is associated with a difference between the first number of iterations and the second number of iterations. Based on the predicate flag, one or more first values corresponding to the iteration may be prevented from being written to memory.
    Type: Grant
    Filed: August 2, 2021
    Date of Patent: April 9, 2024
    Assignee: NVIDIA Corporation
    Inventors: Ching-Yu Hung, Ravi P Singh, Jagadeesh Sankaran, Yen-Te Shih, Ahmad Itani
  • Patent number: 11941397
    Abstract: Techniques to take advantage of the single-instruction-multiple-data (SIMD) capabilities of a processor to process data blocks can include implementing an instruction to fuse the data blocks together. The fuse input instruction can have a first input vector, a second input vector, a select input, a first output vector, and a second output vector. The fuse input instruction selects a portion of the first input vector and a portion of the second input vector based on the select input, sign extends the selected portion of the first input vector and the selected portion of the second input vector, and shuffles data elements of the sign extended portion of the first input vector with data elements of the sign extended portion of the second input vector to generate the first and second output vectors.
    Type: Grant
    Filed: May 31, 2022
    Date of Patent: March 26, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Xiaodan Tan, Paul Gilbert Meyer
  • Patent number: 11941399
    Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream. Once fetched data elements in the data stream are disposed in lanes in a stream head register in the fixed order. Some lanes may be invalid, for example when the number of remaining data elements are less than the number of lanes in the stream head register. The streaming engine automatically produces a valid data word stored in a stream valid register indicating lanes holding valid data. The data in the stream valid register may be automatically stored in a predicate register or otherwise made available. This data can be used to control vector SIMD operations or may be combined with other predicate register data.
    Type: Grant
    Filed: March 7, 2022
    Date of Patent: March 26, 2024
    Assignee: Texas Instruments Incorporated
    Inventors: Joseph Zbiciak, Son H. Tran
  • Patent number: 11928474
    Abstract: Selectively updating branch predictors for loops executed from loop buffers is disclosed herein. In some aspects, a branch predictor update circuit of a processor is configured to detect a loop comprising a plurality of loop instructions in an instruction stream, and to determine that the loop is stored within a loop buffer circuit of the processor. The branch predictor update circuit is further configured to determine a count of potential history register updates to the history register for the plurality of loop instructions, and to determine whether the count of potential history register updates exceeds a size of the history register. The branch predictor update circuit is also configured to, responsive to determining that the count of potential history register updates does not exceed the size of the history register, update a branch predictor of the branch predictor circuit based on the plurality of loop instructions.
    Type: Grant
    Filed: June 3, 2022
    Date of Patent: March 12, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Rami Mohammad Al Sheikh, Saransh Jain, Michael Scott McIlvaine, Daren Eugene Streett
  • Patent number: 11907713
    Abstract: Systems, methods, and apparatuses relating to a sign modification field for fused operations in a configurable spatial accelerator are described.
    Type: Grant
    Filed: December 28, 2019
    Date of Patent: February 20, 2024
    Assignee: Intel Corporation
    Inventors: Kermin E. Chofleming, Chuanjun Zhang, Daniel Towner, Simon C. Steely, Jr., Benjamin Keen
  • Patent number: 11907726
    Abstract: Systems and methods for virtually partitioning an integrated circuit may include identifying dimensional attributes of a target input dataset and selecting a data partitioning scheme from a plurality of distinct data partitioning schemes for the target input dataset based on the dimensional attributes of the target dataset and architectural attributes of an integrated circuit. The methods described herein may also include disintegrating the target dataset into a plurality of distinct subsets of data based on the selected data partitioning scheme and identifying a virtual processing core partitioning scheme from a plurality of distinct processing core partitioning schemes for an architecture of the integrated circuit based on the disintegration of the target input dataset.
    Type: Grant
    Filed: October 17, 2022
    Date of Patent: February 20, 2024
    Assignee: quadric.io, Inc.
    Inventors: Nigel Drego, Aman Sikka, Mrinalini Ravichandran, Robert Daniel Firu, Veerbhan Kheterpal
  • Patent number: 11893393
    Abstract: A microprocessor system comprises a computational array and a hardware arbiter. The computational array includes a plurality of computation units. Each of the plurality of computation units operates on a corresponding value addressed from memory. The hardware arbiter is configured to control issuing of at least one memory request for one or more of the corresponding values addressed from the memory for the computation units. The hardware arbiter is also configured to schedule a control signal to be issued based on the issuing of the memory requests.
    Type: Grant
    Filed: October 22, 2021
    Date of Patent: February 6, 2024
    Assignee: Tesla, Inc.
    Inventors: Emil Talpes, Peter Joseph Bannon, Kevin Altair Hurd
  • Patent number: 11886877
    Abstract: A processor may include a plurality of data memories storing operands that may be operated upon by the processor. Load/store operations may specify a memory location in one of the data memories to be accessed using a memory select value that selects the data memory and an address within the selected data memory. The memory select values may be mapped from virtual memory select values associated with the load/store operations to physical memory select values that may be used to access the data memory.
    Type: Grant
    Filed: December 13, 2021
    Date of Patent: January 30, 2024
    Assignee: Apple Inc.
    Inventors: Richard T. Witek, Peter C. Eastty, Rajarshi Mukherjee
  • Patent number: 11880687
    Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. In a representative embodiment, a system includes an interconnection network, a processor, a host interface, and a configurable circuit cluster. The configurable circuit cluster may include a plurality of configurable circuits arranged in an array; an asynchronous packet network and a synchronous network coupled to each configurable circuit of the array; and a memory interface circuit and a dispatch interface circuit coupled to the asynchronous packet network and to the interconnection network. Each configurable circuit includes instruction or configuration memories for selection of a current data path configuration, a master synchronous network input, and a data path configuration for a next configurable circuit.
    Type: Grant
    Filed: January 26, 2023
    Date of Patent: January 23, 2024
    Assignee: Micron Technology, Inc.
    Inventor: Tony M. Brewer
  • Patent number: 11868306
    Abstract: A processing system includes a processing unit and a memory device. The memory device includes a processing-in-memory (PIM) module that performs processing operations on behalf of the processing unit. An instruction set architecture (ISA) of the PIM module has fewer instructions than an ISA of the processing unit. Instructions received from the processing unit are translated such that processing resources of the PIM module are virtualized. As a result, the PIM module concurrently performs processing operations for multiple threads or applications of the processing unit.
    Type: Grant
    Filed: September 13, 2022
    Date of Patent: January 9, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Michael L. Chu, Ashwin Aji, Muhammad Amber Hassaan
  • Patent number: 11868779
    Abstract: Aspects of the invention include a computer-implemented method of updating metadata prediction tables. The computer-implemented method includes establishing, in the metadata prediction tables, a prediction of how a set of instructions will resolve and identifying that the set of instructions is completed. The computer-implemented method also includes determining, upon completion of the set of instructions, whether prediction update queues (PUQs) associated with the set of instructions indicate that the set of instructions resolved in one of a plurality of prescribed manners relative to the prediction and deciding that the metadata predictions tables are candidates to be updated based on the PUQs indicating that the set of instructions resolved in one of the plurality of prescribed manners.
    Type: Grant
    Filed: September 9, 2021
    Date of Patent: January 9, 2024
    Assignee: International Business Machines Corporation
    Inventors: James Raymond Cuffney, Adam Benjamin Collura, James Bonanno, Brian Robert Prasky, Edward Thomas Malley, Suman Amugothu
  • Patent number: 11861220
    Abstract: Methods of memory allocation in which registers referenced by different groups of instances of the same task are mapped to individual logical memories. Other example methods describe the mapping of registers referenced by a task to different banks within a single logical memory and in various examples this mapping may take into consideration which bank is likely to be the dominant bank for the particular task and the allocation for one or more other tasks.
    Type: Grant
    Filed: February 14, 2020
    Date of Patent: January 2, 2024
    Assignee: Imagination Technologies Limited
    Inventors: Isuru Herath, Richard Broadhurst
  • Patent number: 11847456
    Abstract: Livelock recovery circuits configured to detect livelock in a processor, and cause the processor to transition to a known safe state when livelock is detected. The livelock recovery circuits include detection logic configured to detect that the processor is in livelock when the processor has illegally repeated an instruction; and transition logic configured to cause the processor to transition to a safe state when livelock has been detected by the detection logic.
    Type: Grant
    Filed: October 6, 2022
    Date of Patent: December 19, 2023
    Assignee: Imagination Technologies Limited
    Inventors: Ashish Darbari, Iain Singleton
  • Patent number: 11822923
    Abstract: A load/store unit includes a first queue including a first entry for a store operation and a second queue including a second entry for a load operation that includes a return instruction that redirects a program flow to a location indicated by the return instruction. The load/store unit also includes a processor to determine that the store operation matches the load operation and selectively perform store-to-load forwarding (STLF) of a return address for the return instruction from the first entry to the second entry based on whether the store operation is associated with a call instruction. The load/store unit forwards the return address to the second entry in response to the store operation being associated with the call instruction. The load/store unit blocks forwarding until the store operation retires in response to the store operation not being associated with the call instruction.
    Type: Grant
    Filed: June 25, 2019
    Date of Patent: November 21, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventor: David Kaplan
  • Patent number: 11809874
    Abstract: A processor may include an instruction distribution circuit and a plurality of execution pipelines. The instruction distribution circuit may distribute a conditional instruction to a first execution pipeline for execution when the conditional instruction is associated with a prediction of a high confidence level, or to a second execution pipeline for execution when the conditional instruction is associated with a prediction of a low confidence level. The second execution pipeline, not the first execution pipeline, may directly instruct the processor to obtain an instruction from a target address for execution, when the conditional instruction is mispredicted. Thus, when the conditional instruction is distributed to the first execution pipeline for execution and determined to be mispredicted, the first execution pipeline may cause the conditional instruction to be re-executed in the second execution pipeline to cause the instruction from the correct target address to be obtained for execution.
    Type: Grant
    Filed: February 1, 2022
    Date of Patent: November 7, 2023
    Assignee: Apple Inc.
    Inventors: Ethan R Schuchman, Niket K Choudhary, Kulin N Kothari, Haoyan Jia, Ian D Kountanis, Douglas C Holman, Wei-Han Lien, Pruthivi Vuyyuru
  • Patent number: 11789742
    Abstract: Techniques related to executing a plurality of instructions by a processor comprising a method for executing a plurality of instructions by a processor. The method comprises detecting a pipeline hazard based on one or more instructions provided for execution by an instruction execution pipeline, beginning execution of an instruction, of the one or more instructions on the instruction execution pipeline, stalling a portion of the instruction execution pipeline based on the detected pipeline hazard, storing a register state associated with the execution of the instruction based on the stalling, determining that the pipeline hazard has been resolved, and restoring the register state to the instruction execution pipeline based on the determination.
    Type: Grant
    Filed: March 7, 2022
    Date of Patent: October 17, 2023
    Assignee: Texas Instruments Incorporated
    Inventors: Timothy D. Anderson, Duc Bui, Joseph Zbiciak, Reid E. Tatge
  • Patent number: 11775305
    Abstract: Aspects of the present disclosure relate to an apparatus comprising fetch circuitry. The fetch circuitry comprises a pointer-based fetch queue for queuing processing instructions retrieved from a storage, and pointer storage for storing a pointer identifying a current fetch queue element. The apparatus comprises decode circuitry having a plurality of decode units, and fetch queue extraction circuitry to, based on the pointer, extract the content of a plurality of elements of the fetch queue; apply combinatorial logic to speculatively produce, from the content of said fetch queue entries, a plurality of speculative potential instructions; and transmit each speculative potential instruction to a corresponding one of said decode units. Each decode unit is configured to decode the corresponding speculative potential instruction.
    Type: Grant
    Filed: December 23, 2021
    Date of Patent: October 3, 2023
    Assignee: Arm Limited
    Inventors: Adrian Viorel Popescu, Remus-Gabriel Vultur, Jatin Bhartia
  • Patent number: 11755330
    Abstract: Processors and methods related to tracking exact convergence to guide the recovery process in response to a mispredicted branch are provided. An example processor includes a pipeline having a frontend and a backend. The processor further includes a state table for maintaining information related to at least a subset of branches corresponding to instructions being processed by the processor. The processor further includes state logic configured to access the state table and track locations of any exact convergence points associated with branches corresponding to the instructions being processed by the processor. The state logic is further configured to identify a first recovery method for recovering from a misprediction associated with a branch if a location of an exact convergence point associated with the branch is determined to be in the frontend of the pipeline, else identify a second recovery method for recovering from the misprediction associated with the branch.
    Type: Grant
    Filed: September 13, 2022
    Date of Patent: September 12, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Vignyan Reddy Kothinti Naresh, Shivam Priyadarshi
  • Patent number: 11755324
    Abstract: A computer system, processor, programming instructions and/or method for managing operations of a gather buffer for a processor core load storage unit. The processor core includes a processing pipeline having one or more execution units for processing unaligned load instructions that executes in two phases to satisfy. A buffer storage element is provided having a plurality of entries for temporarily collecting partial writeback results retrieved from the memory that are associated with first phase accesses for each of a plurality of unaligned load instructions. An associated logic controller device tracks two parts of the unaligned load to be gathered at independent times, wherein said partial result stored at said buffer storage element comprises a first part of an unaligned load. The second phase load access for the same instruction is independently accessed and later merged with first part of the load data at byte granularity to satisfy the load.
    Type: Grant
    Filed: August 31, 2021
    Date of Patent: September 12, 2023
    Assignee: International Business Machines Corporation
    Inventors: Kimberly M. Fernsler, Bryan Lloyd, David A. Hrusecky, David A. Campbell