Patents Examined by Keith E Vicary
-
Patent number: 11966742Abstract: Systems, methods, and apparatuses relating to instructions to reset software thread runtime property histories in a hardware processor are described. In one embodiment, a hardware processor includes a hardware guide scheduler comprising a plurality of software thread runtime property histories; a decoder to decode a single instruction into a decoded single instruction, the single instruction having a field that identifies a model-specific register; and an execution circuit to execute the decoded single instruction to check that an enable bit of the model-specific register is set, and when the enable bit is set, to reset the plurality of software thread runtime property histories of the hardware guide scheduler.Type: GrantFiled: May 3, 2023Date of Patent: April 23, 2024Assignee: Intel CorporationInventors: Eliezer Weissmann, Mark Charney, Michael Mishaeli, Robert Valentine, Itai Ravid, Jason W. Brandt, Gilbert Neiger, Baruch Chaikin, Efraim Rotem
-
Patent number: 11954496Abstract: In various examples, systems and methods for reducing written requirements in a system on chip (SoC) are described herein. For instance, a total number of iterations may be determined for processing data, such as data representing an array. In some circumstances, a set of iterations may include a first number of iterations that is less than a second number of iterations. As such, and during execution of the set of iterations, a predicate flag corresponding to an excess iteration of the set of iterations may be generated, where the excess iteration corresponds to an iteration that is part of a number of excess iterations that is associated with a difference between the first number of iterations and the second number of iterations. Based on the predicate flag, one or more first values corresponding to the iteration may be prevented from being written to memory.Type: GrantFiled: August 2, 2021Date of Patent: April 9, 2024Assignee: NVIDIA CorporationInventors: Ching-Yu Hung, Ravi P Singh, Jagadeesh Sankaran, Yen-Te Shih, Ahmad Itani
-
Patent number: 11941397Abstract: Techniques to take advantage of the single-instruction-multiple-data (SIMD) capabilities of a processor to process data blocks can include implementing an instruction to fuse the data blocks together. The fuse input instruction can have a first input vector, a second input vector, a select input, a first output vector, and a second output vector. The fuse input instruction selects a portion of the first input vector and a portion of the second input vector based on the select input, sign extends the selected portion of the first input vector and the selected portion of the second input vector, and shuffles data elements of the sign extended portion of the first input vector with data elements of the sign extended portion of the second input vector to generate the first and second output vectors.Type: GrantFiled: May 31, 2022Date of Patent: March 26, 2024Assignee: Amazon Technologies, Inc.Inventors: Xiaodan Tan, Paul Gilbert Meyer
-
Patent number: 11941399Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream. Once fetched data elements in the data stream are disposed in lanes in a stream head register in the fixed order. Some lanes may be invalid, for example when the number of remaining data elements are less than the number of lanes in the stream head register. The streaming engine automatically produces a valid data word stored in a stream valid register indicating lanes holding valid data. The data in the stream valid register may be automatically stored in a predicate register or otherwise made available. This data can be used to control vector SIMD operations or may be combined with other predicate register data.Type: GrantFiled: March 7, 2022Date of Patent: March 26, 2024Assignee: Texas Instruments IncorporatedInventors: Joseph Zbiciak, Son H. Tran
-
Patent number: 11928474Abstract: Selectively updating branch predictors for loops executed from loop buffers is disclosed herein. In some aspects, a branch predictor update circuit of a processor is configured to detect a loop comprising a plurality of loop instructions in an instruction stream, and to determine that the loop is stored within a loop buffer circuit of the processor. The branch predictor update circuit is further configured to determine a count of potential history register updates to the history register for the plurality of loop instructions, and to determine whether the count of potential history register updates exceeds a size of the history register. The branch predictor update circuit is also configured to, responsive to determining that the count of potential history register updates does not exceed the size of the history register, update a branch predictor of the branch predictor circuit based on the plurality of loop instructions.Type: GrantFiled: June 3, 2022Date of Patent: March 12, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Rami Mohammad Al Sheikh, Saransh Jain, Michael Scott McIlvaine, Daren Eugene Streett
-
Patent number: 11907713Abstract: Systems, methods, and apparatuses relating to a sign modification field for fused operations in a configurable spatial accelerator are described.Type: GrantFiled: December 28, 2019Date of Patent: February 20, 2024Assignee: Intel CorporationInventors: Kermin E. Chofleming, Chuanjun Zhang, Daniel Towner, Simon C. Steely, Jr., Benjamin Keen
-
Patent number: 11907726Abstract: Systems and methods for virtually partitioning an integrated circuit may include identifying dimensional attributes of a target input dataset and selecting a data partitioning scheme from a plurality of distinct data partitioning schemes for the target input dataset based on the dimensional attributes of the target dataset and architectural attributes of an integrated circuit. The methods described herein may also include disintegrating the target dataset into a plurality of distinct subsets of data based on the selected data partitioning scheme and identifying a virtual processing core partitioning scheme from a plurality of distinct processing core partitioning schemes for an architecture of the integrated circuit based on the disintegration of the target input dataset.Type: GrantFiled: October 17, 2022Date of Patent: February 20, 2024Assignee: quadric.io, Inc.Inventors: Nigel Drego, Aman Sikka, Mrinalini Ravichandran, Robert Daniel Firu, Veerbhan Kheterpal
-
Patent number: 11893393Abstract: A microprocessor system comprises a computational array and a hardware arbiter. The computational array includes a plurality of computation units. Each of the plurality of computation units operates on a corresponding value addressed from memory. The hardware arbiter is configured to control issuing of at least one memory request for one or more of the corresponding values addressed from the memory for the computation units. The hardware arbiter is also configured to schedule a control signal to be issued based on the issuing of the memory requests.Type: GrantFiled: October 22, 2021Date of Patent: February 6, 2024Assignee: Tesla, Inc.Inventors: Emil Talpes, Peter Joseph Bannon, Kevin Altair Hurd
-
Patent number: 11886877Abstract: A processor may include a plurality of data memories storing operands that may be operated upon by the processor. Load/store operations may specify a memory location in one of the data memories to be accessed using a memory select value that selects the data memory and an address within the selected data memory. The memory select values may be mapped from virtual memory select values associated with the load/store operations to physical memory select values that may be used to access the data memory.Type: GrantFiled: December 13, 2021Date of Patent: January 30, 2024Assignee: Apple Inc.Inventors: Richard T. Witek, Peter C. Eastty, Rajarshi Mukherjee
-
Patent number: 11880687Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. In a representative embodiment, a system includes an interconnection network, a processor, a host interface, and a configurable circuit cluster. The configurable circuit cluster may include a plurality of configurable circuits arranged in an array; an asynchronous packet network and a synchronous network coupled to each configurable circuit of the array; and a memory interface circuit and a dispatch interface circuit coupled to the asynchronous packet network and to the interconnection network. Each configurable circuit includes instruction or configuration memories for selection of a current data path configuration, a master synchronous network input, and a data path configuration for a next configurable circuit.Type: GrantFiled: January 26, 2023Date of Patent: January 23, 2024Assignee: Micron Technology, Inc.Inventor: Tony M. Brewer
-
Patent number: 11868306Abstract: A processing system includes a processing unit and a memory device. The memory device includes a processing-in-memory (PIM) module that performs processing operations on behalf of the processing unit. An instruction set architecture (ISA) of the PIM module has fewer instructions than an ISA of the processing unit. Instructions received from the processing unit are translated such that processing resources of the PIM module are virtualized. As a result, the PIM module concurrently performs processing operations for multiple threads or applications of the processing unit.Type: GrantFiled: September 13, 2022Date of Patent: January 9, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Michael L. Chu, Ashwin Aji, Muhammad Amber Hassaan
-
Patent number: 11868779Abstract: Aspects of the invention include a computer-implemented method of updating metadata prediction tables. The computer-implemented method includes establishing, in the metadata prediction tables, a prediction of how a set of instructions will resolve and identifying that the set of instructions is completed. The computer-implemented method also includes determining, upon completion of the set of instructions, whether prediction update queues (PUQs) associated with the set of instructions indicate that the set of instructions resolved in one of a plurality of prescribed manners relative to the prediction and deciding that the metadata predictions tables are candidates to be updated based on the PUQs indicating that the set of instructions resolved in one of the plurality of prescribed manners.Type: GrantFiled: September 9, 2021Date of Patent: January 9, 2024Assignee: International Business Machines CorporationInventors: James Raymond Cuffney, Adam Benjamin Collura, James Bonanno, Brian Robert Prasky, Edward Thomas Malley, Suman Amugothu
-
Patent number: 11861220Abstract: Methods of memory allocation in which registers referenced by different groups of instances of the same task are mapped to individual logical memories. Other example methods describe the mapping of registers referenced by a task to different banks within a single logical memory and in various examples this mapping may take into consideration which bank is likely to be the dominant bank for the particular task and the allocation for one or more other tasks.Type: GrantFiled: February 14, 2020Date of Patent: January 2, 2024Assignee: Imagination Technologies LimitedInventors: Isuru Herath, Richard Broadhurst
-
Patent number: 11847456Abstract: Livelock recovery circuits configured to detect livelock in a processor, and cause the processor to transition to a known safe state when livelock is detected. The livelock recovery circuits include detection logic configured to detect that the processor is in livelock when the processor has illegally repeated an instruction; and transition logic configured to cause the processor to transition to a safe state when livelock has been detected by the detection logic.Type: GrantFiled: October 6, 2022Date of Patent: December 19, 2023Assignee: Imagination Technologies LimitedInventors: Ashish Darbari, Iain Singleton
-
Patent number: 11822923Abstract: A load/store unit includes a first queue including a first entry for a store operation and a second queue including a second entry for a load operation that includes a return instruction that redirects a program flow to a location indicated by the return instruction. The load/store unit also includes a processor to determine that the store operation matches the load operation and selectively perform store-to-load forwarding (STLF) of a return address for the return instruction from the first entry to the second entry based on whether the store operation is associated with a call instruction. The load/store unit forwards the return address to the second entry in response to the store operation being associated with the call instruction. The load/store unit blocks forwarding until the store operation retires in response to the store operation not being associated with the call instruction.Type: GrantFiled: June 25, 2019Date of Patent: November 21, 2023Assignee: Advanced Micro Devices, Inc.Inventor: David Kaplan
-
Patent number: 11809874Abstract: A processor may include an instruction distribution circuit and a plurality of execution pipelines. The instruction distribution circuit may distribute a conditional instruction to a first execution pipeline for execution when the conditional instruction is associated with a prediction of a high confidence level, or to a second execution pipeline for execution when the conditional instruction is associated with a prediction of a low confidence level. The second execution pipeline, not the first execution pipeline, may directly instruct the processor to obtain an instruction from a target address for execution, when the conditional instruction is mispredicted. Thus, when the conditional instruction is distributed to the first execution pipeline for execution and determined to be mispredicted, the first execution pipeline may cause the conditional instruction to be re-executed in the second execution pipeline to cause the instruction from the correct target address to be obtained for execution.Type: GrantFiled: February 1, 2022Date of Patent: November 7, 2023Assignee: Apple Inc.Inventors: Ethan R Schuchman, Niket K Choudhary, Kulin N Kothari, Haoyan Jia, Ian D Kountanis, Douglas C Holman, Wei-Han Lien, Pruthivi Vuyyuru
-
Patent number: 11789742Abstract: Techniques related to executing a plurality of instructions by a processor comprising a method for executing a plurality of instructions by a processor. The method comprises detecting a pipeline hazard based on one or more instructions provided for execution by an instruction execution pipeline, beginning execution of an instruction, of the one or more instructions on the instruction execution pipeline, stalling a portion of the instruction execution pipeline based on the detected pipeline hazard, storing a register state associated with the execution of the instruction based on the stalling, determining that the pipeline hazard has been resolved, and restoring the register state to the instruction execution pipeline based on the determination.Type: GrantFiled: March 7, 2022Date of Patent: October 17, 2023Assignee: Texas Instruments IncorporatedInventors: Timothy D. Anderson, Duc Bui, Joseph Zbiciak, Reid E. Tatge
-
Patent number: 11775305Abstract: Aspects of the present disclosure relate to an apparatus comprising fetch circuitry. The fetch circuitry comprises a pointer-based fetch queue for queuing processing instructions retrieved from a storage, and pointer storage for storing a pointer identifying a current fetch queue element. The apparatus comprises decode circuitry having a plurality of decode units, and fetch queue extraction circuitry to, based on the pointer, extract the content of a plurality of elements of the fetch queue; apply combinatorial logic to speculatively produce, from the content of said fetch queue entries, a plurality of speculative potential instructions; and transmit each speculative potential instruction to a corresponding one of said decode units. Each decode unit is configured to decode the corresponding speculative potential instruction.Type: GrantFiled: December 23, 2021Date of Patent: October 3, 2023Assignee: Arm LimitedInventors: Adrian Viorel Popescu, Remus-Gabriel Vultur, Jatin Bhartia
-
Patent number: 11755330Abstract: Processors and methods related to tracking exact convergence to guide the recovery process in response to a mispredicted branch are provided. An example processor includes a pipeline having a frontend and a backend. The processor further includes a state table for maintaining information related to at least a subset of branches corresponding to instructions being processed by the processor. The processor further includes state logic configured to access the state table and track locations of any exact convergence points associated with branches corresponding to the instructions being processed by the processor. The state logic is further configured to identify a first recovery method for recovering from a misprediction associated with a branch if a location of an exact convergence point associated with the branch is determined to be in the frontend of the pipeline, else identify a second recovery method for recovering from the misprediction associated with the branch.Type: GrantFiled: September 13, 2022Date of Patent: September 12, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Vignyan Reddy Kothinti Naresh, Shivam Priyadarshi
-
Patent number: 11755324Abstract: A computer system, processor, programming instructions and/or method for managing operations of a gather buffer for a processor core load storage unit. The processor core includes a processing pipeline having one or more execution units for processing unaligned load instructions that executes in two phases to satisfy. A buffer storage element is provided having a plurality of entries for temporarily collecting partial writeback results retrieved from the memory that are associated with first phase accesses for each of a plurality of unaligned load instructions. An associated logic controller device tracks two parts of the unaligned load to be gathered at independent times, wherein said partial result stored at said buffer storage element comprises a first part of an unaligned load. The second phase load access for the same instruction is independently accessed and later merged with first part of the load data at byte granularity to satisfy the load.Type: GrantFiled: August 31, 2021Date of Patent: September 12, 2023Assignee: International Business Machines CorporationInventors: Kimberly M. Fernsler, Bryan Lloyd, David A. Hrusecky, David A. Campbell