Patents Examined by Keith E Vicary
  • Patent number: 12645463
    Abstract: Techniques and architectures for determining a target of a branch instruction. In an embodiment, a processor core detects a fall through event wherein multiple fetched instructions comprise one or more branch instructions. Based on the fall through event, a repository is provided with respective branch information for each of the one or more branch instructions. The repository functions as a cache that is available to an evaluation circuit at an instruction fetch stage of the processor core. Branch information at the repository is accessible to facilitate a relatively early identification of an instruction as being of a branch instruction type. In another embodiment, the early identification enables re-steering of a speculative execution sequence.
    Type: Grant
    Filed: September 26, 2024
    Date of Patent: June 2, 2026
    Assignee: Intel Corporation
    Inventors: Baishik Biswas, Anant Vithal Nori, Sreenivas Subramoney
  • Patent number: 12639074
    Abstract: An apparatus comprises fetch circuitry configured to fetch instructions for processing by processing circuitry, and prediction circuitry configured to identify instructions to be fetched by the fetch circuitry.
    Type: Grant
    Filed: June 17, 2024
    Date of Patent: May 26, 2026
    Assignee: Arm Limited
    Inventors: Simon Alastair Hartley, Juan José García-Castro Crespo
  • Patent number: 12632258
    Abstract: Techniques are disclosed involving fusing instruction pairs and executing corresponding fused instruction operations. A processor includes fusion detection circuitry to detect a pair of fetched instructions and fuse the instructions into a fused instruction operation, and execution circuitry to execute the fused instruction operation. In one embodiment, a first instruction is executable to perform an operation and a second instruction is executable to adjust a sign of a result of the operation. In another embodiment, the first instruction is executable to perform an operation and the second instruction is executable to find a maximum or minimum, as compared to a comparison operand, of a result of the operation. In another embodiment, the first instruction is executable to perform a vector operation and the second instruction is executable to read a first element of the vector result and overwrite one or more additional elements of the vector result.
    Type: Grant
    Filed: March 22, 2023
    Date of Patent: May 19, 2026
    Assignee: Apple Inc.
    Inventors: Francesco Spadini, Skanda K. Srinivasa, Zhaoxiang Jin
  • Patent number: 12608336
    Abstract: Processors, systems and methods are provided for executing multiple instances of a kernel by multiple segments of columns of a processor. A method may include retrieving kernel information of a kernel by a sequencer, retrieving instructions of the kernel from an instruction cache by the sequencer, repeatedly executing decoded scalar instructions of each instruction group of the kernel successively according to the number of instances of the kernel, repeatedly decoding vector instructions of each instruction group successively according to the number of instances of the kernel, and dispatching a respective configuration to a respective column after the first vector instruction decoding and each repetition of decoding with an amount of space between two subsequent dispatches equal to the number of columns that the kernel occupies.
    Type: Grant
    Filed: July 10, 2023
    Date of Patent: April 21, 2026
    Assignee: AzurEngine Technologies Zhuhai Inc.
    Inventors: Toshio Nagata, Yuan Li, Jianbin Zhu
  • Patent number: 12608208
    Abstract: Various embodiments include techniques for performing memory synchronization operations between processors in a multiprocessor computing system. A first processor transfers data by issuing memory operations to store the data to a shared memory. The first processor issues an asynchronous release operation to a load store unit. In response, the load store unit issues a memory synchronization operation to ensure that the data associated with the memory operations is visible in the shared memory. While the asynchronous release operation is pending, the first processor is able to issue further instructions and perform other operations. When the data associated with the memory operations is visible in the shared memory, the memory synchronization operation completes and the load store unit writes a flag to a separate memory location. Upon detecting that the flag has been written, a second thread, and/or other threads, can reliably read the data stored in the shared memory.
    Type: Grant
    Filed: March 1, 2024
    Date of Patent: April 21, 2026
    Assignee: NVIDIA CORPORATION
    Inventors: Manan Patel, Ivan Tanasic, Daniel Marcovitch, Michael Allen Parker, Srinivas Santosh Kumar Madugula, Raghuram L, Wishwesh Anil Gandhi, Olivier Giroux
  • Patent number: 12602349
    Abstract: In some aspects, a program is executed on a coarse-grained reconfigurable (CGR) processor. The CGR determines that the program produces an output that includes a variable length tensor, determines a maximum size of the variable length tensor and sets, based on the maximum size, a maximum of a counter associated with the program. The counter is set to an initial value of zero. The CGR initiates execution of the program, causing the program to receive an input tensor. Based on determining that the program is operating on a first portion of the input tensor, the CGR performs an update to the counter, to create an updated counter, and communicates the updated counter to one or more consumers within the program. After determining that the program has completed operating on the input tensor, a final size of the output is communicated to one or more downstream consumers external to the program.
    Type: Grant
    Filed: June 23, 2023
    Date of Patent: April 14, 2026
    Assignee: SambaNova Systems, Inc.
    Inventors: Abhishek Srivastava, Matthew Vilim, Raghu Prabhakar, Sankar Rachuru, Zhekun Zhang, Matheen Musaddiq, Apurv Vivek, Sitanshu Gupta
  • Patent number: 12572360
    Abstract: A stream of data is accessed from a memory system using a stream of addresses generated in a first mode of operating a streaming engine in response to executing a first stream instruction. A block cache preload operation is performed on a cache in the memory using a block of addresses generated in a second mode of operating the streaming engine in response to executing a second stream instruction.
    Type: Grant
    Filed: April 14, 2022
    Date of Patent: March 10, 2026
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Joseph Raymond Michael Zbiciak, Timothy David Anderson, Jonathan (Son) Hung Tran, Kai Chirca, Daniel Wu, Abhijeet Ashok Chachad, David M. Thompson
  • Patent number: 12554507
    Abstract: Provided is a method for performing computations near memory. The method includes receiving, at a storage device, first data associated with a first data set, the first data having a first format. The method further includes receiving, at a processor core of the storage device, a request to perform a function on the first data, the function including a first operation and a second operation. The method further includes performing, by a first processor-core acceleration engine of the storage device, the first operation on the first data, based on first processor-core custom instructions, to generate first result data. The method further includes performing, by a first extra-processor-core circuit of the storage device, the second operation on the first result data, based on the first processor-core custom instructions.
    Type: Grant
    Filed: June 2, 2023
    Date of Patent: February 17, 2026
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Jonghyeon Kim, Soogil Jeong
  • Patent number: 12554494
    Abstract: Systems, methods, and apparatuses relating to instructions to reset software thread runtime property histories in a hardware processor are described. In one embodiment, a hardware processor includes a hardware guide scheduler comprising a plurality of software thread runtime property histories; a decoder to decode a single instruction into a decoded single instruction, the single instruction having a field that identifies a model-specific register; and an execution circuit to execute the decoded single instruction to check that an enable bit of the model-specific register is set, and when the enable bit is set, to reset the plurality of software thread runtime property histories of the hardware guide scheduler.
    Type: Grant
    Filed: April 4, 2024
    Date of Patent: February 17, 2026
    Assignee: Intel Corporation
    Inventors: Eliezer Weissmann, Mark Charney, Michael Mishaeli, Robert Valentine, Itai Ravid, Jason W. Brandt, Gilbert Neiger, Baruch Chaikin, Efraim Rotem
  • Patent number: 12547401
    Abstract: Techniques are disclosed that relate to executing fused instructions. A processor may include a decoder circuit and a load/store circuit. The decoder circuit may detect a load/store instruction to load a value from a memory and detect a non-load/store instruction that depends on the value to be loaded. The decoder circuit may fuse the load/store instruction and the non-load/store instruction such that one or more operations that the non-load/store instruction is defined to perform are to be executed within the load/store circuit. The load/store circuit may receive an indication of the fused load/store and non-load/store instructions and then execute one or more operations of the load/store instruction and the one or more operations of the non-load/store instruction using a circuit included in the load/store circuit.
    Type: Grant
    Filed: June 10, 2024
    Date of Patent: February 10, 2026
    Assignee: Apple Inc.
    Inventors: John D. Pape, Skanda K. Srinivasa, Francesco Spadini, Brian T. Mokrzycki
  • Patent number: 12530313
    Abstract: An accelerator circuit includes a control interface to receive a stream of instructions, a first memory to store an input data, and an engine circuit. The engine circuit includes a dispatch circuit to decode an instruction of the stream of instructions into a plurality of commands and a plurality of queue circuits. Each of the plurality of queue circuits supports a queue data structure to store a respective one of the plurality of commands decoded from the instruction, and a plurality of command execution circuits. Each of the plurality of command execution circuits is to receive and execute a command extracted from a corresponding one of the plurality of queues.
    Type: Grant
    Filed: July 3, 2019
    Date of Patent: January 20, 2026
    Assignee: Huaxia General Processor Technologies Inc.
    Inventors: Lei Wang, Shaobo Shi, Zhaonan Meng
  • Patent number: 12530196
    Abstract: Software instructions are executed on a processor within a computer system to configure a streaming engine with stream parameters to define a multidimensional array. The stream parameters define a size for each dimension of the multidimensional array, a null vector count (N), and a selected dimension. Data is fetched from a memory coupled to the streaming engine responsive to the stream parameters. A stream of vectors is formed for the multidimensional array responsive to the stream parameters from the data fetched from memory. N null stream vectors are inserted into the stream of vectors for the selected dimension without fetching respective null data from the memory.
    Type: Grant
    Filed: February 22, 2022
    Date of Patent: January 20, 2026
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Asheesh Bhardwaj, William Franklin Leven, Son Hung Tran, Timothy David Anderson
  • Patent number: 12511250
    Abstract: A memory architecture includes processing circuits co-located with memory subarrays for performing computations within the memory architecture. The memory architecture includes a plurality of decoders in hierarchical levels that include a multicast capability for distributing data or compute operations to individual subarrays. The multicast may be configurable with respect to individual fan-outs at each hierarchical level. A computation workflow may be organized into a compute supertile representing one or more “supertiles” of input data to be processed in the compute supertile. The individual data tiles of the input data supertile may be used by multiple compute tiles executed by the processing circuits of the subarrays, and the data tiles multicast to the respective processing circuits for efficient data loading and parallel computation.
    Type: Grant
    Filed: December 23, 2021
    Date of Patent: December 30, 2025
    Assignee: Intel Corporation
    Inventors: Om Ji Omer, Gurpreet Singh Kalsi, Anirud Thyagharajan, Saurabh Jain, Kamlesh R. Pillai, Sreenivas Subramoney, Avishaii Abuhatzera
  • Patent number: 12498932
    Abstract: Techniques are disclosed relating to physical register sharing. In some embodiments, a processor includes, in a register file, physical registers of a largest architected register size for a given operand type defined in an instruction set architecture (ISA). A register rename circuit of the processor is configured to assign a first physical register to an architected register of the largest architected register size. The register rename circuit is also configured to assign a first portion of a second physical register to a second ISA-defined architected register of a smaller size than the largest architected register size and a second portion of the second physical register to a third ISA-defined architected register of the smaller size. In the second assignment, the second and third architected registers of the smaller size are separate and distinct registers in the ISA.
    Type: Grant
    Filed: December 7, 2023
    Date of Patent: December 16, 2025
    Assignee: Apple Inc.
    Inventors: Deepankar Duggal, Richard F. Russo, Haoyan Jia
  • Patent number: 12481505
    Abstract: A graphflow apparatus includes an information buffer (IB) and a load queue (LQ). The IB is configured to cache an instruction queue. The LQ is used to cache a read instruction queue. The IB includes a speculative bit and a speculative identity (ID) field. The speculative bit indicates whether a current instruction is a speculatively-executable instruction. The speculative ID field stores a speculative ID of one speculative operation on the current instruction.
    Type: Grant
    Filed: May 4, 2023
    Date of Patent: November 25, 2025
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Fan Zhu, Ruoyu Zhou, Wenbo Sun, Xiping Zhou
  • Patent number: 12450063
    Abstract: Apparatus and methods for maintaining approximate uniformity of aging of equivalent processing circuits in a pipeline stage(s) in a processor are disclosed. The processor includes one or more processing units that include one or more pipelines. A pipeline includes a series of pipeline stages each of which performs a particular function. In this regard, the processor also includes an age management circuit (AMC) configured to store performance factors indicative to aging of the equivalent processing circuits in a pipeline stage. In response to work input into a given pipeline stage, the AMC is further configured to route the work input to one of the equivalent processing circuits in the pipeline stage based on the stored performance factors. In so doing, the AMC controls the frequency of use of the equivalent processing circuits in its pipeline stage to substantially maintain uniform aging of the equivalent processing circuits.
    Type: Grant
    Filed: December 8, 2023
    Date of Patent: October 21, 2025
    Assignee: QUALCOMM Incorporated
    Inventors: Hithesh Hassan Lepaksha, Darshan Kumar Nandanwar, Linga Achuta Ram Kumar Nimmala
  • Patent number: 12423109
    Abstract: There is provided an apparatus comprising decoder circuitry to decode store instructions and load instructions. The apparatus includes decoder circuitry to decode store instructions and load instructions. The apparatus also includes prediction circuitry to store load predictions and store predictions. Each load prediction is indexed based on a program counter value of one of the load instructions and comprises information indicative of a predicted store instruction predicted to store data to memory to be subsequently loaded from the memory by that load instruction. Each store prediction is indexed based on the program counter value of one of the store instructions and comprises information indicative of the store instruction being predicted to be indicated as the predicted store instruction in at least one of the plurality of load predictions. Conditions for maintaining the load predictions are different from conditions for maintaining the store predictions.
    Type: Grant
    Filed: February 27, 2024
    Date of Patent: September 23, 2025
    Assignee: Arm Limited
    Inventors: Alexander Cole Shulyak, Zachary Allen Kingsbury, Bipin Prasad Heremagalur Ramaprasad, Abhishek Raja
  • Patent number: 12417104
    Abstract: When a predicted branch type for a given address is a first branch type corresponding to a predicated-loop instruction for triggering processing circuitry to perform a variable number of iterations of a predicated loop body, branch prediction circuitry generates a first type of branch prediction indicative of a predicted number of iterations for the predicated-loop instruction, and omits speculatively updating history information based on the predicted number of iterations. The processing circuitry is able to tolerate at least one unnecessary iteration of the predicated loop body being processed when the predicted number of iterations is too large. For a second branch type, the branch prediction circuitry generates a second type of branch prediction and, at least when a taken branch is predicted in the second type of branch prediction, speculatively updates the history information.
    Type: Grant
    Filed: February 27, 2024
    Date of Patent: September 16, 2025
    Assignee: Arm Limited
    Inventors: Guillaume Bolbenes, Thibaut Elie Lanois, Houdhaifa Bouzguarrou
  • Patent number: 12411694
    Abstract: A system and method for reducing pipeline latency. In one embodiment, a processing system includes a processing pipeline. The processing pipeline includes a plurality of processing stages. Each stage is configured to further processing provided by a previous stage. A first of the stages is configured to perform a first function in a pipeline cycle. A second of the stages is disposed downstream of the first of the stages, and is configured to perform, in a pipeline cycle, a second function that is different from the first function. The first of the stages is further configured to selectably perform the first function and the second function in a pipeline cycle, and bypass the second of the stages.
    Type: Grant
    Filed: May 9, 2023
    Date of Patent: September 9, 2025
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Christian Wiencke, Shrey Sudhir Bhatia, Jeroen Vliegen
  • Patent number: 12411688
    Abstract: A computer system, processor, programming instructions and/or method for managing operations of a gather buffer for a processor core load storage unit. The processor core includes a processing pipeline having one or more execution units for processing unaligned load instructions that executes in two phases to satisfy. A buffer storage element is provided having a plurality of entries for temporarily collecting partial writeback results retrieved from the memory that are associated with first phase accesses for each of a plurality of unaligned load instructions. An associated logic controller device tracks two parts of the unaligned load to be gathered at independent times, wherein said partial result stored at said buffer storage element comprises a first part of an unaligned load. The second phase load access for the same instruction is independently accessed and later merged with first part of the load data at byte granularity to satisfy the load.
    Type: Grant
    Filed: July 25, 2023
    Date of Patent: September 9, 2025
    Assignee: International Business Machines Corporation
    Inventors: Kimberly M. Fernsler, Bryan Lloyd, David A. Hrusecky, David Campbell