Patents Examined by Corey S Faherty
-
Patent number: 12650841Abstract: Systems, methods, and apparatuses for implementing capability informed prefetches are described.Type: GrantFiled: April 2, 2022Date of Patent: June 9, 2026Assignee: Intel CorporationInventor: Scott D. Constable
-
Patent number: 12645454Abstract: Techniques for shared data prefetch are described. An exemplary instruction for shared data prefetch includes at least one field for an opcode, at least one field for a source operand to provide a memory address at least a byte of data, wherein the opcode is to indicate that circuitry is to fetch of a line of data from memory at the provided address that contains the byte specified with the source operand and store that byte in at least a cache local to a requester, wherein the byte of data is to be stored in a shared state.Type: GrantFiled: September 25, 2021Date of Patent: June 2, 2026Assignee: Intel CorporationInventors: Christopher Hughes, Zhe Wang, Dan Baum, Alexander Heinecke, Evangelos Georganas, Lingxiang Xiang, Joseph Nuzman, Ritu Gupta
-
Patent number: 12639256Abstract: Embodiments herein describe a hardware accelerator with an array of data processing engines (DPEs) which includes a controller (e.g., a microcontroller) for multiple columns of the array. The controllers can be hardened circuitry that executes software code (or firmware) that controls the hardware accelerator. In one embodiment, the task of the controller is to control and orchestrate the functions performed by the hardware accelerator.Type: GrantFiled: July 23, 2024Date of Patent: May 26, 2026Assignee: XILINX, INC.Inventors: Juan J. Noguera Serra, David Patrick Clarke, Javier Cabezas Rodriguez, Mikhail Asiatici, Patrick Schlangen
-
Patent number: 12632259Abstract: Techniques for using soft-barrier hints are described. An example includes a synchronous microthreading (SyMT) co-processor coupled to a logical processor to execute a plurality of microthreads, with each microthread having an independent register state, upon an execution of an instruction to enter into SyMT mode, wherein the SyMT co-processor is further to support a soft-barrier hint instruction in code which when processed by a microthread is to pause execution of the microthread to be resumed based at least in part on a data structure having at least one entry, the entry to include an instruction pointer of the soft-barrier hint instruction and a count of microthreads that have encountered the soft-barrier hint instruction at the instruction pointer.Type: GrantFiled: April 2, 2022Date of Patent: May 19, 2026Assignee: Intel CorporationInventors: Shreesha Srinath, Jonathan Pearce, David B. Sheffield, Ching-Kai Liang, Jeffrey Cook
-
Patent number: 12625706Abstract: Embodiments of this application disclose an instruction translation method. The method includes: obtaining a return instruction of a function call instruction; obtaining a first address mapping result based on a second address indicated in the return instruction; storing the first address mapping result in a running stack space; and obtaining a first translation result of the return instruction, where the first translation result is a binary translation result of the return instruction, and the second translation result indicates to obtain, from a target location, an instruction indicated by the first address mapping result and execute the instruction. In this application, a running stack space of a source program is reused, thereby saving a storage space. In addition, an address of a return instruction does not need to be checked each time the return instruction is translated, thereby reducing overheads during translation and increasing program running efficiency.Type: GrantFiled: September 26, 2024Date of Patent: May 12, 2026Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Xianzhe Liu, Jianjiang Zeng, Yandong Lv
-
Patent number: 12613702Abstract: A system for Multi-Party Computation (MPC) includes a general-purpose programming language, an MPC processor, an MPC Instruction Set Architecture (ISA), and a compiler. The general-purpose programming language is used for writing an MPC application. The MPC processor executes the MPC application. The MPC ISA corresponds to the MPC processor. The compiler generates an intermediate representation for the MPC application and generates machine code for the intermediate representation by mapping and assembling MPC ISA instructions.Type: GrantFiled: August 21, 2024Date of Patent: April 28, 2026Assignee: Regents of the University of MinnesotaInventors: John Mario Sartori, Shashank Ganapathi Hegde
-
Patent number: 12608202Abstract: Detailed herein are examples of instructions and their hardware support for floating-point comparison that makes use of the distinction between signed integer comparison and unsigned integer comparison to make an analogous distinction between floating-point relationships including unordered and those that do not. These instructions may reduce the number of instructions required to compare and conditionally execute operations in a program, including instructions to load values and instructions to explicitly test for the unordered condition.Type: GrantFiled: July 1, 2023Date of Patent: April 21, 2026Assignee: Intel CorporationInventors: John Morgan, Deepti Aggarwal, Michael Espig, H. Peter Anvin
-
Patent number: 12591430Abstract: A system for efficient element-wise and cross-vector maximum operations. One example system includes an input bus, an output bus, and a memory configured to store N/K elements of an N-element vector in a corresponding row of the memory. A K-wide FMAX( ) comparator has a first set of K-wide inputs of the K-wide FMAX( ) comparator coupled to a read port of the memory and a second set of K-wide inputs of the K-wide FMAX( ) comparator coupled to the input bus, and a set of K-wide outputs of the K-wide FMAX( ) comparator coupled to a write port of the memory. A tree of FMAX( ) comparators comprises an input and an output, the input of the tree coupled to the set of K-wide outputs of the K-wide FMAX( ) comparator and the output of the tree coupled to the output bus.Type: GrantFiled: June 1, 2023Date of Patent: March 31, 2026Assignee: International Business Machines CorporationInventors: Geoffrey Burr, Shubham Jain, Yasuteru Kohda
-
Patent number: 12585600Abstract: A system receives, by a network interface card (NIC), inputs including an instruction to read or write a payload of a message, a tracker state indicating a round of processing, and a datatype descriptor defining organization of the message payload. The system identifies a current context and a processing state for the instruction. If the datatype descriptor indicates a first type, the system: obtains the current context associated with the first type from a host memory or a cache of the NIC; and creates direct memory access (DMA) instructions corresponding to the received instruction by executing operations in a nested loop. If the datatype descriptor indicates a second type, the system: obtains the current context associated with the second type by fetching vector entries from a buffer of the NIC; and creates the DMA instructions corresponding to the received instruction based on addresses and lengths in the vector entries.Type: GrantFiled: October 24, 2024Date of Patent: March 24, 2026Assignee: Hewlett Packard Enterprise Development LPInventor: Christopher M. Brueggen
-
Patent number: 12578992Abstract: Systems, apparatuses, and methods for implementing a hierarchical scheduler. In various implementations, a processor includes a global scheduler, and a plurality of independent local schedulers with each of the local schedulers coupled to a plurality of processors. In one implementation, the processor is a graphics processing unit and the processors are computation units. The processor further includes a shared cache that is shared by the plurality of local schedulers. Each of the local schedulers also includes a local cache used by the local scheduler and processors coupled to the local scheduler. To schedule work items for execution, the global scheduler is configured to store one or more work items in the shared cache and convey an indication to a first local scheduler of the plurality of local schedulers which causes the first local scheduler to retrieve the one or more work items from the shared cache.Type: GrantFiled: September 29, 2022Date of Patent: March 17, 2026Assignee: Advanced Micro Devices, Inc.Inventors: Matthäus G. Chajdas, Michael J. Mantor, Rex Eldon McCrary, Christopher J. Brennan, Robert Martin, Dominik Baumeister, Fabian Robert Sebastian Wildgrube
-
Patent number: 12561279Abstract: Embodiments are directed to a deterministic streaming system with one or more deterministic streaming processors each having an array of processing elements and a first deterministic memory coupled to the processing elements. The deterministic streaming system further includes a second deterministic memory with multiple data banks having a global memory address space, and a controller. The controller initiates retrieval of first data from the data banks of the second deterministic memory as a first plurality of streams, each stream of the first plurality of streams streaming toward a respective group of processing elements of the array of processing elements. The controller further initiates writing of second data to the data banks of the second deterministic memory as a second plurality of streams, each stream of the second plurality of streams streaming from the respective group of processing elements toward a respective data bank of the second deterministic memory.Type: GrantFiled: June 3, 2024Date of Patent: February 24, 2026Assignee: Groq, Inc.Inventor: Dennis Charles Abts
-
Patent number: 12561141Abstract: An electronic, digital data processor is adapted dynamically to develop for selected instructions received for execution in a pipelined execution unit a predetermined transform of the selected instructions. Depending on predetermined static or dynamic conditions that may exist in the pipelined execution unit, execution of the transform may be allowed or suppressed. The transform may be a fusion, a fracture or a binary transformation. A method is also disclosed for implementing this selective suppression of developed transforms.Type: GrantFiled: April 15, 2024Date of Patent: February 24, 2026Assignee: Condor Computing CorporationInventor: Jeffrey L Nye
-
Patent number: 12561145Abstract: An apparatus comprises decoding circuitry configured to decode instructions; processing circuitry configured to perform data processing operations in response to the instructions decoded by the decoding circuitry; extension processing circuitry configured to perform a low-precision computation extension task asynchronously with respect to other data processing operations performed by the processing circuitry, the low-precision computation extension task comprising processing one or more sets of data elements for which at least one of the sets of data elements comprises low-precision data represented in a low-precision number format with lower precision than a single-precision floating-point format; and an extension task offload interface separate from an interface by which the processing circuitry issues a memory system request to a memory system, wherein the extension task offload interface is responsive to at least one task offloading instruction decoded by the decoding circuitry to offload the low-precisioType: GrantFiled: October 1, 2024Date of Patent: February 24, 2026Assignee: Arm LimitedInventors: Mbou Eyole, Mariam Rakka
-
Patent number: 12561144Abstract: Circuitry and methods for implementing conditional fence instructions are described. In certain examples, a hardware processor (e.g., core) includes a branch predictor to predict one of a taken path and a not taken path for a conditional branch instruction; decoder circuitry to decode an instruction into a decoded instruction, the instruction comprising a field that indicates a condition to be set by execution of another instruction, and an opcode that indicates execution circuitry is to, in response to the condition being satisfied, implement an execution fence to delay execution of the instruction until prior instructions in program order execute and delay execution of instructions after the instruction in program order until the instruction executes; and the execution circuitry to execute the decoded instruction according to the opcode.Type: GrantFiled: September 27, 2024Date of Patent: February 24, 2026Assignee: Intel CorporationInventors: Fangfei Liu, Scott Constable, Thomas Unterluggauer, Joseph Nuzman, Carlos Rozas
-
Patent number: 12554504Abstract: Apparatus and methods for dependency tracking, chaining, and/or fusing for vector instructions. A system, processor, or integrated circuit includes a renamer to generate a valid bit mask for each micro-operation decoded from a first vector instruction, where the valid bit mask indicates what portion of a mask register to write and generate a dependency bit mask for each micro-operation decoded from a second vector instruction, where the dependency bit mask is based on a relationship between the first vector instruction and the second vector instruction, and an issue queue configured to issue for execution each micro-operation from the second vector instruction when an associated dependency bit mask is cleared based on execution of appropriate micro-operations from the first vector instruction.Type: GrantFiled: April 26, 2023Date of Patent: February 17, 2026Assignee: SiFive, Inc.Inventors: Bradley Gene Burgess, David Kravitz
-
Patent number: 12554551Abstract: A method may include creating an association identifier based on an association between a computational device function and a compute engine of a computational device, and invoking an execute command to perform an execution of the computational device function using the compute engine, wherein the execute command uses the association identifier. The compute engine may be a first compute engine, and the association may be further between the computational device function and a second compute engine of the computational device. The execute command may perform an execution of the computational device function using the second compute engine. The execution of the computational device function using the first compute engine and the execution of the computational device function using the second compute engine may overlap. The execute command may include the association identifier. The creating the association identifier may include invoking a create association command.Type: GrantFiled: May 20, 2024Date of Patent: February 17, 2026Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: William Martin, Oscar P. Pinto
-
Patent number: 12547454Abstract: Disclosed herein is a graph streaming processing system comprising a thread scheduler comprising a first component and a second component. The first component is configured to schedule a first set of threads of a first node to a first processor associated with the first node and initialize status of a completion pointer to an initial value. The completion pointer is associated with a command buffer of the first node. The first component is configured to detect the execution of the first set of threads and generation of a data unit and update the status of the completion pointer to an updated value indicating execution of the first set of threads in response to the generation of the data unit. The second component is configured to schedule a second set of threads of a plurality of second nodes to a second processor based on the status of the completion pointer.Type: GrantFiled: July 1, 2023Date of Patent: February 10, 2026Assignee: Blaize, Inc.Inventors: Venkata Ganapathi Puppala, Kota Vamsi Darsi, Matthew Fortune
-
Patent number: 12547406Abstract: A digital data processor includes a multi-stage butterfly network, which is configured to, in response to a look up table read instruction, receive look up table data from an intermediate register, reorder the look up table data based on control signals comprising look up table configuration register data, and write the reordered look up table data to a destination register specified by the look up table read instruction.Type: GrantFiled: September 29, 2023Date of Patent: February 10, 2026Assignee: Texas Instruments IncorporatedInventors: Naveen Bhoria, Duc Bui, Dheera Balasubramanian Samudrala, Rama Venkatasubramanian
-
Patent number: 12547402Abstract: An example method for distributing packet flow by an electronic device including a plurality of processor cores may include receiving a plurality of packet flows for validation; validating the plurality of packet flows by a processor core(s) of the plurality of processor cores based on a validation parameter(s); sequentially distributing the plurality of validated packet flows among the plurality of processor cores for processing based on a combination of a try-lock and a ticket-lock or a core parameter; and transmitting the plurality of distributed packet flows to an electronic entity by the processor core of the plurality of processor cores.Type: GrantFiled: November 8, 2023Date of Patent: February 10, 2026Assignee: Samsung Electronics Co., Ltd.Inventors: Nayan Ostwal, Eunchul Jang, Srihari Das Sunkada Gopinath, Kusung Lim
-
Patent number: 12530192Abstract: Deployments of microservices executing in a cloud are automatically managed. Some microservices are deployed on dedicated nodes, others in serverless configurations. Rates of invocation and runtime data of microservices are monitored. Responsive to the monitored rate of invocation of a microservice running serverless exceeding a given threshold, the microservice is automatically redeployed to a dedicated node. A microservice executing on a dedicated node may be redeployed serverless if the infrequency with which it is called is sufficient. Microservices can be automatically redeployed between different dedicated nodes with different capacities based on monitored usage. The underlying cloud service provider may be automatically monitored for changes in serverless support functionality. Responsive to these changes, the thresholds at which microservices are redeployed can be automatically adjusted.Type: GrantFiled: January 8, 2024Date of Patent: January 20, 2026Assignee: Crowd Strike, Inc.Inventors: Akshay Dongaonkar, Prashant Pathak, Sourabh Satish