Patents Examined by Corey S Faherty
-
Patent number: 12106115Abstract: A computer-implemented method, system and computer program product for effectively searching for values in a multi-byte array of elements using an n-byte search instruction. Multiple values to be searched in an N-byte array of elements in a loop are received. The loop is optimized by searching the received search values at the starting address of the N-byte array of elements using the n-byte search instruction. A successful search is performed if the received return address points to an address found in the lowest n-bytes of the N-byte array of elements and an element of the address corresponds to a search value. Otherwise, a subsequent search for the search values at the address of the next element in the N-byte array of elements is performed if there are additional elements in the N-byte array of elements to be searched.Type: GrantFiled: January 26, 2023Date of Patent: October 1, 2024Assignee: International Business Machines CorporationInventor: Motohiro Kawahito
-
Patent number: 12106103Abstract: An information processing device that executes an arithmetic process includes a first processing circuit and a second processing circuit. The first processing circuit executes the arithmetic process N times consecutively. The second processing circuit executes the arithmetic process N times consecutively. N is an integer of 2 or more. The first processing circuit and the second processing circuit continue to operate according to a match between at least one result among the results of the N arithmetic processes executed by the first processing circuit and at least one result among the results of the N arithmetic processes executed by the second processing circuit. As a result, it is possible to suppress an increase in cost required for hardware and to suppress a temporary stop due to a temporary failure.Type: GrantFiled: April 7, 2020Date of Patent: October 1, 2024Assignee: OMRON CORPORATIONInventors: Daisuke Yagi, Toru Murata, Atsushi Kamimura, Yasuo Muneta
-
Patent number: 12106098Abstract: A semiconductor device including a first processor having a first register, the first processor configured to perform region of interest (ROI) calculations using the first register; and a second processor having a second register, the second processor configured to perform arithmetic calculations using the second register. The first register is shared with the second processor, and the second register is shared with the first processor.Type: GrantFiled: March 31, 2023Date of Patent: October 1, 2024Assignee: Samsung Electronics Co., Ltd.Inventors: Hyun Pil Kim, Hyun Woo Sim, Seong Woo Ahn
-
Patent number: 12099844Abstract: An exemplary branch predictor apparatus comprises a Pattern History Table (PHT) configured with a PHT allocation multiplexer/demultiplexer (PAMD) configurable to output a prediction logically selected from a portion of the PHT entries selectively allocated among a plurality of threads. The PHT entries may be allocated among a plurality of threads based on control bits read from a Control and Status Register (CSR) at system initialization. The branch predictor may govern a plurality of threads fetching instructions from an address selected from a Branch Target Buffer (BTB) entry indexed based on a per-thread Program Counter (PC) or a PHT entry indexed based on a per-thread Global History Register (GBHR). The PHT entries may be saturating binary counters. The saturating counters may be two-bit counters. An exemplary implementation may permit reduced misprediction rate, increased throughput, or reduced energy consumption resulting from increased allocation of PHT entries to more branch-intensive threads.Type: GrantFiled: May 30, 2022Date of Patent: September 24, 2024Assignee: Ceremorphic, Inc.Inventors: Somya Dashora, Kalash Bhavin Shah, Prakhar Kumar
-
Patent number: 12099462Abstract: Methods, systems, and apparatus, including medium-encoded computer program products, for implementing a dynamic processor architectures include, in one or more aspects of the subject matter described in this specification, an apparatus including: switches coupled with computing elements in a hardware processor to enable selective formation of one or more cores from the computing elements in the hardware processor; and means for dynamically determining how many of the one or more cores to form in the hardware processor, by provision of control signals to the switches, to execute instructions of one or more computer programs based on (i) a current set of the instructions to be executed and (ii) a current set of the computing elements available for processing instructions.Type: GrantFiled: December 8, 2023Date of Patent: September 24, 2024Assignee: Chariot Technologies Lab, Inc.Inventor: Timur Ryspekov
-
Patent number: 12099847Abstract: A data processing apparatus comprises: execution circuitry to execute instructions in order to perform data processing operations specified by those instructions; a plurality of registers to store data values for access by the execution circuitry when performing the data processing operations, each register having an associated physical register identifier; register rename circuitry to select physical register identifiers to associate with architectural register identifiers specified by the instructions; and rename storage having a plurality of entries, each entry being associated with one of the architectural register identifiers and used by the register rename circuitry to indicate a physical register identifier selected for association with that one of the architectural register identifiers; the register rename circuitry comprising an execute unit, and being responsive to detection of an early execute condition for a given instruction, the early execute condition requiring at least detection that each sourType: GrantFiled: January 26, 2023Date of Patent: September 24, 2024Assignee: Arm LimitedInventors: Quentin Éric Nouvel, Luca Nassi, Adrien Pesle
-
Patent number: 12086653Abstract: A processor is described. The processor includes model specific register space that is visible to software above a BIOS level. The model specific register space is to specify a granularity of a processing entity of a lock-step group. The processor also includes logic circuitry to support dynamic entry/exit of the lock-step group's processing entities to/from lock-step mode including: i) termination of lock-step execution by the processing entities before the program code to be executed in lock-step is fully executed; and, ii) as part of the exit from the lock-step mode, restoration of a state of a shadow processing entity of the processing entities as the state existed before the shadow processing entity entered the lock-step mode and began lock-step execution of the program code.Type: GrantFiled: December 24, 2020Date of Patent: September 10, 2024Assignee: Intel CorporationInventors: Vedvyas Shanbhogue, Jeff A. Huxel, Jeffrey G. Wiedemeier, James D. Allen, Arvind Raman, Krishnakumar Ganapathy
-
Patent number: 12086647Abstract: A method for dynamically generating and executing tasks can include includes executing a worker execution stream, where the worker execution stream includes multiple execution threads associated with a workflow of the workflow service, receiving, by the worker execution stream, from a workflow service, a definition of a task, and responsive to determining that the definition of the task satisfies a predefined criterion, dividing the task into a set of sub-tasks. The method further includes generating a definition of a sub-task workflow for the set of sub-tasks, and causing the workflow service to distribute, based on the definition of the sub-task workflow, the sub-tasks of the set to one or more workers for execution.Type: GrantFiled: December 16, 2022Date of Patent: September 10, 2024Assignee: ABBYY Development Inc.Inventors: Vladimir Demidov, Vladimir Bukin, Vladimir Yunev, Alexander Subbotin
-
Patent number: 12079631Abstract: One aspect provides a system for hardware-assisted pre-execution. During operation, the system determines a pre-execution code region comprising one or more instructions. The system increments a global counter upon initiating the one or more instructions. The system issues a first instruction, which involves setting, in a first entry for the first instruction in a data structure, a first prefetch region identifier with a current value of the global counter. Responsive to a head pointer of the data structure reaching the first entry, the system: determines, based on a non-zero value for the first prefetch region identifier, that the first entry is not available to be allocated; and advances the head pointer to a next entry in the data structure, which renders a load associated with the first entry as a non-blocking load. The system resets the global counter upon completing the one or more instructions.Type: GrantFiled: June 2, 2023Date of Patent: September 3, 2024Assignee: Hewlett Packard Enterprise Development LPInventor: Sanyam Mehta
-
Patent number: 12079627Abstract: A processor-implemented method for executing a hardware intrinsic programming instruction, includes performing one or more Boolean operations in combination with one or more permutation operations in response to the hardware intrinsic programming instruction being a single predicated compare-exchange-shuffle programming instruction. The method also includes outputting a sub-sorted list after the performing of the one or more Boolean operation in combination with the one or more permutation operation.Type: GrantFiled: March 23, 2023Date of Patent: September 3, 2024Assignee: QUALCOMM IncorporatedInventors: Himanshu Pradeep Aswani, Mithil Ramteke, Venkata Prema Sai Sravan Patchala, Sridhar Kandimalla
-
Patent number: 12073214Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand.Type: GrantFiled: September 23, 2022Date of Patent: August 27, 2024Assignee: Intel CorporationInventors: Jesus Corbal, Robert Valentine, Roman S. Dubtsov, Nikita A. Shustrov, Mark J. Charney, Dennis R. Bradford, Milind B. Girkar, Edward T. Grochowski, Thomas D. Fletcher, Warren E. Ferguson
-
Patent number: 12073231Abstract: A reconfigurable data processor includes an array of configurable units. The array includes a two or more sub-arrays of configurable units, and sub-arrays of configurable units in the plurality of sub-arrays of configurable units are configurable to separately execute different programs. The reconfigurable data processor also includes a force-quit controller connected to the array. The force-quit controller can stop execution of a particular program on a particular sub-array of configurable units and reset the particular sub-array of configurable units, while remaining sub-arrays of configurable units continue execution of their respective programs.Type: GrantFiled: October 26, 2022Date of Patent: August 27, 2024Assignee: SambaNova Systems, Inc.Inventor: Manish K. Shah
-
Patent number: 12050915Abstract: In an embodiment, a processor includes a fetch circuit to fetch instructions, the instructions including a code prefetch instruction; a decode circuit to decode the code prefetch instruction and provide the decoded code prefetch instruction to a memory circuit, the memory circuit to execute the decoded code prefetch instruction to prefetch a first set of code blocks into a first cache and to prefetch a second set of code blocks into a second cache. Other embodiments are described and claimed.Type: GrantFiled: December 22, 2020Date of Patent: July 30, 2024Assignee: Intel CorporationInventors: Wim Heirman, Stijn Eyerman, Ibrahim Hur
-
Patent number: 12045154Abstract: A technique for collecting state information of an apparatus comprising a processing pipeline for executing a sequence of instructions, and interesting instruction designation circuitry for identifying at least one of the instructions in the sequence as being an interesting instruction. Each interesting instruction is an instruction for which given state information of the apparatus associated with execution of that interesting instruction is to be collected. The interesting instruction designation circuitry is arranged, for each identified interesting instruction, to apply defined selection criteria to determine a further instruction later in the sequence of instructions than the interesting instruction, and to mark that further instruction as having a synchronous exception associated therewith. The processing pipeline is responsive to the further instruction, which causes the processing pipeline to execute a given exception handling routine in order to collect the given state information.Type: GrantFiled: May 13, 2021Date of Patent: July 23, 2024Assignee: Arm LimitedInventors: John Michael Horley, Michael John Williams, Mark Salling Rutland, Alasdair Grant
-
Patent number: 12045620Abstract: A data processing apparatus is provided that comprises rename circuitry for performing a register rename stage of a pipeline in respect of a stream of operations. Move elimination circuitry performs a move elimination operation on the stream of operations in which a move operation is eliminated and the register rename stage performs an adjustment of an identity of registers in the stream of operations to compensate for the move operation being eliminated and demotion circuitry reverses or inhibits the adjustment in response to one or more conditions being met.Type: GrantFiled: December 17, 2021Date of Patent: July 23, 2024Assignee: Arm LimitedInventors: Yasuo Ishii, Muhammad Umar Farooq, William Elton Burky, Michael Brian Schinzler, Jason Lee Setter, David Gum Lim
-
Patent number: 12045612Abstract: An efficient pipelined implementation of digital scaling, offset and aggregation operation supports element-by-element programmable scale and offset factors. The method includes time-multiplexed parallel pipelining of a plurality of digital data words, each of the plurality of digital data words encoding an N-bit signed integer, from one of a plurality of receive-registers through a datapath that can either (1) store the plurality of digital data words directly in a dedicated first memory, (2) store the plurality of digital data words directly in a dedicated second memory, or (3) direct the plurality of digital data words into a parallel set of fused-multiply-add units. The method further includes multiplying each digital data word by a corresponding data-word retrieved from the dedicated first memory to form product data words and adding the product data words to a corresponding data-word retrieved from the dedicated second memory to form an output sum-and-product data words.Type: GrantFiled: September 12, 2022Date of Patent: July 23, 2024Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Geoffrey Burr, Shubham Jain, Milos Stanisavljevic, Yasuteru Kohda
-
Patent number: 12045621Abstract: A method for improving an accuracy of a loop branch prediction algorithm by a bypass circuit, comprising: adding a bypass circuit to a loop branch prediction algorithm; and for three pcs entering a pipeline, enabling pc1 fetched in an if0 stage to enter a hybrid branch predictor, and registering pc1; obtaining branch prediction information in an if1 stage, and making a comparison in an if2 stage to obtain a prediction result, registering the prediction result obtained in the if2 stage, and processing pc2 and pc3 in a same way.Type: GrantFiled: February 28, 2023Date of Patent: July 23, 2024Assignees: Jiangsu Huachuang Microsystem Company Limited, Nanjing Research Institute of Electronics TechnologyInventors: Jiong Lou, Shiping Li, Sibo Yang, Ming Li, Wenjun Han, Zhiyong Lei
-
Patent number: 12039331Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.Type: GrantFiled: October 17, 2022Date of Patent: July 16, 2024Assignee: Intel CorporationInventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
-
Patent number: 12032959Abstract: Latch-based multiply-accumulate (MAC) operations implemented on the die of a non-volatile memory (NVM) array are disclosed. The exemplary latch-based MAC procedures described herein are linear procedures that do not require logic branches. In one example, the MAC operation uses a set of linear MAC stages, wherein each linear stage processes MAC operations corresponding to one bit of a first multi-bit multiplicand being multiplied against a second multi-bit multiplicand. Examples are provided wherein the MAC procedures are performed as part of a neural network feedforward procedure where the first multiplicand is a synaptic weight and the second multiplicand is an activation value. Multiple plane and multiple die NVM array implementations are also described for massive parallel processing.Type: GrantFiled: June 22, 2022Date of Patent: July 9, 2024Assignee: Western Digital Technologies, Inc.Inventors: Daniel Joseph Linnen, Ramanathan Muthiah, Kirubakaran Periyannan
-
Patent number: 12026543Abstract: The present invention discloses a cooperative computing device, wherein a task dispatching module receives a plurality of original image frames and dynamically dispatches the original image frames as a first amount of original image frames and a second amount of original image frames based on a loading result. A first computing module and a second computing module, which are of different types, respectively receive the first amount and the second amount of original image frames and respectively generate a first amount and a second amount of processed image frames. An image sorting module receives the first amount and the second amount of processed image frames, sorts and recovers the processed image frames based on a first timing sequence, and generating the loading result. The present invention also discloses a cooperative computing method which corresponds to the cooperative computing device.Type: GrantFiled: November 3, 2021Date of Patent: July 2, 2024Assignee: AVERMEDIA TECHNOLOGIES, INC.Inventor: Chao-Tung Hu