Patents Examined by Shawn Doman
  • Patent number: 11842423
    Abstract: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides for data aware sparsity via compressed bitstreams. One embodiment provides for block sparse dot product instructions. One embodiment provides for a depth-wise adapter for a systolic array.
    Type: Grant
    Filed: December 15, 2020
    Date of Patent: December 12, 2023
    Assignee: Intel Corporation
    Inventors: Abhishek Appu, Subramaniam Maiyuran, Mike Macpherson, Fangwen Fu, Jiasheng Chen, Varghese George, Vasanth Ranganathan, Ashutosh Garg, Joydeep Ray
  • Patent number: 11816487
    Abstract: An instruction conversion device, an instruction conversion method, an instruction conversion system, and a processor are provided. The instruction conversion device includes a monitor for determining whether a ready-for-execution instruction is an instruction that belongs to a new instruction set or an extended instruction set, wherein the new instruction set and the extended instruction set have the same type of the instruction set architecture as that of the processor. If the ready-for-execution instruction is determined as an extended instruction, this extended instruction is converted into a converted instruction sequence by means of the conversion system, this converted instruction sequence is then sent to the processor for executions, thereby extending the lifespans of the electronic devices embodied with old-version processors.
    Type: Grant
    Filed: September 10, 2021
    Date of Patent: November 14, 2023
    Assignee: Shanghai Zhaoxin Semiconductor Co., Ltd.
    Inventors: Weilin Wang, Yingbing Guan, Mengchen Yang
  • Patent number: 11816488
    Abstract: There is provided methods and devices for dynamically simplifying processor instructions. A method includes receiving, at a computing device, processor instructions and determining, by the computing device, if instruction simplification is enabled for an instruction being processed. The method further includes determining, by the computing device, from an instruction simplification table if the instruction is capable of being simplified and scheduling, by the computing device, a simplified instruction based on the determination from the instruction simplification table. A device includes a processor, and a non-transient computer readable memory having stored thereon instructions which when executed by the processor configure the device to execute the methods disclosed herein.
    Type: Grant
    Filed: November 10, 2021
    Date of Patent: November 14, 2023
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Henry Fangli Kao, Shehab Yomn Abdellatif Elsayed, Tomasz Sebastian Czajkowski, Reza Azimi, Ehsan Amiri
  • Patent number: 11803377
    Abstract: A computer comprising one or more processors offering vector instructions may implement a direct convolution on a source data set. The source data set may be one-dimensional or multi-dimensional. For a given vector width, w, of the vector instructions, w consecutive data elements of the output data set are computed in parallel using vector instructions. For multi-dimensional data sets, multiple vectors of the output data set are computed for a single load of a set of vectors from the source data set. New vector instructions are disclosed to improve the performance of the convolution and to enable full utilization of the arithmetic logic units within the one or more processors.
    Type: Grant
    Filed: March 30, 2018
    Date of Patent: October 31, 2023
    Assignee: Oracle International Corporation
    Inventors: Jeffrey R. Diamond, Avadh P. Patel
  • Patent number: 11803386
    Abstract: A branch prediction system includes a neuron cache and logic coupled to the neuron cache. The neuron cache includes one or more weights of a neural network model trained for one or more selected code sections, and the logic is to be used with the neuron cache to predict a target address for a branch instruction of the one or more selected code sections.
    Type: Grant
    Filed: September 16, 2021
    Date of Patent: October 31, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Satish Kumar Sadasivam, Shruti Saxena, Puneeth A. H. Bhat
  • Patent number: 11783169
    Abstract: Systems, apparatus, and methods for thread-based scheduling within a multicore processor. Neural networking uses a network of connected nodes (aka neurons) to loosely model the neuro-biological functionality found in the human brain. Various embodiments of the present disclosure use thread dependency graphs analysis to decouple scheduling across many distributed cores. Rather than using thread dependency graphs to generate a sequential ordering for a centralized scheduler, the individual thread dependencies define a count value for each thread at compile-time. Threads and their thread dependency count are distributed to each core at run-time. Thereafter, each core can dynamically determine which threads to execute based on fulfilled thread dependencies without requiring a centralized scheduler.
    Type: Grant
    Filed: January 2, 2023
    Date of Patent: October 10, 2023
    Assignee: Femtosense, Inc.
    Inventors: Sam Brian Fok, Alexander Smith Neckar
  • Patent number: 11782710
    Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an asynchronous packet network having a plurality of data transmission lines forming a data path transmitting operand data; a synchronous mesh communication network; a plurality of configurable circuits arranged in an array, each configurable circuit of the plurality of configurable circuits coupled to the asynchronous packet network and to the synchronous mesh communication network, each configurable circuit of the plurality of configurable circuits adapted to perform a plurality of computations; each configurable circuit of the plurality of configurable circuits comprising: a memory storing operand data; and an execution or write mask generator adapted to generate an execution mask or a write mask identifying valid bits or bytes transmitted on the data path or stored in the memory for a current or next computation.
    Type: Grant
    Filed: September 13, 2021
    Date of Patent: October 10, 2023
    Assignee: Micron Technology, Inc.
    Inventor: Tony M. Brewer
  • Patent number: 11782725
    Abstract: A reconfigurable compute fabric can include multiple nodes, and each node can include multiple tiles with respective processing and storage elements. The tiles can be arranged in an array or grid and can be communicatively coupled. In an example, a first node can include a tile cluster of N memory-compute tiles, and the N memory-compute tiles can be coupled using a first portion of a synchronous compute fabric. Operations performed by the respective processing and storage elements of the N memory-compute tiles can be selectively enabled or disabled based on information in a mask field of data propagated through the first portion of the synchronous compute fabric.
    Type: Grant
    Filed: August 16, 2021
    Date of Patent: October 10, 2023
    Assignee: Micron Technology, Inc.
    Inventors: Bryan Hornung, Skyler Arron Windh
  • Patent number: 11775810
    Abstract: Systems, apparatus, and methods for thread-based scheduling within a multicore processor. Neural networking uses a network of connected nodes (aka neurons) to loosely model the neuro-biological functionality found in the human brain. Various embodiments of the present disclosure use thread dependency graphs analysis to decouple scheduling across many distributed cores. Rather than using thread dependency graphs to generate a sequential ordering for a centralized scheduler, the individual thread dependencies define a count value for each thread at compile-time. Threads and their thread dependency count are distributed to each core at run-time. Thereafter, each core can dynamically determine which threads to execute based on fulfilled thread dependencies without requiring a centralized scheduler.
    Type: Grant
    Filed: January 2, 2023
    Date of Patent: October 3, 2023
    Assignee: Femtosense, Inc.
    Inventors: Sam Brian Fok, Alexander Smith Neckar
  • Patent number: 11762660
    Abstract: A unified queue configured to perform decoupled prediction and fetch operations, and related apparatuses, systems, methods, and computer-readable media, is disclosed. The unified queue has a plurality of entries, where each entry is configured to store information associated with at least one instruction, and where the information comprises an identifier portion, a prediction information portion, and a tag information portion. The unified queue is configured to update the prediction information portion of each entry responsive to a prediction block, and to update the tag information portion of each entry responsive to a tag and TLB block. The prediction information may be updated more than once, and the unified queue is configured to take corrective action where a later prediction conflicts with an earlier prediction.
    Type: Grant
    Filed: June 23, 2020
    Date of Patent: September 19, 2023
    Assignee: Ampere Computing LLC
    Inventors: Brett Alan Ireland, Michael Stephen Chin, Stephan Jean Jourdan
  • Patent number: 11755327
    Abstract: Delivering immediate values by using program counter (PC)-relative load instructions to fetch literal data in processor-based devices is disclosed. In this regard, a processing element (PE) of a processor-based device provides an execution pipeline circuit that comprises an instruction processing portion and a data access portion. Using a literal data access logic circuit, the PE detects a PC-relative load instruction within a fetch window that includes multiple fetched instructions. The PE determines that the PC-relative load instruction can be serviced using literal data that is available to the instruction processing portion of the execution pipeline circuit (e.g., located within the fetch window containing the PC-relative load instruction, or stored in a literal pool buffer), The PE then retrieves the literal data within the instruction processing portion of the execution pipeline circuit, and executes the PC-relative load instruction using the literal data.
    Type: Grant
    Filed: March 2, 2020
    Date of Patent: September 12, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Melinda Joyce Brown, Michael Scott Mcilvaine
  • Patent number: 11755331
    Abstract: A processor includes a processing pipeline, a plurality of result-storage elements, and writeback logic. The processing pipeline is configured to process program operations and to write, to a result storage, up to a predefined maximal number of results of the processed program operations per clock cycle. The result-storage elements are configured to store respective ones of the results. The writeback logic is configured to (i) detect a writeback conflict event in which the processing pipeline produces simultaneous results that exceed the predefined maximal number of results, for writing to the result storage, in a same clock cycle, (ii) in response to detecting the writeback conflict event, to temporarily store at least a given result, from among the simultaneous results, in a given result-storage element, and (iii) to subsequently write the temporarily-stored given result from the given result-storage element to the result storage.
    Type: Grant
    Filed: July 11, 2021
    Date of Patent: September 12, 2023
    Assignee: APPLE INC.
    Inventors: Skanda K Srinivasa, Christopher S Thomas
  • Patent number: 11755325
    Abstract: A computer system, processor, and method for processing information is disclosed that includes at least one computer processor; a main register file associated with the at least one processor, the main register file having a plurality of entries for storing data, one or more write ports to write data to the main register file entries, and one or more read ports to read data from the main register file entries; one or more execution units including a dense math execution unit; and at least one accumulator register file having a plurality of entries for storing data. The results of the dense math execution unit in an aspect are written to the accumulator register file, preferably to the same accumulator register file entry multiple times, and the data from the accumulator register file is written to the main register file.
    Type: Grant
    Filed: August 27, 2021
    Date of Patent: September 12, 2023
    Assignee: International Business Machines Corporation
    Inventors: Brian W. Thompto, Maarten J. Boersma, Andreas Wagner, Jose E. Moreira, Hung Q. Le, Silvia Melitta Mueller, Dung Q. Nguyen
  • Patent number: 11755484
    Abstract: Apparatus and methods are disclosed for throttling processor operation in block-based processor architectures. In one example of the disclosed technology, a block-based instruction set architecture processor includes a plurality of processing cores configured to fetch and execute a sequence of instruction blocks. Each of the processing cores includes function resources for performing operations specified by the instruction blocks. The processor further includes a core scheduler configured to allocate functional resources for performing the operations. The functional resources are allocated for executing the instruction blocks based, at least in part, on a performance metric. The performance metric can be generated dynamically or statically based on branch prediction accuracy, energy usage tolerance, and other suitable metrics.
    Type: Grant
    Filed: June 26, 2015
    Date of Patent: September 12, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jan S. Gray, Douglas C. Burger, Aaron L. Smith
  • Patent number: 11748109
    Abstract: A system and corresponding method enforce strong load ordering in a processor. The system comprises an ordering ring that stores entries corresponding to in-flight memory instructions associated with a program order, scanning logic, and recovery logic. The scanning logic scans the ordering ring in response to execution or completion of a given load instruction of the in-flight memory instructions and detects an ordering violation in an event at least one entry of the entries indicates that a younger load instruction has completed and is associated with an invalidated cache line. In response to the ordering violation, the recovery logic allows the given load instruction to complete, flushes the younger load instruction, and restarts execution of the processor after the given load instruction in the program order, causing data returned by the given and younger load instructions to be returned consistent with execution according to the program order to satisfy strong load ordering.
    Type: Grant
    Filed: December 2, 2022
    Date of Patent: September 5, 2023
    Assignee: Marvell Asia Pte, Ltd.
    Inventors: David A. Carlson, Shubhendu S. Mukherjee, Wilson P. Snyder, II
  • Patent number: 11741349
    Abstract: When performing a matrix-vector multiply operation for neural network processing, a set of one or more input vectors to be multiplied by a matrix of data values is scanned to identify data positions of the input vector(s) for which the data value is non-zero in at least one of the input vectors. For each of the data positions identified as having a non-zero value in at least one of the input vectors, the set of data values from the matrix of data values for that data position is fetched from memory and the matrix-vector multiply operation is performed using the data values for the input vectors for the data positions identified as being non-zero and the fetched set(s) of data values from the matrix of data values for those data position(s).
    Type: Grant
    Filed: October 31, 2019
    Date of Patent: August 29, 2023
    Assignee: Arm Limited
    Inventors: Rune Holm, John Wakefield Brothers, III
  • Patent number: 11740906
    Abstract: A method and hardware system to remove the overhead caused by having stream handling instructions in nested loops. Where code contains inner loops, nested in outer loops, a compiler pass identifies qualified nested streams and generates ISA specific instructions for transferring stream information linking an inner loop stream with an outer loop stream, to hardware components of a co-designed prefetcher. The hardware components include a frontend able to decode and execute instructions for a stream linking information transfer mechanism, a stream engine unit with a streams configuration table (SCT) having a field for allowing a subordinate stream to stay pending for values from its master stream, and a stream prefetch manager with buffers for storing values of current elements of a master stream, and with a nested streams control unit for reconfiguring and iterating the streams.
    Type: Grant
    Filed: February 22, 2022
    Date of Patent: August 29, 2023
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Maziar Goudarzi, Zhizhao Qian, Reza Azimi, Billy Mengxuan Cai, Man Pok Ho
  • Patent number: 11740908
    Abstract: In a particular implementation, a method includes: receiving, at a computing device, first and second instructions of a plurality of instructions obtained from a memory, where the first instruction corresponds to a preceding instruction of a second instruction, and where the second instruction corresponds to a succeeding instruction of the first instruction; determining a dependency of the first and second instructions; sending the first and second instructions to an issue queue of the computing device; executing, at the computing device, the first and second instructions; and completing, at the computing device, the first and second instructions.
    Type: Grant
    Filed: December 21, 2020
    Date of Patent: August 29, 2023
    Assignee: Arm Limited
    Inventors: Wei Wang, Thomas Edward Shull
  • Patent number: 11740907
    Abstract: In a particular implementation, a method includes: receiving, at a central processing unit (CPU), first and second instructions of a plurality of instructions obtained from a memory, where the first instruction corresponds to a preceding instruction of a second instruction, and where the second instruction corresponds to a succeeding instruction of the first instruction; determining a dependency of the first and second instructions; sending the first and second instructions to an issue queue of the CPU; executing, at the CPU, the first and second instructions; and completing, at the CPU, the first and second instructions.
    Type: Grant
    Filed: March 16, 2020
    Date of Patent: August 29, 2023
    Assignee: Arm Limited
    Inventor: Thomas Edward Shull
  • Patent number: 11709796
    Abstract: Various examples are directed to systems and methods in which a first flow controller of a first synchronous flow may receive an instruction to execute a first loop using the first synchronous flow. The first flow controller may determine a first iteration index for a first iteration of the first loop. The first flow controller may send, to a first compute element of the first synchronous flow, a first synchronous message to initiate a first synchronous flow thread for executing the first iteration of the first loop. The first synchronous message may comprise the iteration index. The first compute element may execute an input/output operation at a first location of a first compute element memory indicated by the first iteration index.
    Type: Grant
    Filed: August 16, 2021
    Date of Patent: July 25, 2023
    Assignee: Micron Technology, Inc.
    Inventors: Bryan Hornung, Douglas Vanesko