Patents Examined by Shawn Doman

Dot product operations on sparse matrix elements

Patent number: 11842423

Abstract: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides for data aware sparsity via compressed bitstreams. One embodiment provides for block sparse dot product instructions. One embodiment provides for a depth-wise adapter for a systolic array.

Type: Grant

Filed: December 15, 2020

Date of Patent: December 12, 2023

Assignee: Intel Corporation

Inventors: Abhishek Appu, Subramaniam Maiyuran, Mike Macpherson, Fangwen Fu, Jiasheng Chen, Varghese George, Vasanth Ranganathan, Ashutosh Garg, Joydeep Ray
Method of converting extended instructions based on an emulation flag and retirement of corresponding microinstructions, device and system using the same

Patent number: 11816487

Abstract: An instruction conversion device, an instruction conversion method, an instruction conversion system, and a processor are provided. The instruction conversion device includes a monitor for determining whether a ready-for-execution instruction is an instruction that belongs to a new instruction set or an extended instruction set, wherein the new instruction set and the extended instruction set have the same type of the instruction set architecture as that of the processor. If the ready-for-execution instruction is determined as an extended instruction, this extended instruction is converted into a converted instruction sequence by means of the conversion system, this converted instruction sequence is then sent to the processor for executions, thereby extending the lifespans of the electronic devices embodied with old-version processors.

Type: Grant

Filed: September 10, 2021

Date of Patent: November 14, 2023

Assignee: Shanghai Zhaoxin Semiconductor Co., Ltd.

Inventors: Weilin Wang, Yingbing Guan, Mengchen Yang
Method and apparatus for dynamically simplifying processor instructions

Patent number: 11816488

Abstract: There is provided methods and devices for dynamically simplifying processor instructions. A method includes receiving, at a computing device, processor instructions and determining, by the computing device, if instruction simplification is enabled for an instruction being processed. The method further includes determining, by the computing device, from an instruction simplification table if the instruction is capable of being simplified and scheduling, by the computing device, a simplified instruction based on the determination from the instruction simplification table. A device includes a processor, and a non-transient computer readable memory having stored thereon instructions which when executed by the processor configure the device to execute the methods disclosed herein.

Type: Grant

Filed: November 10, 2021

Date of Patent: November 14, 2023

Assignee: HUAWEI TECHNOLOGIES CO., LTD.

Inventors: Henry Fangli Kao, Shehab Yomn Abdellatif Elsayed, Tomasz Sebastian Czajkowski, Reza Azimi, Ehsan Amiri
Efficient direct convolution using SIMD instructions

Patent number: 11803377

Abstract: A computer comprising one or more processors offering vector instructions may implement a direct convolution on a source data set. The source data set may be one-dimensional or multi-dimensional. For a given vector width, w, of the vector instructions, w consecutive data elements of the output data set are computed in parallel using vector instructions. For multi-dimensional data sets, multiple vectors of the output data set are computed for a single load of a set of vectors from the source data set. New vector instructions are disclosed to improve the performance of the convolution and to enable full utilization of the arithmetic logic units within the one or more processors.

Type: Grant

Filed: March 30, 2018

Date of Patent: October 31, 2023

Assignee: Oracle International Corporation

Inventors: Jeffrey R. Diamond, Avadh P. Patel
Neuron cache-based hardware branch prediction

Patent number: 11803386

Abstract: A branch prediction system includes a neuron cache and logic coupled to the neuron cache. The neuron cache includes one or more weights of a neural network model trained for one or more selected code sections, and the logic is to be used with the neuron cache to predict a target address for a branch instruction of the one or more selected code sections.

Type: Grant

Filed: September 16, 2021

Date of Patent: October 31, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Satish Kumar Sadasivam, Shruti Saxena, Puneeth A. H. Bhat
Methods and apparatus for thread-based scheduling in multicore neural networks

Patent number: 11783169

Abstract: Systems, apparatus, and methods for thread-based scheduling within a multicore processor. Neural networking uses a network of connected nodes (aka neurons) to loosely model the neuro-biological functionality found in the human brain. Various embodiments of the present disclosure use thread dependency graphs analysis to decouple scheduling across many distributed cores. Rather than using thread dependency graphs to generate a sequential ordering for a centralized scheduler, the individual thread dependencies define a count value for each thread at compile-time. Threads and their thread dependency count are distributed to each core at run-time. Thereafter, each core can dynamically determine which threads to execute based on fulfilled thread dependencies without requiring a centralized scheduler.

Type: Grant

Filed: January 2, 2023

Date of Patent: October 10, 2023

Assignee: Femtosense, Inc.

Inventors: Sam Brian Fok, Alexander Smith Neckar
Execution or write mask generation for data selection in a multi-threaded, self-scheduling reconfigurable computing fabric

Patent number: 11782710

Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an asynchronous packet network having a plurality of data transmission lines forming a data path transmitting operand data; a synchronous mesh communication network; a plurality of configurable circuits arranged in an array, each configurable circuit of the plurality of configurable circuits coupled to the asynchronous packet network and to the synchronous mesh communication network, each configurable circuit of the plurality of configurable circuits adapted to perform a plurality of computations; each configurable circuit of the plurality of configurable circuits comprising: a memory storing operand data; and an execution or write mask generator adapted to generate an execution mask or a write mask identifying valid bits or bytes transmitted on the data path or stored in the memory for a current or next computation.

Type: Grant

Filed: September 13, 2021

Date of Patent: October 10, 2023

Assignee: Micron Technology, Inc.

Inventor: Tony M. Brewer
Mask field propagation among memory-compute tiles in a reconfigurable architecture

Patent number: 11782725

Abstract: A reconfigurable compute fabric can include multiple nodes, and each node can include multiple tiles with respective processing and storage elements. The tiles can be arranged in an array or grid and can be communicatively coupled. In an example, a first node can include a tile cluster of N memory-compute tiles, and the N memory-compute tiles can be coupled using a first portion of a synchronous compute fabric. Operations performed by the respective processing and storage elements of the N memory-compute tiles can be selectively enabled or disabled based on information in a mask field of data propagated through the first portion of the synchronous compute fabric.

Type: Grant

Filed: August 16, 2021

Date of Patent: October 10, 2023

Assignee: Micron Technology, Inc.

Inventors: Bryan Hornung, Skyler Arron Windh
Methods and apparatus for thread-based scheduling in multicore neural networks

Patent number: 11775810

Abstract: Systems, apparatus, and methods for thread-based scheduling within a multicore processor. Neural networking uses a network of connected nodes (aka neurons) to loosely model the neuro-biological functionality found in the human brain. Various embodiments of the present disclosure use thread dependency graphs analysis to decouple scheduling across many distributed cores. Rather than using thread dependency graphs to generate a sequential ordering for a centralized scheduler, the individual thread dependencies define a count value for each thread at compile-time. Threads and their thread dependency count are distributed to each core at run-time. Thereafter, each core can dynamically determine which threads to execute based on fulfilled thread dependencies without requiring a centralized scheduler.

Type: Grant

Filed: January 2, 2023

Date of Patent: October 3, 2023

Assignee: Femtosense, Inc.

Inventors: Sam Brian Fok, Alexander Smith Neckar
Virtual 3-way decoupled prediction and fetch

Patent number: 11762660

Abstract: A unified queue configured to perform decoupled prediction and fetch operations, and related apparatuses, systems, methods, and computer-readable media, is disclosed. The unified queue has a plurality of entries, where each entry is configured to store information associated with at least one instruction, and where the information comprises an identifier portion, a prediction information portion, and a tag information portion. The unified queue is configured to update the prediction information portion of each entry responsive to a prediction block, and to update the tag information portion of each entry responsive to a tag and TLB block. The prediction information may be updated more than once, and the unified queue is configured to take corrective action where a later prediction conflicts with an earlier prediction.

Type: Grant

Filed: June 23, 2020

Date of Patent: September 19, 2023

Assignee: Ampere Computing LLC

Inventors: Brett Alan Ireland, Michael Stephen Chin, Stephan Jean Jourdan
Delivering immediate values by using program counter (PC)-relative load instructions to fetch literal data in processor-based devices

Patent number: 11755327

Abstract: Delivering immediate values by using program counter (PC)-relative load instructions to fetch literal data in processor-based devices is disclosed. In this regard, a processing element (PE) of a processor-based device provides an execution pipeline circuit that comprises an instruction processing portion and a data access portion. Using a literal data access logic circuit, the PE detects a PC-relative load instruction within a fetch window that includes multiple fetched instructions. The PE determines that the PC-relative load instruction can be serviced using literal data that is available to the instruction processing portion of the execution pipeline circuit (e.g., located within the fetch window containing the PC-relative load instruction, or stored in a literal pool buffer), The PE then retrieves the literal data within the instruction processing portion of the execution pipeline circuit, and executes the PC-relative load instruction using the literal data.

Type: Grant

Filed: March 2, 2020

Date of Patent: September 12, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Melinda Joyce Brown, Michael Scott Mcilvaine
Writeback hazard elimination using a plurality of temporary result-storage elements

Patent number: 11755331

Abstract: A processor includes a processing pipeline, a plurality of result-storage elements, and writeback logic. The processing pipeline is configured to process program operations and to write, to a result storage, up to a predefined maximal number of results of the processed program operations per clock cycle. The result-storage elements are configured to store respective ones of the results. The writeback logic is configured to (i) detect a writeback conflict event in which the processing pipeline produces simultaneous results that exceed the predefined maximal number of results, for writing to the result storage, in a same clock cycle, (ii) in response to detecting the writeback conflict event, to temporarily store at least a given result, from among the simultaneous results, in a given result-storage element, and (iii) to subsequently write the temporarily-stored given result from the given result-storage element to the result storage.

Type: Grant

Filed: July 11, 2021

Date of Patent: September 12, 2023

Assignee: APPLE INC.

Inventors: Skanda K Srinivasa, Christopher S Thomas
Instruction handling for accumulation of register results in a microprocessor

Patent number: 11755325

Abstract: A computer system, processor, and method for processing information is disclosed that includes at least one computer processor; a main register file associated with the at least one processor, the main register file having a plurality of entries for storing data, one or more write ports to write data to the main register file entries, and one or more read ports to read data from the main register file entries; one or more execution units including a dense math execution unit; and at least one accumulator register file having a plurality of entries for storing data. The results of the dense math execution unit in an aspect are written to the accumulator register file, preferably to the same accumulator register file entry multiple times, and the data from the accumulator register file is written to the main register file.

Type: Grant

Filed: August 27, 2021

Date of Patent: September 12, 2023

Assignee: International Business Machines Corporation

Inventors: Brian W. Thompto, Maarten J. Boersma, Andreas Wagner, Jose E. Moreira, Hung Q. Le, Silvia Melitta Mueller, Dung Q. Nguyen
Instruction block allocation

Patent number: 11755484

Abstract: Apparatus and methods are disclosed for throttling processor operation in block-based processor architectures. In one example of the disclosed technology, a block-based instruction set architecture processor includes a plurality of processing cores configured to fetch and execute a sequence of instruction blocks. Each of the processing cores includes function resources for performing operations specified by the instruction blocks. The processor further includes a core scheduler configured to allocate functional resources for performing the operations. The functional resources are allocated for executing the instruction blocks based, at least in part, on a performance metric. The performance metric can be generated dynamically or statically based on branch prediction accuracy, energy usage tolerance, and other suitable metrics.

Type: Grant

Filed: June 26, 2015

Date of Patent: September 12, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jan S. Gray, Douglas C. Burger, Aaron L. Smith
System and method for implementing strong load ordering in a processor using a circular ordering ring

Patent number: 11748109

Abstract: A system and corresponding method enforce strong load ordering in a processor. The system comprises an ordering ring that stores entries corresponding to in-flight memory instructions associated with a program order, scanning logic, and recovery logic. The scanning logic scans the ordering ring in response to execution or completion of a given load instruction of the in-flight memory instructions and detects an ordering violation in an event at least one entry of the entries indicates that a younger load instruction has completed and is associated with an invalidated cache line. In response to the ordering violation, the recovery logic allows the given load instruction to complete, flushes the younger load instruction, and restarts execution of the processor after the given load instruction in the program order, causing data returned by the given and younger load instructions to be returned consistent with execution according to the program order to satisfy strong load ordering.

Type: Grant

Filed: December 2, 2022

Date of Patent: September 5, 2023

Assignee: Marvell Asia Pte, Ltd.

Inventors: David A. Carlson, Shubhendu S. Mukherjee, Wilson P. Snyder, II
Performing matrix-vector multiply operations for neural networks on electronic devices

Patent number: 11741349

Abstract: When performing a matrix-vector multiply operation for neural network processing, a set of one or more input vectors to be multiplied by a matrix of data values is scanned to identify data positions of the input vector(s) for which the data value is non-zero in at least one of the input vectors. For each of the data positions identified as having a non-zero value in at least one of the input vectors, the set of data values from the matrix of data values for that data position is fetched from memory and the matrix-vector multiply operation is performed using the data values for the input vectors for the data positions identified as being non-zero and the fetched set(s) of data values from the matrix of data values for those data position(s).

Type: Grant

Filed: October 31, 2019

Date of Patent: August 29, 2023

Assignee: Arm Limited

Inventors: Rune Holm, John Wakefield Brothers, III
Methods and systems for nested stream prefetching for general purpose central processing units

Patent number: 11740906

Abstract: A method and hardware system to remove the overhead caused by having stream handling instructions in nested loops. Where code contains inner loops, nested in outer loops, a compiler pass identifies qualified nested streams and generates ISA specific instructions for transferring stream information linking an inner loop stream with an outer loop stream, to hardware components of a co-designed prefetcher. The hardware components include a frontend able to decode and execute instructions for a stream linking information transfer mechanism, a stream engine unit with a streams configuration table (SCT) having a field for allowing a subordinate stream to stay pending for values from its master stream, and a stream prefetch manager with buffers for storing values of current elements of a master stream, and with a nested streams control unit for reconfiguring and iterating the streams.

Type: Grant

Filed: February 22, 2022

Date of Patent: August 29, 2023

Assignee: HUAWEI TECHNOLOGIES CO., LTD.

Inventors: Maziar Goudarzi, Zhizhao Qian, Reza Azimi, Billy Mengxuan Cai, Man Pok Ho
Systems and methods for defining a dependency of preceding and succeeding instructions

Patent number: 11740908

Abstract: In a particular implementation, a method includes: receiving, at a computing device, first and second instructions of a plurality of instructions obtained from a memory, where the first instruction corresponds to a preceding instruction of a second instruction, and where the second instruction corresponds to a succeeding instruction of the first instruction; determining a dependency of the first and second instructions; sending the first and second instructions to an issue queue of the computing device; executing, at the computing device, the first and second instructions; and completing, at the computing device, the first and second instructions.

Type: Grant

Filed: December 21, 2020

Date of Patent: August 29, 2023

Assignee: Arm Limited

Inventors: Wei Wang, Thomas Edward Shull
Systems and methods for determining a dependency of instructions

Patent number: 11740907

Abstract: In a particular implementation, a method includes: receiving, at a central processing unit (CPU), first and second instructions of a plurality of instructions obtained from a memory, where the first instruction corresponds to a preceding instruction of a second instruction, and where the second instruction corresponds to a succeeding instruction of the first instruction; determining a dependency of the first and second instructions; sending the first and second instructions to an issue queue of the CPU; executing, at the CPU, the first and second instructions; and completing, at the CPU, the first and second instructions.

Type: Grant

Filed: March 16, 2020

Date of Patent: August 29, 2023

Assignee: Arm Limited

Inventor: Thomas Edward Shull
Data input/output operations during loop execution in a reconfigurable compute fabric

Patent number: 11709796

Abstract: Various examples are directed to systems and methods in which a first flow controller of a first synchronous flow may receive an instruction to execute a first loop using the first synchronous flow. The first flow controller may determine a first iteration index for a first iteration of the first loop. The first flow controller may send, to a first compute element of the first synchronous flow, a first synchronous message to initiate a first synchronous flow thread for executing the first iteration of the first loop. The first synchronous message may comprise the iteration index. The first compute element may execute an input/output operation at a first location of a first compute element memory indicated by the first iteration index.

Type: Grant

Filed: August 16, 2021

Date of Patent: July 25, 2023

Assignee: Micron Technology, Inc.

Inventors: Bryan Hornung, Douglas Vanesko

prev 1 2 3 4 5 6 … next