Patents Examined by Corey S Faherty

Multi-port register file for partial-sum accumulation

Patent number: 11635958

Abstract: Embodiments of the present disclosure provide a multi-port register file, including: a plurality of single-bit data registers for receiving and storing input data; a read path coupled to an output of each of the plurality of data registers; a plurality of AND gates, wherein an output of each of the plurality of data registers is coupled to an input of a respective AND gate of the plurality of AND gates; an input gating signal coupled to another input of each of the plurality of AND gates; a plurality of multi-bit registers, wherein an output of each of the plurality of AND gates is coupled to each of the plurality of multi-bit registers; and a write disable circuit coupled to the input gating signal for disabling a write signal applied to each of the plurality of multi-bit registers.

Type: Grant

Filed: January 3, 2022

Date of Patent: April 25, 2023

Assignee: GLOBALFOUNDRIES U.S. Inc.

Inventors: Vivek Raj, Gregory A. Northrop, Shashank Nemawarkar, Shivraj Gurpadappa Dharne
Circular buffer accessing device, system and method

Patent number: 11630671

Abstract: A device includes a circular buffer, which, in operation, is organized into a plurality of subsets of buffers, and control circuitry coupled to the circular buffer. The control circuitry, in operation, receives a memory load command to load a set of data into the circular buffer. The memory load command has an offset parameter indicating a data offset and a subset parameter indicating a subset of the plurality of subsets into which the circular buffer is organized. The control circuitry responds to the command by identifying a set of buffer addresses of the circular buffer based on a value of the offset parameter and a value of the subset parameter, and loading the set of data into the circular buffer using the identified set of buffer addresses.

Type: Grant

Filed: January 21, 2022

Date of Patent: April 18, 2023

Assignees: STMICROELECTRONICS (BEIJING) R&D CO., LTD., STMICROELECTRONICS S.r.l.

Inventors: Xiao Kang Jiao, Fabio Giuseppe De Ambroggi
Reduction of data cache access in a processing system

Patent number: 11620133

Abstract: Systems and methods for reusing load instructions by a processor without accessing a data cache include a load store execution unit (LSU) of the processor, the LSU being configured to determine if a prior execution of a first load instruction loaded data from a first cache line of the data cache and determine if a current execution of the second load instruction will load the data from the first cache line of the data cache. Further, the LSU also determines if a reuse of the data from the prior execution of the first load instruction for the current execution of the second load instruction will lead to functional errors. If there are no functional errors, the data from the prior execution of the first load instruction is reused for the current execution of the second load instruction, without accessing the data cache for the current execution of the second load instruction.

Type: Grant

Filed: March 28, 2019

Date of Patent: April 4, 2023

Assignee: Qualcomm Incorporated

Inventor: Vignyan Reddy Kothinti Naresh
Method performed by a microcontroller for managing a NOP instruction and corresponding microcontroller

Patent number: 11614938

Abstract: Disclosed herein is a method for managing of NOP instructions in a microcontroller, the method comprising duplicating all jump instructions causing a NOP instruction to form a new instruction set; inserting an internal NOP instruction into each of the jump instructions; when a jump instruction is executed, executing a subsequent instruction of the new instruction set; and executing the internal NOP instruction when an execution of the subsequent instruction is skipped.

Type: Grant

Filed: September 2, 2021

Date of Patent: March 28, 2023

Assignee: SK hynix Inc.

Inventors: Giulio Martinozzi, Federica Arosio, Lorenzo Di Lalla
Computing device and method

Patent number: 11609760

Abstract: The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.

Type: Grant

Filed: September 3, 2018

Date of Patent: March 21, 2023

Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD

Inventors: Yao Zhang, Bingrui Wang
Inline data inspection for workload simplification

Patent number: 11609761

Abstract: A method, computer readable medium, and processor are described herein for inline data inspection by using a decoder to decode a load instruction, including a signal to cause a circuit in a processor to indicate whether data loaded by a load instruction exceeds a threshold value. Moreover, an indication of whether data loaded by a load instruction exceeds a threshold value may be stored.

Type: Grant

Filed: December 9, 2019

Date of Patent: March 21, 2023

Assignee: NVIDIA CORPORATION

Inventors: Jeffrey Michael Pool, Andrew Kerr, John Tran, Ming Y. Siu, Stuart Oberman
Performance scaling for binary translation

Patent number: 11609763

Abstract: Embodiments relate to improving user experiences when executing binary code that has been translated from other binary code. Binary code (instructions) for a source instruction set architecture (ISA) cannot natively execute on a processor that implements a target ISA. The instructions in the source ISA are binary-translated to instructions in the target ISA and are executed on the processor. The overhead of performing binary translation and/or the overhead of executing binary-translated code are compensated for by increasing the speed at which the translated code is executed, relative to non-translated code. Translated code may be executed on hardware that has one or more power-performance parameters of the processor set to increase the performance of the processor with respect to the translated code. The increase in power-performance for translated code may be proportional to the degree of translation overhead.

Type: Grant

Filed: October 25, 2021

Date of Patent: March 21, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Hee Jun Park, Mehmet Iyigun
Systems and methods for systolic array design from a high-level program

Patent number: 11604758

Abstract: Systems and methods for automated systolic array design from a high-level program are disclosed. One implementation of a systolic array design supporting a convolutional neural network includes a two-dimensional array of reconfigurable processing elements arranged in rows and columns. Each processing element has an associated SIMD vector and is connected through a local connection to at least one other processing element. An input feature map buffer having a double buffer is configured to store input feature maps, and an interconnect system is configured to pass data to neighboring processing elements in accordance with a processing element scheduler. A CNN computation is mapped onto the two-dimensional array of reconfigurable processing elements using an automated system configured to determine suitable reconfigurable processing element parameters.

Type: Grant

Filed: November 12, 2020

Date of Patent: March 14, 2023

Assignee: Xilinx, Inc.

Inventors: Peng Zhang, Cody Hao Yu, Xuechao Wei, Peichen Pan
System for cross-routed communication between functional units of multiple processing units

Patent number: 11604752

Abstract: A data processing system comprising a plurality of processing units. Each processing unit comprises a set of plural functional units and an internal communications network that routes communications between the functional units in a particular sequence order of the functional units. Each processing unit is connected to at least one other processing unit via a communications bridge that has at least two connections, a first connection that routes communications between a first pair of network nodes of the pair of processing units, and a separate, second connection that routes communications between a second, different pair of network nodes of the pair of processing units. Each connected pair of network nodes comprises network nodes having different positions in the internal communications network sequence order of the network nodes and/or network nodes associated with functional units of different types.

Type: Grant

Filed: January 29, 2021

Date of Patent: March 14, 2023

Assignee: Arm Limited

Inventors: Akshay Vijayashekar, Jussi Tuomas Pennala, Sebastian Marc Blasius
Techniques for efficiently transferring data to a processor

Patent number: 11604649

Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

Type: Grant

Filed: June 30, 2021

Date of Patent: March 14, 2023

Assignee: NVIDIA Corporation

Inventors: Andrew Kerr, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz
Communication in a computer having multiple processors

Patent number: 11599363

Abstract: A computer comprising a plurality of processors, each of which are configured to perform operations on data during a compute phase for the computer and, following a pre-compiled synchronisation barrier, exchange data with at least one other of the processors during an exchange phase for the computer, wherein of the processors in the computer is indexed and the data exchange operations carried out by each processor in the exchange phase depend upon its index value.

Type: Grant

Filed: April 6, 2020

Date of Patent: March 7, 2023

Assignee: GRAPHCORE LIMITED

Inventors: Richard Osborne, Matthew Fyles
Widening memory access to an aligned address for unaligned memory operations

Patent number: 11593113

Abstract: Unaligned atomic memory operations on a processor using a load-store instruction set architecture (ISA) that requires aligned accesses are performed by widening the memory access to an aligned address by the next larger power of two (e.g., 4-byte access is widened to 8 bytes, and 8-byte access is widened to 16 bytes). Data processing operations supported by the load-store ISA including shift, rotate, and bitfield manipulation are utilized to modify only the bytes in the original unaligned address so that the atomic memory operations are aligned to the widened access address. The aligned atomic memory operations using the widened accesses avoid the faulting exceptions associated with unaligned access for most 4-byte and 8-byte accesses. Exception handling is performed in cases in which memory access spans a 16-byte boundary.

Type: Grant

Filed: October 4, 2021

Date of Patent: February 28, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Darek Mihocka, Arun Upadhyaya Kishan, Pedro Miguel Sequeira De Justo Teixeira
Synchronization amongst processor tiles

Patent number: 11586483

Abstract: A processing system comprising an arrangement of tiles and an interconnect between the tiles. The interconnect comprises synchronization logic for coordinating a barrier synchronization to be performed between a group of the tiles. The instruction set comprises a synchronization instruction taking an operand which selects one of a plurality of available modes each specifying a different membership of the group. Execution of the synchronization instruction cause a synchronization request to be transmitted from the respective tile to the synchronization logic, and instruction issue to be suspended on the respective tile pending a synchronization acknowledgement being received back from the synchronization logic. In response to receiving the synchronization request from all the tiles in the group as specified by the operand of the synchronization instruction, the synchronization logic returns the synchronization acknowledgment to the tiles in the specified group.

Type: Grant

Filed: May 14, 2021

Date of Patent: February 21, 2023

Assignee: GRAPHCORE LIMITED

Inventors: Daniel John Pelham Wilkinson, Simon Christian Knowles, Matthew David Fyles, Alan Graham Alexander, Stephen Felix
Processing pipeline with first and second processing modes having different performance or energy consumption characteristics

Patent number: 11579879

Abstract: An apparatus 2 has a processing pipeline 4 supporting at least a first processing mode and a second processing mode with different energy consumption or performance characteristics. A storage structure 22, 30, 36, 50, 40, 64, 44 is accessible in both the first and second processing modes. When the second processing mode is selected, control circuitry 70 triggers a subset 102 of the entries of the storage structure to be placed in a power saving state.

Type: Grant

Filed: April 7, 2021

Date of Patent: February 14, 2023

Assignee: ARM LIMITED

Inventors: Max John Batley, Simon John Craske, Ian Michael Caulfield, Peter Richard Greenhalgh, Allan John Skillman, Antony John Penton
Control barrier network for reconfigurable data processors

Patent number: 11580056

Abstract: A processing system comprises a control bus and a plurality of logic units. The control bus is configurable by configuration data to form signal routes in a control barrier network coupled to processing units in an array of processing units. The plurality of logic units has inputs and outputs connected to the control bus and to the array of processing units. A logic unit in the plurality of logic units is operatively coupled to a processing unit in the array of processing units and is configurable by the configuration data to consume source tokens and a status signal from the processing unit on the inputs and to produce barrier tokens and an enable signal on the outputs based on the source tokens and the status signal on the inputs.

Type: Grant

Filed: October 1, 2021

Date of Patent: February 14, 2023

Assignee: SambaNova Systems, Inc.

Inventors: Raghu Prabhakar, Manish K. Shah, Ram Sivaramakrishnan, Pramod Nataraja, David Brian Jackson, Gregory Frederick Grohoski
Built-in self-test for a programmable vision accelerator of a system on a chip

Patent number: 11573921

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

Type: Grant

Filed: August 2, 2021

Date of Patent: February 7, 2023

Assignee: NVIDIA Corporation

Inventors: Ahmad Itani, Yen-Te Shih, Jagadeesh Sankaran, Ravi P Singh, Ching-Yu Hung
Conditional branching control for a multi-threaded, self-scheduling reconfigurable computing fabric

Patent number: 11573796

Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an interconnection network; a processor; and a plurality of configurable circuit clusters. Each configurable circuit cluster includes a plurality of configurable circuits arranged in an array; a synchronous network coupled to each configurable circuit of the array; and an asynchronous packet network coupled to each configurable circuit of the array.

Type: Grant

Filed: August 11, 2021

Date of Patent: February 7, 2023

Assignee: Micron Technology, Inc.

Inventor: Tony M. Brewer
Dynamic generation of logic for computing systems

Patent number: 11567775

Abstract: Some embodiments provide a non-transitory machine-readable medium that stores a program. The program observes a parameter associated with a computing system. Upon receiving a change associated with the parameter, the program further determines a routine definition from a set of routine definitions associated with the parameter. Each routine definition in the set of routine definitions specifies a set of instructions associated with a particular parameter associated with the computing system. The program also executes the set of instructions specified in the determined routine definition.

Type: Grant

Filed: October 25, 2021

Date of Patent: January 31, 2023

Assignee: SAP SE

Inventors: Debashis Banerjee, Paresh Rathod, Kavitha Krishnan, Prateek Agarwal, Hemanth Basrur
Branch density detection for prefetcher

Patent number: 11567776

Abstract: In one embodiment, a microprocessor, comprising: first logic configured to dynamically adjust a maximum prefetch count based on a total count of predicted taken branches over a predetermined quantity of cache lines; and second logic configured to prefetch instructions based on the adjusted maximum prefetch count.

Type: Grant

Filed: November 3, 2020

Date of Patent: January 31, 2023

Assignee: CENTAUR TECHNOLOGY, INC.

Inventors: Thomas C. McDonald, Brent Bean
Vector SIMD VLIW data path architecture

Patent number: 11556338

Abstract: A Very Long Instruction Word (VLIW) digital signal processor particularly adapted for single instruction multiple data (SIMD) operation on various operand widths and data sizes. A vector compare instruction compares first and second operands and stores compare bits. A companion vector conditional instruction performs conditional operations based upon the state of a corresponding predicate data register bit. A predicate unit performs data processing operations on data in at least one predicate data register including unary operations and binary operations. The predicate unit may also transfer data between a general data register file and the predicate data register file.

Type: Grant

Filed: April 20, 2020

Date of Patent: January 17, 2023

Assignee: Texas Instruments Incorporated

Inventors: Timothy David Anderson, Duc Quang Bui, Mujibur Rahman, Joseph Raymond Michael Zbiciak, Eric Biscondi, Peter Dent, Jelena Milanovic, Ashish Shrivastava

prev 1 2 3 4 5 6 7 8 9 … next