Patents Examined by Corey S Faherty
  • Patent number: 11635958
    Abstract: Embodiments of the present disclosure provide a multi-port register file, including: a plurality of single-bit data registers for receiving and storing input data; a read path coupled to an output of each of the plurality of data registers; a plurality of AND gates, wherein an output of each of the plurality of data registers is coupled to an input of a respective AND gate of the plurality of AND gates; an input gating signal coupled to another input of each of the plurality of AND gates; a plurality of multi-bit registers, wherein an output of each of the plurality of AND gates is coupled to each of the plurality of multi-bit registers; and a write disable circuit coupled to the input gating signal for disabling a write signal applied to each of the plurality of multi-bit registers.
    Type: Grant
    Filed: January 3, 2022
    Date of Patent: April 25, 2023
    Assignee: GLOBALFOUNDRIES U.S. Inc.
    Inventors: Vivek Raj, Gregory A. Northrop, Shashank Nemawarkar, Shivraj Gurpadappa Dharne
  • Patent number: 11630671
    Abstract: A device includes a circular buffer, which, in operation, is organized into a plurality of subsets of buffers, and control circuitry coupled to the circular buffer. The control circuitry, in operation, receives a memory load command to load a set of data into the circular buffer. The memory load command has an offset parameter indicating a data offset and a subset parameter indicating a subset of the plurality of subsets into which the circular buffer is organized. The control circuitry responds to the command by identifying a set of buffer addresses of the circular buffer based on a value of the offset parameter and a value of the subset parameter, and loading the set of data into the circular buffer using the identified set of buffer addresses.
    Type: Grant
    Filed: January 21, 2022
    Date of Patent: April 18, 2023
    Assignees: STMICROELECTRONICS (BEIJING) R&D CO., LTD., STMICROELECTRONICS S.r.l.
    Inventors: Xiao Kang Jiao, Fabio Giuseppe De Ambroggi
  • Patent number: 11620133
    Abstract: Systems and methods for reusing load instructions by a processor without accessing a data cache include a load store execution unit (LSU) of the processor, the LSU being configured to determine if a prior execution of a first load instruction loaded data from a first cache line of the data cache and determine if a current execution of the second load instruction will load the data from the first cache line of the data cache. Further, the LSU also determines if a reuse of the data from the prior execution of the first load instruction for the current execution of the second load instruction will lead to functional errors. If there are no functional errors, the data from the prior execution of the first load instruction is reused for the current execution of the second load instruction, without accessing the data cache for the current execution of the second load instruction.
    Type: Grant
    Filed: March 28, 2019
    Date of Patent: April 4, 2023
    Assignee: Qualcomm Incorporated
    Inventor: Vignyan Reddy Kothinti Naresh
  • Patent number: 11614938
    Abstract: Disclosed herein is a method for managing of NOP instructions in a microcontroller, the method comprising duplicating all jump instructions causing a NOP instruction to form a new instruction set; inserting an internal NOP instruction into each of the jump instructions; when a jump instruction is executed, executing a subsequent instruction of the new instruction set; and executing the internal NOP instruction when an execution of the subsequent instruction is skipped.
    Type: Grant
    Filed: September 2, 2021
    Date of Patent: March 28, 2023
    Assignee: SK hynix Inc.
    Inventors: Giulio Martinozzi, Federica Arosio, Lorenzo Di Lalla
  • Patent number: 11609760
    Abstract: The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.
    Type: Grant
    Filed: September 3, 2018
    Date of Patent: March 21, 2023
    Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD
    Inventors: Yao Zhang, Bingrui Wang
  • Patent number: 11609761
    Abstract: A method, computer readable medium, and processor are described herein for inline data inspection by using a decoder to decode a load instruction, including a signal to cause a circuit in a processor to indicate whether data loaded by a load instruction exceeds a threshold value. Moreover, an indication of whether data loaded by a load instruction exceeds a threshold value may be stored.
    Type: Grant
    Filed: December 9, 2019
    Date of Patent: March 21, 2023
    Assignee: NVIDIA CORPORATION
    Inventors: Jeffrey Michael Pool, Andrew Kerr, John Tran, Ming Y. Siu, Stuart Oberman
  • Patent number: 11609763
    Abstract: Embodiments relate to improving user experiences when executing binary code that has been translated from other binary code. Binary code (instructions) for a source instruction set architecture (ISA) cannot natively execute on a processor that implements a target ISA. The instructions in the source ISA are binary-translated to instructions in the target ISA and are executed on the processor. The overhead of performing binary translation and/or the overhead of executing binary-translated code are compensated for by increasing the speed at which the translated code is executed, relative to non-translated code. Translated code may be executed on hardware that has one or more power-performance parameters of the processor set to increase the performance of the processor with respect to the translated code. The increase in power-performance for translated code may be proportional to the degree of translation overhead.
    Type: Grant
    Filed: October 25, 2021
    Date of Patent: March 21, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Hee Jun Park, Mehmet Iyigun
  • Patent number: 11604758
    Abstract: Systems and methods for automated systolic array design from a high-level program are disclosed. One implementation of a systolic array design supporting a convolutional neural network includes a two-dimensional array of reconfigurable processing elements arranged in rows and columns. Each processing element has an associated SIMD vector and is connected through a local connection to at least one other processing element. An input feature map buffer having a double buffer is configured to store input feature maps, and an interconnect system is configured to pass data to neighboring processing elements in accordance with a processing element scheduler. A CNN computation is mapped onto the two-dimensional array of reconfigurable processing elements using an automated system configured to determine suitable reconfigurable processing element parameters.
    Type: Grant
    Filed: November 12, 2020
    Date of Patent: March 14, 2023
    Assignee: Xilinx, Inc.
    Inventors: Peng Zhang, Cody Hao Yu, Xuechao Wei, Peichen Pan
  • Patent number: 11604752
    Abstract: A data processing system comprising a plurality of processing units. Each processing unit comprises a set of plural functional units and an internal communications network that routes communications between the functional units in a particular sequence order of the functional units. Each processing unit is connected to at least one other processing unit via a communications bridge that has at least two connections, a first connection that routes communications between a first pair of network nodes of the pair of processing units, and a separate, second connection that routes communications between a second, different pair of network nodes of the pair of processing units. Each connected pair of network nodes comprises network nodes having different positions in the internal communications network sequence order of the network nodes and/or network nodes associated with functional units of different types.
    Type: Grant
    Filed: January 29, 2021
    Date of Patent: March 14, 2023
    Assignee: Arm Limited
    Inventors: Akshay Vijayashekar, Jussi Tuomas Pennala, Sebastian Marc Blasius
  • Patent number: 11604649
    Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.
    Type: Grant
    Filed: June 30, 2021
    Date of Patent: March 14, 2023
    Assignee: NVIDIA Corporation
    Inventors: Andrew Kerr, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz
  • Patent number: 11599363
    Abstract: A computer comprising a plurality of processors, each of which are configured to perform operations on data during a compute phase for the computer and, following a pre-compiled synchronisation barrier, exchange data with at least one other of the processors during an exchange phase for the computer, wherein of the processors in the computer is indexed and the data exchange operations carried out by each processor in the exchange phase depend upon its index value.
    Type: Grant
    Filed: April 6, 2020
    Date of Patent: March 7, 2023
    Assignee: GRAPHCORE LIMITED
    Inventors: Richard Osborne, Matthew Fyles
  • Patent number: 11593113
    Abstract: Unaligned atomic memory operations on a processor using a load-store instruction set architecture (ISA) that requires aligned accesses are performed by widening the memory access to an aligned address by the next larger power of two (e.g., 4-byte access is widened to 8 bytes, and 8-byte access is widened to 16 bytes). Data processing operations supported by the load-store ISA including shift, rotate, and bitfield manipulation are utilized to modify only the bytes in the original unaligned address so that the atomic memory operations are aligned to the widened access address. The aligned atomic memory operations using the widened accesses avoid the faulting exceptions associated with unaligned access for most 4-byte and 8-byte accesses. Exception handling is performed in cases in which memory access spans a 16-byte boundary.
    Type: Grant
    Filed: October 4, 2021
    Date of Patent: February 28, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Darek Mihocka, Arun Upadhyaya Kishan, Pedro Miguel Sequeira De Justo Teixeira
  • Patent number: 11586483
    Abstract: A processing system comprising an arrangement of tiles and an interconnect between the tiles. The interconnect comprises synchronization logic for coordinating a barrier synchronization to be performed between a group of the tiles. The instruction set comprises a synchronization instruction taking an operand which selects one of a plurality of available modes each specifying a different membership of the group. Execution of the synchronization instruction cause a synchronization request to be transmitted from the respective tile to the synchronization logic, and instruction issue to be suspended on the respective tile pending a synchronization acknowledgement being received back from the synchronization logic. In response to receiving the synchronization request from all the tiles in the group as specified by the operand of the synchronization instruction, the synchronization logic returns the synchronization acknowledgment to the tiles in the specified group.
    Type: Grant
    Filed: May 14, 2021
    Date of Patent: February 21, 2023
    Assignee: GRAPHCORE LIMITED
    Inventors: Daniel John Pelham Wilkinson, Simon Christian Knowles, Matthew David Fyles, Alan Graham Alexander, Stephen Felix
  • Patent number: 11579879
    Abstract: An apparatus 2 has a processing pipeline 4 supporting at least a first processing mode and a second processing mode with different energy consumption or performance characteristics. A storage structure 22, 30, 36, 50, 40, 64, 44 is accessible in both the first and second processing modes. When the second processing mode is selected, control circuitry 70 triggers a subset 102 of the entries of the storage structure to be placed in a power saving state.
    Type: Grant
    Filed: April 7, 2021
    Date of Patent: February 14, 2023
    Assignee: ARM LIMITED
    Inventors: Max John Batley, Simon John Craske, Ian Michael Caulfield, Peter Richard Greenhalgh, Allan John Skillman, Antony John Penton
  • Patent number: 11580056
    Abstract: A processing system comprises a control bus and a plurality of logic units. The control bus is configurable by configuration data to form signal routes in a control barrier network coupled to processing units in an array of processing units. The plurality of logic units has inputs and outputs connected to the control bus and to the array of processing units. A logic unit in the plurality of logic units is operatively coupled to a processing unit in the array of processing units and is configurable by the configuration data to consume source tokens and a status signal from the processing unit on the inputs and to produce barrier tokens and an enable signal on the outputs based on the source tokens and the status signal on the inputs.
    Type: Grant
    Filed: October 1, 2021
    Date of Patent: February 14, 2023
    Assignee: SambaNova Systems, Inc.
    Inventors: Raghu Prabhakar, Manish K. Shah, Ram Sivaramakrishnan, Pramod Nataraja, David Brian Jackson, Gregory Frederick Grohoski
  • Patent number: 11573921
    Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
    Type: Grant
    Filed: August 2, 2021
    Date of Patent: February 7, 2023
    Assignee: NVIDIA Corporation
    Inventors: Ahmad Itani, Yen-Te Shih, Jagadeesh Sankaran, Ravi P Singh, Ching-Yu Hung
  • Patent number: 11573796
    Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an interconnection network; a processor; and a plurality of configurable circuit clusters. Each configurable circuit cluster includes a plurality of configurable circuits arranged in an array; a synchronous network coupled to each configurable circuit of the array; and an asynchronous packet network coupled to each configurable circuit of the array.
    Type: Grant
    Filed: August 11, 2021
    Date of Patent: February 7, 2023
    Assignee: Micron Technology, Inc.
    Inventor: Tony M. Brewer
  • Patent number: 11567775
    Abstract: Some embodiments provide a non-transitory machine-readable medium that stores a program. The program observes a parameter associated with a computing system. Upon receiving a change associated with the parameter, the program further determines a routine definition from a set of routine definitions associated with the parameter. Each routine definition in the set of routine definitions specifies a set of instructions associated with a particular parameter associated with the computing system. The program also executes the set of instructions specified in the determined routine definition.
    Type: Grant
    Filed: October 25, 2021
    Date of Patent: January 31, 2023
    Assignee: SAP SE
    Inventors: Debashis Banerjee, Paresh Rathod, Kavitha Krishnan, Prateek Agarwal, Hemanth Basrur
  • Patent number: 11567776
    Abstract: In one embodiment, a microprocessor, comprising: first logic configured to dynamically adjust a maximum prefetch count based on a total count of predicted taken branches over a predetermined quantity of cache lines; and second logic configured to prefetch instructions based on the adjusted maximum prefetch count.
    Type: Grant
    Filed: November 3, 2020
    Date of Patent: January 31, 2023
    Assignee: CENTAUR TECHNOLOGY, INC.
    Inventors: Thomas C. McDonald, Brent Bean
  • Patent number: 11556338
    Abstract: A Very Long Instruction Word (VLIW) digital signal processor particularly adapted for single instruction multiple data (SIMD) operation on various operand widths and data sizes. A vector compare instruction compares first and second operands and stores compare bits. A companion vector conditional instruction performs conditional operations based upon the state of a corresponding predicate data register bit. A predicate unit performs data processing operations on data in at least one predicate data register including unary operations and binary operations. The predicate unit may also transfer data between a general data register file and the predicate data register file.
    Type: Grant
    Filed: April 20, 2020
    Date of Patent: January 17, 2023
    Assignee: Texas Instruments Incorporated
    Inventors: Timothy David Anderson, Duc Quang Bui, Mujibur Rahman, Joseph Raymond Michael Zbiciak, Eric Biscondi, Peter Dent, Jelena Milanovic, Ashish Shrivastava