Patents Examined by Shawn Doman
  • Patent number: 11556339
    Abstract: Systems and methods related to implementing vector registers in memory. A memory system for implementing vector registers in memory can include an array of memory cells, where a plurality of rows in the array serve as a plurality of vector registers as defined by an instruction set architecture. The memory system for implementing vector registers in memory can also include a processing resource configured to, responsive to receiving a command to perform a particular vector operation on a particular vector register, access a particular row of the array serving as the particular register to perform the vector operation.
    Type: Grant
    Filed: November 9, 2021
    Date of Patent: January 17, 2023
    Assignee: Micron Technology, Inc.
    Inventors: Timothy P. Finkbeiner, Troy D. Larsen
  • Patent number: 11556494
    Abstract: A device architecture includes a spatially reconfigurable array of processors, such as configurable units of a CGRA, having spare homogenous subarrays, and a parameter store on the device which stores parameters that tag one or more elements as unusable. Configuration data is distributed using a statically reconfigurable bus system, to implement the pattern of placement of configuration data, in dependence on the tagged elements. As a result, a spatially reconfigurable array having unusable elements can be repaired.
    Type: Grant
    Filed: July 16, 2021
    Date of Patent: January 17, 2023
    Assignee: SambaNova Systems, Inc.
    Inventors: Gregory F. Grohoski, Manish K. Shah, Kin Hing Leung
  • Patent number: 11550590
    Abstract: A system and corresponding method enforce strong load ordering in a processor. The system comprises an ordering ring that stores entries corresponding to in-flight memory instructions associated with a program order, scanning logic, and recovery logic. The scanning logic scans the ordering ring in response to execution or completion of a given load instruction of the in-flight memory instructions and detects an ordering violation in an event at least one entry of the entries indicates that a younger load instruction has completed and is associated with an invalidated cache line. In response to the ordering violation, the recovery logic allows the given load instruction to complete, flushes the younger load instruction, and restarts execution of the processor after the given load instruction in the program order, causing data returned by the given and younger load instructions to be returned consistent with execution according to the program order to satisfy strong load ordering.
    Type: Grant
    Filed: January 28, 2022
    Date of Patent: January 10, 2023
    Assignee: Marvell Asia Pte, Ltd.
    Inventors: David A. Carlson, Shubhendu S. Mukherjee, Wilson P. Snyder, II
  • Patent number: 11544214
    Abstract: A computer processor comprising a vector unit is disclosed. The vector unit may comprise a vector register file comprising at least one register to hold a varying number of elements. The vector unit may further comprise a vector length register file comprising at least one register to specify the number of operations of a vector instruction to be performed on the varying number of elements in the at least one register of the vector register file. The computer processor may be implemented as a monolithic integrated circuit.
    Type: Grant
    Filed: May 12, 2015
    Date of Patent: January 3, 2023
    Assignee: Optimum Semiconductor Technologies, Inc.
    Inventors: Mayan Moudgill, Gary J. Nacer, C. John Glossner, Arthur Joseph Hoane, Paul Hurtley, Murugappan Senthilvelan, Pablo Balzola, Vitaly Kalashnikov, Sitij Agrawal
  • Patent number: 11526357
    Abstract: Systems and methods for controlling machine operations are provided. A number of data entries are organized into a stack. Each data entry includes a type, a flag, a length, and a value or pointer entry. For each data entry in the stack, the type of data is determined from the type entry, the presence of an address or value is determined by the respective flag entry, and a length of the address or value is determined from the respective length entry. The data to be utilized or an address for the same at a particular electronic storage area is provided at the respective value or pointer entry, which may be specified by a space definition pushed onto the stack.
    Type: Grant
    Filed: January 25, 2021
    Date of Patent: December 13, 2022
    Assignee: Rankin Labs, LLC
    Inventor: John Rankin
  • Patent number: 11520591
    Abstract: Processing data in an information handling system is disclosed that includes: in response to an event that triggers a flushing operation, calculate a finish ratio, wherein the finish ratio is a number of finished operations to a number of at least one of the group consisting of in-flight instructions, instructions pending in a processor pipeline, instructions issued to an issue queue, and instructions being processed in a processor execution unit; compare the calculated finish ratio to a threshold; and if the finish ratio is greater than the threshold, then do not perform the flushing operation. Also disclosed is moving the flush point.
    Type: Grant
    Filed: March 27, 2020
    Date of Patent: December 6, 2022
    Assignee: International Business Machines Corporation
    Inventors: Ehsan Fatehi, Richard J. Eickemeyer, John B. Griswell, Jr.
  • Patent number: 11513805
    Abstract: A computer architecture employs multiple special-purpose processors having different affinities for program execution to execute substantial portions of general-purpose programs to provide improved performance with respect to a general-purpose processor executing the general-purpose program alone.
    Type: Grant
    Filed: August 19, 2016
    Date of Patent: November 29, 2022
    Assignee: Wisconsin Alumni Research Foundation
    Inventors: Karthikeyan Sankaralingam, Anthony Nowatzki
  • Patent number: 11507374
    Abstract: Disclosed herein are vector index registers for storing or loading indexes of true and/or false results of comparison operations in vector processors. Each of the vector index registers store multiple addresses for accessing multiple positions in operand vectors.
    Type: Grant
    Filed: May 20, 2019
    Date of Patent: November 22, 2022
    Assignee: Micron Technology, Inc.
    Inventor: Steven Jeffrey Wallach
  • Patent number: 11500639
    Abstract: An arithmetic processing apparatus includes a memory, a first processor coupled to the memory, and a second processor coupled to the memory. The first processor is configured to consecutively issue a plurality of load instructions for reading respective data with respect to the memory. The first processor is configured to determine whether an ordering property is guaranteed, based on values included in the data loaded from the memory. The second processor is configured to issue a store instruction during an execution of the plurality of load instructions with respect to the memory.
    Type: Grant
    Filed: May 22, 2019
    Date of Patent: November 15, 2022
    Assignee: FUJITSU LIMITED
    Inventor: Hideyuki Takano
  • Patent number: 11500680
    Abstract: The present disclosure relates to an accelerator for systolic array-friendly data placement. The accelerator may include: a systolic array comprising a plurality of operation units, wherein the systolic array is configured to receive staged input data and perform operations using the staged input to generate staged output data, the staged output data comprising a number of segments; a controller configured to execute one or more instructions to generate a pattern generation signal; a data mask generator; and a memory configured to store the staged output data using the generated masks. The data mask generator may include circuitry configured to: receive the pattern generation signal from the controller, and, based on the received signal, generate a mask corresponding to each segment of the staged output data.
    Type: Grant
    Filed: April 24, 2020
    Date of Patent: November 15, 2022
    Assignee: Alibaba Group Holding Limited
    Inventors: Yuhao Wang, Xiaoxin Fan, Dimin Niu, Chunsheng Liu, Wei Han
  • Patent number: 11461107
    Abstract: One embodiment provides for a general-purpose graphics processing unit comprising a streaming multiprocessor having a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The streaming multiprocessor comprises multiple processing blocks including multiple processing cores. The processing cores include independent integer and floating-point data paths that are configurable to concurrently execute multiple independent instructions. A memory is coupled with the multiple processing blocks.
    Type: Grant
    Filed: December 20, 2018
    Date of Patent: October 4, 2022
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Barath Lakshmanan, Tatiana Shpeisman, Joydeep Ray, Ping T. Tang, Michael Strickland, Xiaoming Chen, Anbang Yao, Ben J. Ashbaugh, Linda L. Hurd, Liwei Ma
  • Patent number: 11461104
    Abstract: Apparatus for data processing and a method of data processing are provided. Data processing operations are performed in response to data processing instructions. An error exception condition is set if a data processing operation has not been successful. It is determined if an error memory barrier condition exists and an error memory barrier procedure is performed in dependence on whether the error memory barrier condition exists. The error memory barrier procedure comprises, if the error exception condition is set and if an error mask condition is set: setting a deferred error exception condition and clearing the error exception condition.
    Type: Grant
    Filed: November 25, 2015
    Date of Patent: October 4, 2022
    Assignee: ARM LIMITED
    Inventors: Michael John Williams, Richard Roy Grisenthwaite, Simon John Craske
  • Patent number: 11455171
    Abstract: A fast and frugal item-state tracking scoreboard circuit is disclosed. The scoreboard maintains per-item partial states across multiple memory circuits, enabling multiple lookups per clock cycle and multiple state updates per clock cycle. In an embodiment a scoreboard is used to schedule instructions in an out-of-order processor. Each clock cycle the scoreboard indicates the busy state of an instruction's registers and may update the busy state of the destination registers of issuing instructions and completing instructions. Applications include register tracking, function-unit tracking, and cache-line state tracking, in embodiments including processor cores (including superscalar, superpipelined, and multithreaded processors), accelerators, memory systems, and networks. In an embodiment, a register-busy scoreboard circuit is implemented using FPGA LUT RAM memory.
    Type: Grant
    Filed: May 29, 2020
    Date of Patent: September 27, 2022
    Assignee: Gray Research LLC
    Inventor: Jan Stephen Gray
  • Patent number: 11442731
    Abstract: A data processor includes an execution unit that executes instructions to perform data processing operations, a register file operable to store data values for use by and produced by the execution unit, and a buffer intermediate between the register file for providing data values from the register file to the execution unit for use when executing an instruction, and to receive output data values from the execution unit for writing to the register file. Instructions to be executed by the execution unit of the data processor have associated buffer eviction priority indications representative of a priority for eviction from the buffer of an output data value that will be generated when executing the instruction. The buffer eviction priority indications are then used when selecting data values to evict from the buffer.
    Type: Grant
    Filed: October 17, 2019
    Date of Patent: September 13, 2022
    Assignee: Arm Limited
    Inventors: John David Robson, Sean Tristram LeGuay Ellis, William Robert Stoye
  • Patent number: 11429389
    Abstract: A method for a plurality of pipelines, each having a processing element having first and second inputs and first and second lines, wherein at least one of the pipelines includes first and second logic operable to select a respective line so that data is received at the first and second inputs respectively. A first mode is selected and for the at least one pipeline, the first and second lines of that pipeline are selected such that the processing element of that pipeline receives data via the first and second lines of that pipeline, the first line being capable of supplying data that is different to the second line. A second mode is selected and for the at least one pipeline a line of another pipeline is selected, the second line of the at least one pipeline is selected and the same data at the second line is supplied as the first line.
    Type: Grant
    Filed: November 25, 2020
    Date of Patent: August 30, 2022
    Assignee: Imagination Technologies Limited
    Inventors: Simon Nield, Thomas Rose
  • Patent number: 11416258
    Abstract: A method for debugging a processor which is executing vertices of a software application is described. Each vertex is assigned to a programming thread of the processor. The processor has debug hardware for raising exceptions in certain break conditions. The method comprises inspecting a vertex identifier, comparing the vertex identifier and raising an instruction exception event for the programming thread if the vertex identifier assigned to the thread matches the vertex break identifier in the debug hardware. Exceptions are raised based on identified vertices, rather than just individual instructions or instruction addresses.
    Type: Grant
    Filed: May 22, 2019
    Date of Patent: August 16, 2022
    Assignee: Graphcore Limited
    Inventors: Alan Graham Alexander, Richard Luke Southwell Osborne, Matthew David Fyles
  • Patent number: 11409537
    Abstract: One embodiment provides for a graphics processing unit (GPU) to accelerate machine learning operations, the GPU comprising an instruction cache to store a first instruction and a second instruction, the first instruction to cause the GPU to perform a floating-point operation, including a multi-dimensional floating-point operation, and the second instruction to cause the GPU to perform an integer operation; and a general-purpose graphics compute unit having a single instruction, multiple thread (SIMT) architecture, the general-purpose graphics compute unit to simultaneously execute the first instruction and the second instruction, wherein the integer operation corresponds to a memory address calculation.
    Type: Grant
    Filed: November 21, 2017
    Date of Patent: August 9, 2022
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Barath Lakshmanan, Tatiana Shpeisman, Joydeep Ray, Ping T. Tang, Michael Strickland, Xiaoming Chen, Anbang Yao, Ben J. Ashbaugh, Linda L. Hurd, Liwei Ma
  • Patent number: 11403103
    Abstract: A microprocessor is shown, in which a branch predictor and an instruction cache are decoupled by a fetch-target queue (FTQ). The branch predictor performs branch prediction for N instruction addresses in parallel in the same cycle, wherein N is an integer greater than 1. In the current cycle, the branch predictor finishes branch prediction for N instruction addresses in parallel and, among the N instruction addresses with finished branch prediction, those that are not bypassed and do not overlap previously-predicted instruction addresses are pushed into the fetch-target queue, to be read out later as an instruction-fetching address for the instruction cache. The previously-predicted instruction addresses are pushed into the fetch-target queue in a previous cycle.
    Type: Grant
    Filed: October 13, 2020
    Date of Patent: August 2, 2022
    Assignee: SHANGHAI ZHAOXIN SEMICONDUCTOR CO., LTD.
    Inventors: Fangong Gong, Mengchen Yang
  • Patent number: 11354123
    Abstract: A computing in memory method for a memory device is provided. The computing in memory method includes: based on a stride parameter, unfolding a kernel into a plurality of sub-kernels and a plurality of complement sub-kernels; based on the sub-kernels and the complement sub-kernels, writing a plurality of weights into a plurality of target memory cells of a memory array of the memory device; inputting an input data into a selected word line of the memory array; performing a stride operation in the memory array; temporarily storing a plurality of partial sums; and summing the stored partial sums into a stride operation result when all operation cycles are completed.
    Type: Grant
    Filed: September 21, 2020
    Date of Patent: June 7, 2022
    Assignee: MACRONIX INTERNATIONAL CO., LTD.
    Inventors: Hung-Sheng Chang, Han-Wen Hu, Yueh-Han Wu, Tse-Yuan Wang, Yuan-Hao Chang, Tei-Wei Kuo
  • Patent number: 11288076
    Abstract: An integrated circuit including configurable multiplier-accumulator circuitry, wherein, during processing operations, a plurality of the multiplier-accumulator circuits are serially connected into pipelines to perform concatenated multiply and accumulate operations. The integrated circuit includes a first memory and a second memory, and a switch interconnect network, including configurable multiplexers arranged in a plurality of switch matrices. The first and second memories are configurable as either a dedicated read memory or a dedicated write memory and connected to a given pipeline, via the switch interconnect network, during a processing operation performed thereby; wherein, during a first processing operations, the first memory is dedicated to write data to a first pipeline and the second memory is dedicated to read data therefrom and, during a second processing operation, the first memory is dedicated to read data from a second pipeline and the second memory is dedicated to write data thereto.
    Type: Grant
    Filed: September 12, 2020
    Date of Patent: March 29, 2022
    Assignee: Flex Logix Technologies, Inc.
    Inventor: Cheng C. Wang