Patents Examined by Daniel H. Pan
  • Patent number: 11016801
    Abstract: A system to support a machine learning (ML) operation comprises an array-based inference engine comprising a plurality of processing tiles each comprising at least one or more of an on-chip memory (OCM) configured to maintain data for local access by components in the processing tile and one or more processing units configured to perform one or more computation tasks on the data in the OCM by executing a set of task instructions. The system also comprises a data streaming engine configured to stream data between a memory and the OCMs and an instruction streaming engine configured to distribute said set of task instructions to the corresponding processing tiles to control their operations and to synchronize said set of task instructions to be executed by each processing tile, respectively, to wait current certain task at each processing tile to finish before starting a new one.
    Type: Grant
    Filed: May 22, 2019
    Date of Patent: May 25, 2021
    Assignee: Marvell Asia Pte, Ltd.
    Inventors: Avinash Sodani, Senad Durakovic, Gopal Nalamalapu
  • Patent number: 11010164
    Abstract: Predicting a Table of Contents (TOC) pointer value responsive to branching to a subroutine. A subroutine is called from a calling module executing on a processor. Based on calling the subroutine, a value of a pointer to a reference data structure, such as a TOC, is predicted. The predicting is performed prior to executing a sequence of one or more instructions in the subroutine to compute the value. The value that is predicted is used to access the reference data structure to obtain a variable value for a variable of the subroutine.
    Type: Grant
    Filed: October 2, 2019
    Date of Patent: May 18, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Patent number: 11010162
    Abstract: An electronic device can execute instructions, comprising: a processing circuit; a first storage device, coupled to the processing circuit, configured to store at least one instruction and first operation data; and a second storage device, coupled to the processing circuit. The processing circuit reads at least one of the instruction and the first operation data corresponding to the read instruction from the first storage device, and the second storage device does not store the first operation data corresponding to the read instruction, the processing circuit backs up the read first operation data to the second storage device.
    Type: Grant
    Filed: September 17, 2019
    Date of Patent: May 18, 2021
    Assignee: Realtek Semiconductor Corp.
    Inventor: Yen-Ju Lu
  • Patent number: 11010161
    Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an interconnection network; a processor; and a plurality of configurable circuit clusters. Each configurable circuit cluster includes a plurality of configurable circuits arranged in an array; a synchronous network coupled to each configurable circuit of the array; and an asynchronous packet network coupled to each configurable circuit of the array.
    Type: Grant
    Filed: March 31, 2019
    Date of Patent: May 18, 2021
    Assignee: Micron Technology, Inc.
    Inventor: Tony M. Brewer
  • Patent number: 11003451
    Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an interconnection network; a processor; and a plurality of configurable circuit clusters. Each configurable circuit cluster includes a plurality of configurable circuits arranged in an array; a synchronous network coupled to each configurable circuit of the array; and an asynchronous packet network coupled to each configurable circuit of the array.
    Type: Grant
    Filed: March 31, 2019
    Date of Patent: May 11, 2021
    Assignee: Micron Technology, Inc.
    Inventor: Tony M. Brewer
  • Patent number: 10990390
    Abstract: An instruction generates a value for use in processing within a computing environment. The instruction obtains a sign control associated with the instruction, and shifts an input value of the instruction in a specified direction by a selected amount to provide a result. The result is placed in a first designated location in a register, and the sign, which is based on the sign control, is placed in a second designated location of the register. The result and the sign provide a signed value to be used in processing within the computing environment.
    Type: Grant
    Filed: August 19, 2019
    Date of Patent: April 27, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Reid T. Copeland, Silvia Melitta Mueller
  • Patent number: 10990392
    Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an interconnection network; a processor; and a plurality of configurable circuit clusters. Each configurable circuit cluster includes a plurality of configurable circuits arranged in an array; a synchronous network coupled to each configurable circuit of the array; and an asynchronous packet network coupled to each configurable circuit of the array.
    Type: Grant
    Filed: March 31, 2019
    Date of Patent: April 27, 2021
    Assignee: Micron Technology, Inc.
    Inventor: Tony M. Brewer
  • Patent number: 10990391
    Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an interconnection network; a processor; and a plurality of configurable circuit clusters. Each configurable circuit cluster includes a plurality of configurable circuits arranged in an array; a synchronous network coupled to each configurable circuit of the array; and an asynchronous packet network coupled to each configurable circuit of the array.
    Type: Grant
    Filed: March 31, 2019
    Date of Patent: April 27, 2021
    Assignee: Micron Technology, Inc.
    Inventor: Tony M. Brewer
  • Patent number: 10990398
    Abstract: Techniques related to executing a plurality of instructions by a processor comprising receiving a first instruction for execution on an instruction execution pipeline, beginning execution of the first instruction, receiving one or more second instructions for execution on the instruction execution pipeline, the one or more second instructions associated with a higher priority task than the first instruction, storing a register state associated with the execution of the first instruction in one or more registers of a capture queue associated with the instruction execution pipeline, copying the register state from the capture queue to a memory, determining that the one or more second instructions have been executed, copying the register state from the memory to the one or more registers of the capture queue, and restoring the register state to the instruction execution pipeline from the capture queue.
    Type: Grant
    Filed: April 15, 2019
    Date of Patent: April 27, 2021
    Assignee: Texas Instruments Incorporated
    Inventors: Timothy D. Anderson, Joseph Zbiciak, Kai Chirca
  • Patent number: 10990401
    Abstract: In an embodiment, a computation engine may perform dot product computations on input vectors. The dot product operation may have a first operand and a second operand, and the dot product may be performed on a subset of the vector elements in the first operand and each of the vector elements in the second operand. The subset of vector elements may be separated in the first operand by a stride that skips one or more elements between each element to which the dot product operation is applied. More particularly, in an embodiment, the input operands of the dot product operation may be a first vector having second vectors as elements, and the stride may select a specified element of each second vector.
    Type: Grant
    Filed: April 1, 2020
    Date of Patent: April 27, 2021
    Assignee: Apple Inc.
    Inventors: Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
  • Patent number: 10976961
    Abstract: Techniques and mechanisms for circuitry of a processor to automatically provide, and perform an operation based on, metadata indicating an uninitialized memory block. In an embodiment, processor circuitry detects a software instruction which specifies a first operation to be performed based on some data at a memory block. Metadata corresponding to said data comprises an identifier of whether the data is based on an uninitialized memory condition. Processing of the instruction, includes the processor circuitry automatically performing a second operation based on the identifier. The second operation is performed independent of any instruction of the application which specifies the second operation. In another embodiment, execution of the instruction (if any) is conditional upon an evaluation which is based on the state identifier, or the second operation is automatically performed based on an execution of the first instruction.
    Type: Grant
    Filed: December 20, 2018
    Date of Patent: April 13, 2021
    Assignee: Intel Corporation
    Inventors: Ron Gabor, Tomer Stark, Joseph Nuzman, Ady Tal
  • Patent number: 10977047
    Abstract: Technical solutions are described for hazard detection of out-of-order execution of load and store instructions without using real addresses in a processing unit. An example includes an out-of-order load-store unit (LSU) for transferring data between memory and registers. The LSU detects a store-hit-load (SHL) in an out-of-order execution of instructions based only on effective addresses by: determining an effective address associated with a store instruction; determining whether a load instruction entry using said effective address is present in a load reorder queue; and indicating that a SHL has been detected based at least in part on determining that load instruction entry using said effective address is present in the load reorder queue. The LSU, in response to detecting the SHL, flushes instructions starting from a load instruction corresponding to the load instruction entry.
    Type: Grant
    Filed: June 24, 2019
    Date of Patent: April 13, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bryan Lloyd, Balaram Sinharoy, Shih-Hsiung S. Tung
  • Patent number: 10963255
    Abstract: Techniques related to executing a plurality of instructions by a processor comprising receiving a first instruction configured to cause the processor to output a first data value to a first address in a first data cache, outputting, by the processor, the first data value to a second address in a second data cache, receiving a second instruction configured to cause a streaming engine associated with the processor to prefetch data from the first data cache, determining that the first data value has not been outputted from the second data cache to the first data cache, stalling execution of the second instruction, receiving an indication, from the second data cache, that the first data value has been output from the second data cache to the first data cache, and resuming execution of the second instruction based on the received indication.
    Type: Grant
    Filed: March 11, 2019
    Date of Patent: March 30, 2021
    Assignee: Texas Instruments Incorporated
    Inventors: Naveen Bhoria, Kai Chirca, Timothy D. Anderson, Duc Bui, Abhijeet A. Chachad, Son Hung Tran
  • Patent number: 10963251
    Abstract: There is provided an apparatus that includes a set of vector registers, each of the vector registers being arranged to store a vector comprising a plurality of portions. The set of vector registers is logically divided into a plurality of columns, each of the columns being arranged to store a same portion of each vector. The apparatus also includes register access circuitry that comprises a plurality of access blocks. Each access block is arranged to access a portion in a different column when accessing one of the vector registers than when accessing at least one other of the vector registers. The register access circuitry is arranged to simultaneously access portions in any one of: the vector registers and the columns.
    Type: Grant
    Filed: June 15, 2017
    Date of Patent: March 30, 2021
    Assignee: ARM Limited
    Inventor: Thomas Christopher Grocutt
  • Patent number: 10963254
    Abstract: A steaming engine in a system receives a first set of stream parameters into a queue to define a first stream along with an indication of either a queue mode of operation or a speculative mode of operation for the first stream. Acquisition of the first stream then begins. At some point, a second set of stream parameters is received into the queue to define a second stream. When the queue mode of operation was specified for the first stream, the second set of parameters is queued and acquisition of the second stream is delayed until completion of acquisition of the first stream. When the speculative mode of operation was specified for the first stream, acquisition of the first stream is canceled upon receipt of the second set of stream parameters and acquisition of the second stream begins immediately.
    Type: Grant
    Filed: March 2, 2019
    Date of Patent: March 30, 2021
    Assignee: Texas Instruments Incorporated
    Inventors: Timothy David Anderson, Jonathan (Son) Hung Tran, Joseph Raymond Michael Zbiciak
  • Patent number: 10949211
    Abstract: Execution of multiple execution streams is scheduled on a plurality of coprocessors. A software layer located logically between applications and the coprocessors determines dependencies within the execution streams, each said dependency being a condition in one of the execution streams that must be satisfied in order for execution of at least one other of the execution streams to proceed on corresponding ones of the coprocessors. The dependencies are then represented in a data structure and an optimized execution schedule is determined for the execution streams according to the dependencies. Simultaneous execution of a plurality of the execution streams is then dynamically reordered according to the optimized execution schedule.
    Type: Grant
    Filed: December 20, 2018
    Date of Patent: March 16, 2021
    Assignee: VMware, Inc.
    Inventors: Mazhar Memon, Aidan Cully
  • Patent number: 10949206
    Abstract: Software instructions are executed on a processor within a computer system to configure a steaming engine to operate in either a linear mode or a transpose mode. A stream of addresses is generated using an address generator, in which the stream of addresses includes consecutive nested loop iterations for at least a first loop and a second loop. While in the linear mode, the first loop is treated as an inner loop. While in the transpose mode, the second loop is treated as the inner loop. A matrix can be fetched from memory in the linear mode to provide row-wise vectors. A matrix can be fetched from the memory in the transpose mode to provide column wise vectors.
    Type: Grant
    Filed: February 22, 2019
    Date of Patent: March 16, 2021
    Assignee: Texas Instruments Incorporated
    Inventors: Jonathan (Son) Hung Tran, Joseph Raymond Michael Zbiciak
  • Patent number: 10949251
    Abstract: Embodiments described herein provide a system, method, and apparatus to accelerate reduce operations in a graphics processor. One embodiment provides an apparatus including one or more processors, the one or more processors including a first logic unit to perform a merged write, barrier, and read operation in response to a barrier synchronization request from a set of threads in a work group, synchronize the set of threads, and report a result of an operation specified in association with the barrier synchronization request.
    Type: Grant
    Filed: April 1, 2016
    Date of Patent: March 16, 2021
    Assignee: INTEL CORPORATION
    Inventors: Yong Jiang, Yuanyuan Li, Jianghong Du, Kuilin Chen, Thomas A. Tetzlaff
  • Patent number: 10942741
    Abstract: Software instructions are executed on a processor within a computer system to configure a steaming engine to operate in either a linear mode or a transpose mode. A stream of addresses is generated using an address generator, in which the stream of addresses includes consecutive nested loop iterations for at least a first loop and a second loop. While in the linear mode, the first loop is treated as an inner loop. While in the transpose mode, the second loop is treated as the inner loop. A matrix can be fetched from memory in the linear mode to provide row-wise vectors. A matrix can be fetched from the memory in the transpose mode to provide column wise vectors. Local storage on the streaming engine is organized as sectors based on the number of rows in the matrix to allow overlapping transposition processing and to minimize memory accesses.
    Type: Grant
    Filed: February 22, 2019
    Date of Patent: March 9, 2021
    Assignee: Texas Instruments Incorporated
    Inventors: Jonathan (Son) Hung Tran, Joseph Raymond Michael Zbiciak
  • Patent number: 10936315
    Abstract: In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.
    Type: Grant
    Filed: December 31, 2018
    Date of Patent: March 2, 2021
    Assignee: Texas Instruments Incorporated
    Inventors: Duc Quang Bui, Joseph Raymond Michael Zbiciak