Patents Examined by Daniel H. Pan

Architecture to support color scheme-based synchronization for machine learning

Patent number: 11016801

Abstract: A system to support a machine learning (ML) operation comprises an array-based inference engine comprising a plurality of processing tiles each comprising at least one or more of an on-chip memory (OCM) configured to maintain data for local access by components in the processing tile and one or more processing units configured to perform one or more computation tasks on the data in the OCM by executing a set of task instructions. The system also comprises a data streaming engine configured to stream data between a memory and the OCMs and an instruction streaming engine configured to distribute said set of task instructions to the corresponding processing tiles to control their operations and to synchronize said set of task instructions to be executed by each processing tile, respectively, to wait current certain task at each processing tile to finish before starting a new one.

Type: Grant

Filed: May 22, 2019

Date of Patent: May 25, 2021

Assignee: Marvell Asia Pte, Ltd.

Inventors: Avinash Sodani, Senad Durakovic, Gopal Nalamalapu
Predicting a table of contents pointer value responsive to branching to a subroutine

Patent number: 11010164

Abstract: Predicting a Table of Contents (TOC) pointer value responsive to branching to a subroutine. A subroutine is called from a calling module executing on a processor. Based on calling the subroutine, a value of a pointer to a reference data structure, such as a TOC, is predicted. The predicting is performed prior to executing a sequence of one or more instructions in the subroutine to compute the value. The value that is predicted is used to access the reference data structure to obtain a variable value for a variable of the subroutine.

Type: Grant

Filed: October 2, 2019

Date of Patent: May 18, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Michael K. Gschwind, Valentina Salapura
Electronic apparatus can execute instruction and instruction executing method

Patent number: 11010162

Abstract: An electronic device can execute instructions, comprising: a processing circuit; a first storage device, coupled to the processing circuit, configured to store at least one instruction and first operation data; and a second storage device, coupled to the processing circuit. The processing circuit reads at least one of the instruction and the first operation data corresponding to the read instruction from the first storage device, and the second storage device does not store the first operation data corresponding to the read instruction, the processing circuit backs up the read first operation data to the second storage device.

Type: Grant

Filed: September 17, 2019

Date of Patent: May 18, 2021

Assignee: Realtek Semiconductor Corp.

Inventor: Yen-Ju Lu
Multiple types of thread identifiers for a multi-threaded, self-scheduling reconfigurable computing fabric

Patent number: 11010161

Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an interconnection network; a processor; and a plurality of configurable circuit clusters. Each configurable circuit cluster includes a plurality of configurable circuits arranged in an array; a synchronous network coupled to each configurable circuit of the array; and an asynchronous packet network coupled to each configurable circuit of the array.

Type: Grant

Filed: March 31, 2019

Date of Patent: May 18, 2021

Assignee: Micron Technology, Inc.

Inventor: Tony M. Brewer
Execution control of a multi-threaded, self-scheduling reconfigurable computing fabric

Patent number: 11003451

Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an interconnection network; a processor; and a plurality of configurable circuit clusters. Each configurable circuit cluster includes a plurality of configurable circuits arranged in an array; a synchronous network coupled to each configurable circuit of the array; and an asynchronous packet network coupled to each configurable circuit of the array.

Type: Grant

Filed: March 31, 2019

Date of Patent: May 11, 2021

Assignee: Micron Technology, Inc.

Inventor: Tony M. Brewer
Decimal load immediate instruction

Patent number: 10990390

Abstract: An instruction generates a value for use in processing within a computing environment. The instruction obtains a sign control associated with the instruction, and shifts an input value of the instruction in a specified direction by a selected amount to provide a result. The result is placed in a first designated location in a register, and the sign, which is based on the sign control, is placed in a second designated location of the register. The result and the sign provide a signed value to be used in processing within the computing environment.

Type: Grant

Filed: August 19, 2019

Date of Patent: April 27, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jonathan D. Bradbury, Reid T. Copeland, Silvia Melitta Mueller
Efficient loop execution for a multi-threaded, self-scheduling reconfigurable computing fabric

Patent number: 10990392

Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an interconnection network; a processor; and a plurality of configurable circuit clusters. Each configurable circuit cluster includes a plurality of configurable circuits arranged in an array; a synchronous network coupled to each configurable circuit of the array; and an asynchronous packet network coupled to each configurable circuit of the array.

Type: Grant

Filed: March 31, 2019

Date of Patent: April 27, 2021

Assignee: Micron Technology, Inc.

Inventor: Tony M. Brewer
Backpressure control using a stop signal for a multi-threaded, self-scheduling reconfigurable computing fabric

Patent number: 10990391

Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an interconnection network; a processor; and a plurality of configurable circuit clusters. Each configurable circuit cluster includes a plurality of configurable circuits arranged in an array; a synchronous network coupled to each configurable circuit of the array; and an asynchronous packet network coupled to each configurable circuit of the array.

Type: Grant

Filed: March 31, 2019

Date of Patent: April 27, 2021

Assignee: Micron Technology, Inc.

Inventor: Tony M. Brewer
Mechanism for interrupting and resuming execution on an unprotected pipeline processor

Patent number: 10990398

Abstract: Techniques related to executing a plurality of instructions by a processor comprising receiving a first instruction for execution on an instruction execution pipeline, beginning execution of the first instruction, receiving one or more second instructions for execution on the instruction execution pipeline, the one or more second instructions associated with a higher priority task than the first instruction, storing a register state associated with the execution of the first instruction in one or more registers of a capture queue associated with the instruction execution pipeline, copying the register state from the capture queue to a memory, determining that the one or more second instructions have been executed, copying the register state from the memory to the one or more registers of the capture queue, and restoring the register state to the instruction execution pipeline from the capture queue.

Type: Grant

Filed: April 15, 2019

Date of Patent: April 27, 2021

Assignee: Texas Instruments Incorporated

Inventors: Timothy D. Anderson, Joseph Zbiciak, Kai Chirca
Computation engine with strided dot product

Patent number: 10990401

Abstract: In an embodiment, a computation engine may perform dot product computations on input vectors. The dot product operation may have a first operand and a second operand, and the dot product may be performed on a subset of the vector elements in the first operand and each of the vector elements in the second operand. The subset of vector elements may be separated in the first operand by a stride that skips one or more elements between each element to which the dot product operation is applied. More particularly, in an embodiment, the input operands of the dot product operation may be a first vector having second vectors as elements, and the stride may select a specified element of each second vector.

Type: Grant

Filed: April 1, 2020

Date of Patent: April 27, 2021

Assignee: Apple Inc.

Inventors: Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
Device, system and method to detect an uninitialized memory read

Patent number: 10976961

Abstract: Techniques and mechanisms for circuitry of a processor to automatically provide, and perform an operation based on, metadata indicating an uninitialized memory block. In an embodiment, processor circuitry detects a software instruction which specifies a first operation to be performed based on some data at a memory block. Metadata corresponding to said data comprises an identifier of whether the data is based on an uninitialized memory condition. Processing of the instruction, includes the processor circuitry automatically performing a second operation based on the identifier. The second operation is performed independent of any instruction of the application which specifies the second operation. In another embodiment, execution of the instruction (if any) is conditional upon an evaluation which is based on the state identifier, or the second operation is automatically performed based on an execution of the first instruction.

Type: Grant

Filed: December 20, 2018

Date of Patent: April 13, 2021

Assignee: Intel Corporation

Inventors: Ron Gabor, Tomer Stark, Joseph Nuzman, Ady Tal
Hazard detection of out-of-order execution of load and store instructions in processors without using real addresses

Patent number: 10977047

Abstract: Technical solutions are described for hazard detection of out-of-order execution of load and store instructions without using real addresses in a processing unit. An example includes an out-of-order load-store unit (LSU) for transferring data between memory and registers. The LSU detects a store-hit-load (SHL) in an out-of-order execution of instructions based only on effective addresses by: determining an effective address associated with a store instruction; determining whether a load instruction entry using said effective address is present in a load reorder queue; and indicating that a SHL has been detected based at least in part on determining that load instruction entry using said effective address is present in the load reorder queue. The LSU, in response to detecting the SHL, flushes instructions starting from a load instruction corresponding to the load instruction entry.

Type: Grant

Filed: June 24, 2019

Date of Patent: April 13, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Bryan Lloyd, Balaram Sinharoy, Shih-Hsiung S. Tung
Implied fence on stream open

Patent number: 10963255

Abstract: Techniques related to executing a plurality of instructions by a processor comprising receiving a first instruction configured to cause the processor to output a first data value to a first address in a first data cache, outputting, by the processor, the first data value to a second address in a second data cache, receiving a second instruction configured to cause a streaming engine associated with the processor to prefetch data from the first data cache, determining that the first data value has not been outputted from the second data cache to the first data cache, stalling execution of the second instruction, receiving an indication, from the second data cache, that the first data value has been output from the second data cache to the first data cache, and resuming execution of the second instruction based on the received indication.

Type: Grant

Filed: March 11, 2019

Date of Patent: March 30, 2021

Assignee: Texas Instruments Incorporated

Inventors: Naveen Bhoria, Kai Chirca, Timothy D. Anderson, Duc Bui, Abhijeet A. Chachad, Son Hung Tran
Vector register access

Patent number: 10963251

Abstract: There is provided an apparatus that includes a set of vector registers, each of the vector registers being arranged to store a vector comprising a plurality of portions. The set of vector registers is logically divided into a plurality of columns, each of the columns being arranged to store a same portion of each vector. The apparatus also includes register access circuitry that comprises a plurality of access blocks. Each access block is arranged to access a portion in a different column when accessing one of the vector registers than when accessing at least one other of the vector registers. The register access circuitry is arranged to simultaneously access portions in any one of: the vector registers and the columns.

Type: Grant

Filed: June 15, 2017

Date of Patent: March 30, 2021

Assignee: ARM Limited

Inventor: Thomas Christopher Grocutt
Mechanism to queue multiple streams to run on streaming engine

Patent number: 10963254

Abstract: A steaming engine in a system receives a first set of stream parameters into a queue to define a first stream along with an indication of either a queue mode of operation or a speculative mode of operation for the first stream. Acquisition of the first stream then begins. At some point, a second set of stream parameters is received into the queue to define a second stream. When the queue mode of operation was specified for the first stream, the second set of parameters is queued and acquisition of the second stream is delayed until completion of acquisition of the first stream. When the speculative mode of operation was specified for the first stream, acquisition of the first stream is canceled upon receipt of the second set of stream parameters and acquisition of the second stream begins immediately.

Type: Grant

Filed: March 2, 2019

Date of Patent: March 30, 2021

Assignee: Texas Instruments Incorporated

Inventors: Timothy David Anderson, Jonathan (Son) Hung Tran, Joseph Raymond Michael Zbiciak
Intelligent scheduling of coprocessor execution

Patent number: 10949211

Abstract: Execution of multiple execution streams is scheduled on a plurality of coprocessors. A software layer located logically between applications and the coprocessors determines dependencies within the execution streams, each said dependency being a condition in one of the execution streams that must be satisfied in order for execution of at least one other of the execution streams to proceed on corresponding ones of the coprocessors. The dependencies are then represented in a data structure and an optimized execution schedule is determined for the execution streams according to the dependencies. Simultaneous execution of a plurality of the execution streams is then dynamically reordered according to the optimized execution schedule.

Type: Grant

Filed: December 20, 2018

Date of Patent: March 16, 2021

Assignee: VMware, Inc.

Inventors: Mazhar Memon, Aidan Cully
Transposing a matrix using a streaming engine

Patent number: 10949206

Abstract: Software instructions are executed on a processor within a computer system to configure a steaming engine to operate in either a linear mode or a transpose mode. A stream of addresses is generated using an address generator, in which the stream of addresses includes consecutive nested loop iterations for at least a first loop and a second loop. While in the linear mode, the first loop is treated as an inner loop. While in the transpose mode, the second loop is treated as the inner loop. A matrix can be fetched from memory in the linear mode to provide row-wise vectors. A matrix can be fetched from the memory in the transpose mode to provide column wise vectors.

Type: Grant

Filed: February 22, 2019

Date of Patent: March 16, 2021

Assignee: Texas Instruments Incorporated

Inventors: Jonathan (Son) Hung Tran, Joseph Raymond Michael Zbiciak
System and method to accelerate reduce operations in graphics processor

Patent number: 10949251

Abstract: Embodiments described herein provide a system, method, and apparatus to accelerate reduce operations in a graphics processor. One embodiment provides an apparatus including one or more processors, the one or more processors including a first logic unit to perform a merged write, barrier, and read operation in response to a barrier synchronization request from a set of threads in a work group, synchronize the set of threads, and report a result of an operation specified in association with the barrier synchronization request.

Type: Grant

Filed: April 1, 2016

Date of Patent: March 16, 2021

Assignee: INTEL CORPORATION

Inventors: Yong Jiang, Yuanyuan Li, Jianghong Du, Kuilin Chen, Thomas A. Tetzlaff
Storage organization for transposing a matrix using a streaming engine

Patent number: 10942741

Abstract: Software instructions are executed on a processor within a computer system to configure a steaming engine to operate in either a linear mode or a transpose mode. A stream of addresses is generated using an address generator, in which the stream of addresses includes consecutive nested loop iterations for at least a first loop and a second loop. While in the linear mode, the first loop is treated as an inner loop. While in the transpose mode, the second loop is treated as the inner loop. A matrix can be fetched from memory in the linear mode to provide row-wise vectors. A matrix can be fetched from the memory in the transpose mode to provide column wise vectors. Local storage on the streaming engine is organized as sectors based on the number of rows in the matrix to allow overlapping transposition processing and to minimize memory accesses.

Type: Grant

Filed: February 22, 2019

Date of Patent: March 9, 2021

Assignee: Texas Instruments Incorporated

Inventors: Jonathan (Son) Hung Tran, Joseph Raymond Michael Zbiciak
Tracking streaming engine vector predicates to control processor execution

Patent number: 10936315

Abstract: In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.

Type: Grant

Filed: December 31, 2018

Date of Patent: March 2, 2021

Assignee: Texas Instruments Incorporated

Inventors: Duc Quang Bui, Joseph Raymond Michael Zbiciak

prev 1 2 3 4 5 6 7 8 … next