Patents Examined by Daniel H. Pan
-
Patent number: 11016801Abstract: A system to support a machine learning (ML) operation comprises an array-based inference engine comprising a plurality of processing tiles each comprising at least one or more of an on-chip memory (OCM) configured to maintain data for local access by components in the processing tile and one or more processing units configured to perform one or more computation tasks on the data in the OCM by executing a set of task instructions. The system also comprises a data streaming engine configured to stream data between a memory and the OCMs and an instruction streaming engine configured to distribute said set of task instructions to the corresponding processing tiles to control their operations and to synchronize said set of task instructions to be executed by each processing tile, respectively, to wait current certain task at each processing tile to finish before starting a new one.Type: GrantFiled: May 22, 2019Date of Patent: May 25, 2021Assignee: Marvell Asia Pte, Ltd.Inventors: Avinash Sodani, Senad Durakovic, Gopal Nalamalapu
-
Patent number: 11010164Abstract: Predicting a Table of Contents (TOC) pointer value responsive to branching to a subroutine. A subroutine is called from a calling module executing on a processor. Based on calling the subroutine, a value of a pointer to a reference data structure, such as a TOC, is predicted. The predicting is performed prior to executing a sequence of one or more instructions in the subroutine to compute the value. The value that is predicted is used to access the reference data structure to obtain a variable value for a variable of the subroutine.Type: GrantFiled: October 2, 2019Date of Patent: May 18, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael K. Gschwind, Valentina Salapura
-
Patent number: 11010162Abstract: An electronic device can execute instructions, comprising: a processing circuit; a first storage device, coupled to the processing circuit, configured to store at least one instruction and first operation data; and a second storage device, coupled to the processing circuit. The processing circuit reads at least one of the instruction and the first operation data corresponding to the read instruction from the first storage device, and the second storage device does not store the first operation data corresponding to the read instruction, the processing circuit backs up the read first operation data to the second storage device.Type: GrantFiled: September 17, 2019Date of Patent: May 18, 2021Assignee: Realtek Semiconductor Corp.Inventor: Yen-Ju Lu
-
Patent number: 11010161Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an interconnection network; a processor; and a plurality of configurable circuit clusters. Each configurable circuit cluster includes a plurality of configurable circuits arranged in an array; a synchronous network coupled to each configurable circuit of the array; and an asynchronous packet network coupled to each configurable circuit of the array.Type: GrantFiled: March 31, 2019Date of Patent: May 18, 2021Assignee: Micron Technology, Inc.Inventor: Tony M. Brewer
-
Patent number: 11003451Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an interconnection network; a processor; and a plurality of configurable circuit clusters. Each configurable circuit cluster includes a plurality of configurable circuits arranged in an array; a synchronous network coupled to each configurable circuit of the array; and an asynchronous packet network coupled to each configurable circuit of the array.Type: GrantFiled: March 31, 2019Date of Patent: May 11, 2021Assignee: Micron Technology, Inc.Inventor: Tony M. Brewer
-
Patent number: 10990390Abstract: An instruction generates a value for use in processing within a computing environment. The instruction obtains a sign control associated with the instruction, and shifts an input value of the instruction in a specified direction by a selected amount to provide a result. The result is placed in a first designated location in a register, and the sign, which is based on the sign control, is placed in a second designated location of the register. The result and the sign provide a signed value to be used in processing within the computing environment.Type: GrantFiled: August 19, 2019Date of Patent: April 27, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jonathan D. Bradbury, Reid T. Copeland, Silvia Melitta Mueller
-
Patent number: 10990392Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an interconnection network; a processor; and a plurality of configurable circuit clusters. Each configurable circuit cluster includes a plurality of configurable circuits arranged in an array; a synchronous network coupled to each configurable circuit of the array; and an asynchronous packet network coupled to each configurable circuit of the array.Type: GrantFiled: March 31, 2019Date of Patent: April 27, 2021Assignee: Micron Technology, Inc.Inventor: Tony M. Brewer
-
Patent number: 10990391Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. A representative system includes an interconnection network; a processor; and a plurality of configurable circuit clusters. Each configurable circuit cluster includes a plurality of configurable circuits arranged in an array; a synchronous network coupled to each configurable circuit of the array; and an asynchronous packet network coupled to each configurable circuit of the array.Type: GrantFiled: March 31, 2019Date of Patent: April 27, 2021Assignee: Micron Technology, Inc.Inventor: Tony M. Brewer
-
Patent number: 10990398Abstract: Techniques related to executing a plurality of instructions by a processor comprising receiving a first instruction for execution on an instruction execution pipeline, beginning execution of the first instruction, receiving one or more second instructions for execution on the instruction execution pipeline, the one or more second instructions associated with a higher priority task than the first instruction, storing a register state associated with the execution of the first instruction in one or more registers of a capture queue associated with the instruction execution pipeline, copying the register state from the capture queue to a memory, determining that the one or more second instructions have been executed, copying the register state from the memory to the one or more registers of the capture queue, and restoring the register state to the instruction execution pipeline from the capture queue.Type: GrantFiled: April 15, 2019Date of Patent: April 27, 2021Assignee: Texas Instruments IncorporatedInventors: Timothy D. Anderson, Joseph Zbiciak, Kai Chirca
-
Patent number: 10990401Abstract: In an embodiment, a computation engine may perform dot product computations on input vectors. The dot product operation may have a first operand and a second operand, and the dot product may be performed on a subset of the vector elements in the first operand and each of the vector elements in the second operand. The subset of vector elements may be separated in the first operand by a stride that skips one or more elements between each element to which the dot product operation is applied. More particularly, in an embodiment, the input operands of the dot product operation may be a first vector having second vectors as elements, and the stride may select a specified element of each second vector.Type: GrantFiled: April 1, 2020Date of Patent: April 27, 2021Assignee: Apple Inc.Inventors: Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
-
Patent number: 10976961Abstract: Techniques and mechanisms for circuitry of a processor to automatically provide, and perform an operation based on, metadata indicating an uninitialized memory block. In an embodiment, processor circuitry detects a software instruction which specifies a first operation to be performed based on some data at a memory block. Metadata corresponding to said data comprises an identifier of whether the data is based on an uninitialized memory condition. Processing of the instruction, includes the processor circuitry automatically performing a second operation based on the identifier. The second operation is performed independent of any instruction of the application which specifies the second operation. In another embodiment, execution of the instruction (if any) is conditional upon an evaluation which is based on the state identifier, or the second operation is automatically performed based on an execution of the first instruction.Type: GrantFiled: December 20, 2018Date of Patent: April 13, 2021Assignee: Intel CorporationInventors: Ron Gabor, Tomer Stark, Joseph Nuzman, Ady Tal
-
Patent number: 10977047Abstract: Technical solutions are described for hazard detection of out-of-order execution of load and store instructions without using real addresses in a processing unit. An example includes an out-of-order load-store unit (LSU) for transferring data between memory and registers. The LSU detects a store-hit-load (SHL) in an out-of-order execution of instructions based only on effective addresses by: determining an effective address associated with a store instruction; determining whether a load instruction entry using said effective address is present in a load reorder queue; and indicating that a SHL has been detected based at least in part on determining that load instruction entry using said effective address is present in the load reorder queue. The LSU, in response to detecting the SHL, flushes instructions starting from a load instruction corresponding to the load instruction entry.Type: GrantFiled: June 24, 2019Date of Patent: April 13, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bryan Lloyd, Balaram Sinharoy, Shih-Hsiung S. Tung
-
Patent number: 10963255Abstract: Techniques related to executing a plurality of instructions by a processor comprising receiving a first instruction configured to cause the processor to output a first data value to a first address in a first data cache, outputting, by the processor, the first data value to a second address in a second data cache, receiving a second instruction configured to cause a streaming engine associated with the processor to prefetch data from the first data cache, determining that the first data value has not been outputted from the second data cache to the first data cache, stalling execution of the second instruction, receiving an indication, from the second data cache, that the first data value has been output from the second data cache to the first data cache, and resuming execution of the second instruction based on the received indication.Type: GrantFiled: March 11, 2019Date of Patent: March 30, 2021Assignee: Texas Instruments IncorporatedInventors: Naveen Bhoria, Kai Chirca, Timothy D. Anderson, Duc Bui, Abhijeet A. Chachad, Son Hung Tran
-
Patent number: 10963251Abstract: There is provided an apparatus that includes a set of vector registers, each of the vector registers being arranged to store a vector comprising a plurality of portions. The set of vector registers is logically divided into a plurality of columns, each of the columns being arranged to store a same portion of each vector. The apparatus also includes register access circuitry that comprises a plurality of access blocks. Each access block is arranged to access a portion in a different column when accessing one of the vector registers than when accessing at least one other of the vector registers. The register access circuitry is arranged to simultaneously access portions in any one of: the vector registers and the columns.Type: GrantFiled: June 15, 2017Date of Patent: March 30, 2021Assignee: ARM LimitedInventor: Thomas Christopher Grocutt
-
Patent number: 10963254Abstract: A steaming engine in a system receives a first set of stream parameters into a queue to define a first stream along with an indication of either a queue mode of operation or a speculative mode of operation for the first stream. Acquisition of the first stream then begins. At some point, a second set of stream parameters is received into the queue to define a second stream. When the queue mode of operation was specified for the first stream, the second set of parameters is queued and acquisition of the second stream is delayed until completion of acquisition of the first stream. When the speculative mode of operation was specified for the first stream, acquisition of the first stream is canceled upon receipt of the second set of stream parameters and acquisition of the second stream begins immediately.Type: GrantFiled: March 2, 2019Date of Patent: March 30, 2021Assignee: Texas Instruments IncorporatedInventors: Timothy David Anderson, Jonathan (Son) Hung Tran, Joseph Raymond Michael Zbiciak
-
Patent number: 10949211Abstract: Execution of multiple execution streams is scheduled on a plurality of coprocessors. A software layer located logically between applications and the coprocessors determines dependencies within the execution streams, each said dependency being a condition in one of the execution streams that must be satisfied in order for execution of at least one other of the execution streams to proceed on corresponding ones of the coprocessors. The dependencies are then represented in a data structure and an optimized execution schedule is determined for the execution streams according to the dependencies. Simultaneous execution of a plurality of the execution streams is then dynamically reordered according to the optimized execution schedule.Type: GrantFiled: December 20, 2018Date of Patent: March 16, 2021Assignee: VMware, Inc.Inventors: Mazhar Memon, Aidan Cully
-
Patent number: 10949206Abstract: Software instructions are executed on a processor within a computer system to configure a steaming engine to operate in either a linear mode or a transpose mode. A stream of addresses is generated using an address generator, in which the stream of addresses includes consecutive nested loop iterations for at least a first loop and a second loop. While in the linear mode, the first loop is treated as an inner loop. While in the transpose mode, the second loop is treated as the inner loop. A matrix can be fetched from memory in the linear mode to provide row-wise vectors. A matrix can be fetched from the memory in the transpose mode to provide column wise vectors.Type: GrantFiled: February 22, 2019Date of Patent: March 16, 2021Assignee: Texas Instruments IncorporatedInventors: Jonathan (Son) Hung Tran, Joseph Raymond Michael Zbiciak
-
Patent number: 10949251Abstract: Embodiments described herein provide a system, method, and apparatus to accelerate reduce operations in a graphics processor. One embodiment provides an apparatus including one or more processors, the one or more processors including a first logic unit to perform a merged write, barrier, and read operation in response to a barrier synchronization request from a set of threads in a work group, synchronize the set of threads, and report a result of an operation specified in association with the barrier synchronization request.Type: GrantFiled: April 1, 2016Date of Patent: March 16, 2021Assignee: INTEL CORPORATIONInventors: Yong Jiang, Yuanyuan Li, Jianghong Du, Kuilin Chen, Thomas A. Tetzlaff
-
Patent number: 10942741Abstract: Software instructions are executed on a processor within a computer system to configure a steaming engine to operate in either a linear mode or a transpose mode. A stream of addresses is generated using an address generator, in which the stream of addresses includes consecutive nested loop iterations for at least a first loop and a second loop. While in the linear mode, the first loop is treated as an inner loop. While in the transpose mode, the second loop is treated as the inner loop. A matrix can be fetched from memory in the linear mode to provide row-wise vectors. A matrix can be fetched from the memory in the transpose mode to provide column wise vectors. Local storage on the streaming engine is organized as sectors based on the number of rows in the matrix to allow overlapping transposition processing and to minimize memory accesses.Type: GrantFiled: February 22, 2019Date of Patent: March 9, 2021Assignee: Texas Instruments IncorporatedInventors: Jonathan (Son) Hung Tran, Joseph Raymond Michael Zbiciak
-
Patent number: 10936315Abstract: In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.Type: GrantFiled: December 31, 2018Date of Patent: March 2, 2021Assignee: Texas Instruments IncorporatedInventors: Duc Quang Bui, Joseph Raymond Michael Zbiciak