Patents Examined by Daniel H. Pan

Computation engine with strided dot product

Patent number: 10642620

Abstract: In an embodiment, a computation engine may perform dot product computations on input vectors. The dot product operation may have a first operand and a second operand, and the dot product may be performed on a subset of the vector elements in the first operand and each of the vector elements in the second operand. The subset of vector elements may be separated in the first operand by a stride that skips one or more elements between each element to which the dot product operation is applied. More particularly, in an embodiment, the input operands of the dot product operation may be a first vector having second vectors as elements, and the stride may select a specified element of each second vector.

Type: Grant

Filed: April 5, 2018

Date of Patent: May 5, 2020

Assignee: Apple Inc.

Inventors: Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
Scatter reduction instruction

Patent number: 10635447

Abstract: Single Instruction, Multiple Data (SIMD) technologies are described. A processing device can include a processor core and a memory. The processor core can receive, from a software application, a request to perform an operation on a first set of variables that includes a first input value and a register value and perform the operation on a second set of variables that includes a second input value and the first register value. The processor core can vectorize the operation on the first set of variables and the second set of variables. The processor core can perform the operation on the first set of variables and the second set of variables in parallel to obtain a first operation value and a second operation value. The processor core can perform a horizontal add operation on the first operation value and the second operation value and write the result to memory.

Type: Grant

Filed: December 20, 2018

Date of Patent: April 28, 2020

Assignee: Intel Corporation

Inventors: Jun Jin, Elmoustapha Ould-Ahmed-Vall
Executing load-store operations without address translation hardware per load-store unit port

Patent number: 10628158

Abstract: Technical solutions are described for out-of-order (OoO) execution of one or more instructions by a processing unit includes receiving, by a load-store unit (LSU) of the processing unit, an OoO window of instructions including a plurality of instructions to be executed OoO, and issuing, by the LSU, instructions from the OoO window. The issuing includes selecting an instruction from the OoO window, the instruction using an effective address. Further, in response to the instruction being a load instruction, it is determined whether the effective address is present in an effective address directory (EAD). In response to the effective address being present in the EAD, the load instruction is issued using the effective address. Further, in response to the instruction being a store instruction, a real address mapped to the effective address is determined from an effective-real translation (ERT) table, and the store instruction is issued using the real address.

Type: Grant

Filed: November 29, 2017

Date of Patent: April 21, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Christopher Gonzalez, Bryan Lloyd, Balaram Sinharoy
Dynamic acceleration of data processor operations using data-flow analysis

Patent number: 10620954

Abstract: A method and apparatus are provided for dynamically determining when an operation, specified by one or more instructions in a data processing system, is suitable for accelerated execution. Data indicators are maintained, for data registers of the system, that indicate when data-flow from a register derives from a restricted source. In addition, instruction predicates are provided for instructions to indicate which instructions are capable of accelerated execution. From the data indicators and the instruction predicates, the microarchitecture of the data processing system determines, dynamically, when the operation is a thread-restricted function and suitable for accelerated execution in a hardware accelerator. The thread-restricted function may be executed on a hardware processor, such as a vector, neuromorphic or other processor.

Type: Grant

Filed: March 29, 2018

Date of Patent: April 14, 2020

Assignee: Arm Limited

Inventors: Jonathan Curtis Beard, Curtis Glenn Dunham, Alejandro Rico Carro
Predicting a table of contents pointer value responsive to branching to a subroutine

Patent number: 10620955

Abstract: Predicting a Table of Contents (TOC) pointer value responsive to branching to a subroutine. A subroutine is called from a calling module executing on a processor. Based on calling the subroutine, a value of a pointer to a reference data structure, such as a TOC, is predicted. The predicting is performed prior to executing a sequence of one or more instructions in the subroutine to compute the value. The value that is predicted is used to access the reference data structure to obtain a variable value for a variable of the subroutine.

Type: Grant

Filed: September 19, 2017

Date of Patent: April 14, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Michael K. Gschwind, Valentina Salapura
Apparatus and method for vector compression

Patent number: 10623015

Abstract: An apparatus and method are described for performing vector compression. For example, one embodiment of a processor comprises: vector compression logic to compress a source vector comprising a plurality of valid data elements and invalid data elements to generate a destination vector in which valid data elements are stored contiguously on one side of the destination vector, the vector compression logic to utilize a bit mask associated with the source vector and comprising a plurality of bits, each bit corresponding to one of the plurality of data elements of the source vector and indicating whether the data element comprises a valid data element or an invalid data element, the vector compression logic to utilize indices of the bit mask and associated bit values of the bit mask to generate a control vector; and shuffle logic to shuffle/permute the data elements of the source vector to the destination vector in accordance with the control vector.

Type: Grant

Filed: March 15, 2018

Date of Patent: April 14, 2020

Assignee: Intel Corporation

Inventors: Simon Rubanovich, David M. Russinoff, Amit Gradstein, John W. O'Leary, Zeev Sperber
Method for forming constant extensions in the same execute packet in a VLIW processor

Patent number: 10620957

Abstract: In a very long instruction word (VLIW) central processing unit instructions are grouped into execute packets that execute in parallel. A constant may be specified or extended by bits in a constant extension instruction in the same execute packet. If an instruction includes an indication of constant extension, the decoder employs bits of a constant extension instruction to extend the constant of an immediate field. Two or more constant extension slots are permitted in each execute packet, each extending constants for a different predetermined subset of functional unit instructions. In an alternative embodiment, more than one functional unit may have constants extended from the same constant extension instruction employing the same extended bits. A long extended constant may be formed using the extension bits of two constant extension instructions.

Type: Grant

Filed: October 22, 2015

Date of Patent: April 14, 2020

Assignee: TEXAS INSTRUMENTS INCORPORATED

Inventors: Timothy David Anderson, Duc Quang Bui, Joseph Raymond Zbiciak
Branch target address provision

Patent number: 10613869

Abstract: An apparatus and method of operating an apparatus are provided. The apparatus comprises execution circuitry to perform data processing operations specified by instructions and instruction retrieval circuitry to retrieve the instructions from memory, wherein the instructions comprise branch instructions. The instruction retrieval circuitry comprises branch target storage to store target instruction addresses for the branch instructions and branch target prefetch circuitry to prepopulate the branch target storage with predicted target instruction addresses for the branch instructions. An improved hit rate in the branch target storage may thereby be supported.

Type: Grant

Filed: March 29, 2018

Date of Patent: April 7, 2020

Assignee: ARM Limited

Inventors: Peter Richard Greenhalgh, Frederic Claude Marie Piry, Jose Gonzalez-Gonzalez
Dual data streams sharing dual level two cache access ports to maximize bandwidth utilization

Patent number: 10606598

Abstract: A streaming engine employed in a digital data processor specifies fixed first and second read only data streams. Corresponding stream address generator produces address of data elements of the two streams. Corresponding steam head registers stores data elements next to be supplied to functional units for use as operands. The two streams share two memory ports. A toggling preference of stream to port ensures fair allocation. The arbiters permit one stream to borrow the other's interface when the other interface is idle. Thus one stream may issue two memory requests, one from each memory port, if the other stream is idle. This spreads the bandwidth demand for each stream across both interfaces, ensuring neither interface becomes a bottleneck.

Type: Grant

Filed: September 24, 2018

Date of Patent: March 31, 2020

Assignee: TEXAS INSTRUMENTS INCORPORATED

Inventors: Joseph Zbiciak, Timothy Anderson
Cache preload operations using streaming engine

Patent number: 10606596

Abstract: A stream of data is accessed from a memory system using a stream of addresses generated in a first mode of operating a streaming engine in response to executing a first stream instruction. A block cache preload operation is performed on a cache in the memory using a block of addresses generated in a second mode of operating the streaming engine in response to executing a second stream instruction.

Type: Grant

Filed: November 28, 2018

Date of Patent: March 31, 2020

Assignee: TEXAS INSTRUMENTS INCORPORATED

Inventors: Joseph Raymond Michael Zbiciak, Timothy David Anderson, Jonathan (Son) Hung Tran, Kai Chirca, Daniel Wu, Abhijeet Ashok Chachad, David M. Thompson
Real time stack protection

Patent number: 10606771

Abstract: Methods, circuitries, and systems for real-time protection of a stack are provided. A stack protection circuitry includes interface circuitry and computation circuitry. The interface is circuitry configured to receive a return instruction from a central processing unit (CPU). The computation circuitry is configured to, in response to the return instruction, generate protection data that i) identifies a new topmost return address location that is below a current protected topmost return address location and ii) specifies read only access for the new topmost return address location. The interface circuitry is configured to provide the protection data to a memory protection unit to cause the memory protection unit to enforce a read only access restriction on the new topmost return address location.

Type: Grant

Filed: January 22, 2018

Date of Patent: March 31, 2020

Assignee: Infineon Technologies AG

Inventors: Sanjay Trivedi, Ramesh Babu
Apparatus and methods for vector operations

Patent number: 10599745

Abstract: Aspects for vector operations in neural network are described herein. The aspects may include a vector caching unit configured to store a first vector and a second vector, wherein the first vector includes one or more first elements and the second vector includes one or more second elements. The aspects may further include one or more adders and a combiner. The one or more adders may be configured to respectively add each of the first elements to a corresponding one of the second elements to generate one or more addition results. The combiner may be configured to combine a combiner configured to combine the one or more addition results into an output vector.

Type: Grant

Filed: October 26, 2018

Date of Patent: March 24, 2020

Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED

Inventors: Jinhua Tao, Tian Zhi, Shaoli Liu, Tianshi Chen, Yunji Chen
Managing backend resources via frontend steering or stalls

Patent number: 10599431

Abstract: Embodiments of the present invention provide a system for balancing a global completion table (GCT) in a microprocessor via frontend steering or stalls. A non-limiting example of the system includes an instruction dispatch unit (IDU) that includes an instruction queue and the system includes an instruction sequencing unit (ISU) that includes a GCT having a first area and a second area. The IDU is configured to determine whether a full group of instructions exist in the instruction queue and to determine whether additional instructions will be received by the instruction queue in a subsequent cycle. The IDU is configured to stall the instruction queue for at least one cycle until a full group of instructions is accumulated at the instruction queue upon determining that additional instructions will be received by the instruction queue in subsequent cycle.

Type: Grant

Filed: July 17, 2017

Date of Patent: March 24, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Gregory W. Alexander, David S. Hutton, Christian Jacobi, Edward T. Malley, Anthony Saporito
Cache management operations using streaming engine

Patent number: 10599433

Abstract: A stream of data is accessed from a memory system using a stream of addresses generated in a first mode of operating a streaming engine in response to executing a first stream instruction. A block cache management operation is performed on a cache in the memory using a block of addresses generated in a second mode of operating the streaming engine in response to executing a second stream instruction.

Type: Grant

Filed: November 28, 2018

Date of Patent: March 24, 2020

Assignee: TEXAS INSTRUMENTS INCORPORTED

Inventors: Joseph Raymond Michael Zbiciak, Timothy David Anderson, Jonathan (Son) Hung Tran, Kai Chirca, Daniel Wu, Abhijeet Ashok Chachad, David M. Thompson
Apparatus and methods for vector operations

Patent number: 10592582

Abstract: Aspects for vector operations in neural network are described herein. The aspects may include a vector caching unit configured to store a first vector and a second vector, wherein the first vector includes one or more first elements and the second vector includes one or more second elements. The aspects may further include one or more adders and a combiner. The one or more adders may be configured to respectively add each of the first elements to a corresponding one of the second elements to generate one or more addition results. The combiner may be configured to combine a combiner configured to combine the one or more addition results into an output vector.

Type: Grant

Filed: October 26, 2018

Date of Patent: March 17, 2020

Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED

Inventors: Jinhua Tao, Tian Zhi, Shaoli Liu, Tianshi Chen, Yunji Chen
Streaming engine with cache-like stream data storage and lifetime tracking

Patent number: 10592243

Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. The streaming engine fetches stream data ahead of use by the central processing unit core in a stream buffer constructed like a cache. The stream buffer cache includes plural cache lines, each includes tag bits, at least one valid bit and data bits. Cache lines are allocated to store newly fetched stream data. Cache lines are deallocated upon consumption of the data by a central processing unit core functional unit. Instructions preferably include operand fields with a first subset of codings corresponding to registers, a stream read only operand coding and a stream read and advance operand coding.

Type: Grant

Filed: September 10, 2018

Date of Patent: March 17, 2020

Assignee: TEXAS INSTRUMENTS INCORPORATED

Inventor: Joseph Zbiciak
Apparatus and methods for matrix multiplication

Patent number: 10592241

Abstract: Aspects for matrix multiplication in neural network are described herein. The aspects may include a master computation module configured to receive a first matrix and transmit a row vector of the first matrix. In addition, the aspects may include one or more slave computation modules respectively configured to store a column vector of a second matrix, receive the row vector of the first matrix, and multiply the row vector of the first matrix with the stored column vector of the second matrix to generate a result element. Further, the aspects may include an interconnection unit configured to combine the one or more result elements generated respectively by the one or more slave computation modules to generate a row vector of a result matrix and transmit the row vector of the result matrix to the master computation module.

Type: Grant

Filed: October 25, 2018

Date of Patent: March 17, 2020

Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED

Inventors: Xiao Zhang, Shaoli Liu, Tianshi Chen, Yunji Chen
Method and system for implementing recovery from speculative forwarding miss-predictions/errors resulting from load store reordering and optimization

Patent number: 10592300

Abstract: A method for forwarding data from the store instructions to a corresponding load instruction in an out of order processor. The method includes accessing an incoming sequence of instructions; reordering the instructions in accordance with processor resources for dispatch and execution; ensuring a closest earlier store in machine order for to a corresponding load, by determining if said store has an actual age but said corresponding load does not have an actual age, then said store is earlier than said corresponding load; if said corresponding load has an actual age but said store does not have an actual age, then said corresponding load is earlier than said store; if neither said corresponding load or said store have an actual age, then a virtual identifier table is used to determine which is earlier; and if both said corresponding load and said store have actual ages, then the actual ages are used to determine which is earlier.

Type: Grant

Filed: February 14, 2018

Date of Patent: March 17, 2020

Assignee: Intel Corporation

Inventor: Mohammad Abdallah
Streaming engine with error detection, correction and restart

Patent number: 10592339

Abstract: Disclosed embodiments relate to a streaming engine employed in, for example, a digital signal processor. A fixed data stream sequence including plural nested loops is specified by a control register. The streaming engine includes an address generator producing addresses of data elements and a steam head register storing data elements next to be supplied as operands. The streaming engine fetches stream data ahead of use by the central processing unit core in a stream buffer. Parity bits are formed upon storage of data in the stream buffer which are stored with the corresponding data. Upon transfer to the stream head register a second parity is calculated and compared with the stored parity. The streaming engine signals a parity fault if the parities do not match. The streaming engine preferably restarts fetching the data stream at the data element generating a parity fault.

Type: Grant

Filed: September 17, 2018

Date of Patent: March 17, 2020

Assignee: TEXAS INSTRUMENTS INCORPORATED

Inventors: Joseph Zbiciak, Timothy Anderson
Reconfigurable interconnected programmable processors

Patent number: 10592444

Abstract: A plurality of software programmable processors is disclosed. The software programmable processors are controlled by rotating circular buffers. A first processor and a second processor within the plurality of software programmable processors are individually programmable. The first processor within the plurality of software programmable processors is coupled to neighbor processors within the plurality of software programmable processors. The first processor sends and receives data from the neighbor processors. The first processor and the second processor are configured to operate on a common instruction cycle. An output of the first processor from a first instruction cycle is an input to the second processor on a subsequent instruction cycle.

Type: Grant

Filed: March 3, 2017

Date of Patent: March 17, 2020

Assignee: Wave Computing, Inc.

Inventors: Christopher John Nicol, Samit Chaudhuri, Radoslav Danilak

prev … 3 4 5 6 7 8 9 10 11 … next