Patents Examined by Daniel H. Pan
  • Patent number: 10642620
    Abstract: In an embodiment, a computation engine may perform dot product computations on input vectors. The dot product operation may have a first operand and a second operand, and the dot product may be performed on a subset of the vector elements in the first operand and each of the vector elements in the second operand. The subset of vector elements may be separated in the first operand by a stride that skips one or more elements between each element to which the dot product operation is applied. More particularly, in an embodiment, the input operands of the dot product operation may be a first vector having second vectors as elements, and the stride may select a specified element of each second vector.
    Type: Grant
    Filed: April 5, 2018
    Date of Patent: May 5, 2020
    Assignee: Apple Inc.
    Inventors: Tal Uliel, Eric Bainville, Jeffry E. Gonion, Ali Sazegari
  • Patent number: 10635447
    Abstract: Single Instruction, Multiple Data (SIMD) technologies are described. A processing device can include a processor core and a memory. The processor core can receive, from a software application, a request to perform an operation on a first set of variables that includes a first input value and a register value and perform the operation on a second set of variables that includes a second input value and the first register value. The processor core can vectorize the operation on the first set of variables and the second set of variables. The processor core can perform the operation on the first set of variables and the second set of variables in parallel to obtain a first operation value and a second operation value. The processor core can perform a horizontal add operation on the first operation value and the second operation value and write the result to memory.
    Type: Grant
    Filed: December 20, 2018
    Date of Patent: April 28, 2020
    Assignee: Intel Corporation
    Inventors: Jun Jin, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 10628158
    Abstract: Technical solutions are described for out-of-order (OoO) execution of one or more instructions by a processing unit includes receiving, by a load-store unit (LSU) of the processing unit, an OoO window of instructions including a plurality of instructions to be executed OoO, and issuing, by the LSU, instructions from the OoO window. The issuing includes selecting an instruction from the OoO window, the instruction using an effective address. Further, in response to the instruction being a load instruction, it is determined whether the effective address is present in an effective address directory (EAD). In response to the effective address being present in the EAD, the load instruction is issued using the effective address. Further, in response to the instruction being a store instruction, a real address mapped to the effective address is determined from an effective-real translation (ERT) table, and the store instruction is issued using the real address.
    Type: Grant
    Filed: November 29, 2017
    Date of Patent: April 21, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Christopher Gonzalez, Bryan Lloyd, Balaram Sinharoy
  • Patent number: 10620954
    Abstract: A method and apparatus are provided for dynamically determining when an operation, specified by one or more instructions in a data processing system, is suitable for accelerated execution. Data indicators are maintained, for data registers of the system, that indicate when data-flow from a register derives from a restricted source. In addition, instruction predicates are provided for instructions to indicate which instructions are capable of accelerated execution. From the data indicators and the instruction predicates, the microarchitecture of the data processing system determines, dynamically, when the operation is a thread-restricted function and suitable for accelerated execution in a hardware accelerator. The thread-restricted function may be executed on a hardware processor, such as a vector, neuromorphic or other processor.
    Type: Grant
    Filed: March 29, 2018
    Date of Patent: April 14, 2020
    Assignee: Arm Limited
    Inventors: Jonathan Curtis Beard, Curtis Glenn Dunham, Alejandro Rico Carro
  • Patent number: 10620955
    Abstract: Predicting a Table of Contents (TOC) pointer value responsive to branching to a subroutine. A subroutine is called from a calling module executing on a processor. Based on calling the subroutine, a value of a pointer to a reference data structure, such as a TOC, is predicted. The predicting is performed prior to executing a sequence of one or more instructions in the subroutine to compute the value. The value that is predicted is used to access the reference data structure to obtain a variable value for a variable of the subroutine.
    Type: Grant
    Filed: September 19, 2017
    Date of Patent: April 14, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Patent number: 10623015
    Abstract: An apparatus and method are described for performing vector compression. For example, one embodiment of a processor comprises: vector compression logic to compress a source vector comprising a plurality of valid data elements and invalid data elements to generate a destination vector in which valid data elements are stored contiguously on one side of the destination vector, the vector compression logic to utilize a bit mask associated with the source vector and comprising a plurality of bits, each bit corresponding to one of the plurality of data elements of the source vector and indicating whether the data element comprises a valid data element or an invalid data element, the vector compression logic to utilize indices of the bit mask and associated bit values of the bit mask to generate a control vector; and shuffle logic to shuffle/permute the data elements of the source vector to the destination vector in accordance with the control vector.
    Type: Grant
    Filed: March 15, 2018
    Date of Patent: April 14, 2020
    Assignee: Intel Corporation
    Inventors: Simon Rubanovich, David M. Russinoff, Amit Gradstein, John W. O'Leary, Zeev Sperber
  • Patent number: 10620957
    Abstract: In a very long instruction word (VLIW) central processing unit instructions are grouped into execute packets that execute in parallel. A constant may be specified or extended by bits in a constant extension instruction in the same execute packet. If an instruction includes an indication of constant extension, the decoder employs bits of a constant extension instruction to extend the constant of an immediate field. Two or more constant extension slots are permitted in each execute packet, each extending constants for a different predetermined subset of functional unit instructions. In an alternative embodiment, more than one functional unit may have constants extended from the same constant extension instruction employing the same extended bits. A long extended constant may be formed using the extension bits of two constant extension instructions.
    Type: Grant
    Filed: October 22, 2015
    Date of Patent: April 14, 2020
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Timothy David Anderson, Duc Quang Bui, Joseph Raymond Zbiciak
  • Patent number: 10613869
    Abstract: An apparatus and method of operating an apparatus are provided. The apparatus comprises execution circuitry to perform data processing operations specified by instructions and instruction retrieval circuitry to retrieve the instructions from memory, wherein the instructions comprise branch instructions. The instruction retrieval circuitry comprises branch target storage to store target instruction addresses for the branch instructions and branch target prefetch circuitry to prepopulate the branch target storage with predicted target instruction addresses for the branch instructions. An improved hit rate in the branch target storage may thereby be supported.
    Type: Grant
    Filed: March 29, 2018
    Date of Patent: April 7, 2020
    Assignee: ARM Limited
    Inventors: Peter Richard Greenhalgh, Frederic Claude Marie Piry, Jose Gonzalez-Gonzalez
  • Patent number: 10606598
    Abstract: A streaming engine employed in a digital data processor specifies fixed first and second read only data streams. Corresponding stream address generator produces address of data elements of the two streams. Corresponding steam head registers stores data elements next to be supplied to functional units for use as operands. The two streams share two memory ports. A toggling preference of stream to port ensures fair allocation. The arbiters permit one stream to borrow the other's interface when the other interface is idle. Thus one stream may issue two memory requests, one from each memory port, if the other stream is idle. This spreads the bandwidth demand for each stream across both interfaces, ensuring neither interface becomes a bottleneck.
    Type: Grant
    Filed: September 24, 2018
    Date of Patent: March 31, 2020
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Joseph Zbiciak, Timothy Anderson
  • Patent number: 10606596
    Abstract: A stream of data is accessed from a memory system using a stream of addresses generated in a first mode of operating a streaming engine in response to executing a first stream instruction. A block cache preload operation is performed on a cache in the memory using a block of addresses generated in a second mode of operating the streaming engine in response to executing a second stream instruction.
    Type: Grant
    Filed: November 28, 2018
    Date of Patent: March 31, 2020
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Joseph Raymond Michael Zbiciak, Timothy David Anderson, Jonathan (Son) Hung Tran, Kai Chirca, Daniel Wu, Abhijeet Ashok Chachad, David M. Thompson
  • Patent number: 10606771
    Abstract: Methods, circuitries, and systems for real-time protection of a stack are provided. A stack protection circuitry includes interface circuitry and computation circuitry. The interface is circuitry configured to receive a return instruction from a central processing unit (CPU). The computation circuitry is configured to, in response to the return instruction, generate protection data that i) identifies a new topmost return address location that is below a current protected topmost return address location and ii) specifies read only access for the new topmost return address location. The interface circuitry is configured to provide the protection data to a memory protection unit to cause the memory protection unit to enforce a read only access restriction on the new topmost return address location.
    Type: Grant
    Filed: January 22, 2018
    Date of Patent: March 31, 2020
    Assignee: Infineon Technologies AG
    Inventors: Sanjay Trivedi, Ramesh Babu
  • Patent number: 10599745
    Abstract: Aspects for vector operations in neural network are described herein. The aspects may include a vector caching unit configured to store a first vector and a second vector, wherein the first vector includes one or more first elements and the second vector includes one or more second elements. The aspects may further include one or more adders and a combiner. The one or more adders may be configured to respectively add each of the first elements to a corresponding one of the second elements to generate one or more addition results. The combiner may be configured to combine a combiner configured to combine the one or more addition results into an output vector.
    Type: Grant
    Filed: October 26, 2018
    Date of Patent: March 24, 2020
    Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED
    Inventors: Jinhua Tao, Tian Zhi, Shaoli Liu, Tianshi Chen, Yunji Chen
  • Patent number: 10599431
    Abstract: Embodiments of the present invention provide a system for balancing a global completion table (GCT) in a microprocessor via frontend steering or stalls. A non-limiting example of the system includes an instruction dispatch unit (IDU) that includes an instruction queue and the system includes an instruction sequencing unit (ISU) that includes a GCT having a first area and a second area. The IDU is configured to determine whether a full group of instructions exist in the instruction queue and to determine whether additional instructions will be received by the instruction queue in a subsequent cycle. The IDU is configured to stall the instruction queue for at least one cycle until a full group of instructions is accumulated at the instruction queue upon determining that additional instructions will be received by the instruction queue in subsequent cycle.
    Type: Grant
    Filed: July 17, 2017
    Date of Patent: March 24, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Gregory W. Alexander, David S. Hutton, Christian Jacobi, Edward T. Malley, Anthony Saporito
  • Patent number: 10599433
    Abstract: A stream of data is accessed from a memory system using a stream of addresses generated in a first mode of operating a streaming engine in response to executing a first stream instruction. A block cache management operation is performed on a cache in the memory using a block of addresses generated in a second mode of operating the streaming engine in response to executing a second stream instruction.
    Type: Grant
    Filed: November 28, 2018
    Date of Patent: March 24, 2020
    Assignee: TEXAS INSTRUMENTS INCORPORTED
    Inventors: Joseph Raymond Michael Zbiciak, Timothy David Anderson, Jonathan (Son) Hung Tran, Kai Chirca, Daniel Wu, Abhijeet Ashok Chachad, David M. Thompson
  • Patent number: 10592582
    Abstract: Aspects for vector operations in neural network are described herein. The aspects may include a vector caching unit configured to store a first vector and a second vector, wherein the first vector includes one or more first elements and the second vector includes one or more second elements. The aspects may further include one or more adders and a combiner. The one or more adders may be configured to respectively add each of the first elements to a corresponding one of the second elements to generate one or more addition results. The combiner may be configured to combine a combiner configured to combine the one or more addition results into an output vector.
    Type: Grant
    Filed: October 26, 2018
    Date of Patent: March 17, 2020
    Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED
    Inventors: Jinhua Tao, Tian Zhi, Shaoli Liu, Tianshi Chen, Yunji Chen
  • Patent number: 10592243
    Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. The streaming engine fetches stream data ahead of use by the central processing unit core in a stream buffer constructed like a cache. The stream buffer cache includes plural cache lines, each includes tag bits, at least one valid bit and data bits. Cache lines are allocated to store newly fetched stream data. Cache lines are deallocated upon consumption of the data by a central processing unit core functional unit. Instructions preferably include operand fields with a first subset of codings corresponding to registers, a stream read only operand coding and a stream read and advance operand coding.
    Type: Grant
    Filed: September 10, 2018
    Date of Patent: March 17, 2020
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventor: Joseph Zbiciak
  • Patent number: 10592241
    Abstract: Aspects for matrix multiplication in neural network are described herein. The aspects may include a master computation module configured to receive a first matrix and transmit a row vector of the first matrix. In addition, the aspects may include one or more slave computation modules respectively configured to store a column vector of a second matrix, receive the row vector of the first matrix, and multiply the row vector of the first matrix with the stored column vector of the second matrix to generate a result element. Further, the aspects may include an interconnection unit configured to combine the one or more result elements generated respectively by the one or more slave computation modules to generate a row vector of a result matrix and transmit the row vector of the result matrix to the master computation module.
    Type: Grant
    Filed: October 25, 2018
    Date of Patent: March 17, 2020
    Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED
    Inventors: Xiao Zhang, Shaoli Liu, Tianshi Chen, Yunji Chen
  • Patent number: 10592300
    Abstract: A method for forwarding data from the store instructions to a corresponding load instruction in an out of order processor. The method includes accessing an incoming sequence of instructions; reordering the instructions in accordance with processor resources for dispatch and execution; ensuring a closest earlier store in machine order for to a corresponding load, by determining if said store has an actual age but said corresponding load does not have an actual age, then said store is earlier than said corresponding load; if said corresponding load has an actual age but said store does not have an actual age, then said corresponding load is earlier than said store; if neither said corresponding load or said store have an actual age, then a virtual identifier table is used to determine which is earlier; and if both said corresponding load and said store have actual ages, then the actual ages are used to determine which is earlier.
    Type: Grant
    Filed: February 14, 2018
    Date of Patent: March 17, 2020
    Assignee: Intel Corporation
    Inventor: Mohammad Abdallah
  • Patent number: 10592339
    Abstract: Disclosed embodiments relate to a streaming engine employed in, for example, a digital signal processor. A fixed data stream sequence including plural nested loops is specified by a control register. The streaming engine includes an address generator producing addresses of data elements and a steam head register storing data elements next to be supplied as operands. The streaming engine fetches stream data ahead of use by the central processing unit core in a stream buffer. Parity bits are formed upon storage of data in the stream buffer which are stored with the corresponding data. Upon transfer to the stream head register a second parity is calculated and compared with the stored parity. The streaming engine signals a parity fault if the parities do not match. The streaming engine preferably restarts fetching the data stream at the data element generating a parity fault.
    Type: Grant
    Filed: September 17, 2018
    Date of Patent: March 17, 2020
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Joseph Zbiciak, Timothy Anderson
  • Patent number: 10592444
    Abstract: A plurality of software programmable processors is disclosed. The software programmable processors are controlled by rotating circular buffers. A first processor and a second processor within the plurality of software programmable processors are individually programmable. The first processor within the plurality of software programmable processors is coupled to neighbor processors within the plurality of software programmable processors. The first processor sends and receives data from the neighbor processors. The first processor and the second processor are configured to operate on a common instruction cycle. An output of the first processor from a first instruction cycle is an input to the second processor on a subsequent instruction cycle.
    Type: Grant
    Filed: March 3, 2017
    Date of Patent: March 17, 2020
    Assignee: Wave Computing, Inc.
    Inventors: Christopher John Nicol, Samit Chaudhuri, Radoslav Danilak