Patents Examined by Daniel H. Pan
  • Patent number: 10339057
    Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template specifies loop count and loop dimension for each nested loop. A format definition field in the stream template specifies the number of loops and the stream template bits devoted to the loop counts and loop dimensions. This permits the same bits of the stream template to be interpreted differently enabling trade off between the number of loops supported and the size of the loop counts and loop dimensions.
    Type: Grant
    Filed: December 20, 2016
    Date of Patent: July 2, 2019
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventor: Joseph Zbiciak
  • Patent number: 10331447
    Abstract: Providing efficient recursion handling using compressed return address stacks (CRASs) in processor-based systems is disclosed. In one aspect, a processor-based system provides a branch prediction circuit including a CRAS. Each CRAS entry within the CRAS includes an address field and a counter field. When a call instruction is encountered, a return address of the call instruction is compared to the address field of a top CRAS entry indicated by a CRAS top-of-stack (TOS) index. If the return address matches the top CRAS entry, the counter field of the top CRAS entry is incremented instead of adding a new CRAS entry for the return address. When a return instruction is subsequently encountered in the instruction stream, the counter field of the top CRAS entry is decremented if its value is greater than zero (0), or, if not, the top CRAS entry is removed from the CRAS.
    Type: Grant
    Filed: August 30, 2017
    Date of Patent: June 25, 2019
    Assignee: QUALCOMM Incorporated
    Inventors: Vignyan Reddy Kothinti Naresh, Anil Krishna
  • Patent number: 10318433
    Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template register independently specifies a linear address or a circular address mode for each of the nested loops.
    Type: Grant
    Filed: December 20, 2016
    Date of Patent: June 11, 2019
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventor: Joseph Zbiciak
  • Patent number: 10318303
    Abstract: A method and apparatus for performing branch prediction is disclosed. A branch predictor includes a history buffer configured to store a branch history table indicative of a history of a plurality of previously fetched branch instructions. The branch predictor also includes a branch target cache (BTC) configured to store branch target addresses for fetch addresses that have been identified as including branch instructions but have not yet been predicted. A hash circuit is configured to form a hash of a fetch address, history information received from the history buffer, and hit information received from the BTC, wherein the fetch address includes a branch instruction. A branch prediction unit (BPU) configured to generate a branch prediction for the branch instruction included in the fetch address based on the hash formed from the fetch address, history information, and BTC hit information.
    Type: Grant
    Filed: March 28, 2017
    Date of Patent: June 11, 2019
    Assignee: Oracle International Corporation
    Inventors: Manish Shah, Jared Smolens
  • Patent number: 10310855
    Abstract: Embodiments relate to non-default instruction handling within a transaction. An aspect includes entering a transaction, the transaction comprising a first plurality of instructions and a second plurality of instructions, wherein a default manner of handling of instructions in the transaction is one of atomic and non-atomic. Another aspect includes encountering a non-default specification instruction in the transaction, wherein the non-default specification instruction comprises a single instruction that specifies the second plurality of instructions of the transaction for handling in a non-default manner comprising one of atomic and non-atomic, wherein the non-default manner is different from the default manner. Another aspect includes handling the first plurality of instructions in the default manner. Yet another aspect includes handling the second plurality of instructions in the non-default manner.
    Type: Grant
    Filed: September 4, 2015
    Date of Patent: June 4, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Michael K. Gschwind, Maged M. Michael, Eric M. Schwarz, Valentina Salapura, Chung-Lung K. Shum
  • Patent number: 10310862
    Abstract: Data processing circuitry comprises out-of-order instruction execution circuitry to execute program instructions in an instruction execution order; a data store, to store information on a set of instructions for which execution has been initiated, the data store providing ordering information indicating the relative position of each instruction in the set of instructions with respect to a program code order; commit circuitry to commit the results of instructions executed by the instruction execution circuitry; one or more cumulative status registers configured to be set in response to a respective condition generated by execution of an instruction and then to remain set until an unset instruction is executed; and an identifier store, to store for at least those of the one or more cumulative status registers which are not currently set, an identifier of an instruction which is earliest in the program code order in the set of instructions and which generated a condition to set that cumulative status register.
    Type: Grant
    Filed: March 21, 2017
    Date of Patent: June 4, 2019
    Assignee: ARM Limited
    Inventors: Robert Greg McDonald, Michael Filippo, Glen Andrew Harris
  • Patent number: 10310546
    Abstract: The present invention provides an arbitrary waveform generator based on instruction architecture. To deal with the feature that the instructions and waveform data of the AWG are coupled in the prior art, an instruction set based waveform synthesis controller is employed, and substitutes for the sequence wave generator in the present invention, i.e. an arbitrary waveform generator based on instruction architecture. Thus the time-sharing scheduling in reading the waveform synthesis instruction and the segment waveform data is realized, and the complexity of the hardware is reduced, so that the AWG in present invention can synthesize and generate a complex sequence wave rapidly and efficiently.
    Type: Grant
    Filed: October 11, 2017
    Date of Patent: June 4, 2019
    Assignee: UNIVERSITY OF ELECTRONIC SCIENCE AND TECHNOLOGY OF CHINA
    Inventors: Yindong Xiao, Guangkun Guo, Ke Liu, Junwu Zhang, Houjun Wang, Jianguo Huang, Shulin Tian
  • Patent number: 10303474
    Abstract: Embodiments of the present invention provide a data read/write method and apparatus, a storage device, and a computer system, so as to reduce completion time of a data read/write operation in a multi-core computer system. The method includes: determining, by a host device, N cores used for executing a target process, where the N cores are in a one-to-one correspondence with N execution threads included in the target process; grouping the N execution threads to determine M execution thread groups, and allocating an indication identifier to each execution thread group; and sending M data read/write instructions to a storage device, where each data read/write instruction includes an indication identifier of a corresponding execution thread group.
    Type: Grant
    Filed: March 17, 2017
    Date of Patent: May 28, 2019
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Huilian Yang, Lei Lu, Dai Shi
  • Patent number: 10303482
    Abstract: A dynamic processor frequency selection system includes a memory and a processor in communication with the memory. The processor includes a dynamic processor frequency selection module, a branch predictor module, a measurement module, and a power module. The measurement module measures a value according to a looping prediction function, which represents a quantity of cycles spent waiting for a type of contended resource within an instruction sequence. Additionally, the processor retains the value and branch history information, which is used to predict a waiting period associated with a potential loop. Then, the dynamic processor frequency selection module predicts the potential loop in a subsequent instruction according to the type of contended resource. The power module dynamically reduces a processor frequency during the waiting period from a first frequency state to a second frequency state according to the potential loop prediction. Then, the processor resumes operation at the first frequency state.
    Type: Grant
    Filed: March 7, 2017
    Date of Patent: May 28, 2019
    Assignee: Red Hat, Inc.
    Inventor: Jonathan Charles Masters
  • Patent number: 10303471
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector double block packed sum of absolute differences (SAD) in response to a single vector double block packed sum of absolute differences instruction that includes a destination vector register operand, first and second source operands, an immediate, and an opcode are described.
    Type: Grant
    Filed: February 28, 2017
    Date of Patent: May 28, 2019
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Mostafa Hagog, Robert Valentine, Amit Gradstein, Simon Rubanovich, Zeev Sperber
  • Patent number: 10296351
    Abstract: An apparatus includes a processor and a coprocessor. The processor may be configured to generate a command to run a directed acyclic graph. The coprocessor may be configured to (i) receive the command from the processor, (ii) parse the directed acyclic graph into a data flow including one or more operators, (iii) schedule the operators in one or more data paths and (iv) generate one or more output vectors by processing one or more input vectors in the data paths. The data paths may be implemented with a plurality of hardware engines. The hardware engines may operate in parallel to each other. The coprocessor may be implemented solely in hardware.
    Type: Grant
    Filed: March 15, 2017
    Date of Patent: May 21, 2019
    Assignee: Ambarella, Inc.
    Inventors: Leslie D. Kohn, Robert C. Kunz
  • Patent number: 10296343
    Abstract: A processing device including a first shadow register, a second shadow register, and an instruction execution circuit, communicatively coupled to the first shadow register and the second shadow register, to receive a sequence of instructions comprising a first local commit marker, a first global commit marker, and a first register access instruction referencing an architectural register, speculatively execute the first register access instruction to generate a speculative register state value associated with a physical register, responsive to identifying the first local commit marker, store, in the first shadow register, the speculative register state value, and responsive to identifying the first global commit marker, store, in the second shadow register, the speculative register state value.
    Type: Grant
    Filed: March 30, 2017
    Date of Patent: May 21, 2019
    Assignee: Intel Corporation
    Inventors: Vineeth Mekkat, Jason M. Agron, Youfeng Wu
  • Patent number: 10275254
    Abstract: A Spin Loop Delay instruction. The instruction has a field associated therewith that indicates one or more conditions to be checked. Dispatching of the instruction is initially delayed. The instruction is subsequently dispatched based on a timeout, provided the instruction has not been previously dispatched based on meeting at least one condition of the one or more conditions to be checked.
    Type: Grant
    Filed: March 8, 2017
    Date of Patent: April 30, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Fadi Y. Busaba, Christian Jacobi, Anthony Saporito, Eric M. Schwarz, Timothy J. Slegel
  • Patent number: 10275250
    Abstract: An apparatus comprises processing circuitry for executing instructions of two or more threads of processing, hardware registers to store context data for the two or more threads concurrently, and commit circuitry to commit results of executed instructions of the threads, where for each thread the commit circuitry commits the instructions of that thread in program order. At least one defer buffer is provided to buffer at least one blocked instruction for which execution by the processing circuitry is complete but execution of an earlier instruction of the same thread in the program order is incomplete. This can help to resolve inter-thread blocking and hence improve performance.
    Type: Grant
    Filed: March 6, 2017
    Date of Patent: April 30, 2019
    Assignee: ARM Limited
    Inventors: Jose Alberto Joao, Ziqiang Huang, Alejandro Rico Carro
  • Patent number: 10261789
    Abstract: A data processing apparatus and a method of controlling performance of speculative vector operations are provided. The apparatus comprises processing circuitry for performing a sequence of speculative vector operations on vector operands, each vector operand comprising a plurality of vector elements, and speculation control circuitry for maintaining a speculation width indication indicating the number of vector elements of each vector operand to be subjected to the speculative vector operations. The speculation width indication is set to an initial value prior to performance of the sequence of speculative vector operations. The processing circuitry generates progress indications during performance of the sequence of speculative vector operations, and the speculation control circuitry detects, with reference to the progress indications and speculation reduction criteria, presence of a speculation reduction condition.
    Type: Grant
    Filed: August 18, 2014
    Date of Patent: April 16, 2019
    Assignee: ARM Limited
    Inventors: Alastair David Reid, Daniel Kershaw
  • Patent number: 10261786
    Abstract: A vector processing unit is described, and includes processor units that each include multiple processing resources. The processor units are each configured to perform arithmetic operations associated with vectorized computations. The vector processing unit includes a vector memory in data communication with each of the processor units and their respective processing resources. The vector memory includes memory banks configured to store data used by each of the processor units to perform the arithmetic operations. The processor units and the vector memory are tightly coupled within an area of the vector processing unit such that data communications are exchanged at a high bandwidth based on the placement of respective processor units relative to one another, and based on the placement of the vector memory relative to each processor unit.
    Type: Grant
    Filed: March 9, 2017
    Date of Patent: April 16, 2019
    Assignee: Google LLC
    Inventors: William Lacy, Gregory Michael Thorson, Christopher Aaron Clark, Norman Paul Jouppi, Thomas Norrie, Andrew Everett Phelps
  • Patent number: 10255068
    Abstract: A selected boundary of memory to be used in processing an instruction is dynamically selected, based on a predictor. The instruction is decoded, and the decoding provides a sequence of operations to perform a specified operation. The sequence of operations includes a load to boundary operation to load data up to the selected boundary of memory. The data is loaded as part of the specified operation.
    Type: Grant
    Filed: March 3, 2017
    Date of Patent: April 9, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Michael K. Gschwind
  • Patent number: 10241789
    Abstract: An apparatus includes a binary translator to hoist a load instruction in a branch of a conditional statement above the conditional statement and insert a speculation control of load (SCL) instruction in a complementary branch of the conditional statement, where the SCL instruction provides an indication of a real program order (RPO) of the load instruction before the load instruction was hoisted. The apparatus further includes an execution circuit to execute the load instruction to perform a load and cause an entry for the load instruction to be inserted in an ordering buffer, and where the execution circuit is to execute the SCL instruction to locate the entry for the load instruction in the ordering buffer using the RPO of the load instruction provided by the SCL instruction and discard the entry for the load instruction from the ordering buffer.
    Type: Grant
    Filed: December 27, 2016
    Date of Patent: March 26, 2019
    Assignee: INTEL CORPORATION
    Inventors: Alexander Y. Ostanevich, Sergey P. Scherbinin, Jayesh Iyer, Dmitry M. Maslennikov, Denis G. Motin, Alexander V. Ermolovich, Andrey Chudnovets, Sergey A. Rozhkov, Boris A. Babayan
  • Patent number: 10241801
    Abstract: An apparatus includes a register file and a binary translator to create a plurality of strands and a plurality of iteration windows, where each iteration window of the plurality of iteration windows is allocated a set of continuous registers of the register file. The apparatus further includes a buffer to store strand documentation for a strand from the plurality of strands, where the strand documentation for the strand is to include an indication of a current register base for the strand. The apparatus further includes an execution circuit to execute an instruction to update the current register base for the strand in the strand documentation for the strand based on a fixed step value and an iteration window size.
    Type: Grant
    Filed: December 23, 2016
    Date of Patent: March 26, 2019
    Assignee: INTEL CORPORATION
    Inventors: Jayesh Iyer, Sergey P. Scherbinin, Alexander Y. Ostanevich, Dmitry M. Maslennikov, Denis G. Motin, Alexander V. Ermolovich, Andrey Chudnovets, Sergey A. Rozhkov, Boris A. Babayan
  • Patent number: 10241794
    Abstract: Embodiments described herein generally relate to the field of multi-strand out-of-order loop processing, and, more specifically, to apparatus and methods to support counted loop exits in a multi-strand loop processor. In one embodiment, a processor includes a loop accelerator comprising a strand documentation buffer and a plurality of strand execution circuits; and a binary translator to receive a plurality of loop instructions, divide the plurality of loop instructions into a plurality of strands, and store a strand documentation for each of the plurality of strands into the strand documentation buffer, each strand documentation indicating at least a number of iterations; wherein the binary translator further causes the loop accelerator to execute the plurality of strands asynchronously and in parallel using the plurality of strand execution circuits, wherein each of the strand execution circuits repeats the strand for the number of iterations indicated in the strand documentation associated with the strand.
    Type: Grant
    Filed: December 27, 2016
    Date of Patent: March 26, 2019
    Assignee: Intel Corporation
    Inventors: Sergey P. Scherbinin, Jayesh Iyer, Alexander Y. Ostanevich, Dmitry Maslennikov, Denis G. Motin, Alexander V. Ermolovich, Andrey Chudnovets, Sergey A. Rozhkov, Boris A. Babayan