Patents Examined by Daniel H. Pan

Streaming engine with flexible streaming engine template supporting differing number of nested loops with corresponding loop counts and loop offsets

Patent number: 10339057

Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template specifies loop count and loop dimension for each nested loop. A format definition field in the stream template specifies the number of loops and the stream template bits devoted to the loop counts and loop dimensions. This permits the same bits of the stream template to be interpreted differently enabling trade off between the number of loops supported and the size of the loop counts and loop dimensions.

Type: Grant

Filed: December 20, 2016

Date of Patent: July 2, 2019

Assignee: TEXAS INSTRUMENTS INCORPORATED

Inventor: Joseph Zbiciak
Providing efficient recursion handling using compressed return address stacks (CRASs) in processor-based systems

Patent number: 10331447

Abstract: Providing efficient recursion handling using compressed return address stacks (CRASs) in processor-based systems is disclosed. In one aspect, a processor-based system provides a branch prediction circuit including a CRAS. Each CRAS entry within the CRAS includes an address field and a counter field. When a call instruction is encountered, a return address of the call instruction is compared to the address field of a top CRAS entry indicated by a CRAS top-of-stack (TOS) index. If the return address matches the top CRAS entry, the counter field of the top CRAS entry is incremented instead of adding a new CRAS entry for the return address. When a return instruction is subsequently encountered in the instruction stream, the counter field of the top CRAS entry is decremented if its value is greater than zero (0), or, if not, the top CRAS entry is removed from the CRAS.

Type: Grant

Filed: August 30, 2017

Date of Patent: June 25, 2019

Assignee: QUALCOMM Incorporated

Inventors: Vignyan Reddy Kothinti Naresh, Anil Krishna
Streaming engine with multi dimensional circular addressing selectable at each dimension

Patent number: 10318433

Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template register independently specifies a linear address or a circular address mode for each of the nested loops.

Type: Grant

Filed: December 20, 2016

Date of Patent: June 11, 2019

Assignee: TEXAS INSTRUMENTS INCORPORATED

Inventor: Joseph Zbiciak
Method and apparatus for augmentation and disambiguation of branch history in pipelined branch predictors

Patent number: 10318303

Abstract: A method and apparatus for performing branch prediction is disclosed. A branch predictor includes a history buffer configured to store a branch history table indicative of a history of a plurality of previously fetched branch instructions. The branch predictor also includes a branch target cache (BTC) configured to store branch target addresses for fetch addresses that have been identified as including branch instructions but have not yet been predicted. A hash circuit is configured to form a hash of a fetch address, history information received from the history buffer, and hit information received from the BTC, wherein the fetch address includes a branch instruction. A branch prediction unit (BPU) configured to generate a branch prediction for the branch instruction included in the fetch address based on the hash formed from the fetch address, history information, and BTC hit information.

Type: Grant

Filed: March 28, 2017

Date of Patent: June 11, 2019

Assignee: Oracle International Corporation

Inventors: Manish Shah, Jared Smolens
Non-default instruction handling within transaction

Patent number: 10310855

Abstract: Embodiments relate to non-default instruction handling within a transaction. An aspect includes entering a transaction, the transaction comprising a first plurality of instructions and a second plurality of instructions, wherein a default manner of handling of instructions in the transaction is one of atomic and non-atomic. Another aspect includes encountering a non-default specification instruction in the transaction, wherein the non-default specification instruction comprises a single instruction that specifies the second plurality of instructions of the transaction for handling in a non-default manner comprising one of atomic and non-atomic, wherein the non-default manner is different from the default manner. Another aspect includes handling the first plurality of instructions in the default manner. Yet another aspect includes handling the second plurality of instructions in the non-default manner.

Type: Grant

Filed: September 4, 2015

Date of Patent: June 4, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jonathan D. Bradbury, Michael K. Gschwind, Maged M. Michael, Eric M. Schwarz, Valentina Salapura, Chung-Lung K. Shum
Data processing

Patent number: 10310862

Abstract: Data processing circuitry comprises out-of-order instruction execution circuitry to execute program instructions in an instruction execution order; a data store, to store information on a set of instructions for which execution has been initiated, the data store providing ordering information indicating the relative position of each instruction in the set of instructions with respect to a program code order; commit circuitry to commit the results of instructions executed by the instruction execution circuitry; one or more cumulative status registers configured to be set in response to a respective condition generated by execution of an instruction and then to remain set until an unset instruction is executed; and an identifier store, to store for at least those of the one or more cumulative status registers which are not currently set, an identifier of an instruction which is earliest in the program code order in the set of instructions and which generated a condition to set that cumulative status register.

Type: Grant

Filed: March 21, 2017

Date of Patent: June 4, 2019

Assignee: ARM Limited

Inventors: Robert Greg McDonald, Michael Filippo, Glen Andrew Harris
Arbitrary waveform generator based on instruction architecture

Patent number: 10310546

Abstract: The present invention provides an arbitrary waveform generator based on instruction architecture. To deal with the feature that the instructions and waveform data of the AWG are coupled in the prior art, an instruction set based waveform synthesis controller is employed, and substitutes for the sequence wave generator in the present invention, i.e. an arbitrary waveform generator based on instruction architecture. Thus the time-sharing scheduling in reading the waveform synthesis instruction and the segment waveform data is realized, and the complexity of the hardware is reduced, so that the AWG in present invention can synthesize and generate a complex sequence wave rapidly and efficiently.

Type: Grant

Filed: October 11, 2017

Date of Patent: June 4, 2019

Assignee: UNIVERSITY OF ELECTRONIC SCIENCE AND TECHNOLOGY OF CHINA

Inventors: Yindong Xiao, Guangkun Guo, Ke Liu, Junwu Zhang, Houjun Wang, Jianguo Huang, Shulin Tian
Data read/write method and apparatus, storage device, and computer system

Patent number: 10303474

Abstract: Embodiments of the present invention provide a data read/write method and apparatus, a storage device, and a computer system, so as to reduce completion time of a data read/write operation in a multi-core computer system. The method includes: determining, by a host device, N cores used for executing a target process, where the N cores are in a one-to-one correspondence with N execution threads included in the target process; grouping the N execution threads to determine M execution thread groups, and allocating an indication identifier to each execution thread group; and sending M data read/write instructions to a storage device, where each data read/write instruction includes an indication identifier of a corresponding execution thread group.

Type: Grant

Filed: March 17, 2017

Date of Patent: May 28, 2019

Assignee: Huawei Technologies Co., Ltd.

Inventors: Huilian Yang, Lei Lu, Dai Shi
Dynamic processor frequency selection

Patent number: 10303482

Abstract: A dynamic processor frequency selection system includes a memory and a processor in communication with the memory. The processor includes a dynamic processor frequency selection module, a branch predictor module, a measurement module, and a power module. The measurement module measures a value according to a looping prediction function, which represents a quantity of cycles spent waiting for a type of contended resource within an instruction sequence. Additionally, the processor retains the value and branch history information, which is used to predict a waiting period associated with a potential loop. Then, the dynamic processor frequency selection module predicts the potential loop in a subsequent instruction according to the type of contended resource. The power module dynamically reduces a processor frequency during the waiting period from a first frequency state to a second frequency state according to the potential loop prediction. Then, the processor resumes operation at the first frequency state.

Type: Grant

Filed: March 7, 2017

Date of Patent: May 28, 2019

Assignee: Red Hat, Inc.

Inventor: Jonathan Charles Masters
Systems, apparatuses, and methods for performing a double blocked sum of absolute differences

Patent number: 10303471

Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector double block packed sum of absolute differences (SAD) in response to a single vector double block packed sum of absolute differences instruction that includes a destination vector register operand, first and second source operands, an immediate, and an opcode are described.

Type: Grant

Filed: February 28, 2017

Date of Patent: May 28, 2019

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Mostafa Hagog, Robert Valentine, Amit Gradstein, Simon Rubanovich, Zeev Sperber
Computer vision processing in hardware data paths

Patent number: 10296351

Abstract: An apparatus includes a processor and a coprocessor. The processor may be configured to generate a command to run a directed acyclic graph. The coprocessor may be configured to (i) receive the command from the processor, (ii) parse the directed acyclic graph into a data flow including one or more operators, (iii) schedule the operators in one or more data paths and (iv) generate one or more output vectors by processing one or more input vectors in the data paths. The data paths may be implemented with a plurality of hardware engines. The hardware engines may operate in parallel to each other. The coprocessor may be implemented solely in hardware.

Type: Grant

Filed: March 15, 2017

Date of Patent: May 21, 2019

Assignee: Ambarella, Inc.

Inventors: Leslie D. Kohn, Robert C. Kunz
Hybrid atomicity support for a binary translation based microprocessor

Patent number: 10296343

Abstract: A processing device including a first shadow register, a second shadow register, and an instruction execution circuit, communicatively coupled to the first shadow register and the second shadow register, to receive a sequence of instructions comprising a first local commit marker, a first global commit marker, and a first register access instruction referencing an architectural register, speculatively execute the first register access instruction to generate a speculative register state value associated with a physical register, responsive to identifying the first local commit marker, store, in the first shadow register, the speculative register state value, and responsive to identifying the first global commit marker, store, in the second shadow register, the speculative register state value.

Type: Grant

Filed: March 30, 2017

Date of Patent: May 21, 2019

Assignee: Intel Corporation

Inventors: Vineeth Mekkat, Jason M. Agron, Youfeng Wu
Spin loop delay instruction

Patent number: 10275254

Abstract: A Spin Loop Delay instruction. The instruction has a field associated therewith that indicates one or more conditions to be checked. Dispatching of the instruction is initially delayed. The instruction is subsequently dispatched based on a timeout, provided the instruction has not been previously dispatched based on meeting at least one condition of the one or more conditions to be checked.

Type: Grant

Filed: March 8, 2017

Date of Patent: April 30, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Fadi Y. Busaba, Christian Jacobi, Anthony Saporito, Eric M. Schwarz, Timothy J. Slegel
Defer buffer

Patent number: 10275250

Abstract: An apparatus comprises processing circuitry for executing instructions of two or more threads of processing, hardware registers to store context data for the two or more threads concurrently, and commit circuitry to commit results of executed instructions of the threads, where for each thread the commit circuitry commits the instructions of that thread in program order. At least one defer buffer is provided to buffer at least one blocked instruction for which execution by the processing circuitry is complete but execution of an earlier instruction of the same thread in the program order is incomplete. This can help to resolve inter-thread blocking and hence improve performance.

Type: Grant

Filed: March 6, 2017

Date of Patent: April 30, 2019

Assignee: ARM Limited

Inventors: Jose Alberto Joao, Ziqiang Huang, Alejandro Rico Carro
Data processing apparatus and method for controlling performance of speculative vector operations

Patent number: 10261789

Abstract: A data processing apparatus and a method of controlling performance of speculative vector operations are provided. The apparatus comprises processing circuitry for performing a sequence of speculative vector operations on vector operands, each vector operand comprising a plurality of vector elements, and speculation control circuitry for maintaining a speculation width indication indicating the number of vector elements of each vector operand to be subjected to the speculative vector operations. The speculation width indication is set to an initial value prior to performance of the sequence of speculative vector operations. The processing circuitry generates progress indications during performance of the sequence of speculative vector operations, and the speculation control circuitry detects, with reference to the progress indications and speculation reduction criteria, presence of a speculation reduction condition.

Type: Grant

Filed: August 18, 2014

Date of Patent: April 16, 2019

Assignee: ARM Limited

Inventors: Alastair David Reid, Daniel Kershaw
Vector processing unit

Patent number: 10261786

Abstract: A vector processing unit is described, and includes processor units that each include multiple processing resources. The processor units are each configured to perform arithmetic operations associated with vectorized computations. The vector processing unit includes a vector memory in data communication with each of the processor units and their respective processing resources. The vector memory includes memory banks configured to store data used by each of the processor units to perform the arithmetic operations. The processor units and the vector memory are tightly coupled within an area of the vector processing unit such that data communications are exchanged at a high bandwidth based on the placement of respective processor units relative to one another, and based on the placement of the vector memory relative to each processor unit.

Type: Grant

Filed: March 9, 2017

Date of Patent: April 16, 2019

Assignee: Google LLC

Inventors: William Lacy, Gregory Michael Thorson, Christopher Aaron Clark, Norman Paul Jouppi, Thomas Norrie, Andrew Everett Phelps
Dynamically selecting a memory boundary to be used in performing operations

Patent number: 10255068

Abstract: A selected boundary of memory to be used in processing an instruction is dynamically selected, based on a predictor. The instruction is decoded, and the decoding provides a sequence of operations to perform a specified operation. The sequence of operations includes a load to boundary operation to load data up to the selected boundary of memory. The data is loaded as part of the specified operation.

Type: Grant

Filed: March 3, 2017

Date of Patent: April 9, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Michael K. Gschwind
Method to do control speculation on loads in a high performance strand-based loop accelerator

Patent number: 10241789

Abstract: An apparatus includes a binary translator to hoist a load instruction in a branch of a conditional statement above the conditional statement and insert a speculation control of load (SCL) instruction in a complementary branch of the conditional statement, where the SCL instruction provides an indication of a real program order (RPO) of the load instruction before the load instruction was hoisted. The apparatus further includes an execution circuit to execute the load instruction to perform a load and cause an entry for the load instruction to be inserted in an ordering buffer, and where the execution circuit is to execute the SCL instruction to locate the entry for the load instruction in the ordering buffer using the RPO of the load instruction provided by the SCL instruction and discard the entry for the load instruction from the ordering buffer.

Type: Grant

Filed: December 27, 2016

Date of Patent: March 26, 2019

Assignee: INTEL CORPORATION

Inventors: Alexander Y. Ostanevich, Sergey P. Scherbinin, Jayesh Iyer, Dmitry M. Maslennikov, Denis G. Motin, Alexander V. Ermolovich, Andrey Chudnovets, Sergey A. Rozhkov, Boris A. Babayan
Method and apparatus to create register windows for parallel iterations to achieve high performance in HW-SW codesigned loop accelerator

Patent number: 10241801

Abstract: An apparatus includes a register file and a binary translator to create a plurality of strands and a plurality of iteration windows, where each iteration window of the plurality of iteration windows is allocated a set of continuous registers of the register file. The apparatus further includes a buffer to store strand documentation for a strand from the plurality of strands, where the strand documentation for the strand is to include an indication of a current register base for the strand. The apparatus further includes an execution circuit to execute an instruction to update the current register base for the strand in the strand documentation for the strand based on a fixed step value and an iteration window size.

Type: Grant

Filed: December 23, 2016

Date of Patent: March 26, 2019

Assignee: INTEL CORPORATION

Inventors: Jayesh Iyer, Sergey P. Scherbinin, Alexander Y. Ostanevich, Dmitry M. Maslennikov, Denis G. Motin, Alexander V. Ermolovich, Andrey Chudnovets, Sergey A. Rozhkov, Boris A. Babayan
Apparatus and methods to support counted loop exits in a multi-strand loop processor

Patent number: 10241794

Abstract: Embodiments described herein generally relate to the field of multi-strand out-of-order loop processing, and, more specifically, to apparatus and methods to support counted loop exits in a multi-strand loop processor. In one embodiment, a processor includes a loop accelerator comprising a strand documentation buffer and a plurality of strand execution circuits; and a binary translator to receive a plurality of loop instructions, divide the plurality of loop instructions into a plurality of strands, and store a strand documentation for each of the plurality of strands into the strand documentation buffer, each strand documentation indicating at least a number of iterations; wherein the binary translator further causes the loop accelerator to execute the plurality of strands asynchronously and in parallel using the plurality of strand execution circuits, wherein each of the strand execution circuits repeats the strand for the number of iterations indicated in the strand documentation associated with the strand.

Type: Grant

Filed: December 27, 2016

Date of Patent: March 26, 2019

Assignee: Intel Corporation

Inventors: Sergey P. Scherbinin, Jayesh Iyer, Alexander Y. Ostanevich, Dmitry Maslennikov, Denis G. Motin, Alexander V. Ermolovich, Andrey Chudnovets, Sergey A. Rozhkov, Boris A. Babayan

prev … 6 7 8 9 10 11 12 13 14 … next