Patents Examined by Daniel H. Pan
  • Patent number: 11113063
    Abstract: According to one general aspect, an apparatus may include a main-branch target buffer (BTB). The apparatus may include a micro-BTB separate from and smaller than the main-BTB, and configured to produce prediction information associated with a branching instruction. The apparatus may include a micro-BTB confidence counter configured to measure a correctness of the prediction information produced by the micro-BTB. The apparatus may further include a micro-BTB misprediction rate counter configured to measure a rate of mispredictions produced by the micro-BTB. The apparatus may also include a micro-BTB enablement circuit configured to enable a usage of the micro-BTB's prediction information, based, at least in part, upon the values of the micro-BTB confidence counter and the micro-BTB misprediction rate counter.
    Type: Grant
    Filed: September 9, 2019
    Date of Patent: September 7, 2021
    Inventors: James David Dundas, Xiaoxin Fan, Shashank Nemawarkar, Madhu Saravana Sibi Govindan
  • Patent number: 11099933
    Abstract: Disclosed embodiments relate to a streaming engine employed in, for example, a digital signal processor. A fixed data stream sequence including plural nested loops is specified by a control register. The streaming engine includes an address generator producing addresses of data elements and a steam head register storing data elements next to be supplied as operands. The streaming engine fetches stream data ahead of use by the central processing unit core in a stream buffer. Parity bits are formed upon storage of data in the stream buffer which are stored with the corresponding data. Upon transfer to the stream head register a second parity is calculated and compared with the stored parity. The streaming engine signals a parity fault if the parities do not match. The streaming engine preferably restarts fetching the data stream at the data element generating a parity fault.
    Type: Grant
    Filed: March 4, 2020
    Date of Patent: August 24, 2021
    Assignee: Texas Instruments Incorporated
    Inventors: Joseph Zbiciak, Timothy Anderson
  • Patent number: 11086626
    Abstract: Circuitry comprises decode circuitry to decode program instructions including producer instructions and consumer instructions, a consumer instruction requiring, as an input operand, a result generated by execution of a producer instruction; and execution circuitry to execute the program instructions; in which: the decode circuitry is configured to control operation of the execution circuitry in response to hint data associated with a given producer instruction and indicating, for the given producer instruction, a number of consumer instructions which require, as an input operand, a result generated by the given producer instruction.
    Type: Grant
    Filed: October 24, 2019
    Date of Patent: August 10, 2021
    Assignee: Arm Limited
    Inventors: Roko Grubisic, Giacomo Gabrielli, Matthew James Horsnell, Syed Ali Mustafa Zaidi
  • Patent number: 11086628
    Abstract: A system and method for load queue (LDQ) and store queue (STQ) entry allocations at address generation time that maintains age-order of instructions is described. In particular, writing LDQ and STQ entries are delayed until address generation time. This allows the load and store operations to dispatch, and younger operations (which may not be store and load operations) to also dispatch and execute their instructions. The address generation of the load or store operation is held at an address generation scheduler queue (AGSQ) until a load or store queue entry is available for the operation. The tracking of load queue entries or store queue entries is effectively being done in the AGSQ instead of at the decode engine. The LDQ and STQ depth is not visible from a decode engine's perspective, and increases the effective processing and queue depth.
    Type: Grant
    Filed: August 15, 2016
    Date of Patent: August 10, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventor: John M. King
  • Patent number: 11086631
    Abstract: Techniques are disclosed relating to the handling of exceptions generated by illegal instructions in a processor. In an embodiment, a processor may be configured to fetch instructions defined according to an instruction set architecture (ISA). The ISA may include a set of uncompressed instructions and a set of compressed instructions. The processor may further be configured to, upon detecting a given one of the set of compressed instructions, cause a copy of the given compressed instruction to be saved and convert the given compressed instruction to a corresponding given uncompressed instruction. The processor may also be configured to detect that the given uncompressed instruction is illegal and was converted from the given compressed instruction, and based at least in part on these, cause an illegal instruction exception to be generated using the copy of the given compressed instruction.
    Type: Grant
    Filed: October 23, 2019
    Date of Patent: August 10, 2021
    Assignee: Western Digital Technologies, Inc.
    Inventors: Robert T. Golla, Matthew B. Smittle
  • Patent number: 11080046
    Abstract: A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads.
    Type: Grant
    Filed: February 5, 2021
    Date of Patent: August 3, 2021
    Assignee: Intel Corporation
    Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
  • Patent number: 11080063
    Abstract: A processing device includes an instruction extractor that extracts target instructions intended for a loop process that is repeatedly performed, from instructions decoded by an instruction decoder, and a loop buffer including entries where each of the target instructions extracted by an instruction extractor are stored. An instruction processor stores a target instruction into one of the entries of the loop buffer, and combines target instructions into one target instruction in a case where resources of an instruction execution circuit used by the target instructions do not overlap, to store the one instruction in one of the entries of the loop buffer, and a selector selects the instruction output from the instruction decoder or the target instruction output from the loop buffer, and outputs the selected instruction to the instruction execution circuit.
    Type: Grant
    Filed: October 28, 2019
    Date of Patent: August 3, 2021
    Assignee: FUJITSU LIMITED
    Inventor: Ryohei Okazaki
  • Patent number: 11080057
    Abstract: A processing device includes an instruction decode circuit including decoders that decode instructions respectively assigned an instruction number that is determined for every one of the decoders, an instruction execution circuit that executes the instructions decoded by the instruction decode circuit, an instruction complete holding circuit including hold blocks provided in correspondence with each of the decoders and respectively including hold regions assigned the instruction number, and used for an instruction complete process, and an instruction complete controller that stores instruction information that is generated by decoding the instructions by the decoders, in one of the hold regions of the hold block corresponding to the decoder that decodes the instruction, based on the instruction number, and obtain, in order, the instruction information corresponding to the instructions executed by the instruction execution circuit from the instruction complete holding circuit, to perform the instruction comple
    Type: Grant
    Filed: November 4, 2019
    Date of Patent: August 3, 2021
    Assignee: FUJITSU LIMITED
    Inventor: Atushi Fusejima
  • Patent number: 11080049
    Abstract: Aspects for matrix multiplication in neural network are described herein. The aspects may include a controller unit configured to receive a matrix-multiply-matrix (MM) instruction that includes a first starting address of a first matrix, a first size of the first matrix, a second starting address of a second matrix, and a second size of the second matrix; multiple computation modules configured to respectively multiply, in response to the MM instruction, row vectors of the first matrix with column vectors of the second matrix to generate one or more result elements; and an interconnection unit configured to combine the result elements to generate one or more row vectors of a result matrix.
    Type: Grant
    Filed: October 17, 2019
    Date of Patent: August 3, 2021
    Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED
    Inventors: Xiao Zhang, Shaoli Liu, Tianshi Chen, Yunji Chen
  • Patent number: 11074193
    Abstract: A method is provided that includes performing, by a processor in response to a vector permutation instruction, permutation of values stored in lanes of a vector to generate a permuted vector, wherein the permutation is responsive to a control storage location storing permute control input for each lane of the permuted vector, wherein the permute control input corresponding to each lane of the permuted vector indicates a value to be stored in the lane of the permuted vector, wherein the permute control input for at least one lane of the permuted vector indicates a value of a selected lane of the vector is to be stored in the at least one lane, and storing the permuted vector in a storage location indicated by an operand of the vector permutation instruction.
    Type: Grant
    Filed: August 26, 2019
    Date of Patent: July 27, 2021
    Assignee: Texas Instruments Incorporated
    Inventors: Timothy David Anderson, Mujibur Rahman, Dheera Balasubramanian Samudrala, Peter Richard Dent, Duc Quang Bui
  • Patent number: 11068264
    Abstract: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate a packed data register of the plurality of packed data registers that is to store a source packed memory address information. The source packed memory address information is to include a plurality of memory address information data elements. An execution unit is coupled with the decode unit and the plurality of packed data registers, the execution unit, in response to the instruction, is to load a plurality of data elements from a plurality of memory addresses that are each to correspond to a different one of the plurality of memory address information data elements, and store the plurality of loaded data elements in a destination storage location. The destination storage location does not include a register of the plurality of packed data registers.
    Type: Grant
    Filed: August 9, 2019
    Date of Patent: July 20, 2021
    Assignee: Intel Corporation
    Inventors: William C. Hasenplaugh, Chris J. Newburn, Simon C. Steely, Jr., Samantika S. Sury
  • Patent number: 11061678
    Abstract: In one embodiment, a method for improving a performance of an integrated circuit includes implementing one or more computing devices executing a compiler program that: (i) evaluates a target instruction set intended for execution by an integrated circuit; (ii) identifies one or more nested loop instructions within the target instruction set based on the evaluation; (iii) evaluates whether a most inner loop body within the one or more nested loop instructions comprises a candidate inner loop body that requires a loop optimization that mitigates an operational penalty to the integrated circuit based on one or more executional properties of the most inner loop instruction; and (iv) implements the loop optimization that modifies the target instruction set to include loop optimization instructions to control, at runtime, an execution and a termination of the most inner loop body thereby mitigating the operational penalty to the integrated circuit.
    Type: Grant
    Filed: December 18, 2020
    Date of Patent: July 13, 2021
    Assignee: qaudric.io Inc
    Inventors: Nigel Drego, Mrinalini Ravichandran, Jianman Chang, Daniel Firu, Veerbhan Kheterpal
  • Patent number: 11048513
    Abstract: Techniques related to executing a plurality of instructions by a processor comprising receiving a first instruction for execution on an instruction execution pipeline, wherein the instruction execution pipeline is in a first execution mode, beginning execution of the first instruction on the instruction execution pipeline, receiving an execution mode instruction to switch the instruction execution pipeline to a second execution mode, switching the instruction execution pipeline to the second execution mode based on the received execution mode instruction, annulling the first instruction based on the execution mode instruction, receiving a second instruction for execution on the instruction execution pipeline, the second instruction, and executing the second instruction.
    Type: Grant
    Filed: April 15, 2019
    Date of Patent: June 29, 2021
    Assignee: Texas Instruments Incorporated
    Inventors: Timothy D. Anderson, Joseph Zbiciak, Duc Bui, Mel Alan Phipps, Todd T. Hahn
  • Patent number: 11042462
    Abstract: Identifying computer program execution characteristics for determine relevance of pattern instruction executions to determine characteristics of a computer program. Filters are utilized to determine which subsequent occurrences of execution of at least one computer instruction are relevant to a counter based on execution characteristics of the at least one computer instruction where the counter counts the subsequent occurrences of execution of at least one computer instruction following prior executions of the same at least one computer instruction.
    Type: Grant
    Filed: September 4, 2019
    Date of Patent: June 22, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Anthony Thomas Sofia, Peter Sutton, Robert W. St. John, Matthias Klein
  • Patent number: 11036516
    Abstract: A parallel distributed processing control system used in production distribution planning includes: a storage unit storing step information of steps constituting a production distribution process of a product, CPU information of CPUs that calculate a value of a simulation result for the step, and a constraint value in the production distribution process; a divided model generation unit generating a divided model by grouping the steps; a CPU allocation unit allocating the divided model to the plurality of CPUs; an engine execution unit enabling the CPU to calculate the value for the step constituting the divided model; and a constraint monitoring unit determining whether the value satisfies a condition specified by the constraint value. An output information generation unit generates result information using the value satisfying the condition; and the CPU allocation unit allocates the divided model so that processing loads of the plurality of CPUs are equalized.
    Type: Grant
    Filed: June 11, 2019
    Date of Patent: June 15, 2021
    Assignee: HITACHI, LTD.
    Inventors: Atsuki Kiuchi, Tazu Nomoto, Yasuo Bakke, Takahiro Ogura
  • Patent number: 11036648
    Abstract: Disclosed embodiments include a data processing apparatus having a processing core, a memory, and a streaming engine. The streaming engine is configured to receive a plurality of data elements stored in the memory and to provide the plurality of data elements as a data stream to the processing core, and includes an address generator to generate addresses corresponding to locations in the memory, a buffer to store the data elements received from the locations in the memory corresponding to the generated addresses, and an output to supply the data elements received from the memory to the processing core as the data stream.
    Type: Grant
    Filed: December 20, 2018
    Date of Patent: June 15, 2021
    Assignee: Texas Instruments Incorporated
    Inventors: Timothy D. Anderson, Joseph Zbiciak, Duc Quang Bui, Abhijeet Chachad, Kai Chirca, Naveen Bhoria, Matthew D. Pierson, Daniel Wu, Ramakrishnan Venkatasubramanian
  • Patent number: 11029997
    Abstract: Techniques related to executing a plurality of instructions by a processor comprising receiving a first instruction for execution on an instruction execution pipeline, wherein the instruction execution pipeline is in a first execution mode, and wherein the first instruction is configured to utilize a first memory location, begin execution of the first instruction on the instruction execution pipeline, receiving an execution mode instruction to switch the instruction execution pipeline to a second execution mode, switching the instruction execution pipeline to the second execution mode based on the received execution mode instruction, receiving a second instruction for execution on the instruction execution pipeline, the second instruction configured to utilize the first memory location, determining that the first instruction and the second instruction utilize the first memory location, and stalling execution of the second instruction based on the determining.
    Type: Grant
    Filed: April 15, 2019
    Date of Patent: June 8, 2021
    Assignee: Texas Instruments Incorporated
    Inventors: Duc Bui, Timothy D. Anderson
  • Patent number: 11016810
    Abstract: A system and method for a computing tile of a multi-tiled integrated circuit includes a plurality of distinct tile computing circuits, wherein each of the plurality of distinct tile computing circuits is configured to receive fixed-length instructions; a token-informed task scheduler that: tracks one or more of a plurality of distinct tokens emitted by one or more of the plurality of distinct tile computing circuits; and selects a distinct computation task of a plurality of distinct computation tasks based on the tracking; and a work queue buffer that: contains a plurality of distinct fixed-length instructions, wherein each one of the fixed-length instructions is associated with one of the plurality of distinct computation tasks; and transmits one of the plurality of distinct fixed-length instructions to one or more of the plurality of distinct tile computing circuits based on the selection of the distinct computation task by the token-informed task scheduler.
    Type: Grant
    Filed: November 24, 2020
    Date of Patent: May 25, 2021
    Assignee: Mythic, Inc.
    Inventors: Malav Parikh, Sergio Schuler, Vimal Reddy, Zainab Zaidi, Paul Toth, Adam Caughron, Bryant Sorensen, Alex Dang-Tran, Scott Johnson, Raul Garibay, Andrew Morten, David Fick
  • Patent number: 11016764
    Abstract: A vector processing unit is described, and includes processor units that each include multiple processing resources. The processor units are each configured to perform arithmetic operations associated with vectorized computations. The vector processing unit includes a vector memory in data communication with each of the processor units and their respective processing resources. The vector memory includes memory banks configured to store data used by each of the processor units to perform the arithmetic operations. The processor units and the vector memory are tightly coupled within an area of the vector processing unit such that data communications are exchanged at a high bandwidth based on the placement of respective processor units relative to one another, and based on the placement of the vector memory relative to each processor unit.
    Type: Grant
    Filed: April 8, 2020
    Date of Patent: May 25, 2021
    Assignee: Google LLC
    Inventors: William Lacy, Gregory Michael Thorson, Christopher Aaron Clark, Norman Paul Jouppi, Thomas Norrie, Andrew Everett Phelps
  • Patent number: 11016776
    Abstract: The present disclosure provides systems and methods for executing instructions. The system can include: processing unit having a core configured to execute instructions; and a host unit configured to: compile computer code into a plurality of instructions that includes a set of instructions that are determined to be executed in parallel on the core, wherein the set of instructions each includes an operation instruction and an indication bit and wherein the indication bit is set to identify the last instruction of the set of instructions, and provide the set of instructions to the core.
    Type: Grant
    Filed: December 21, 2018
    Date of Patent: May 25, 2021
    Assignee: ALIBABA GROUP HOLDING LIMITED
    Inventors: Liang Han, Xiaowei Jiang