Patents Examined by Daniel H. Pan
  • Patent number: 10585973
    Abstract: Aspects for vector operations in neural network are described herein. The aspects may include a vector caching unit configured to store a first vector and a second vector, wherein the first vector includes one or more first elements and the second vector includes one or more second elements. The aspects may further include one or more adders and a combiner. The one or more adders may be configured to respectively add each of the first elements to a corresponding one of the second elements to generate one or more addition results. The combiner may be configured to combine a combiner configured to combine the one or more addition results into an output vector.
    Type: Grant
    Filed: October 26, 2018
    Date of Patent: March 10, 2020
    Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED
    Inventors: Jinhua Tao, Tian Zhi, Shaoli Liu, Tianshi Chen, Yunji Chen
  • Patent number: 10579387
    Abstract: Technical solutions are described for executing one or more out-of-order (OoO) instructions by a processing unit. The execution includes detecting, by a load-store unit (LSU), a load-hit-store (LHS) in an out-of-order execution of the instructions, the detecting based only on effective addresses. The detecting includes determining an effective address associated with an operand of a load instruction. The detecting further includes determining whether a store instruction entry using said effective address to store a data value is present in a store reorder queue, and indicating that an LHS has been detected based at least in part on determining that store instruction entry using said effective address is present in the store reorder queue. In response to detecting the LHS, a store forwarding is performed that includes forwarding data from the store instruction to the load instruction.
    Type: Grant
    Filed: October 6, 2017
    Date of Patent: March 3, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Christopher Gonzalez, Bryan Lloyd, Balaram Sinharoy
  • Patent number: 10565117
    Abstract: Techniques relate to handling outstanding cache miss prefetches. A processor pipeline recognizes that a prefetch cancelling instruction is being executed. In response to recognizing that the prefetch cancelling instruction is being executed, all outstanding prefetches are evaluated according to a criterion as set forth by the prefetch cancelling instruction in order to select qualified prefetches. In response to evaluating, a cache subsystem is communicated with to cause cancelling of the qualified prefetches that fit the criterion. In response to successful cancelling of the qualified prefetches, a local cache is prevented from being updated from the qualified prefetches.
    Type: Grant
    Filed: July 31, 2018
    Date of Patent: February 18, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael Karl Gschwind, Maged M. Michael, Valentina Salapura, Eric M. Schwarz, Chung-Lung K. Shum
  • Patent number: 10564978
    Abstract: Operation of a multi-slice processor that includes a plurality of execution slices and a plurality of load/store slices, where each load/store slice includes a load miss queue and a load reorder queue, includes: receiving, at a load reorder queue, a load instruction requesting data; responsive to the data not being stored in a data cache, determining whether a previous load instruction is pending a fetch of a cache line comprising the data; if the cache line does not comprise the data, allocating an entry for the load instruction in the load miss queue; and if the cache line does comprise the data: merging, in the load reorder queue, the load instruction with an entry for the previous load instruction.
    Type: Grant
    Filed: June 8, 2018
    Date of Patent: February 18, 2020
    Assignee: International Business Machines Corporation
    Inventors: Kimberly M. Fernsler, David A. Hrusecky, Hung Q. Le, Elizabeth A. McGlone, Brian W. Thompto
  • Patent number: 10553316
    Abstract: A system for generating an alimentary instruction set based on vibrant constitutional guidance using artificial intelligence includes a diagnostic engine operating on at least a server and configured to receive at least a biological extraction from a user and generate a diagnostic output, based on the at least a biological extraction. The system includes a plan generation module operating on the at least a server, the plan generation module designed and configured to generate, based on the diagnostic output, a comprehensive instruction set associated with the user. The system is further configured to generate an alimentary instruction set associated with the user based on the comprehensive instruction set, wherein the alimentary instruction set is configured to interact with a plurality of processes and services based on components of the alimentary instruction set.
    Type: Grant
    Filed: April 4, 2019
    Date of Patent: February 4, 2020
    Assignee: KPN Innovations, LLC
    Inventor: Kenneth Neumann
  • Patent number: 10552159
    Abstract: A computer processor includes a branch prediction unit that includes a local branch predictor and a global branch predictor. Managing power consumption in such a computer processor includes, for each of a plurality of branch instructions: performing, by the local branch predictor, a local branch prediction; performing, by each of the global branch predictors, a global branch prediction; determining to utilize the local branch prediction over the global branch predictions as a branch prediction for the branch instruction; incrementing a value of a counter; determining whether the value of the counter exceeds a predetermined threshold; and if the value of the counter exceeds the predetermined threshold, powering down at least one of the global branch predictors and configuring the branch prediction unit to bypass the powered down global branch predictor for branch predictions of subsequent branch instructions.
    Type: Grant
    Filed: June 1, 2018
    Date of Patent: February 4, 2020
    Assignee: International Business Machines Corporation
    Inventors: David S. Levitan, Nicholas R. Orzol, Robert A. Philhower
  • Patent number: 10540512
    Abstract: A parallel processing method, system, and/or computer program product for performing data parallel wide accesses on an unstructured text is provided. The parallel processing includes creating a pointer that points to a beginning of the unstructured text and loading into a vector register a string segment of the unstructured text based on the pointer. Then, access permissions of a first byte of the string segment are automatically tested. In turn, a determination is made as to whether the string segment includes an end indication, and a remaining portion of the unstructured text is validated by accessing and loading a last character identified by the end indication into the vector register when the string segment is determined to include the end indication.
    Type: Grant
    Filed: September 29, 2015
    Date of Patent: January 21, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Brett Olsson
  • Patent number: 10534607
    Abstract: Methods, systems, and apparatus, including an apparatus for accessing a N-dimensional tensor, the apparatus including, for each dimension of the N-dimensional tensor, a partial address offset value element that stores a partial address offset value for the dimension based at least on an initial value for the dimension, a step value for the dimension, and a number of iterations of a loop for the dimension. The apparatus includes a hardware adder and a processor. The processor obtains an instruction to access a particular element of the N-dimensional tensor. The N-dimensional tensor has multiple elements arranged across each of the N dimensions, where N is an integer that is equal to or greater than one. The processor determines, using the partial address offset value elements and the hardware adder, an address of the particular element and outputs data indicating the determined address for accessing the particular element of the N-dimensional tensor.
    Type: Grant
    Filed: February 23, 2018
    Date of Patent: January 14, 2020
    Assignee: Google LLC
    Inventors: Olivier Temam, Harshit Khaitan, Ravi Narayanaswami, Dong Hyuk Woo
  • Patent number: 10534616
    Abstract: Technical solutions are described for executing one or more out-of-order instructions by a load-store unit (LSU) by detecting a load-hit-load (LHL) case based only on effective addresses (EA). An example method includes, in response to receiving a first load instruction, creating an entry in a LHL table. Further, in response to receiving a second load instruction in the load reorder queue, and in response to the predetermined number of bits from a second EA used by the second load instruction matching the predetermined number of bits from the first EA, comparing the first EA and the second EA. Further, a first thread identifier for the first load instruction is compared with a second thread identifier for the second load instruction. In response to the first EA matching the second EA, and the first thread identifier matching the second thread identifier, the method includes flushing the first load instruction.
    Type: Grant
    Filed: October 6, 2017
    Date of Patent: January 14, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Christopher Gonzalez, Bryan Lloyd, Balaram Sinharoy
  • Patent number: 10521228
    Abstract: The present disclosure provides a data read-write scheduler and a reservation station for vector operations. The data read-write scheduler suspends the instruction execution by providing a read instruction cache module and a write instruction cache module and detecting conflict instructions based on the two modules. After the time is satisfied, instructions are re-executed, thereby solving the read-after-write conflict and the write-after-read conflict between instructions and guaranteeing that correct data are provided to a vector operations component. Therefore, the subject disclosure has more values for promotion and application.
    Type: Grant
    Filed: November 7, 2018
    Date of Patent: December 31, 2019
    Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED
    Inventors: Dong Han, Shaoli Liu, Yunji Chen, Tianshi Chen
  • Patent number: 10521237
    Abstract: A method and a device that includes a set of multiple pipeline stages, wherein the set of multiple pipeline stages is arranged to execute a first thread of instructions; multiple memristor based registers that are arranged to store a state of another thread of instructions that differs from the first thread of instructions; and a control circuit that is arranged to control a thread switch between the first thread of instructions and the other thread of instructions by controlling a storage of a state of the first thread of instructions at the multiple memristor based registers and by controlling a provision of the state of the other thread of instructions by the set of multiple pipeline stages; wherein the set of multiple pipeline stages is arranged to execute the other thread of instructions upon a reception of the state of the other thread of instructions.
    Type: Grant
    Filed: March 19, 2014
    Date of Patent: December 31, 2019
    Assignee: TECHNION RESEARCH AND DEVELOPMENT FOUNDATION LTD
    Inventors: Avinoam Kolodny, Uri Weiser, Shahar Kvatinsky
  • Patent number: 10514929
    Abstract: Embodiments of the present application disclose a computer instruction processing method, a coprocessor, and a system. The computer instruction processing method includes: receiving, by a coprocessor, a first instruction set migrated by a central processing unit CPU; acquiring, according to the first instruction set that is applicable to the CPU for execution, a second instruction set for execution in the coprocessor; and executing binary codes in the second instruction set. In this way, the coprocessor that executes the second instruction set substitutes for the CPU that executes the first instruction set, CPU load is reduced, and usage of the coprocessor is improved.
    Type: Grant
    Filed: December 15, 2017
    Date of Patent: December 24, 2019
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Yunwei Gao, Xinlong Lin, Jianfeng Zhan
  • Patent number: 10509651
    Abstract: A processor of an aspect includes a plurality of registers, and a decode unit to decode an instruction. The instruction is to indicate at least one storage location that is to store a first integer, a second integer, and a modulus. An execution unit is coupled with the decode unit, and coupled with the plurality of registers. The execution unit, in response to the instruction, is to store a Montgomery multiplication product corresponding to the first integer, the second integer, and the modulus, in a destination storage location. Other processors, methods, systems, and instructions are disclosed.
    Type: Grant
    Filed: December 22, 2016
    Date of Patent: December 17, 2019
    Assignee: Intel Corporation
    Inventor: Vinodh Gopal
  • Patent number: 10496404
    Abstract: The present disclosure provides a data read-write scheduler and a reservation station for vector operations. The data read-write scheduler suspends the instruction execution by providing a read instruction cache module and a write instruction cache module and detecting conflict instructions based on the two modules. After the time is satisfied, instructions are re-executed, thereby solving the read-after-write conflict and the write-after-read conflict between instructions and guaranteeing that correct data are provided to a vector operations component. Therefore, the subject disclosure has more values for promotion and application.
    Type: Grant
    Filed: November 7, 2018
    Date of Patent: December 3, 2019
    Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED
    Inventors: Dong Han, Shaoli Liu, Yunji Chen, Tianshi Chen
  • Patent number: 10489155
    Abstract: Systems and methods relate to a mixed-width single instruction multiple data (SIMD) instruction which has at least a source vector operand comprising data elements of a first bit-width and a destination vector operand comprising data elements of a second bit-width, wherein the second bit-width is either half of or twice the first bit-width. Correspondingly, one of the source or destination vector operands is expressed as a pair of registers, a first register and a second register. The other vector operand is expressed as a single register. Data elements of the first register correspond to even-numbered data elements of the other vector operand expressed as a single register, and data elements of the second register correspond to data elements of the other vector operand expressed as a single register.
    Type: Grant
    Filed: July 21, 2015
    Date of Patent: November 26, 2019
    Assignee: QUALCOMM Incorporated
    Inventors: Eric Wayne Mahurin, Ajay Anant Ingle
  • Patent number: 10489154
    Abstract: An embodiment of the invention is a processor including execution circuitry to calculate, in response to a decoded instruction, a result of a complex multiply-accumulate of a first complex number, a second complex number, and a third complex number. The calculation includes a first operation to calculate a first term of a real component of the result and a first term of the imaginary component of the result. The calculation also includes a second operation to calculate a second term of the real component of the result and a second term of the imaginary component of the result. The processor also includes a decoder to decode an instruction to generate the decoded instruction and a first source register, a second source register, and a source and destination register to provide the first complex number, the second complex number, and the third complex number, respectively.
    Type: Grant
    Filed: November 28, 2017
    Date of Patent: November 26, 2019
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Mark Charney, Raanan Sade, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Roman S. Dubtsov
  • Patent number: 10481911
    Abstract: Method and apparatus are provided for synchronizing execution of a plurality of threads on a multi-threaded processor. A program executed by a thread can have a number of synchronization points corresponding to points where execution is to be synchronized with another thread. Execution of a thread is paused when it reaches a synchronization point until at least one other thread with which it is intended to be synchronized reaches a corresponding synchronization point. Execution is subsequently resumed. A control core maintains status data for threads and can cause a thread that is ready to run to use execution resources that were occupied by a thread that is waiting for a synchronization event.
    Type: Grant
    Filed: February 11, 2014
    Date of Patent: November 19, 2019
    Assignee: Imagination Technologies Limited
    Inventor: Yoong Chert Foo
  • Patent number: 10459731
    Abstract: A first register has a lane storing first input data and a second register has a lane storing second input data elements. A width of the lane of the second register is equal to a width of the lane of the first register. A single-instruction-multiple-data (SIMD) lane has a lane width equal to the width of the lane of the first register. The SIMD lane is configured to perform a sliding window operation on the first input data elements in the lane of the first register and the second input data elements in the lane of the second register. Performing the sliding window operation includes determining a result based on a first input data element stored in a first position of the first register and a second input data element stored in a second position of the second register. The second position is different from the first position.
    Type: Grant
    Filed: July 20, 2015
    Date of Patent: October 29, 2019
    Assignee: QUALCOMM Incorporated
    Inventors: Eric Mahurin, Jakub Pawel Golab
  • Patent number: 10459726
    Abstract: Described herein is a system and method for store fusion that fuses small store operations into fewer, larger store operations. The system detects that a pair of adjacent operations are consecutive store operations, where the adjacent micro-operations refers to micro-operations flowing through adjacent dispatch slots and the consecutive store micro-operations refers to both of the adjacent micro-operations being store micro-operations. The consecutive store operations are then reviewed to determine if the data sizes are the same and if the store operation addresses are consecutive. The two store operations are then fused together to form one store operation with twice the data size and one store data HI operation.
    Type: Grant
    Filed: November 27, 2017
    Date of Patent: October 29, 2019
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventor: John M. King
  • Patent number: 10452394
    Abstract: An embodiment of the invention is a processor including execution circuitry to calculate, in response to a decoded instruction, a result of a complex multiplication of a first complex number and a second complex number. The calculation includes a first operation to calculate a first term of a real component of the result and a first term of the imaginary component of the result. The calculation also includes a second operation to calculate a second term of the real component of the result and a second term of the imaginary component of the result. The processor also includes a decoder, a first source register, and a second source register. The decoder is to decode an instruction to generate the decoded instruction. The first source register is to provide the first complex number and the second source register is to provide the second complex number.
    Type: Grant
    Filed: November 28, 2017
    Date of Patent: October 22, 2019
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Mark Charney, Raanan Sade, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Roman S. Dubstov