Patents Examined by Daniel H. Pan

Apparatus and methods for vector operations

Patent number: 10585973

Abstract: Aspects for vector operations in neural network are described herein. The aspects may include a vector caching unit configured to store a first vector and a second vector, wherein the first vector includes one or more first elements and the second vector includes one or more second elements. The aspects may further include one or more adders and a combiner. The one or more adders may be configured to respectively add each of the first elements to a corresponding one of the second elements to generate one or more addition results. The combiner may be configured to combine a combiner configured to combine the one or more addition results into an output vector.

Type: Grant

Filed: October 26, 2018

Date of Patent: March 10, 2020

Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED

Inventors: Jinhua Tao, Tian Zhi, Shaoli Liu, Tianshi Chen, Yunji Chen
Efficient store-forwarding with partitioned FIFO store-reorder queue in out-of-order processor

Patent number: 10579387

Abstract: Technical solutions are described for executing one or more out-of-order (OoO) instructions by a processing unit. The execution includes detecting, by a load-store unit (LSU), a load-hit-store (LHS) in an out-of-order execution of the instructions, the detecting based only on effective addresses. The detecting includes determining an effective address associated with an operand of a load instruction. The detecting further includes determining whether a store instruction entry using said effective address to store a data value is present in a store reorder queue, and indicating that an LHS has been detected based at least in part on determining that store instruction entry using said effective address is present in the store reorder queue. In response to detecting the LHS, a store forwarding is performed that includes forwarding data from the store instruction to the load instruction.

Type: Grant

Filed: October 6, 2017

Date of Patent: March 3, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Christopher Gonzalez, Bryan Lloyd, Balaram Sinharoy
Instruction to cancel outstanding cache prefetches

Patent number: 10565117

Abstract: Techniques relate to handling outstanding cache miss prefetches. A processor pipeline recognizes that a prefetch cancelling instruction is being executed. In response to recognizing that the prefetch cancelling instruction is being executed, all outstanding prefetches are evaluated according to a criterion as set forth by the prefetch cancelling instruction in order to select qualified prefetches. In response to evaluating, a cache subsystem is communicated with to cause cancelling of the qualified prefetches that fit the criterion. In response to successful cancelling of the qualified prefetches, a local cache is prevented from being updated from the qualified prefetches.

Type: Grant

Filed: July 31, 2018

Date of Patent: February 18, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Michael Karl Gschwind, Maged M. Michael, Valentina Salapura, Eric M. Schwarz, Chung-Lung K. Shum
Operation of a multi-slice processor with an expanded merge fetching queue

Patent number: 10564978

Abstract: Operation of a multi-slice processor that includes a plurality of execution slices and a plurality of load/store slices, where each load/store slice includes a load miss queue and a load reorder queue, includes: receiving, at a load reorder queue, a load instruction requesting data; responsive to the data not being stored in a data cache, determining whether a previous load instruction is pending a fetch of a cache line comprising the data; if the cache line does not comprise the data, allocating an entry for the load instruction in the load miss queue; and if the cache line does comprise the data: merging, in the load reorder queue, the load instruction with an entry for the previous load instruction.

Type: Grant

Filed: June 8, 2018

Date of Patent: February 18, 2020

Assignee: International Business Machines Corporation

Inventors: Kimberly M. Fernsler, David A. Hrusecky, Hung Q. Le, Elizabeth A. McGlone, Brian W. Thompto
Systems and methods for generating alimentary instruction sets based on vibrant constitutional guidance

Patent number: 10553316

Abstract: A system for generating an alimentary instruction set based on vibrant constitutional guidance using artificial intelligence includes a diagnostic engine operating on at least a server and configured to receive at least a biological extraction from a user and generate a diagnostic output, based on the at least a biological extraction. The system includes a plan generation module operating on the at least a server, the plan generation module designed and configured to generate, based on the diagnostic output, a comprehensive instruction set associated with the user. The system is further configured to generate an alimentary instruction set associated with the user based on the comprehensive instruction set, wherein the alimentary instruction set is configured to interact with a plurality of processes and services based on components of the alimentary instruction set.

Type: Grant

Filed: April 4, 2019

Date of Patent: February 4, 2020

Assignee: KPN Innovations, LLC

Inventor: Kenneth Neumann
Power management of branch predictors in a computer processor

Patent number: 10552159

Abstract: A computer processor includes a branch prediction unit that includes a local branch predictor and a global branch predictor. Managing power consumption in such a computer processor includes, for each of a plurality of branch instructions: performing, by the local branch predictor, a local branch prediction; performing, by each of the global branch predictors, a global branch prediction; determining to utilize the local branch prediction over the global branch predictions as a branch prediction for the branch instruction; incrementing a value of a counter; determining whether the value of the counter exceeds a predetermined threshold; and if the value of the counter exceeds the predetermined threshold, powering down at least one of the global branch predictors and configuring the branch prediction unit to bypass the powered down global branch predictor for branch predictions of subsequent branch instructions.

Type: Grant

Filed: June 1, 2018

Date of Patent: February 4, 2020

Assignee: International Business Machines Corporation

Inventors: David S. Levitan, Nicholas R. Orzol, Robert A. Philhower
Exception preserving parallel data processing of string and unstructured text

Patent number: 10540512

Abstract: A parallel processing method, system, and/or computer program product for performing data parallel wide accesses on an unstructured text is provided. The parallel processing includes creating a pointer that points to a beginning of the unstructured text and loading into a vector register a string segment of the unstructured text based on the pointer. Then, access permissions of a first byte of the string segment are automatically tested. In turn, a determination is made as to whether the string segment includes an end indication, and a remaining portion of the unstructured text is validated by accessing and loading a last character identified by the end indication into the vector register when the string segment is determined to include the end indication.

Type: Grant

Filed: September 29, 2015

Date of Patent: January 21, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Michael K. Gschwind, Brett Olsson
Accessing data in multi-dimensional tensors using adders

Patent number: 10534607

Abstract: Methods, systems, and apparatus, including an apparatus for accessing a N-dimensional tensor, the apparatus including, for each dimension of the N-dimensional tensor, a partial address offset value element that stores a partial address offset value for the dimension based at least on an initial value for the dimension, a step value for the dimension, and a number of iterations of a loop for the dimension. The apparatus includes a hardware adder and a processor. The processor obtains an instruction to access a particular element of the N-dimensional tensor. The N-dimensional tensor has multiple elements arranged across each of the N dimensions, where N is an integer that is equal to or greater than one. The processor determines, using the partial address offset value elements and the hardware adder, an address of the particular element and outputs data indicating the determined address for accessing the particular element of the N-dimensional tensor.

Type: Grant

Filed: February 23, 2018

Date of Patent: January 14, 2020

Assignee: Google LLC

Inventors: Olivier Temam, Harshit Khaitan, Ravi Narayanaswami, Dong Hyuk Woo
Load-hit-load detection in an out-of-order processor

Patent number: 10534616

Abstract: Technical solutions are described for executing one or more out-of-order instructions by a load-store unit (LSU) by detecting a load-hit-load (LHL) case based only on effective addresses (EA). An example method includes, in response to receiving a first load instruction, creating an entry in a LHL table. Further, in response to receiving a second load instruction in the load reorder queue, and in response to the predetermined number of bits from a second EA used by the second load instruction matching the predetermined number of bits from the first EA, comparing the first EA and the second EA. Further, a first thread identifier for the first load instruction is compared with a second thread identifier for the second load instruction. In response to the first EA matching the second EA, and the first thread identifier matching the second thread identifier, the method includes flushing the first load instruction.

Type: Grant

Filed: October 6, 2017

Date of Patent: January 14, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Christopher Gonzalez, Bryan Lloyd, Balaram Sinharoy
Data read-write scheduler and reservation station for vector operations

Patent number: 10521228

Abstract: The present disclosure provides a data read-write scheduler and a reservation station for vector operations. The data read-write scheduler suspends the instruction execution by providing a read instruction cache module and a write instruction cache module and detecting conflict instructions based on the two modules. After the time is satisfied, instructions are re-executed, thereby solving the read-after-write conflict and the write-after-read conflict between instructions and guaranteeing that correct data are provided to a vector operations component. Therefore, the subject disclosure has more values for promotion and application.

Type: Grant

Filed: November 7, 2018

Date of Patent: December 31, 2019

Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED

Inventors: Dong Han, Shaoli Liu, Yunji Chen, Tianshi Chen
Memristor based multithreading

Patent number: 10521237

Abstract: A method and a device that includes a set of multiple pipeline stages, wherein the set of multiple pipeline stages is arranged to execute a first thread of instructions; multiple memristor based registers that are arranged to store a state of another thread of instructions that differs from the first thread of instructions; and a control circuit that is arranged to control a thread switch between the first thread of instructions and the other thread of instructions by controlling a storage of a state of the first thread of instructions at the multiple memristor based registers and by controlling a provision of the state of the other thread of instructions by the set of multiple pipeline stages; wherein the set of multiple pipeline stages is arranged to execute the other thread of instructions upon a reception of the state of the other thread of instructions.

Type: Grant

Filed: March 19, 2014

Date of Patent: December 31, 2019

Assignee: TECHNION RESEARCH AND DEVELOPMENT FOUNDATION LTD

Inventors: Avinoam Kolodny, Uri Weiser, Shahar Kvatinsky
Computer instruction processing method, coprocessor, and system

Patent number: 10514929

Abstract: Embodiments of the present application disclose a computer instruction processing method, a coprocessor, and a system. The computer instruction processing method includes: receiving, by a coprocessor, a first instruction set migrated by a central processing unit CPU; acquiring, according to the first instruction set that is applicable to the CPU for execution, a second instruction set for execution in the coprocessor; and executing binary codes in the second instruction set. In this way, the coprocessor that executes the second instruction set substitutes for the CPU that executes the first instruction set, CPU load is reduced, and usage of the coprocessor is improved.

Type: Grant

Filed: December 15, 2017

Date of Patent: December 24, 2019

Assignee: HUAWEI TECHNOLOGIES CO., LTD.

Inventors: Yunwei Gao, Xinlong Lin, Jianfeng Zhan
Montgomery multiplication processors, methods, systems, and instructions

Patent number: 10509651

Abstract: A processor of an aspect includes a plurality of registers, and a decode unit to decode an instruction. The instruction is to indicate at least one storage location that is to store a first integer, a second integer, and a modulus. An execution unit is coupled with the decode unit, and coupled with the plurality of registers. The execution unit, in response to the instruction, is to store a Montgomery multiplication product corresponding to the first integer, the second integer, and the modulus, in a destination storage location. Other processors, methods, systems, and instructions are disclosed.

Type: Grant

Filed: December 22, 2016

Date of Patent: December 17, 2019

Assignee: Intel Corporation

Inventor: Vinodh Gopal
Data read-write scheduler and reservation station for vector operations

Patent number: 10496404

Abstract: The present disclosure provides a data read-write scheduler and a reservation station for vector operations. The data read-write scheduler suspends the instruction execution by providing a read instruction cache module and a write instruction cache module and detecting conflict instructions based on the two modules. After the time is satisfied, instructions are re-executed, thereby solving the read-after-write conflict and the write-after-read conflict between instructions and guaranteeing that correct data are provided to a vector operations component. Therefore, the subject disclosure has more values for promotion and application.

Type: Grant

Filed: November 7, 2018

Date of Patent: December 3, 2019

Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED

Inventors: Dong Han, Shaoli Liu, Yunji Chen, Tianshi Chen
Mixed-width SIMD operations using even/odd register pairs for wide data elements

Patent number: 10489155

Abstract: Systems and methods relate to a mixed-width single instruction multiple data (SIMD) instruction which has at least a source vector operand comprising data elements of a first bit-width and a destination vector operand comprising data elements of a second bit-width, wherein the second bit-width is either half of or twice the first bit-width. Correspondingly, one of the source or destination vector operands is expressed as a pair of registers, a first register and a second register. The other vector operand is expressed as a single register. Data elements of the first register correspond to even-numbered data elements of the other vector operand expressed as a single register, and data elements of the second register correspond to data elements of the other vector operand expressed as a single register.

Type: Grant

Filed: July 21, 2015

Date of Patent: November 26, 2019

Assignee: QUALCOMM Incorporated

Inventors: Eric Wayne Mahurin, Ajay Anant Ingle
Apparatus and method for complex multiply and accumulate

Patent number: 10489154

Abstract: An embodiment of the invention is a processor including execution circuitry to calculate, in response to a decoded instruction, a result of a complex multiply-accumulate of a first complex number, a second complex number, and a third complex number. The calculation includes a first operation to calculate a first term of a real component of the result and a first term of the imaginary component of the result. The calculation also includes a second operation to calculate a second term of the real component of the result and a second term of the imaginary component of the result. The processor also includes a decoder to decode an instruction to generate the decoded instruction and a first source register, a second source register, and a source and destination register to provide the first complex number, the second complex number, and the third complex number, respectively.

Type: Grant

Filed: November 28, 2017

Date of Patent: November 26, 2019

Assignee: Intel Corporation

Inventors: Robert Valentine, Mark Charney, Raanan Sade, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Roman S. Dubtsov
Synchronization of execution threads on a multi-threaded processor

Patent number: 10481911

Abstract: Method and apparatus are provided for synchronizing execution of a plurality of threads on a multi-threaded processor. A program executed by a thread can have a number of synchronization points corresponding to points where execution is to be synchronized with another thread. Execution of a thread is paused when it reaches a synchronization point until at least one other thread with which it is intended to be synchronized reaches a corresponding synchronization point. Execution is subsequently resumed. A control core maintains status data for threads and can cause a thread that is ready to run to use execution resources that were occupied by a thread that is waiting for a synchronization event.

Type: Grant

Filed: February 11, 2014

Date of Patent: November 19, 2019

Assignee: Imagination Technologies Limited

Inventor: Yoong Chert Foo
Sliding window operation

Patent number: 10459731

Abstract: A first register has a lane storing first input data and a second register has a lane storing second input data elements. A width of the lane of the second register is equal to a width of the lane of the first register. A single-instruction-multiple-data (SIMD) lane has a lane width equal to the width of the lane of the first register. The SIMD lane is configured to perform a sliding window operation on the first input data elements in the lane of the first register and the second input data elements in the lane of the second register. Performing the sliding window operation includes determining a result based on a first input data element stored in a first position of the first register and a second input data element stored in a second position of the second register. The second position is different from the first position.

Type: Grant

Filed: July 20, 2015

Date of Patent: October 29, 2019

Assignee: QUALCOMM Incorporated

Inventors: Eric Mahurin, Jakub Pawel Golab
System and method for store fusion

Patent number: 10459726

Abstract: Described herein is a system and method for store fusion that fuses small store operations into fewer, larger store operations. The system detects that a pair of adjacent operations are consecutive store operations, where the adjacent micro-operations refers to micro-operations flowing through adjacent dispatch slots and the consecutive store micro-operations refers to both of the adjacent micro-operations being store micro-operations. The consecutive store operations are then reviewed to determine if the data sizes are the same and if the store operation addresses are consecutive. The two store operations are then fused together to form one store operation with twice the data size and one store data HI operation.

Type: Grant

Filed: November 27, 2017

Date of Patent: October 29, 2019

Assignee: ADVANCED MICRO DEVICES, INC.

Inventor: John M. King
Apparatus and method for complex multiplication

Patent number: 10452394

Abstract: An embodiment of the invention is a processor including execution circuitry to calculate, in response to a decoded instruction, a result of a complex multiplication of a first complex number and a second complex number. The calculation includes a first operation to calculate a first term of a real component of the result and a first term of the imaginary component of the result. The calculation also includes a second operation to calculate a second term of the real component of the result and a second term of the imaginary component of the result. The processor also includes a decoder, a first source register, and a second source register. The decoder is to decode an instruction to generate the decoded instruction. The first source register is to provide the first complex number and the second source register is to provide the second complex number.

Type: Grant

Filed: November 28, 2017

Date of Patent: October 22, 2019

Assignee: Intel Corporation

Inventors: Robert Valentine, Mark Charney, Raanan Sade, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Roman S. Dubstov

prev … 4 5 6 7 8 9 10 11 12 … next