Patents Examined by Michael J Metzger

Robust, efficient multiprocessor-coprocessor interface

Patent number: 11966737

Abstract: Systems and methods for an efficient and robust multiprocessor-coprocessor interface that may be used between a streaming multiprocessor and an acceleration coprocessor in a GPU are provided. According to an example implementation, in order to perform an acceleration of a particular operation using the coprocessor, the multiprocessor: issues a series of write instructions to write input data for the operation into coprocessor-accessible storage locations, issues an operation instruction to cause the coprocessor to execute the particular operation; and then issues a series of read instructions to read result data of the operation from coprocessor-accessible storage locations to multiprocessor-accessible storage locations.

Type: Grant

Filed: September 2, 2021

Date of Patent: April 23, 2024

Assignee: NVIDIA CORPORATION

Inventors: Ronald Charles Babich, Jr., John Burgess, Jack Choquette, Tero Karras, Samuli Laine, Ignacio Llamas, Gregory Muthler, William Parsons Newhall, Jr.
Computing resource management with fast sorting using vector instructions

Patent number: 11947963

Abstract: Computing resource management is improved during fast sorting using vector instructions. The process includes: determining a pivot value and a pivot position in a data set (e.g., by sampling with vectors and determining the sample median), determining whether moving data in the sampled portion may be avoided (e.g., if it is constant-valued or already sorted) and, leveraging that determination to possibly avoid unnecessary data movement, sorting the data set. Some examples further determine the microarchitecture version of the computing device performing the sorting and select an implementation of sorting instruction that is tuned for that microarchitecture version (e.g., based on the number of vector registers and motherboard cache configuration). Some examples leverage a soft 3-way quicksort by finding data elements adjacent to the pivot position that also have the pivot value and adding a partition boundary at the end of the set of same-valued data elements.

Type: Grant

Filed: April 4, 2022

Date of Patent: April 2, 2024

Assignee: Microsoft Technology Licensing, LLC.

Inventors: Conor John Cunningham, Thierry Fevrier
Vector reduction processor

Patent number: 11940946

Abstract: A vector reduction circuit configured to reduce an input vector of elements comprises a plurality of cells, wherein each of the plurality of cells other than a designated first cell that receives a designated first element of the input vector is configured to receive a particular element of the input vector, receive, from another of the one or more cells, a temporary reduction element, perform a reduction operation using the particular element and the temporary reduction element, and provide, as a new temporary reduction element, a result of performing the reduction operation using the particular element and the temporary reduction element. The vector reduction circuit also comprises an output circuit configured to provide, for output as a reduction of the input vector, a new temporary reduction element corresponding to a result of performing the reduction operation using a last element of the input vector.

Type: Grant

Filed: June 22, 2021

Date of Patent: March 26, 2024

Assignee: Google LLC

Inventors: Gregory Michael Thorson, Andrew Everett Phelps, Olivier Temam
Operating a quantum processor in a heterogeneous computing architecture

Patent number: 11941482

Abstract: In some aspects, a heterogeneous computing system includes a quantum processor unit and a classical processor unit. In some instances, variables defined by a computer program are stored in a classical memory in the heterogeneous computing system. The computer program is executed in the heterogeneous computing system by operation of the quantum processor unit and the classical processor unit. Instructions are generated for the quantum processor by a host processor unit based on values of the variables stored in the classical memory. The instructions are configured to cause the quantum processor unit to perform a data processing task defined by the computer program. The values of the variables are updated in the classical memory based on output values generated by the quantum processor unit. The classical processor unit processes the updated values of the variables.

Type: Grant

Filed: March 9, 2021

Date of Patent: March 26, 2024

Assignee: Rigetti & Co, LLC

Inventors: Chad Tyler Rigetti, William J. Zeng, Dane Christoffer Thompson
Vector reductions using shared scratchpad memory

Patent number: 11934826

Abstract: Methods, systems, and apparatus, including computer-readable media, are described for performing vector reductions using a shared scratchpad memory of a hardware circuit having processor cores that communicate with the shared memory. For each of the processor cores, a respective vector of values is generated based on computations performed at the processor core. The shared memory receives the respective vectors of values from respective resources of the processor cores using a direct memory access (DMA) data path of the shared memory. The shared memory performs an accumulation operation on the respective vectors of values using an operator unit coupled to the shared memory. The operator unit is configured to accumulate values based on arithmetic operations encoded at the operator unit. A result vector is generated based on performing the accumulation operation using the respective vectors of values.

Type: Grant

Filed: November 19, 2021

Date of Patent: March 19, 2024

Assignee: Google LLC

Inventors: Thomas Norrie, Gurushankar Rajamani, Andrew Everett Phelps, Matthew Leever Hedlund, Norman Paul Jouppi
Method for scheduling a set of computing tasks in a supercomputer

Patent number: 11934870

Abstract: A method for scheduling computing tasks on a supercomputer including offline reinforcement learning (OFRL) of a scheduler on a database (LDB). The database includes at least one execution history (HIST) that includes the state (LHPCS) of a learning supercomputer at several moments (T, T?1); the actions (LACT) related to the scheduling of learning tasks on the learning supercomputer at those moments (T); and a reward (REW) related to each task. The method also includes the use of the scheduler trained on the computing tasks to be scheduled.

Type: Grant

Filed: September 13, 2022

Date of Patent: March 19, 2024

Assignees: BULL SAS, LE COMMISSARIAT À L'ÉNERGIE ATOMIQUE ET AUX ÉNERGIES ALTERNATIVES

Inventor: Pierre Seroul
Computer system and method for validation of parallelized computer programs

Patent number: 11934835

Abstract: Validation of correct derivation of a parallel program from a sequential program for deployment of the parallel program to a plurality of processing units is described. The system receives the program code of the sequential program and the program code of the parallel program. A static analysis component computes a first control flow graph, and determines dependencies within the sequential program code. It further computes a further control flow graph for each thread or process of the parallel program and determines dependencies within the further control flow graphs. A checking component checks if the sequential program and the derived parallel program are semantically equivalent by comparing the respective first and further control flow graphs and respective dependencies. A release component declares a correct derivation state for the parallel program to qualify the parallel program for deployment if the derived parallel program and the sequential program are semantically equivalent.

Type: Grant

Filed: January 18, 2023

Date of Patent: March 19, 2024

Assignee: emmtrix Technologies GmbH

Inventor: Timo Stripf
Process parasitism-based branch prediction method and device for serverless computing, electronic device, and non-transitory readable storage medium

Patent number: 11915003

Abstract: Disclosed are a process parasitism-based branch prediction method and device for serverless computing, an electronic device, and a readable storage medium. The method includes: receiving a calling request of a user for a target function; when capacity expansion is required, scheduling a container executing the target function to a new server that has not executed the target function in a preset period of time, wherein a parasitic process is pre-added to a base image of the container; triggering the parasitic process when the container is initialized on the new server, the parasitic process being used for initiating a system call, and triggering a system kernel to select a target template function according to the type of the target function and copying the target template function N times; using execution data of the copied N target template functions as training data to train a branch predictor on the new server.

Type: Grant

Filed: August 31, 2023

Date of Patent: February 27, 2024

Assignee: SHENZHEN INSTITUTES OF ADVANCED TECHNOLOGY

Inventors: Kejiang Ye, Yanying Lin, Chengzhong Xu
Memory-based distributed processor architecture

Patent number: 11914487

Abstract: Distributed processors and methods for compiling code for execution by distributed processors are disclosed. In one implementation, a distributed processor may include a substrate; a memory array disposed on the substrate; and a processing array disposed on the substrate. The memory array may include a plurality of discrete memory banks, and the processing array may include a plurality of processor subunits, each one of the processor subunits being associated with a corresponding, dedicated one of the plurality of discrete memory banks. The distributed processor may further include a first plurality of buses, each connecting one of the plurality of processor subunits to its corresponding, dedicated memory bank, and a second plurality of buses, each connecting one of the plurality of processor subunits to another of the plurality of processor subunits.

Type: Grant

Filed: August 9, 2021

Date of Patent: February 27, 2024

Assignee: NeuroBlade Ltd.

Inventors: Elad Sity, Eliad Hillel
Vector reading and writing method, vector register system, device, and medium

Patent number: 11907716

Abstract: The present disclosure discloses a method for vector reading-writing, a vector-register system, a device and a medium. When a vector-writing instruction is obtained, by using a vector-register controller, a to-be-written-vector address space is converted into a to-be-written-vector-register-file bit address, and, for a nonstandard vector, by using a nonstandard-vector converting unit, after the nonstandard vector is converted into a to-be-written nonstandard vector, and, subsequently, writing is performed, to realize the saving of vector data of any format. When a vector-reading instruction is obtained, by using the vector-register controller, according to the to-be-read width and the to-be-read length, after the to-be-read-vector address space is converted into a to-be-read-vector-register-file bit address, and, subsequently, reading is performed, to realize the reading of vector data of any format.

Type: Grant

Filed: April 28, 2022

Date of Patent: February 20, 2024

Assignee: INSPUR SUZHOU INTELLIGENT TECHNOLOGY CO., LTD.

Inventors: Lingjun Kong, Zhaochun Pang, Qi Song
Systems and methods to skip inconsequential matrix operations

Patent number: 11900114

Abstract: Disclosed embodiments relate to systems and methods to skip inconsequential matrix operations. In one example, a processor includes decode circuitry to decode an instruction having fields to specify an opcode and locations of first source, second source, and destination matrices, the opcode indicating that the processor is to multiply each element at row M and column K of the first source matrix with a corresponding element at row K and column N of the second source matrix, and accumulate a resulting product with previous contents of a corresponding element at row M and column N of the destination matrix, the processor to skip multiplications that, based on detected values of corresponding multiplicands, would generate inconsequential results; scheduling circuitry to schedule execution of the instruction; and execution circuitry to execute the instructions as per the opcode.

Type: Grant

Filed: August 1, 2022

Date of Patent: February 13, 2024

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, William Rash, Subramaniam Maiyuran, Varghese George, Rajesh Sankaran
Systems and methods for performing 16-bit floating-point matrix dot product instructions

Patent number: 11893389

Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.

Type: Grant

Filed: March 27, 2023

Date of Patent: February 6, 2024

Assignee: Intel Corporation

Inventors: Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Raanan Sade, Menachem Adelman, Zeev Sperber, Amit Gradstein, Simon Rubanovich
Branch prediction based on coherence operations in processors

Patent number: 11886884

Abstract: In an embodiment, a processor includes a branch prediction circuit and a plurality of processing engines. The branch prediction circuit is to: detect a coherence operation associated with a first memory address; identify a first branch instruction associated with the first memory address; and predict a direction for the identified branch instruction based on the detected coherence operation. Other embodiments are described and claimed.

Type: Grant

Filed: November 12, 2019

Date of Patent: January 30, 2024

Assignee: Intel Corporation

Inventors: Christopher Wilkerson, Binh Pham, Patrick Lu, Jared Warner Stark, IV
Data processing apparatus and related products with descriptor management

Patent number: 11886880

Abstract: The present disclosure provides a data processing apparatus and related products. The products include a control module including an instruction caching unit, an instruction processing unit, and a storage queue unit. The instruction caching unit is configured to store computation instructions associated with an artificial neural network operation; the instruction processing unit is configured to parse the computation instructions to obtain a plurality of operation instructions; and the storage queue unit is configured to store an instruction queue, where the instruction queue includes a plurality of operation instructions or computation instructions to be executed in the sequence of the queue. By adopting the above-mentioned method, the present disclosure can improve the operation efficiency of related products when performing operations of a neural network model.

Type: Grant

Filed: June 24, 2022

Date of Patent: January 30, 2024

Assignee: CAMBRICON TECHNOLOGIES CORPORATION LIMITED

Inventors: Shaoli Liu, Bingrui Wang, Xiaoyong Zhou, Yimin Zhuang, Huiying Lan, Jun Liang, Hongbo Zeng
Pipelines for secure multithread execution

Patent number: 11886882

Abstract: Described herein are systems and methods for secure multithread execution. For example, some methods include fetching an instruction of a first thread from a memory into a processor pipeline that is configured to execute instructions from two or more threads in parallel using execution units of the processor pipeline; detecting that the instruction has been designated as a sensitive instruction; responsive to detection of the sensitive instruction, disabling execution of instructions of threads other than the first thread in the processor pipeline during execution of the sensitive instruction by an execution unit of the processor pipeline; executing the sensitive instruction using an execution unit of the processor pipeline; and, responsive to completion of execution of the sensitive instruction, enabling execution of instructions of threads other than the first thread in the processor pipeline.

Type: Grant

Filed: April 5, 2022

Date of Patent: January 30, 2024

Assignee: Marvell Asia Pte, Ltd.

Inventor: Shubhendu Sekhar Mukherjee
Method and apparatus with bit-serial data processing of a neural network

Patent number: 11880768

Abstract: A processor-implemented data processing method includes encoding a plurality of weights of a filter of a neural network using an inverted two's complement fixed-point format; generating weight data based on values of the encoded weights corresponding to same filter positions of a plurality of filters; and performing an operation on the weight data and input activation data using a bit-serial scheme to control when to perform an activation function with respect to the weight data and input activation data.

Type: Grant

Filed: March 1, 2023

Date of Patent: January 23, 2024

Assignees: Samsung Electronics Co., Ltd., Seoul National University R&DB Foundation

Inventors: Seungwon Lee, Dongwoo Lee, Kiyoung Choi, Sungbum Kang
Inter-process serving of machine learning features from mapped memory for machine learning models

Patent number: 11875151

Abstract: Inter-process serving of machine learning features from mapped memory for machine learning models is described. ML features are populated in a data structure that is serialized. State data is stored that indicates that reader process(es) are to read from a first memory mapped data file and not a second memory mapped data file. The serialized bytes are stored in the second memory mapped data file and the state data is updated to indicate that the reader process(es) are to read from the second memory mapped data file. A request is received and parsed to prepare keys from attributes of the request. Based on the state data, the serialized bytes are read from the second memory mapped data file that correspond to the keys. The serialized bytes are deserialized and copied to a data structure available to an inference algorithm.

Type: Grant

Filed: July 26, 2023

Date of Patent: January 16, 2024

Assignee: CLOUDFLARE, INC.

Inventor: Oleksandr Bocharov
Computer processor for higher precision computations using a mixed-precision decomposition of operations

Patent number: 11868770

Abstract: Embodiments detailed herein relate to arithmetic operations of float-point values. An exemplary processor includes decoding circuitry to decode an instruction, where the instruction specifies locations of a plurality of operands, values of which being in a floating-point format. The exemplary processor further includes execution circuitry to execute the decoded instruction, where the execution includes to: convert the values for each operand, each value being converted into a plurality of lower precision values, where an exponent is to be stored for each operand; perform arithmetic operations among lower precision values converted from values for the plurality of the operands; and generate a floating-point value by converting a resulting value from the arithmetic operations into the floating-point format and store the floating-point value.

Type: Grant

Filed: December 29, 2022

Date of Patent: January 9, 2024

Assignee: INTEL CORPORATION

Inventors: Gregory Henry, Alexander Heinecke
Encoding and decoding variable length instructions

Patent number: 11868775

Abstract: Methods of encoding and decoding are described which use a variable number of instruction words to encode instructions from an instruction set, such that different instructions within the instruction set may be encoded using different numbers of instruction words. To encode an instruction, the bits within the instruction are re-ordered and formed into instruction words based upon their variance as determined using empirical or simulation data. The bits in the instruction words are compared to corresponding predicted values and some or all of the instruction words that match the predicted values are omitted from the encoded instruction.

Type: Grant

Filed: May 5, 2022

Date of Patent: January 9, 2024

Assignee: Imagination Technologies Limited

Inventors: Simon Thomas Nield, James McCarthy
Dynamic instrumentation via user-level mechanisms

Patent number: 11868781

Abstract: In one embodiment, a method includes accessing a loaded but paused source process executable and disassembling the source process executable to identify a system call to be instrumented and an adjacent relocatable instruction. Instrumenting the system call includes building a trampoline for the system call that includes a check flag instruction at or near an entry point to the trampoline and two areas of the trampoline that are selectively executed according to results of the check flag instruction. Building a first area of the trampoline includes providing instructions to execute a relocated copy of the adjacent relocatable instruction and return flow to an address immediately following the adjacent relocatable instruction. Building a second area of the trampoline includes providing instructions to invoke at least one handler associated with executing a relocated copy of the system call and return flow to an address immediately following the system call.

Type: Grant

Filed: March 24, 2022

Date of Patent: January 9, 2024

Assignee: Sysdig, Inc.

Inventor: Loris Degioanni

1 2 3 4 5 … next