Vector Processor Operation Patents (Class 712/7)

Sequential (Class 712/8)

Concurrent (Class 712/9)

Bypassing cache directory lookups for processing-in-memory instructions

Patent number: 12265470

Abstract: Selectively bypassing cache directory lookups for processing-in-memory instructions is described. In one example, a system maintains information describing a status—clean or dirty—of a memory address, where a dirty status indicates that the memory address is modified in a cache and thus different than the memory address as represented in system memory. A processing-in-memory request involving the memory address is assigned a cache directory bypass bit based on the status of the memory address. The cache directory bypass bit for a processing-in-memory request controls whether a cache directory lookup is performed after the processing-in-memory request is issued by a processor core and before the processing-in-memory request is executed by a processing-in-memory component.

Type: Grant

Filed: September 29, 2023

Date of Patent: April 1, 2025

Assignee: Advanced Micro Devices, Inc.

Inventors: Travis Henry Boraten, Jagadish B. Kotra, David Andrew Werner
Systems and methods for mapping hardware FIFO to processor address space

Patent number: 12222884

Abstract: An apparatus for a microprocessor computer system and method for configuring the same where said microprocessor computer system comprises a processor core and at least one hardware buffer FIFO with memory-mapped head and tail that handles data movement among the processor cores, networks, raw data input and outputs, and memory. The method for configuring said microprocessor computer system comprises utilizing a FIFO auxiliary processor to process said data traversing said hardware FIFO; utilizing said hardware FIFOs to efficiently pipe data through functional blocks; and utilizing a FIFO controller to perform DMA operations that include non-unit-stride access patterns and transfers among processor cores, networks, raw data input and outputs, memory, and other memory-mapped hardware FIFOs.

Type: Grant

Filed: July 18, 2022

Date of Patent: February 11, 2025

Assignee: United States of America as represented by the Secretary of the Air Force

Inventors: Mark H. Linderman, Qing Wu, Dennis Fitzgerald
Leveraging processing in memory registers as victim buffers

Patent number: 12147338

Abstract: In accordance with the described techniques for leveraging processing in memory registers as victim buffers, a computing device includes a memory, a processing in memory component having registers for data storage, and a memory controller having a victim address table that includes at least one address of a row of the memory that is stored in the registers. The memory controller receives a request to access the row of the memory and accesses data of the row from the registers based on the address of the row being included in the victim address table.

Type: Grant

Filed: December 27, 2022

Date of Patent: November 19, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Jagadish B Kotra, Dong Kai Wang
Matrix data scatter and gather between rows and irregularly spaced memory locations

Patent number: 12112167

Abstract: Embodiments for gathering and scattering matrix data by row are disclosed. In an embodiment, a processor includes a storage matrix, a decoder, and execution circuitry. The decoder is to decode an instruction having a format including an opcode field to specify an opcode and a first operand field to specify a set of irregularly spaced memory locations. The execution circuitry is to, in response to the decoded instruction, calculate a set of addresses corresponding to the set of irregularly spaced memory locations and transfer a set of rows of data between the storage and the set of irregularly spaced memory locations.

Type: Grant

Filed: June 27, 2020

Date of Patent: October 8, 2024

Assignee: Intel Corporation

Inventors: Christopher J. Hughes, Alexander F. Heinecke, Robert Valentine, Menachem Adelman, Evangelos Georganas, Mark J. Charney, Nikita A. Shustrov, Sara Baghsorkhi
Machine code instruction

Patent number: 12112164

Abstract: A processing device comprising a plurality of operand registers, wherein a first subset of the operand registers are configured to store state information for a plurality of bins, comprising a range of values and a bin count associated with each respective bin, wherein a second subset of the operand registers is configured to store a vector of floating-point values; and an execution unit configured to execute a first instruction taking the state information for the plurality of bins and the vector of floating-point values as operands, and in response to execution of the first instruction, for each of the floating-point values: identify based on an exponent of the respective floating-point value, each one of the plurality of bins for which the respective floating-point value falls within the associated range of values; and increment the bin count associated with the identified bins.

Type: Grant

Filed: February 28, 2023

Date of Patent: October 8, 2024

Assignee: GRAPHCORE LIMITED

Inventors: Alan Alexander, Simon Knowles, Godfrey Da Costa, Badreddine Noune
Texture unit circuit in neural network processor

Patent number: 12079724

Abstract: Embodiments of the present disclosure relate to a texture unit circuit in a neural processor circuit. The neural processor circuit includes a tensor access operation circuit with the texture unit circuit, a data processor circuit, and at least one neural engine circuit. The texture unit circuit fetches a source tensor from a system memory by referencing an index tensor in the system memory representing indexing information into the source tensor. The data processor circuit stores an output version of the source tensor obtained from the tensor access operation circuit and sends the output version of the source tensor as multiple of units of input data to the at least one neural engine circuit. The at least one neural engine circuit performs at least convolution operations on the units of input data and at least one kernel to generate output data.

Type: Grant

Filed: October 10, 2023

Date of Patent: September 3, 2024

Assignee: APPLE INC.

Inventor: Christopher L. Mills
System for non-planar specular reflections in hybrid ray tracing

Patent number: 12051145

Abstract: The present invention teaches a real-time hybrid ray tracing system for non-planar specular reflections. The high complexity of a non-planar surface is reduced to low complexity of multiple small planar surfaces. Advantage is taken of the planar nature of triangles that comprise building blocks of a non-planar surface. All secondary rays bouncing from a given surface triangle toward object triangles keep a close direction to each other. A collective control of secondary rays is enabled by this closeness and by decoupling secondary rays from primary rays. The result is a high coherence of secondary rays.

Type: Grant

Filed: March 14, 2022

Date of Patent: July 30, 2024

Assignee: Snap Inc.

Inventors: Reuven Bakalash, Ron Weitzman, Elad Haviv
Mechanism for reducing coherence directory controller overhead for near-memory compute elements

Patent number: 12008378

Abstract: A parallel processing (PP) level coherence directory, also referred to as a Processing In-Memory Probe Filter (PimPF), is added to a coherence directory controller. When the coherence directory controller receives a broadcast PIM command from a host, or a PIM command that is directed to multiple memory banks in parallel, the PimPF accelerates processing of the PIM command by maintaining a directory for cache coherence that is separate from existing system level directories in the coherence directory controller. The PimPF maintains a directory according to address signatures that define the memory addresses affected by a broadcast PIM command. Two implementations are described: a lightweight implementation that accelerates PIM loads into registers, and a heavyweight implementation that accelerates both PIM loads into registers and PIM stores into memory.

Type: Grant

Filed: April 10, 2023

Date of Patent: June 11, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Varun Agrawal, Yasuko Eckert
Content-addressable processing engine

Patent number: 12001841

Abstract: A content-addressable processing engine, also referred to herein as CAPE, is provided. Processing-in-memory (PIM) architectures attempt to overcome the von Neumann bottleneck by combining computation and storage logic into a single component. CAPE provides a general-purpose PIM microarchitecture that provides acceleration of vector operations while being programmable with standard reduced instruction set computing (RISC) instructions, such as RISC-V instructions with standard vector extensions. CAPE can be implemented as a standalone core that specializes in associative computing, and that can be integrated in a tiled multicore chip alongside other types of compute engines. Certain embodiments of CAPE achieve average speedups of 14× (up to 254×) over an area-equivalent out-of-order processor core tile with three levels of caches across a diverse set of representative applications.

Type: Grant

Filed: September 22, 2022

Date of Patent: June 4, 2024

Assignee: Cornell University

Inventors: José F. Martínez, Helena Caminal, Kailin Yang, Khalid Al-Hawaj, Christopher Batten
Reconfigurable parallel processor with stacked columns forming a circular data path

Patent number: 11995030

Abstract: Processors, systems and methods are provided for thread level parallel processing. A processor may include a plurality of columns of vector processing units arranged in a two-dimensional column array with a plurality of column stacks placed side-by-side in a first direction and each column stack having two columns stacked in a second direction and a temporary storage buffer. Each column may include a processing element (PE) that has a vector Arithmetic Logic Unit (ALU) to perform arithmetic operations in parallel threads. At a first end of the column array in the first direction, two columns in the column stack are coupled to the temporary storage buffer for one-way data flow. At a second end of the column array in the first direction, two columns are coupled to each other for one-way data flow. The column array and the temporary storage buffer may form a one-way circular data path.

Type: Grant

Filed: November 10, 2022

Date of Patent: May 28, 2024

Assignee: AzurEngine Technologies, Inc.

Inventors: Ryan Braidwood, Yuan Li, Jianbin Zhu, Toshio Nagata
Duplicated data sequence transmissions with reduced peak to average power ratio

Patent number: 11956179

Abstract: Duplicated physical layer convergence protocol (PLCP) protocol data unit (PPDU) transmission is described for a wireless device with reduced peak-to-average power ratios (PAPR). One example includes obtaining a first sub-PPDU from to PPDU that includes a data field with data content. A second sub-PPDU may also be obtained by duplicating the PPDU including the data content of the PPDU. At least one of a phase rotation, a phase offset, or a phase ramp is applied to at least a portion of a second set of sub-carrier of a wideband channel. The first sub-PPDU is transmitted on a first set of sub-carriers of the wideband channel and the second sub-PPDU is transmitted on the second set of sub-carriers of the wideband channel.

Type: Grant

Filed: July 21, 2021

Date of Patent: April 9, 2024

Assignee: QUALCOMM Incorporated

Inventors: Kanke Wu, Jialing Li Chen, Bin Tian
Reconfigurable processing-in-memory logic using look-up tables

Patent number: 11947967

Abstract: An example system implementing a processing-in-memory pipeline includes: a memory array to store a plurality of look-up tables (LUTs) and data; a control block coupled to the memory array, the control block to control a computational pipeline by activating one or more LUTs of the plurality of LUTs; and a logic array coupled to the memory array and the control block, the logic array to perform, based on control inputs received from the control block, logic operations on the activated LUTs and the data.

Type: Grant

Filed: August 1, 2022

Date of Patent: April 2, 2024

Inventor: Dmitri Yudanov
Initializing optimization solvers

Patent number: 11915131

Abstract: In an approach to improve the efficiency of solving problem instances by utilizing a machine learning model to solve a sequential optimization problem. Embodiments of the present invention receive a sequential optimization problem for solving and utilize a random initialization to solve a first instance of the sequential optimization problem. Embodiments of the present invention learning, by a computing device a machine learning model, based on a previously stored solution to the first instance of the sequential optimization problem. Additionally, embodiments of the present invention generate, by the machine learning model, one or more subsequent approximate solutions to the sequential optimization problem; and output, by a user interface on the computing device, the one or more subsequent approximate solutions to the sequential optimization problem.

Type: Grant

Filed: November 23, 2020

Date of Patent: February 27, 2024

Assignee: International Business Machines Corporation

Inventors: Kartik Ahuja, Amit Dhurandhar, Karthikeyan Shanmugam, Kush Raj Varshney
Vector processor with vector first and multiple lane configuration

Patent number: 11907158

Abstract: A vector processor with a vector first and multi-lane configuration. A vector operation for a vector processor can include a single vector or multiple vectors as input. Multiple lanes for the input can be used to accelerate the operation in parallel. And, a vector first configuration can enhance the multiple lanes by reducing the number of elements accessed in the lanes to perform the operation in parallel.

Type: Grant

Filed: December 28, 2020

Date of Patent: February 20, 2024

Assignee: Micron Technology, Inc.

Inventor: Steven Jeffrey Wallach
Systems, methods, and apparatus for tile configuration

Patent number: 11847452

Abstract: Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address; and execution circuitry to execute the decoded instruction to set a tile configuration for the processor to utilize tiles in matrix operations based on a description retrieved from the memory address, wherein a tile a set of 2-dimensional registers are discussed.

Type: Grant

Filed: June 28, 2021

Date of Patent: December 19, 2023

Assignee: Intel Corporation

Inventors: Menachem Adelman, Robert Valentine, Zeev Sperber, Mark J. Charney, Bret L. Toll, Rinat Rappoport, Jesus Corbal, Dan Baum, Alexander F. Heinecke, Elmoustapha Ould-Ahmed-Vall, Yuri Gebil, Raanan Sade
Compile device, compile method, and non-transitory computer readable medium for increasing a speed of a program

Patent number: 11829754

Abstract: A vector load instruction generating unit of a compile device generates an instruction to load a “first group of data units”, which is used as an element A[i] in iterative calculation processing, from a memory into a first vector register in a state of being packed in units of 1-word. Each data unit is (1/2)k word. The vector load instruction generating unit generates an instruction to load a second group of data units, which is used as an element [i+2k] into a second vector register. A vector shift double instruction generating unit generates an instruction to cause a part of a data string, which is obtained by shifting data of the first vector Register and the second register by (1/2)k word as a series of data string, to be stored in a third vector register in a state of being packed in units of 1-word.

Type: Grant

Filed: October 11, 2019

Date of Patent: November 28, 2023

Assignee: NEC CORPORATION

Inventor: Koichi Masuda
Method and apparatus for vector sorting using vector permutation logic

Patent number: 11829300

Abstract: A method for sorting of a vector in a processor is provided that includes performing, by the processor in response to a vector sort instruction, generating a control input vector for vector permutation logic comprised in the processor based on values in lanes of the vector and a sort order for the vector indicated by the vector sort instruction and storing the control input vector in a storage location.

Type: Grant

Filed: October 3, 2022

Date of Patent: November 28, 2023

Assignee: Texas Instruments Incorporated

Inventors: Timothy David Anderson, Mujibur Rahman
Unmap operation techniques

Patent number: 11755490

Abstract: Methods, systems, and devices for unmap operation techniques are described. A memory system may include a volatile memory device and a non-volatile memory device. The memory system may receive a set of unmap commands that each include a logical block address associated with unused data. The memory system may determine whether one or more parameters associated with the set of unmap commands satisfy a threshold. If the one or more parameters satisfy the threshold, the memory system may select a first procedure for performing the set of unmap commands different from a second procedure (e.g., a default procedure) for performing the set of unmap commands and may perform the set of unmap commands using the first procedure. If the one or more parameters do not satisfy the threshold, the memory system may perform the set of unmap commands using the second procedure.

Type: Grant

Filed: December 15, 2020

Date of Patent: September 12, 2023

Assignee: Micron Technology, Inc.

Inventors: Giuseppe Cariello, Luca Porzio, Roberto Izzi, Jonathan S. Parry
Adler assist instructions

Patent number: 11748098

Abstract: A processor is provided with a register file comprising a plurality of vector registers, and an execution core coupled to the register file, where the execution core is configured to execute a set of checksum instructions with a first checksum instruction to specify a first vector operand, a second vector operand, and a result vector operand, where the first vector operand is in a first vector register of the plurality of vector registers, the second vector operand is in a second register of the plurality of vector registers, and the result vector operand is to be written to a third vector register of the plurality of vector registers, and to execute the first checksum instruction, the execution core is configured to accumulate bytes from the first vector operand and the second vector operand into a first portion of the result vector operand and add the accumulated bytes from the first vector operand and the second vector operand to a second portion of the result vector operand to generate the second portion

Type: Grant

Filed: May 5, 2021

Date of Patent: September 5, 2023

Assignee: Apple Inc.

Inventors: Ali Sazegari, Chris Cheng-Chieh Lee
Fault buffer for tracking page faults in unified virtual memory system

Patent number: 11741015

Abstract: A system for managing virtual memory. The system includes a first processing unit configured to execute a first operation that references a first virtual memory address. The system also includes a first memory management unit (MMU) associated with the first processing unit and configured to generate a first page fault upon determining that a first page table that is stored in a first memory unit associated with the first processing unit does not include a mapping corresponding to the first virtual memory address. The system further includes a first copy engine associated with the first processing unit. The first copy engine is configured to read a first command queue to determine a first mapping that corresponds to the first virtual memory address and is included in a first page state directory. The first copy engine is also configured to update the first page table to include the first mapping.

Type: Grant

Filed: August 18, 2022

Date of Patent: August 29, 2023

Assignee: NVIDIA Corporation

Inventors: Jerome F. Duluk, Jr., Cameron Buschardt, Sherry Cheung, James Leroy Deming, Samuel H. Duncan, Lucien Dunning, Robert George, Arvind Gopalakrishnan, Mark Hairgrove, Chenghuan Jia, John Mashey
Partial data type promotion to exploit efficient vectorization in microprocessors

Patent number: 11740880

Abstract: Aspects of the invention include a compiler detecting an expression in a loop that includes elements of mixed data types. The compiler then promotes elements of a sub-expression of the expression to a same intermediate data type. The compiler then calculates the sub-expression using the elements of the same intermediate data type.

Type: Grant

Filed: September 8, 2021

Date of Patent: August 29, 2023

Assignee: International Business Machines Corporation

Inventors: Biplob Mishra, Satish Kumar Sadasivam, Puneeth A. H. Bhat
Vector population count determination via comparsion iterations in memory

Patent number: 11663005

Abstract: Examples of the present disclosure provide apparatuses and methods for determining a vector population count in a memory. An example method comprises determining, using sensing circuitry, a vector population count of a number of fixed length elements of a vector stored in a memory array.

Type: Grant

Filed: January 15, 2021

Date of Patent: May 30, 2023

Assignee: Micron Technology, Inc.

Inventor: Sanjay Tiwari
Initialization of parameters for machine-learned transformer neural network architectures

Patent number: 11663488

Abstract: An online system trains a transformer architecture by an initialization method which allows the transformer architecture to be trained without normalization layers of learning rate warmup, resulting in significant improvements in computational efficiency for transformer architectures. Specifically, an attention block included in an encoder or a decoder of the transformer architecture generates the set of attention representations by applying a key matrix to the input key, a query matrix to the input query, a value matrix to the input value to generate an output, and applying an output matrix to the output to generate the set of attention representations. The initialization method may be performed by scaling the parameters of the value matrix and the output matrix with a factor that is inverse to a number of the set of encoders or a number of the set of decoders.

Type: Grant

Filed: February 5, 2021

Date of Patent: May 30, 2023

Assignee: THE TORONTO-DOMINION BANK

Inventors: Maksims Volkovs, Xiao Shi Huang, Juan Felipe Perez Vallejo
Instruction and logic for systolic dot product with accumulate

Patent number: 11640297

Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes systolic dot product circuitry to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.

Type: Grant

Filed: June 15, 2021

Date of Patent: May 2, 2023

Assignee: Intel Corporation

Inventors: Subramaniam Maiyuran, Guei-Yuan Lueh, Supratim Pal, Ashutosh Garg, Chandra S. Gurram, Jorge E. Parra, Junjie Gu, Konrad Trifunovic, Hong Bin Liao, Mike B. MacPherson, Shubh B. Shah, Shubra Marwaha, Stephen Junkins, Timothy R. Bauer, Varghese George, Weiyu Chen
Circuits and methods for vector sorting in a microprocessor

Patent number: 11593106

Abstract: Vector sort circuits that can be used to accelerate sorting operations in a vector processor. When a new data element is received, the vector sort circuit can read multiple existing data elements from a vector-sort database in parallel, compare metrics of the existing data elements to a metric of the new data element, and output updated data elements to the vector-sort database based on the metrics. Depending on implementation, the vector-sort database can be maintained in sorted order, or the data elements can have assigned ranks indicating the sort order and the elements need not be stored in sorted order. A vector sort circuit can be incorporated into a vector sort functional unit of a microprocessor, and the instruction set of the microprocessor can include instructions that are executed by the vector sort functional unit using the vector sort circuit.

Type: Grant

Filed: September 24, 2021

Date of Patent: February 28, 2023

Assignee: Apple Inc.

Inventors: On Wa Yeung, Seydou N. Ba
System and method for instruction mapping in an out-of-order processor

Patent number: 11531549

Abstract: A system and corresponding method map instructions in an out-of-order (OoO) processor. The system comprises a mapper, integer snapshot circuitry, and floating-point (FP) snapshot circuitry. The mapper maps instructions by mapping integer and FP architectural registers (ARs) of the instructions to integer and FP physical registers of the OoO processor, respectively. The mapper records, via at least one present FP indicator, presence of FP ARs used as destinations in the instructions. The mapper copies, periodically, the integer mapper state to the integer snapshot circuitry and copies, intermittently, based on the at least one FP present indicator, the FP mapper state to the FP snapshot circuitry. Copies of the integer and FP mapper state in the integer and FP snapshot circuitry, respectively, improve performance for instruction unwinding caused, for example, by an exception, branch/jump mispredict, etc. By copying the FP mapper state, intermittently, power efficiency of the OoO processor is improved.

Type: Grant

Filed: March 31, 2021

Date of Patent: December 20, 2022

Assignee: Marvell Asia Pte, Ltd.

Inventor: David A. Carlson
Multiported parity scoreboard circuit

Patent number: 11455171

Abstract: A fast and frugal item-state tracking scoreboard circuit is disclosed. The scoreboard maintains per-item partial states across multiple memory circuits, enabling multiple lookups per clock cycle and multiple state updates per clock cycle. In an embodiment a scoreboard is used to schedule instructions in an out-of-order processor. Each clock cycle the scoreboard indicates the busy state of an instruction's registers and may update the busy state of the destination registers of issuing instructions and completing instructions. Applications include register tracking, function-unit tracking, and cache-line state tracking, in embodiments including processor cores (including superscalar, superpipelined, and multithreaded processors), accelerators, memory systems, and networks. In an embodiment, a register-busy scoreboard circuit is implemented using FPGA LUT RAM memory.

Type: Grant

Filed: May 29, 2020

Date of Patent: September 27, 2022

Assignee: Gray Research LLC

Inventor: Jan Stephen Gray
Methods and apparatus to improve optimizing loops with predictable recurring memory reads

Patent number: 11442713

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve loop optimization with predictable recurring memory reads (PRMRs). An example apparatus includes memory, and first processor circuitry to execute first instructions to at least identify one or more optimizations to convert a first loop into a second loop based on converting PRMRs of the first loop into loop-invariant PRMRs, the converting of the PRMRs in response to a quantity of the PRMRs satisfying a threshold, the second loop to execute in a single iteration corresponding to a quantity of iterations of the first loop, determine one or more optimization parameters based on the one or more optimizations, and compile second instructions based on the first processor circuitry processing the first loop based on the one or more optimization parameters associated with the one or more optimizations, the second instructions to be executed by the first or second processor circuitry.

Type: Grant

Filed: October 19, 2020

Date of Patent: September 13, 2022

Assignee: Intel Corporation

Inventors: Diego Luis Caballero de Gea, Hideki Ido, Eric N. Garcia
Simultaneous compute and graphics scheduling

Patent number: 11367160

Abstract: A parallel processing unit (e.g., a GPU), in some examples, includes a hardware scheduler and hardware arbiter that launch graphics and compute work for simultaneous execution on a SIMD/SIMT processing unit. Each processing unit (e.g., a streaming multiprocessor) of the parallel processing unit operates in a graphics-greedy mode or a compute-greedy mode at respective times. The hardware arbiter, in response to a result of a comparison of at least one monitored performance or utilization metric to a user-configured threshold, can selectively cause the processing unit to run one or more compute work items from a compute queue when the processing unit is operating in the graphics-greedy mode, and cause the processing unit to run one or more graphics work items from a graphics queue when the processing unit is operating in the compute-greedy mode. Associated methods and systems are also described.

Type: Grant

Filed: August 2, 2018

Date of Patent: June 21, 2022

Assignee: NVIDIA CORPORATION

Inventors: Rajballav Dash, Gregory Palmer, Gentaro Hirota, Lacky Shah, Jack Choquette, Emmett Kilgariff, Sriharsha Niverty, Milton Lei, Shirish Gadre, Omkar Paranjape, Lei Yang, Rouslan Dimitrov
Data processing

Patent number: 11354126

Abstract: Data processing apparatus comprises vector processing circuitry to selectively apply vector processing operations defined by vector processing instructions to generate one or more data elements of a data vector comprising a plurality of data elements at respective data element positions of the data vector, according to the state of respective predicate flags associated with the positions of the data vector; and generator circuitry to generate instruction sample data indicative of processing activities of the vector processing circuitry for selected ones of the vector processing instructions, instruction sample data indicating at least the state of the predicate flags at execution of the selected vector processing instructions.

Type: Grant

Filed: February 15, 2019

Date of Patent: June 7, 2022

Assignee: Arm Limited

Inventors: Michael John Williams, Nigel John Stephens
Method and system for determining irradiation dose

Patent number: 11219426

Abstract: A method and system for determining an irradiation dose. The method for determining the irradiation dose includes determining a pixel having a biological feature in a region of interest in a radiotherapy simulated locating image by using a retrospective label; extracting a local radiomics feature based on the pixel having the biological feature, in which the local radiomics feature includes a grayscale histogram intensity, a tumor shape feature, a textural feature, a Laplacian of Gaussian filtering feature, and a wavelet feature; acquiring the local radiomics features to be measured; identifying a positive region having the local radiomics features to be measured based on the local radiomics features; performing three-dimensional reconstruction for the peripheral boundary of the positive region to determine a three-dimensional image.

Type: Grant

Filed: July 18, 2018

Date of Patent: January 11, 2022

Assignees: Tumor Hospital of Shandong First Medical University (Shandong Cancer Hospital and Institute

Inventors: Jian Zhu, Zhen Hou, Zhenjiang Li, Haining Yu, Tong Bai, Yong Yin, Baosheng Li
Method and system for determining irradiation dose

Patent number: 11141127

Abstract: A method and system for determining an irradiation dose. The method for determining the irradiation dose includes determining a pixel having a biological feature in a region of interest in a radiotherapy simulated locating image by using a retrospective label; extracting a local radiomics feature based on the pixel having the biological feature, in which the local radiomics feature includes a grayscale histogram intensity, a tumor shape feature, a textural feature, a Laplacian of Gaussian filtering feature, and a wavelet feature; acquiring the local radiomics features to be measured; identifying a positive region having the local radiomics features to be measured based on the local radiomics features; performing three-dimensional reconstruction for the peripheral boundary of the positive region to determine a three-dimensional image.

Type: Grant

Filed: July 18, 2018

Date of Patent: October 12, 2021

Assignees: Tumor Hospital of Shandong First Medical University

Inventors: Jian Zhu, Zhen Hou, Zhenjiang Li, Haining Yu, Tong Bai, Yong Yin, Baosheng Li
Vector processor for heterogeneous data streams

Patent number: 11126430

Abstract: A vector processor includes a grouping memory functional unit coupled to grouping memory having multiple bins. The vector processor also includes a bitformatting functional unit that performs bit-level data arrangements using any suitable technique or network, such as a Benes network. The vector processor receives and reads an input vector of data that includes portions (e.g., bits) of multiple data streams, and writes each portion corresponding to a respective data stream to a respective bin in parallel using the bitformatting functional unit to align the data. The vector processor also or alternatively receives and reads multiple outgoing data streams, writes portions of the data streams in respective bins of the grouping memory, and intersperses the portions in an outgoing vector of data in parallel, using the bitformatting functional unit to align the data.

Type: Grant

Filed: December 27, 2019

Date of Patent: September 21, 2021

Assignee: Intel Corporation

Inventors: Parakalan Venkataraghavan, Thomas W. Smith, Silpa Naidu Chirumavilla, Ravi Shekhar
Propagation instruction to generate a set of predicate flags based on previous and current prediction data

Patent number: 11042378

Abstract: Data processing apparatus comprises processing circuitry to selectively apply a vector processing operation to data items at positions within data vectors according to the states of a set of respective predicate flags associated with the positions, the data vectors having a data vector processing order, each data vector comprising a plurality of data items having a data item order, the processing circuitry comprising: instruction decoder circuitry to decode program instructions; and instruction processing circuitry to execute instructions decoded by the instruction decoder circuitry; wherein the instruction decoder circuitry is responsive to a propagation instruction to control the instruction processing circuitry to derive a set of predicate flags applicable to a current data vector in dependence upon a set of predicate flags applicable to a preceding data vector in the data vector processing order, wherein when one or more last-most predicate flags of the set applicable to the preceding data vector are inac

Type: Grant

Filed: July 28, 2016

Date of Patent: June 22, 2021

Assignee: ARM Limited

Inventors: Nigel John Stephens, Mbou Eyole, Alejandro Martinez Vicente
Processing device and a swizzle pattern generator

Patent number: 11003449

Abstract: A swizzle pattern generator is provided to reduce an overhead due to execution of a swizzle instruction in vector processing. The swizzle pattern generator is configured to provide swizzle patterns with respect to data sets of at least one vector register or vector processing unit. The swizzle pattern generator may be reconfigurable to generate various swizzle patterns for different vector operations.

Type: Grant

Filed: January 24, 2019

Date of Patent: May 11, 2021

Assignee: Samsung Electronics Co., Ltd.

Inventors: Moo-Kyoung Chung, Woong Seo, Ho-Young Kim, Soo-Jung Ryu, Dong-Hoon Yoo, Jin-Seok Lee, Yeon-Gon Cho, Chang-Moo Kim, Seung-Hun Jin
Systems for performing instructions to quickly convert and use tiles as 1D vectors

Patent number: 10990396

Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.

Type: Grant

Filed: September 27, 2018

Date of Patent: April 27, 2021

Assignee: Intel Corporation

Inventors: Bret Toll, Christopher J. Hughes, Dan Baum, Elmoustapha Ould-Ahmed-Vall, Raanan Sade, Robert Valentine, Mark J. Charney, Alexander F. Heinecke
Technologies for efficiently performing scatter-gather operations

Patent number: 10942847

Abstract: Technologies for efficiently performing scatter-gather operations include a device with circuitry configured to associate, with a template identifier, a set of non-contiguous memory locations of a memory having a cross point architecture. The circuitry is additionally configured to access, in response to a request that identifies the non-contiguous memory locations by the template identifier, the memory locations.

Type: Grant

Filed: December 18, 2018

Date of Patent: March 9, 2021

Assignee: Intel Corporation

Inventors: Jawad B. Khan, Richard Coulson
Tracking streaming engine vector predicates to control processor execution

Patent number: 10936315

Abstract: In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.

Type: Grant

Filed: December 31, 2018

Date of Patent: March 2, 2021

Assignee: Texas Instruments Incorporated

Inventors: Duc Quang Bui, Joseph Raymond Michael Zbiciak
Apparatuses, methods, and systems for element sorting of vectors

Patent number: 10929133

Abstract: Systems, methods, and apparatuses relating to element sorting of vectors are described. In one embodiment, a processor includes a decoder to decode an instruction into a decoded instruction; and an execution unit to execute the decoded instruction to: provide storage for a comparison matrix to store a comparison value for each element of an input vector compared against the other elements of the input vector, perform a comparison operation on elements of the input vector corresponding to storage of comparison values above a main diagonal of the comparison matrix, perform a different operation on elements of the input vector corresponding to storage of comparison values below the main diagonal of the comparison matrix, and store results of the comparison operation and the different operation in the comparison matrix.

Type: Grant

Filed: January 16, 2019

Date of Patent: February 23, 2021

Assignee: Intel Corporation

Inventors: Mikhail Plotnikov, Igor Ermolaev
Vector population count determination via comparison iterations in memory

Patent number: 10896042

Abstract: Examples of the present disclosure provide apparatuses and methods for determining a vector population count in a memory. An example method comprises determining, using sensing circuitry, a vector population count of a number of fixed length elements of a vector stored in a memory array.

Type: Grant

Filed: December 3, 2018

Date of Patent: January 19, 2021

Assignee: Micron Technology, Inc.

Inventor: Sanjay Tiwari
Vector processor with vector first and multiple lane configuration

Patent number: 10877925

Abstract: A vector processor with a vector first and multi-lane configuration. A vector operation for a vector processor can include a single vector or multiple vectors as input. Multiple lanes for the input can be used to accelerate the operation in parallel. And, a vector first configuration can enhance the multiple lanes by reducing the number of elements accessed in the lanes to perform the operation in parallel.

Type: Grant

Filed: March 18, 2019

Date of Patent: December 29, 2020

Assignee: Micron Technology, Inc.

Inventor: Steven Jeffrey Wallach
Methods and apparatus to improve optimizing loops with predictable recurring memory reads

Patent number: 10853043

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve loop optimization with predictable recurring memory reads (PRMRs). An example apparatus includes an optimizer including an optimization scenario manager to generate an optimization plan associated with a loop and corresponding optimization parameters, the optimization plan including a set of one or more optimizations, an optimization scenario analyzer to identify the optimization plan as a candidate optimization plan when a quantity of PRMRs included in the loop is greater than a threshold, and a parameter calculator to determine the optimization parameters based on the candidate optimization plan, and a code generator to generate instructions to be executed by a processor, the instructions based on processing the loop with the one or more optimizations included in the candidate optimization plan.

Type: Grant

Filed: September 11, 2018

Date of Patent: December 1, 2020

Assignee: INTEL CORPORATION

Inventors: Diego Luis Caballero de Gea, Hideki Ido, Eric N. Garcia
Apparatuses, methods, and systems for swizzle operations in a configurable spatial accelerator

Patent number: 10817291

Abstract: Systems, methods, and apparatuses relating to swizzle operations and disable operations in a configurable spatial accelerator (CSA) are described. Certain embodiments herein provide for an encoding system for a specific set of swizzle primitives across a plurality of packed data elements in a CSA.

Type: Grant

Filed: March 30, 2019

Date of Patent: October 27, 2020

Assignee: Intel Corporation

Inventors: Jesus Corbal, Rohan Sharma, Simon Steely, Jr., Chinmay Ashok, Kent D. Glossop, Dennis Bradford, Paul Caprioli, Louise Huot, Kermin ChoFleming, Barry Tannenbaum
High performance processor system and method based on general purpose units

Patent number: 10684860

Abstract: This invention provides a high performance processor system and a method based on a common general purpose unit, it may be configured into a variety of different processor architectures; before the processor executes instructions, the instruction is filled into the instruction read buffer, which is directly accessed by the processor core, then instruction read buffer actively provides instructions to processor core to execute, achieving a high cache hit rate.

Type: Grant

Filed: July 6, 2018

Date of Patent: June 16, 2020

Assignee: SHANGHAI XINHAO MICROELECTRONICS CO. LTD.

Inventor: Kenneth Chenghao Lin
Performing database operations using a vectorized approach or a non-vectorized approach

Patent number: 10671583

Abstract: Techniques for performing database operations using vectorized instructions are provided. In one technique, it is determined whether to perform a database operation using one or more vectorized instructions or without using any vectorized instructions. This determination may comprise estimating a first cost of performing the database operation using one or more vectorized instructions and estimating a second cost of performing the database operation without using any vectorized instructions. Multiple factors that may be used to determine which approach to follow, such as the number of data elements that may fit into a SIMD register, a number of vectorized instructions in the vectorized approach, a number of data movement instructions that involve moving data from a SIMD register to a non-SIMD register and/or vice versa, a size of a cache, and a projected size of a hash table.

Type: Grant

Filed: August 24, 2017

Date of Patent: June 2, 2020

Assignee: Oracle International Corporation

Inventors: Rajkumar Sen, Sam Idicula, Nipun Agarwal
Sparse matrix processing circuitry

Patent number: 10572409

Abstract: A memory arrangement can store a matrix of matrix data elements specified as index-value pairs that indicate row and column indices and associated values. First split-and-merge circuitry is coupled between the memory arrangement and a first set of FIFO buffers for reading the matrix data elements from the memory arrangement and putting the matrix data elements in the first set of FIFO buffers based on column indices. A pairing circuit is configured to read vector data elements, pair the vector data elements with the matrix data elements, and put the paired matrix and vector data elements in a second set of FIFO buffers based on column indices. Second split-and-merge circuitry is configured to read paired matrix and vector data elements from the second set of FIFO buffers and put the paired matrix and vector data elements in a third set of FIFO buffers based on row indices.

Type: Grant

Filed: May 10, 2018

Date of Patent: February 25, 2020

Assignee: XILINX, INC.

Inventors: Jindrich Zejda, Ling Liu, Yifei Zhou, Ashish Sirasao
Vector cross-compare count and sequence instructions

Patent number: 10564964

Abstract: Systems and methods are provided for executing an instruction. The method may include loading a first vector into a first location, the first vector including a plurality of first data elements and loading a second vector into a second location, the second vector including a plurality of second data elements. The method may further include comparing the plurality of first data elements of the first vector to the plurality of data elements of the second vector and performing one or more operations on the plurality of first and second data elements based on at least one vector cross-compare instruction. The one or more operations include counting a number of data elements of the plurality of first and second data elements that satisfy at least one condition, counting a number of times specified values occur in the plurality of first and second data elements, and generating sequence counts for duplicated values.

Type: Grant

Filed: August 23, 2016

Date of Patent: February 18, 2020

Assignee: International Business Machines Corporation

Inventors: Jeffrey H. Derby, Robert K. Montoye, Dheeraj Sreedhar
Streaming engine with separately selectable element and group duplication

Patent number: 10459843

Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. An element duplication unit optionally duplicates data element an instruction specified number of times. A vector masking unit limits data elements received from the element duplication unit to least significant bits within an instruction specified vector length. If the vector length is less than a stream head register size, the vector masking unit stores all 0's in excess lanes of the stream head register (group duplication disabled) or stores duplicate copies of the least significant bits in excess lanes of the stream head register.

Type: Grant

Filed: December 30, 2016

Date of Patent: October 29, 2019

Assignee: TEXAS INSTRUMENTS INCORPORATED

Inventor: Joseph Zbiciak
Distributed graph databases that facilitate streaming data insertion and queries by efficient throughput edge addition

Patent number: 10394891

Abstract: A novel distributed graph database is provided that is designed for efficient graph data storage and processing on modern computing architectures. In particular a single node graph database and a runtime & communication layer allows for composing a distributed graph database from multiple single node instances.

Type: Grant

Filed: August 5, 2016

Date of Patent: August 27, 2019

Assignee: International Business Machines Corporation

Inventors: Chun-Fu Chen, Jason L. Crawford, Ching-Yung Lin, Jie Lu, Mark R. Nutter, Toyotaro Suzumura, Ilie G. Tanase, Danny L. Yeh
Processor and data gathering method

Patent number: 10235398

Abstract: An object of the present invention is to efficiently perform a data load process or a data store process between a memory and a storage unit in a processor. The processor includes: a plurality of storage units associated with a plurality of data elements included in a data set; and a control unit that reads the plurality of data elements stored in adjacent storage areas from a memory, in which a plurality of the data sets is stored, collectively for respective data sets, sorts the respective read data elements to a storage unit corresponding to the data element among the plurality of storage units, and writes the data elements to the respective data sets.

Type: Grant

Filed: April 23, 2015

Date of Patent: March 19, 2019

Assignee: Renesas Electronics Corporation

Inventor: Masayuki Kimura

1 2 3 4 5 … next