Vector Processor Operation Patents (Class 712/7)
  • Patent number: 12147338
    Abstract: In accordance with the described techniques for leveraging processing in memory registers as victim buffers, a computing device includes a memory, a processing in memory component having registers for data storage, and a memory controller having a victim address table that includes at least one address of a row of the memory that is stored in the registers. The memory controller receives a request to access the row of the memory and accesses data of the row from the registers based on the address of the row being included in the victim address table.
    Type: Grant
    Filed: December 27, 2022
    Date of Patent: November 19, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Jagadish B Kotra, Dong Kai Wang
  • Patent number: 12112164
    Abstract: A processing device comprising a plurality of operand registers, wherein a first subset of the operand registers are configured to store state information for a plurality of bins, comprising a range of values and a bin count associated with each respective bin, wherein a second subset of the operand registers is configured to store a vector of floating-point values; and an execution unit configured to execute a first instruction taking the state information for the plurality of bins and the vector of floating-point values as operands, and in response to execution of the first instruction, for each of the floating-point values: identify based on an exponent of the respective floating-point value, each one of the plurality of bins for which the respective floating-point value falls within the associated range of values; and increment the bin count associated with the identified bins.
    Type: Grant
    Filed: February 28, 2023
    Date of Patent: October 8, 2024
    Assignee: GRAPHCORE LIMITED
    Inventors: Alan Alexander, Simon Knowles, Godfrey Da Costa, Badreddine Noune
  • Patent number: 12112167
    Abstract: Embodiments for gathering and scattering matrix data by row are disclosed. In an embodiment, a processor includes a storage matrix, a decoder, and execution circuitry. The decoder is to decode an instruction having a format including an opcode field to specify an opcode and a first operand field to specify a set of irregularly spaced memory locations. The execution circuitry is to, in response to the decoded instruction, calculate a set of addresses corresponding to the set of irregularly spaced memory locations and transfer a set of rows of data between the storage and the set of irregularly spaced memory locations.
    Type: Grant
    Filed: June 27, 2020
    Date of Patent: October 8, 2024
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Alexander F. Heinecke, Robert Valentine, Menachem Adelman, Evangelos Georganas, Mark J. Charney, Nikita A. Shustrov, Sara Baghsorkhi
  • Patent number: 12079724
    Abstract: Embodiments of the present disclosure relate to a texture unit circuit in a neural processor circuit. The neural processor circuit includes a tensor access operation circuit with the texture unit circuit, a data processor circuit, and at least one neural engine circuit. The texture unit circuit fetches a source tensor from a system memory by referencing an index tensor in the system memory representing indexing information into the source tensor. The data processor circuit stores an output version of the source tensor obtained from the tensor access operation circuit and sends the output version of the source tensor as multiple of units of input data to the at least one neural engine circuit. The at least one neural engine circuit performs at least convolution operations on the units of input data and at least one kernel to generate output data.
    Type: Grant
    Filed: October 10, 2023
    Date of Patent: September 3, 2024
    Assignee: APPLE INC.
    Inventor: Christopher L. Mills
  • Patent number: 12051145
    Abstract: The present invention teaches a real-time hybrid ray tracing system for non-planar specular reflections. The high complexity of a non-planar surface is reduced to low complexity of multiple small planar surfaces. Advantage is taken of the planar nature of triangles that comprise building blocks of a non-planar surface. All secondary rays bouncing from a given surface triangle toward object triangles keep a close direction to each other. A collective control of secondary rays is enabled by this closeness and by decoupling secondary rays from primary rays. The result is a high coherence of secondary rays.
    Type: Grant
    Filed: March 14, 2022
    Date of Patent: July 30, 2024
    Assignee: Snap Inc.
    Inventors: Reuven Bakalash, Ron Weitzman, Elad Haviv
  • Patent number: 12008378
    Abstract: A parallel processing (PP) level coherence directory, also referred to as a Processing In-Memory Probe Filter (PimPF), is added to a coherence directory controller. When the coherence directory controller receives a broadcast PIM command from a host, or a PIM command that is directed to multiple memory banks in parallel, the PimPF accelerates processing of the PIM command by maintaining a directory for cache coherence that is separate from existing system level directories in the coherence directory controller. The PimPF maintains a directory according to address signatures that define the memory addresses affected by a broadcast PIM command. Two implementations are described: a lightweight implementation that accelerates PIM loads into registers, and a heavyweight implementation that accelerates both PIM loads into registers and PIM stores into memory.
    Type: Grant
    Filed: April 10, 2023
    Date of Patent: June 11, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Varun Agrawal, Yasuko Eckert
  • Patent number: 12001841
    Abstract: A content-addressable processing engine, also referred to herein as CAPE, is provided. Processing-in-memory (PIM) architectures attempt to overcome the von Neumann bottleneck by combining computation and storage logic into a single component. CAPE provides a general-purpose PIM microarchitecture that provides acceleration of vector operations while being programmable with standard reduced instruction set computing (RISC) instructions, such as RISC-V instructions with standard vector extensions. CAPE can be implemented as a standalone core that specializes in associative computing, and that can be integrated in a tiled multicore chip alongside other types of compute engines. Certain embodiments of CAPE achieve average speedups of 14× (up to 254×) over an area-equivalent out-of-order processor core tile with three levels of caches across a diverse set of representative applications.
    Type: Grant
    Filed: September 22, 2022
    Date of Patent: June 4, 2024
    Assignee: Cornell University
    Inventors: José F. Martínez, Helena Caminal, Kailin Yang, Khalid Al-Hawaj, Christopher Batten
  • Patent number: 11995030
    Abstract: Processors, systems and methods are provided for thread level parallel processing. A processor may include a plurality of columns of vector processing units arranged in a two-dimensional column array with a plurality of column stacks placed side-by-side in a first direction and each column stack having two columns stacked in a second direction and a temporary storage buffer. Each column may include a processing element (PE) that has a vector Arithmetic Logic Unit (ALU) to perform arithmetic operations in parallel threads. At a first end of the column array in the first direction, two columns in the column stack are coupled to the temporary storage buffer for one-way data flow. At a second end of the column array in the first direction, two columns are coupled to each other for one-way data flow. The column array and the temporary storage buffer may form a one-way circular data path.
    Type: Grant
    Filed: November 10, 2022
    Date of Patent: May 28, 2024
    Assignee: AzurEngine Technologies, Inc.
    Inventors: Ryan Braidwood, Yuan Li, Jianbin Zhu, Toshio Nagata
  • Patent number: 11956179
    Abstract: Duplicated physical layer convergence protocol (PLCP) protocol data unit (PPDU) transmission is described for a wireless device with reduced peak-to-average power ratios (PAPR). One example includes obtaining a first sub-PPDU from to PPDU that includes a data field with data content. A second sub-PPDU may also be obtained by duplicating the PPDU including the data content of the PPDU. At least one of a phase rotation, a phase offset, or a phase ramp is applied to at least a portion of a second set of sub-carrier of a wideband channel. The first sub-PPDU is transmitted on a first set of sub-carriers of the wideband channel and the second sub-PPDU is transmitted on the second set of sub-carriers of the wideband channel.
    Type: Grant
    Filed: July 21, 2021
    Date of Patent: April 9, 2024
    Assignee: QUALCOMM Incorporated
    Inventors: Kanke Wu, Jialing Li Chen, Bin Tian
  • Patent number: 11947967
    Abstract: An example system implementing a processing-in-memory pipeline includes: a memory array to store a plurality of look-up tables (LUTs) and data; a control block coupled to the memory array, the control block to control a computational pipeline by activating one or more LUTs of the plurality of LUTs; and a logic array coupled to the memory array and the control block, the logic array to perform, based on control inputs received from the control block, logic operations on the activated LUTs and the data.
    Type: Grant
    Filed: August 1, 2022
    Date of Patent: April 2, 2024
    Inventor: Dmitri Yudanov
  • Patent number: 11915131
    Abstract: In an approach to improve the efficiency of solving problem instances by utilizing a machine learning model to solve a sequential optimization problem. Embodiments of the present invention receive a sequential optimization problem for solving and utilize a random initialization to solve a first instance of the sequential optimization problem. Embodiments of the present invention learning, by a computing device a machine learning model, based on a previously stored solution to the first instance of the sequential optimization problem. Additionally, embodiments of the present invention generate, by the machine learning model, one or more subsequent approximate solutions to the sequential optimization problem; and output, by a user interface on the computing device, the one or more subsequent approximate solutions to the sequential optimization problem.
    Type: Grant
    Filed: November 23, 2020
    Date of Patent: February 27, 2024
    Assignee: International Business Machines Corporation
    Inventors: Kartik Ahuja, Amit Dhurandhar, Karthikeyan Shanmugam, Kush Raj Varshney
  • Patent number: 11907158
    Abstract: A vector processor with a vector first and multi-lane configuration. A vector operation for a vector processor can include a single vector or multiple vectors as input. Multiple lanes for the input can be used to accelerate the operation in parallel. And, a vector first configuration can enhance the multiple lanes by reducing the number of elements accessed in the lanes to perform the operation in parallel.
    Type: Grant
    Filed: December 28, 2020
    Date of Patent: February 20, 2024
    Assignee: Micron Technology, Inc.
    Inventor: Steven Jeffrey Wallach
  • Patent number: 11847452
    Abstract: Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address; and execution circuitry to execute the decoded instruction to set a tile configuration for the processor to utilize tiles in matrix operations based on a description retrieved from the memory address, wherein a tile a set of 2-dimensional registers are discussed.
    Type: Grant
    Filed: June 28, 2021
    Date of Patent: December 19, 2023
    Assignee: Intel Corporation
    Inventors: Menachem Adelman, Robert Valentine, Zeev Sperber, Mark J. Charney, Bret L. Toll, Rinat Rappoport, Jesus Corbal, Dan Baum, Alexander F. Heinecke, Elmoustapha Ould-Ahmed-Vall, Yuri Gebil, Raanan Sade
  • Patent number: 11829754
    Abstract: A vector load instruction generating unit of a compile device generates an instruction to load a “first group of data units”, which is used as an element A[i] in iterative calculation processing, from a memory into a first vector register in a state of being packed in units of 1-word. Each data unit is (1/2)k word. The vector load instruction generating unit generates an instruction to load a second group of data units, which is used as an element [i+2k] into a second vector register. A vector shift double instruction generating unit generates an instruction to cause a part of a data string, which is obtained by shifting data of the first vector Register and the second register by (1/2)k word as a series of data string, to be stored in a third vector register in a state of being packed in units of 1-word.
    Type: Grant
    Filed: October 11, 2019
    Date of Patent: November 28, 2023
    Assignee: NEC CORPORATION
    Inventor: Koichi Masuda
  • Patent number: 11829300
    Abstract: A method for sorting of a vector in a processor is provided that includes performing, by the processor in response to a vector sort instruction, generating a control input vector for vector permutation logic comprised in the processor based on values in lanes of the vector and a sort order for the vector indicated by the vector sort instruction and storing the control input vector in a storage location.
    Type: Grant
    Filed: October 3, 2022
    Date of Patent: November 28, 2023
    Assignee: Texas Instruments Incorporated
    Inventors: Timothy David Anderson, Mujibur Rahman
  • Patent number: 11755490
    Abstract: Methods, systems, and devices for unmap operation techniques are described. A memory system may include a volatile memory device and a non-volatile memory device. The memory system may receive a set of unmap commands that each include a logical block address associated with unused data. The memory system may determine whether one or more parameters associated with the set of unmap commands satisfy a threshold. If the one or more parameters satisfy the threshold, the memory system may select a first procedure for performing the set of unmap commands different from a second procedure (e.g., a default procedure) for performing the set of unmap commands and may perform the set of unmap commands using the first procedure. If the one or more parameters do not satisfy the threshold, the memory system may perform the set of unmap commands using the second procedure.
    Type: Grant
    Filed: December 15, 2020
    Date of Patent: September 12, 2023
    Assignee: Micron Technology, Inc.
    Inventors: Giuseppe Cariello, Luca Porzio, Roberto Izzi, Jonathan S. Parry
  • Patent number: 11748098
    Abstract: A processor is provided with a register file comprising a plurality of vector registers, and an execution core coupled to the register file, where the execution core is configured to execute a set of checksum instructions with a first checksum instruction to specify a first vector operand, a second vector operand, and a result vector operand, where the first vector operand is in a first vector register of the plurality of vector registers, the second vector operand is in a second register of the plurality of vector registers, and the result vector operand is to be written to a third vector register of the plurality of vector registers, and to execute the first checksum instruction, the execution core is configured to accumulate bytes from the first vector operand and the second vector operand into a first portion of the result vector operand and add the accumulated bytes from the first vector operand and the second vector operand to a second portion of the result vector operand to generate the second portion
    Type: Grant
    Filed: May 5, 2021
    Date of Patent: September 5, 2023
    Assignee: Apple Inc.
    Inventors: Ali Sazegari, Chris Cheng-Chieh Lee
  • Patent number: 11741015
    Abstract: A system for managing virtual memory. The system includes a first processing unit configured to execute a first operation that references a first virtual memory address. The system also includes a first memory management unit (MMU) associated with the first processing unit and configured to generate a first page fault upon determining that a first page table that is stored in a first memory unit associated with the first processing unit does not include a mapping corresponding to the first virtual memory address. The system further includes a first copy engine associated with the first processing unit. The first copy engine is configured to read a first command queue to determine a first mapping that corresponds to the first virtual memory address and is included in a first page state directory. The first copy engine is also configured to update the first page table to include the first mapping.
    Type: Grant
    Filed: August 18, 2022
    Date of Patent: August 29, 2023
    Assignee: NVIDIA Corporation
    Inventors: Jerome F. Duluk, Jr., Cameron Buschardt, Sherry Cheung, James Leroy Deming, Samuel H. Duncan, Lucien Dunning, Robert George, Arvind Gopalakrishnan, Mark Hairgrove, Chenghuan Jia, John Mashey
  • Patent number: 11740880
    Abstract: Aspects of the invention include a compiler detecting an expression in a loop that includes elements of mixed data types. The compiler then promotes elements of a sub-expression of the expression to a same intermediate data type. The compiler then calculates the sub-expression using the elements of the same intermediate data type.
    Type: Grant
    Filed: September 8, 2021
    Date of Patent: August 29, 2023
    Assignee: International Business Machines Corporation
    Inventors: Biplob Mishra, Satish Kumar Sadasivam, Puneeth A. H. Bhat
  • Patent number: 11663488
    Abstract: An online system trains a transformer architecture by an initialization method which allows the transformer architecture to be trained without normalization layers of learning rate warmup, resulting in significant improvements in computational efficiency for transformer architectures. Specifically, an attention block included in an encoder or a decoder of the transformer architecture generates the set of attention representations by applying a key matrix to the input key, a query matrix to the input query, a value matrix to the input value to generate an output, and applying an output matrix to the output to generate the set of attention representations. The initialization method may be performed by scaling the parameters of the value matrix and the output matrix with a factor that is inverse to a number of the set of encoders or a number of the set of decoders.
    Type: Grant
    Filed: February 5, 2021
    Date of Patent: May 30, 2023
    Assignee: THE TORONTO-DOMINION BANK
    Inventors: Maksims Volkovs, Xiao Shi Huang, Juan Felipe Perez Vallejo
  • Patent number: 11663005
    Abstract: Examples of the present disclosure provide apparatuses and methods for determining a vector population count in a memory. An example method comprises determining, using sensing circuitry, a vector population count of a number of fixed length elements of a vector stored in a memory array.
    Type: Grant
    Filed: January 15, 2021
    Date of Patent: May 30, 2023
    Assignee: Micron Technology, Inc.
    Inventor: Sanjay Tiwari
  • Patent number: 11640297
    Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes systolic dot product circuitry to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.
    Type: Grant
    Filed: June 15, 2021
    Date of Patent: May 2, 2023
    Assignee: Intel Corporation
    Inventors: Subramaniam Maiyuran, Guei-Yuan Lueh, Supratim Pal, Ashutosh Garg, Chandra S. Gurram, Jorge E. Parra, Junjie Gu, Konrad Trifunovic, Hong Bin Liao, Mike B. MacPherson, Shubh B. Shah, Shubra Marwaha, Stephen Junkins, Timothy R. Bauer, Varghese George, Weiyu Chen
  • Patent number: 11593106
    Abstract: Vector sort circuits that can be used to accelerate sorting operations in a vector processor. When a new data element is received, the vector sort circuit can read multiple existing data elements from a vector-sort database in parallel, compare metrics of the existing data elements to a metric of the new data element, and output updated data elements to the vector-sort database based on the metrics. Depending on implementation, the vector-sort database can be maintained in sorted order, or the data elements can have assigned ranks indicating the sort order and the elements need not be stored in sorted order. A vector sort circuit can be incorporated into a vector sort functional unit of a microprocessor, and the instruction set of the microprocessor can include instructions that are executed by the vector sort functional unit using the vector sort circuit.
    Type: Grant
    Filed: September 24, 2021
    Date of Patent: February 28, 2023
    Assignee: Apple Inc.
    Inventors: On Wa Yeung, Seydou N. Ba
  • Patent number: 11531549
    Abstract: A system and corresponding method map instructions in an out-of-order (OoO) processor. The system comprises a mapper, integer snapshot circuitry, and floating-point (FP) snapshot circuitry. The mapper maps instructions by mapping integer and FP architectural registers (ARs) of the instructions to integer and FP physical registers of the OoO processor, respectively. The mapper records, via at least one present FP indicator, presence of FP ARs used as destinations in the instructions. The mapper copies, periodically, the integer mapper state to the integer snapshot circuitry and copies, intermittently, based on the at least one FP present indicator, the FP mapper state to the FP snapshot circuitry. Copies of the integer and FP mapper state in the integer and FP snapshot circuitry, respectively, improve performance for instruction unwinding caused, for example, by an exception, branch/jump mispredict, etc. By copying the FP mapper state, intermittently, power efficiency of the OoO processor is improved.
    Type: Grant
    Filed: March 31, 2021
    Date of Patent: December 20, 2022
    Assignee: Marvell Asia Pte, Ltd.
    Inventor: David A. Carlson
  • Patent number: 11455171
    Abstract: A fast and frugal item-state tracking scoreboard circuit is disclosed. The scoreboard maintains per-item partial states across multiple memory circuits, enabling multiple lookups per clock cycle and multiple state updates per clock cycle. In an embodiment a scoreboard is used to schedule instructions in an out-of-order processor. Each clock cycle the scoreboard indicates the busy state of an instruction's registers and may update the busy state of the destination registers of issuing instructions and completing instructions. Applications include register tracking, function-unit tracking, and cache-line state tracking, in embodiments including processor cores (including superscalar, superpipelined, and multithreaded processors), accelerators, memory systems, and networks. In an embodiment, a register-busy scoreboard circuit is implemented using FPGA LUT RAM memory.
    Type: Grant
    Filed: May 29, 2020
    Date of Patent: September 27, 2022
    Assignee: Gray Research LLC
    Inventor: Jan Stephen Gray
  • Patent number: 11442713
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve loop optimization with predictable recurring memory reads (PRMRs). An example apparatus includes memory, and first processor circuitry to execute first instructions to at least identify one or more optimizations to convert a first loop into a second loop based on converting PRMRs of the first loop into loop-invariant PRMRs, the converting of the PRMRs in response to a quantity of the PRMRs satisfying a threshold, the second loop to execute in a single iteration corresponding to a quantity of iterations of the first loop, determine one or more optimization parameters based on the one or more optimizations, and compile second instructions based on the first processor circuitry processing the first loop based on the one or more optimization parameters associated with the one or more optimizations, the second instructions to be executed by the first or second processor circuitry.
    Type: Grant
    Filed: October 19, 2020
    Date of Patent: September 13, 2022
    Assignee: Intel Corporation
    Inventors: Diego Luis Caballero de Gea, Hideki Ido, Eric N. Garcia
  • Patent number: 11367160
    Abstract: A parallel processing unit (e.g., a GPU), in some examples, includes a hardware scheduler and hardware arbiter that launch graphics and compute work for simultaneous execution on a SIMD/SIMT processing unit. Each processing unit (e.g., a streaming multiprocessor) of the parallel processing unit operates in a graphics-greedy mode or a compute-greedy mode at respective times. The hardware arbiter, in response to a result of a comparison of at least one monitored performance or utilization metric to a user-configured threshold, can selectively cause the processing unit to run one or more compute work items from a compute queue when the processing unit is operating in the graphics-greedy mode, and cause the processing unit to run one or more graphics work items from a graphics queue when the processing unit is operating in the compute-greedy mode. Associated methods and systems are also described.
    Type: Grant
    Filed: August 2, 2018
    Date of Patent: June 21, 2022
    Assignee: NVIDIA CORPORATION
    Inventors: Rajballav Dash, Gregory Palmer, Gentaro Hirota, Lacky Shah, Jack Choquette, Emmett Kilgariff, Sriharsha Niverty, Milton Lei, Shirish Gadre, Omkar Paranjape, Lei Yang, Rouslan Dimitrov
  • Patent number: 11354126
    Abstract: Data processing apparatus comprises vector processing circuitry to selectively apply vector processing operations defined by vector processing instructions to generate one or more data elements of a data vector comprising a plurality of data elements at respective data element positions of the data vector, according to the state of respective predicate flags associated with the positions of the data vector; and generator circuitry to generate instruction sample data indicative of processing activities of the vector processing circuitry for selected ones of the vector processing instructions, instruction sample data indicating at least the state of the predicate flags at execution of the selected vector processing instructions.
    Type: Grant
    Filed: February 15, 2019
    Date of Patent: June 7, 2022
    Assignee: Arm Limited
    Inventors: Michael John Williams, Nigel John Stephens
  • Patent number: 11219426
    Abstract: A method and system for determining an irradiation dose. The method for determining the irradiation dose includes determining a pixel having a biological feature in a region of interest in a radiotherapy simulated locating image by using a retrospective label; extracting a local radiomics feature based on the pixel having the biological feature, in which the local radiomics feature includes a grayscale histogram intensity, a tumor shape feature, a textural feature, a Laplacian of Gaussian filtering feature, and a wavelet feature; acquiring the local radiomics features to be measured; identifying a positive region having the local radiomics features to be measured based on the local radiomics features; performing three-dimensional reconstruction for the peripheral boundary of the positive region to determine a three-dimensional image.
    Type: Grant
    Filed: July 18, 2018
    Date of Patent: January 11, 2022
    Assignees: Tumor Hospital of Shandong First Medical University (Shandong Cancer Hospital and Institute
    Inventors: Jian Zhu, Zhen Hou, Zhenjiang Li, Haining Yu, Tong Bai, Yong Yin, Baosheng Li
  • Patent number: 11141127
    Abstract: A method and system for determining an irradiation dose. The method for determining the irradiation dose includes determining a pixel having a biological feature in a region of interest in a radiotherapy simulated locating image by using a retrospective label; extracting a local radiomics feature based on the pixel having the biological feature, in which the local radiomics feature includes a grayscale histogram intensity, a tumor shape feature, a textural feature, a Laplacian of Gaussian filtering feature, and a wavelet feature; acquiring the local radiomics features to be measured; identifying a positive region having the local radiomics features to be measured based on the local radiomics features; performing three-dimensional reconstruction for the peripheral boundary of the positive region to determine a three-dimensional image.
    Type: Grant
    Filed: July 18, 2018
    Date of Patent: October 12, 2021
    Assignees: Tumor Hospital of Shandong First Medical University
    Inventors: Jian Zhu, Zhen Hou, Zhenjiang Li, Haining Yu, Tong Bai, Yong Yin, Baosheng Li
  • Patent number: 11126430
    Abstract: A vector processor includes a grouping memory functional unit coupled to grouping memory having multiple bins. The vector processor also includes a bitformatting functional unit that performs bit-level data arrangements using any suitable technique or network, such as a Benes network. The vector processor receives and reads an input vector of data that includes portions (e.g., bits) of multiple data streams, and writes each portion corresponding to a respective data stream to a respective bin in parallel using the bitformatting functional unit to align the data. The vector processor also or alternatively receives and reads multiple outgoing data streams, writes portions of the data streams in respective bins of the grouping memory, and intersperses the portions in an outgoing vector of data in parallel, using the bitformatting functional unit to align the data.
    Type: Grant
    Filed: December 27, 2019
    Date of Patent: September 21, 2021
    Assignee: Intel Corporation
    Inventors: Parakalan Venkataraghavan, Thomas W. Smith, Silpa Naidu Chirumavilla, Ravi Shekhar
  • Patent number: 11042378
    Abstract: Data processing apparatus comprises processing circuitry to selectively apply a vector processing operation to data items at positions within data vectors according to the states of a set of respective predicate flags associated with the positions, the data vectors having a data vector processing order, each data vector comprising a plurality of data items having a data item order, the processing circuitry comprising: instruction decoder circuitry to decode program instructions; and instruction processing circuitry to execute instructions decoded by the instruction decoder circuitry; wherein the instruction decoder circuitry is responsive to a propagation instruction to control the instruction processing circuitry to derive a set of predicate flags applicable to a current data vector in dependence upon a set of predicate flags applicable to a preceding data vector in the data vector processing order, wherein when one or more last-most predicate flags of the set applicable to the preceding data vector are inac
    Type: Grant
    Filed: July 28, 2016
    Date of Patent: June 22, 2021
    Assignee: ARM Limited
    Inventors: Nigel John Stephens, Mbou Eyole, Alejandro Martinez Vicente
  • Patent number: 11003449
    Abstract: A swizzle pattern generator is provided to reduce an overhead due to execution of a swizzle instruction in vector processing. The swizzle pattern generator is configured to provide swizzle patterns with respect to data sets of at least one vector register or vector processing unit. The swizzle pattern generator may be reconfigurable to generate various swizzle patterns for different vector operations.
    Type: Grant
    Filed: January 24, 2019
    Date of Patent: May 11, 2021
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Moo-Kyoung Chung, Woong Seo, Ho-Young Kim, Soo-Jung Ryu, Dong-Hoon Yoo, Jin-Seok Lee, Yeon-Gon Cho, Chang-Moo Kim, Seung-Hun Jin
  • Patent number: 10990396
    Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.
    Type: Grant
    Filed: September 27, 2018
    Date of Patent: April 27, 2021
    Assignee: Intel Corporation
    Inventors: Bret Toll, Christopher J. Hughes, Dan Baum, Elmoustapha Ould-Ahmed-Vall, Raanan Sade, Robert Valentine, Mark J. Charney, Alexander F. Heinecke
  • Patent number: 10942847
    Abstract: Technologies for efficiently performing scatter-gather operations include a device with circuitry configured to associate, with a template identifier, a set of non-contiguous memory locations of a memory having a cross point architecture. The circuitry is additionally configured to access, in response to a request that identifies the non-contiguous memory locations by the template identifier, the memory locations.
    Type: Grant
    Filed: December 18, 2018
    Date of Patent: March 9, 2021
    Assignee: Intel Corporation
    Inventors: Jawad B. Khan, Richard Coulson
  • Patent number: 10936315
    Abstract: In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.
    Type: Grant
    Filed: December 31, 2018
    Date of Patent: March 2, 2021
    Assignee: Texas Instruments Incorporated
    Inventors: Duc Quang Bui, Joseph Raymond Michael Zbiciak
  • Patent number: 10929133
    Abstract: Systems, methods, and apparatuses relating to element sorting of vectors are described. In one embodiment, a processor includes a decoder to decode an instruction into a decoded instruction; and an execution unit to execute the decoded instruction to: provide storage for a comparison matrix to store a comparison value for each element of an input vector compared against the other elements of the input vector, perform a comparison operation on elements of the input vector corresponding to storage of comparison values above a main diagonal of the comparison matrix, perform a different operation on elements of the input vector corresponding to storage of comparison values below the main diagonal of the comparison matrix, and store results of the comparison operation and the different operation in the comparison matrix.
    Type: Grant
    Filed: January 16, 2019
    Date of Patent: February 23, 2021
    Assignee: Intel Corporation
    Inventors: Mikhail Plotnikov, Igor Ermolaev
  • Patent number: 10896042
    Abstract: Examples of the present disclosure provide apparatuses and methods for determining a vector population count in a memory. An example method comprises determining, using sensing circuitry, a vector population count of a number of fixed length elements of a vector stored in a memory array.
    Type: Grant
    Filed: December 3, 2018
    Date of Patent: January 19, 2021
    Assignee: Micron Technology, Inc.
    Inventor: Sanjay Tiwari
  • Patent number: 10877925
    Abstract: A vector processor with a vector first and multi-lane configuration. A vector operation for a vector processor can include a single vector or multiple vectors as input. Multiple lanes for the input can be used to accelerate the operation in parallel. And, a vector first configuration can enhance the multiple lanes by reducing the number of elements accessed in the lanes to perform the operation in parallel.
    Type: Grant
    Filed: March 18, 2019
    Date of Patent: December 29, 2020
    Assignee: Micron Technology, Inc.
    Inventor: Steven Jeffrey Wallach
  • Patent number: 10853043
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve loop optimization with predictable recurring memory reads (PRMRs). An example apparatus includes an optimizer including an optimization scenario manager to generate an optimization plan associated with a loop and corresponding optimization parameters, the optimization plan including a set of one or more optimizations, an optimization scenario analyzer to identify the optimization plan as a candidate optimization plan when a quantity of PRMRs included in the loop is greater than a threshold, and a parameter calculator to determine the optimization parameters based on the candidate optimization plan, and a code generator to generate instructions to be executed by a processor, the instructions based on processing the loop with the one or more optimizations included in the candidate optimization plan.
    Type: Grant
    Filed: September 11, 2018
    Date of Patent: December 1, 2020
    Assignee: INTEL CORPORATION
    Inventors: Diego Luis Caballero de Gea, Hideki Ido, Eric N. Garcia
  • Patent number: 10817291
    Abstract: Systems, methods, and apparatuses relating to swizzle operations and disable operations in a configurable spatial accelerator (CSA) are described. Certain embodiments herein provide for an encoding system for a specific set of swizzle primitives across a plurality of packed data elements in a CSA.
    Type: Grant
    Filed: March 30, 2019
    Date of Patent: October 27, 2020
    Assignee: Intel Corporation
    Inventors: Jesus Corbal, Rohan Sharma, Simon Steely, Jr., Chinmay Ashok, Kent D. Glossop, Dennis Bradford, Paul Caprioli, Louise Huot, Kermin ChoFleming, Barry Tannenbaum
  • Patent number: 10684860
    Abstract: This invention provides a high performance processor system and a method based on a common general purpose unit, it may be configured into a variety of different processor architectures; before the processor executes instructions, the instruction is filled into the instruction read buffer, which is directly accessed by the processor core, then instruction read buffer actively provides instructions to processor core to execute, achieving a high cache hit rate.
    Type: Grant
    Filed: July 6, 2018
    Date of Patent: June 16, 2020
    Assignee: SHANGHAI XINHAO MICROELECTRONICS CO. LTD.
    Inventor: Kenneth Chenghao Lin
  • Patent number: 10671583
    Abstract: Techniques for performing database operations using vectorized instructions are provided. In one technique, it is determined whether to perform a database operation using one or more vectorized instructions or without using any vectorized instructions. This determination may comprise estimating a first cost of performing the database operation using one or more vectorized instructions and estimating a second cost of performing the database operation without using any vectorized instructions. Multiple factors that may be used to determine which approach to follow, such as the number of data elements that may fit into a SIMD register, a number of vectorized instructions in the vectorized approach, a number of data movement instructions that involve moving data from a SIMD register to a non-SIMD register and/or vice versa, a size of a cache, and a projected size of a hash table.
    Type: Grant
    Filed: August 24, 2017
    Date of Patent: June 2, 2020
    Assignee: Oracle International Corporation
    Inventors: Rajkumar Sen, Sam Idicula, Nipun Agarwal
  • Patent number: 10572409
    Abstract: A memory arrangement can store a matrix of matrix data elements specified as index-value pairs that indicate row and column indices and associated values. First split-and-merge circuitry is coupled between the memory arrangement and a first set of FIFO buffers for reading the matrix data elements from the memory arrangement and putting the matrix data elements in the first set of FIFO buffers based on column indices. A pairing circuit is configured to read vector data elements, pair the vector data elements with the matrix data elements, and put the paired matrix and vector data elements in a second set of FIFO buffers based on column indices. Second split-and-merge circuitry is configured to read paired matrix and vector data elements from the second set of FIFO buffers and put the paired matrix and vector data elements in a third set of FIFO buffers based on row indices.
    Type: Grant
    Filed: May 10, 2018
    Date of Patent: February 25, 2020
    Assignee: XILINX, INC.
    Inventors: Jindrich Zejda, Ling Liu, Yifei Zhou, Ashish Sirasao
  • Patent number: 10564964
    Abstract: Systems and methods are provided for executing an instruction. The method may include loading a first vector into a first location, the first vector including a plurality of first data elements and loading a second vector into a second location, the second vector including a plurality of second data elements. The method may further include comparing the plurality of first data elements of the first vector to the plurality of data elements of the second vector and performing one or more operations on the plurality of first and second data elements based on at least one vector cross-compare instruction. The one or more operations include counting a number of data elements of the plurality of first and second data elements that satisfy at least one condition, counting a number of times specified values occur in the plurality of first and second data elements, and generating sequence counts for duplicated values.
    Type: Grant
    Filed: August 23, 2016
    Date of Patent: February 18, 2020
    Assignee: International Business Machines Corporation
    Inventors: Jeffrey H. Derby, Robert K. Montoye, Dheeraj Sreedhar
  • Patent number: 10459843
    Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. An element duplication unit optionally duplicates data element an instruction specified number of times. A vector masking unit limits data elements received from the element duplication unit to least significant bits within an instruction specified vector length. If the vector length is less than a stream head register size, the vector masking unit stores all 0's in excess lanes of the stream head register (group duplication disabled) or stores duplicate copies of the least significant bits in excess lanes of the stream head register.
    Type: Grant
    Filed: December 30, 2016
    Date of Patent: October 29, 2019
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventor: Joseph Zbiciak
  • Patent number: 10394891
    Abstract: A novel distributed graph database is provided that is designed for efficient graph data storage and processing on modern computing architectures. In particular a single node graph database and a runtime & communication layer allows for composing a distributed graph database from multiple single node instances.
    Type: Grant
    Filed: August 5, 2016
    Date of Patent: August 27, 2019
    Assignee: International Business Machines Corporation
    Inventors: Chun-Fu Chen, Jason L. Crawford, Ching-Yung Lin, Jie Lu, Mark R. Nutter, Toyotaro Suzumura, Ilie G. Tanase, Danny L. Yeh
  • Patent number: 10235398
    Abstract: An object of the present invention is to efficiently perform a data load process or a data store process between a memory and a storage unit in a processor. The processor includes: a plurality of storage units associated with a plurality of data elements included in a data set; and a control unit that reads the plurality of data elements stored in adjacent storage areas from a memory, in which a plurality of the data sets is stored, collectively for respective data sets, sorts the respective read data elements to a storage unit corresponding to the data element among the plurality of storage units, and writes the data elements to the respective data sets.
    Type: Grant
    Filed: April 23, 2015
    Date of Patent: March 19, 2019
    Assignee: Renesas Electronics Corporation
    Inventor: Masayuki Kimura
  • Patent number: 10026458
    Abstract: Memories and methods for performing an atomic memory operation are disclosed, including a memory having a memory store, operation logic, and a command decoder. Operation logic can be configured to receive data and perform operations thereon in accordance with internal control signals. A command decoder can be configured to receive command packets having at least a memory command portion in which a memory command is provided and data configuration portion in which configuration information related to data associated with a command packet is provided. The command decoder is further configured to generate a command control signal based at least in part on the memory command and further configured to generate control signal based at least in part on the configuration information.
    Type: Grant
    Filed: October 21, 2010
    Date of Patent: July 17, 2018
    Assignee: Micron Technology, Inc.
    Inventor: David Resnick
  • Patent number: 9766858
    Abstract: A data processing system supports vector operands with components representing different bit significance portions of an integer number. Processing circuitry performs a processing operation specified by a program instruction in dependence upon a number of components comprising the vector as specified by metadata for the vector.
    Type: Grant
    Filed: December 24, 2014
    Date of Patent: September 19, 2017
    Assignee: ARM Limited
    Inventors: David Raymond Lutz, Neil Burgess, Christopher Neal Hinds