Patents Examined by Eric Coleman
  • Patent number: 12045617
    Abstract: Software instructions are executed on a processor within a computer system to configure a steaming engine with stream parameters to define a multidimensional array. The stream parameters define a size for each dimension of the multidimensional array and a specified width for two selected dimensions of the array. Data is fetched from a memory coupled to the streaming engine responsive to the stream parameters. A stream of vectors is formed for the multidimensional array responsive to the stream parameters from the data fetched from memory. When either selected dimension in the stream of vectors exceeds a respective specified width, the streaming engine inserts null elements into each portion of a respective vector for the selected dimension that exceeds the specified width in the stream of vectors. Stream vectors that are completely null are formed by the streaming engine without accessing the system memory for respective data.
    Type: Grant
    Filed: February 14, 2022
    Date of Patent: July 23, 2024
    Assignee: Texas Instruments Incorporated
    Inventors: William Franklin Leven, Asheesh Bhardwaj, Son Hung Tran, Timothy David Anderson
  • Patent number: 12039334
    Abstract: A neural processor and method for assigning config ID for neural core included in the same are provided. The neural processor includes a core array comprising a first neural core, a second neural core, a first data line connecting the first neural core and the second neural core in series, and a config line connecting the first neural core and the second neural core in series, an ID config manager configured to assign a first config ID to the first neural core and a second config ID to the second neural core via the config line, and a memory configured to input and output data to and from the core array via the first data line.
    Type: Grant
    Filed: October 26, 2023
    Date of Patent: July 16, 2024
    Assignee: Rebellions Inc.
    Inventors: Wongyu Shin, Juyeong Yoon, Sangeun Je
  • Patent number: 12039001
    Abstract: Described herein is a graphics processor including a plurality of processing clusters coupled with a host interface, each processing cluster comprising a plurality of multiprocessors, the plurality of multiprocessors interconnected via a data interconnect, and each multiprocessor comprising sparse matrix multiply acceleration hardware including a systolic processing array with feedback inputs.
    Type: Grant
    Filed: April 17, 2023
    Date of Patent: July 16, 2024
    Assignee: Intel Corporation
    Inventors: Subramaniam Maiyuran, Jorge Parra, Supratim Pal, Ashutosh Garg, Shubra Marwaha, Chandra Gurram, Darin Starkey, Durgesh Borkar, Varghese George
  • Patent number: 12032490
    Abstract: A method for sorting of a vector in a processor is provided that includes performing, by the processor in response to a vector sort instruction, sorting of values stored in lanes of the vector to generate a sorted vector, wherein the values are sorted in an order indicated by the vector sort instruction, and storing the sorted vector in a storage location.
    Type: Grant
    Filed: December 1, 2022
    Date of Patent: July 9, 2024
    Assignee: Texas Instruments Incorporated
    Inventors: Timothy David Anderson, Mujibur Rahman
  • Patent number: 12032966
    Abstract: A processor with instruction storage configured to store processor instructions, data storage configured to store processor data representing an array, the array including plural data elements, a controller, and an instruction pipeline. The instruction pipeline includes: a load stage circuit configured to load an array element from the data storage, a compare stage circuit configured to compare the array element to a reference value, a store stage circuit configured to store a set of results that includes a result of the comparison of the array element to the reference value, and a loop hit detect stage circuit configured to determine whether any of the set of results is associated with a hit on the reference value.
    Type: Grant
    Filed: September 30, 2022
    Date of Patent: July 9, 2024
    Assignee: Texas Instruments Incorporated
    Inventors: Alan Davis, Venkatesh Natarajan, Alexander Tessarolo
  • Patent number: 12033033
    Abstract: A quantum processor comprises a plurality of tiles, the plurality of tiles arranged in a first grid, and where a first tile of the plurality of tiles comprises a number of qubits (e.g., superconducting qubits). The quantum processor further comprises a shift register comprising at least one shift register stage communicatively coupled to a frequency-multiplexed resonant (FMR) readout, a qubit readout device, a plurality of digital-to-analog converter (DAC) buffer stages, and a plurality of shift-register-loadable DACs arranged in a second grid. The quantum processor may further include a transmission line comprising at least one transmission line inductance, a superconducting resonator, and a coupling capacitance that communicatively couples the superconducting resonator to the transmission line. A digital processor may program at least one of the plurality of shift-register-loadable DACs. Programming the first tile may be performed in parallel with programming a second tile of the plurality of tiles.
    Type: Grant
    Filed: June 11, 2020
    Date of Patent: July 9, 2024
    Assignee: D-WAVE SYSTEMS INC.
    Inventor: Kelly T. R. Boothby
  • Patent number: 12026607
    Abstract: A neural network accelerator executes instructions to: load a first weight data element of an array of weight data elements from a memory into a systolic array; extract, from the instructions, information indicating a first number of input data elements to be obtained from a first address of the memory and a second number of input data elements to be skipped between adjacent input data elements to be obtained, the first address being based on first coordinates of the first weight data element, and the first and second numbers being based on a stride of a convolution operation; based on the information, obtain first input data elements from the first address of the memory; and control the systolic array to perform first computations based on the first weight data element and the first input data elements to generate first output data elements of an output data array.
    Type: Grant
    Filed: October 12, 2022
    Date of Patent: July 2, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Jeffrey T. Huynh, Ron Diamant
  • Patent number: 12020028
    Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described.
    Type: Grant
    Filed: December 26, 2020
    Date of Patent: June 25, 2024
    Assignee: Intel Corporation
    Inventors: Naveen Mellempudi, Alexander F. Heinecke, Robert Valentine, Mark J. Charney, Christopher J. Hughes, Evangelos Georganas, Zeev Sperber, Amit Gradstein, Simon Rubanovich
  • Patent number: 12020034
    Abstract: An instruction execution method for a microprocessor is provided. The microprocessor includes a model specific register (MSR). And, the instruction execution method includes the following steps. A target instruction is received using an instruction cache. The target instruction is decoded using an instruction translator to determine whether the target instruction is a specific instruction is a specific instruction. When the target instruction is the specific instruction, a model specific register index of the target instruction is obtained to directly read or write the model specific register.
    Type: Grant
    Filed: November 4, 2022
    Date of Patent: June 25, 2024
    Assignee: SHANGHAI ZHAOXIN SEMICONDUCTOR CO., LTD.
    Inventors: Weilin Wang, Yingbing Guan, Long Cheng, Lei Yi
  • Patent number: 12014181
    Abstract: An instruction configuration and execution method includes the following steps. A target instruction is received through an instruction cache. The target instruction is decoded by an instruction translator. It is determined whether the target instruction has the authority to read or write the model specific register in an unprivileged state. It is determined whether the model specific register index of the specific instruction corresponds to a specific model specific register, so as to order the microprocessor to perform an instruction serialization operation.
    Type: Grant
    Filed: November 4, 2022
    Date of Patent: June 18, 2024
    Assignee: SHANGHAI ZHAOXIN SEMICONDUCTOR CO., LTD.
    Inventors: Weilin Wang, Yingbing Guan, Lei Yi, Long Cheng
  • Patent number: 12013811
    Abstract: A method of workload management for distributing workload over a single instruction and multiple data (SIMD) based parallel processing architecture includes pre-processing of vert large ontology file on the host processor. The outcome of the preprocessing step is a set of arrays which are loaded on the SIMD based parallel processing architecture to process the ontology file over input text documents of any kind and generate the tagged outcome. The method provides the maximum granularity to facilitate allocation of the maximum number of software threads on the SIMD based parallel processing architecture to achieve minimum document processing latency.
    Type: Grant
    Filed: December 1, 2022
    Date of Patent: June 18, 2024
    Assignee: INNOPLEXUS AG
    Inventor: Adarsh Jain
  • Patent number: 12007937
    Abstract: In a system with control logic and a processing element array, two modes of operation may be provided. In the first mode of operation, the control logic may configure the system to perform matrix multiplication or 1×1 convolution. In the second mode of operation, the control logic may configure the system to perform 3×3 convolution. The processing element array may include an array of processing elements. Each of the processing elements may be configured to compute the dot product of two vectors in a single clock cycle, and further may accumulate the dot products that are sequentially computed over time.
    Type: Grant
    Filed: November 29, 2023
    Date of Patent: June 11, 2024
    Assignee: Recogni Inc.
    Inventors: Jian hui Huang, Gary S. Goldman
  • Patent number: 12001311
    Abstract: An apparatus for computing functions using polynomial-based approximation, comprising one or more processing circuitries configured for computing a polynomial-based approximant approximating a function by executing one or more iterations. Each iteration comprising computing the polynomial-based approximant using scaled fixed-point unit(s) according to a constructed set of coefficients, minimizing an approximation error of the computed polynomial-based approximant compared to the function while complying with one or more constraints selected from a group comprising at least: an accuracy, a compute graph size, a computation complexity, and a hardware utilization of the processing circuitry(s), adjusting one or more of the coefficients in case the approximation error is incompliant with the constraint(s) and initiating another iteration.
    Type: Grant
    Filed: January 6, 2022
    Date of Patent: June 4, 2024
    Assignee: Next Silicon Ltd
    Inventor: Daniel Khankin
  • Patent number: 12001841
    Abstract: A content-addressable processing engine, also referred to herein as CAPE, is provided. Processing-in-memory (PIM) architectures attempt to overcome the von Neumann bottleneck by combining computation and storage logic into a single component. CAPE provides a general-purpose PIM microarchitecture that provides acceleration of vector operations while being programmable with standard reduced instruction set computing (RISC) instructions, such as RISC-V instructions with standard vector extensions. CAPE can be implemented as a standalone core that specializes in associative computing, and that can be integrated in a tiled multicore chip alongside other types of compute engines. Certain embodiments of CAPE achieve average speedups of 14× (up to 254×) over an area-equivalent out-of-order processor core tile with three levels of caches across a diverse set of representative applications.
    Type: Grant
    Filed: September 22, 2022
    Date of Patent: June 4, 2024
    Assignee: Cornell University
    Inventors: José F. Martínez, Helena Caminal, Kailin Yang, Khalid Al-Hawaj, Christopher Batten
  • Patent number: 11995441
    Abstract: Systems and methods for instruction decoding using hash tables. An example method of constructing a decoding tree comprises: generating an aggregated vector of differentiating bit scores representing at least a subset of a set of processor instructions; identifying, based on the aggregated vector of differentiating bit scores, one or more opcode bit positions; and constructing a hash table implementing a current level of a decoding tree representing the subset of the set of processor instructions, wherein the hash table is indexed by one or more opcode bits identified by the one or more opcode bit positions.
    Type: Grant
    Filed: November 11, 2022
    Date of Patent: May 28, 2024
    Assignee: Parallels International GmbH
    Inventors: Alexey Koryakin, Nikolay Dobrovolskiy
  • Patent number: 11989580
    Abstract: Embodiments described herein provide a system, method, and apparatus to accelerate reduce operations in a graphics processor. One embodiment provides an apparatus including one or more processors, the one or more processors including a first logic unit to perform a merged write, barrier, and read operation in response to a barrier synchronization request from a set of threads in a work group, synchronize the set of threads, and report a result of an operation specified in association with the barrier synchronization request.
    Type: Grant
    Filed: March 10, 2021
    Date of Patent: May 21, 2024
    Assignee: Intel Corporation
    Inventors: Yong Jiang, Yuanyuan Li, Jianghong Du, Kuilin Chen, Thomas A. Tetzlaff
  • Patent number: 11989155
    Abstract: A computing device includes an array of processing elements mutually connected to perform single instruction multiple data (SIMD) operations, memory cells connected to each processing element to store data related to the SIMD operations, and a cache connected to each processing element to cache data related to the SIMD operations. Caches of adjacent processing elements are connected. The same or another computing device includes rows of mutually connected processing elements to share data. The computing device further includes a row arithmetic logic unit (ALU) at each row of processing elements. The row ALU of a respective row is configured to perform an operation with processing elements of the respective row.
    Type: Grant
    Filed: September 12, 2022
    Date of Patent: May 21, 2024
    Assignee: UNTETHER AI CORPORATION
    Inventors: William Martin Snelgrove, Jonathan Scobbie
  • Patent number: 11983536
    Abstract: Systems and methods herein address power for one or more processing units, using one of a plurality of power profiles during execution of a group of real-time instructions, the one of the plurality of power profiles determined based in part on a relationship determined between the one of the plurality of power profiles and a power profile of the group of real-time instructions, the relationship limited by a threshold, and the plurality of power profiles are associated with a plurality of groups of reference instructions.
    Type: Grant
    Filed: October 26, 2022
    Date of Patent: May 14, 2024
    Assignee: Nvidia Corporation
    Inventors: Michael Houston, Ryan Kelsey Albright, Benjamin Goska, Siddha Ganju, Elad Mentovich
  • Patent number: 11977885
    Abstract: An apparatus to facilitate utilizing structured sparsity in systolic arrays is disclosed. The apparatus includes a processor comprising a systolic array to receive data from a plurality of source registers, the data comprising unpacked source data, structured source data that is packed based on sparsity, and metadata corresponding to the structured source data; identify portions of the unpacked source data to multiply with the structured source data, the portions of the unpacked source data identified based on the metadata; and output, to a destination register, a result of multiplication of the portions of the unpacked source data and the structured source data.
    Type: Grant
    Filed: November 30, 2020
    Date of Patent: May 7, 2024
    Assignee: INTEL CORPORATION
    Inventors: Subramaniam Maiyuran, Jorge Parra, Ashutosh Garg, Chandra Gurram, Chunhui Mei, Durgesh Borkar, Shubra Marwaha, Supratim Pal, Varghese George, Wei Xiong, Yan Li, Yongsheng Liu, Dipankar Das, Sasikanth Avancha, Dharma Teja Vooturi, Naveen K. Mellempudi
  • Patent number: 11977886
    Abstract: Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in at least a form of decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and destination memory information, and execution circuitry to execute the decoded instruction to store each data element of configured rows of the identified source matrix operand to memory based on the destination memory information.
    Type: Grant
    Filed: March 28, 2022
    Date of Patent: May 7, 2024
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Menachem Adelman, Elmoustapha Ould-Ahmed-Vall, Bret L. Toll, Milind B. Girkar, Zeev Sperber, Mark J. Charney, Rinat Rappoport, Jesus Corbal, Stanislav Shwartsman, Igor Yanover, Alexander F. Heinecke, Barukh Ziv, Dan Baum, Yuri Gebil, Raanan Sade