Patents Examined by Cheng-Yuan Tseng
  • Patent number: 12367046
    Abstract: A method of activating scheduling instructions within a parallel processing unit is described. The method comprises decoding, in an instruction decoder, an instruction in a scheduled task in an active state and checking, by an instruction controller, if a swap flag is set in the decoded instruction. If the swap flag in the decoded instruction is set, a scheduler is triggered to de-activate the scheduled task by changing the scheduled task from the active state to a non-active state.
    Type: Grant
    Filed: December 5, 2022
    Date of Patent: July 22, 2025
    Assignee: Imagination Technologies Limited
    Inventors: Simon Nield, Yoong-Chert Foo, Adam de Grasse, Luca Iuliano
  • Patent number: 12367047
    Abstract: Systems and methods are disclosed for debug path profiling. For example, a processor pipeline may execute instructions. A debug trace circuitry may, responsive to an indication of a non-sequential execution of an instruction by the processor pipeline, generate a record including an address pair and one or more counter values. The address pair may include a first address corresponding to a first instruction before the non-sequential execution and a second address corresponding to a second instruction resulting in the non-sequential execution. The one or more counter values may indicate, for example, a count of instructions executed, a type of instruction executed, cache misses, cycles consumed by cache misses, translation lookaside buffer misses, cycles consumed by translation lookaside buffer misses, and/or processor stalls.
    Type: Grant
    Filed: November 6, 2023
    Date of Patent: July 22, 2025
    Assignee: SiFive, Inc.
    Inventor: Bruce Ableidinger
  • Patent number: 12360767
    Abstract: A data processing apparatus comprises processing circuitry to execute processing instructions, the processing circuitry comprising: a set of physical registers; instruction decoder circuitry to decode processing instructions; detector circuitry to detect groups of instructions which comply with a conflict condition, in which a group of instructions complies with the conflict condition at least when a given storage element is written to by a maximum of one instruction of that group of instructions; instruction issue circuitry to issue decoded instructions for execution; and instruction execution circuitry to execute instructions decoded by the instruction decoder circuitry.
    Type: Grant
    Filed: March 3, 2023
    Date of Patent: July 15, 2025
    Assignee: Arm Limited
    Inventors: Michael Jean Sole, Cedric Denis Robert Airaud
  • Patent number: 12360805
    Abstract: Processors, systems and methods are provided for thread level parallel processing. A processor may include a sequencer and a plurality of columns of vector processing units coupled to the sequencer. The sequencer may include a scalar instruction decoder, a vector instruction decoder, and a plurality of scalar processors configured to concurrently execute one or more scalar instructions decoded by the scalar instruction decoder to generate one or more vectors of parameters. The vector instruction decoder may be configured to decode one or more vector instructions to generate a set of configurations with the one or more vectors of parameters embedded as one or more vectors of immediate values and send the configurations to a target column. Each column of vector processing units may be configured to repeatedly execute the vector operations in the configurations using one or more respective elements of one or more vectors of immediate values per repetition.
    Type: Grant
    Filed: July 10, 2023
    Date of Patent: July 15, 2025
    Assignee: AzurEngine Technologies Zhuhai Inc.
    Inventors: Toshio Nagata, Yuan Li, Jianbin Zhu
  • Patent number: 12346797
    Abstract: This specification describes methods and systems for accessing attribute data in graph neural network (GNN) processing. An example system includes: a plurality of cores, each of the plurality of cores comprises a key-value fetcher and a filter, and is programmable using a software interface to support a plurality of data formats of the GNN attribute data, wherein: the key-value fetcher is programmable using the software interface to perform key-value fetching associated with accessing the GNN attribute data, and the filter of at least one of the plurality of cores is programmable using the software interface to sample node identifiers associated with accessing the GNN attribute data; and a first memory communicatively coupled with the plurality of cores, wherein the first memory is configured to store data shared by the plurality of cores.
    Type: Grant
    Filed: January 12, 2022
    Date of Patent: July 1, 2025
    Assignee: Alibaba Damo (Hangzhou) Technology Co., Ltd.
    Inventors: Heng Liu, Shuangchen Li, Tianchan Guan, Hongzhong Zheng
  • Patent number: 12333418
    Abstract: A neural network device including an on-chip buffer memory that stores an input feature map of a first layer of a neural network, a computational circuit that receives the input feature map of the first layer through a single port of the on-chip buffer memory and performs a neural network operation on the input feature map of the first layer to output an output feature map of the first layer corresponding to the input feature map of the first layer, and a controller that transmits the output feature map of the first layer to the on-chip buffer memory through the single port to store the output feature map of the first layer and the input feature map of the first layer together in the on-chip buffer memory.
    Type: Grant
    Filed: October 18, 2023
    Date of Patent: June 17, 2025
    Assignees: Samsung Electronics Co., Ltd., UNIST (ULSAN NATIONAL INSTITUTE OF SCIENCE AND TECHNOLOGY)
    Inventors: Hyeongseok Yu, Hyeonuk Sim, Jongeun Lee
  • Patent number: 12321744
    Abstract: A computer-implemented method for hardware gather optimization can include identifying, by at least one processor, one or more gather instructions that retrieve data from contiguous memory locations. The method can additionally include converting, by the at least one processor, the one or more gather instructions into one or more strided load instructions in response to the identification. The method can also include loading, by the at least one processor, data retrieved using the one or more strided load instructions into one or more vector registers. Various other methods, systems, and computer-readable media are also disclosed.
    Type: Grant
    Filed: June 27, 2023
    Date of Patent: June 3, 2025
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Ashish Jha
  • Patent number: 12321298
    Abstract: A physical layer module and a network module are provided. The network module includes the physical layer module and a media access control module. The physical layer module includes a group decoder, an input selection module, and a device module. The group decoder decodes a common input data signal generated according to a management data input/output signal to generate a group selection signal. The input selection module includes X input circuits being classified into M groups. The X input circuits generate X device input data according to the common input data signal and the group selection signal. The device module includes K physical layer devices classified into M groups. The K physical devices receive X device input data from the X input circuits. An m-th group corresponds to at least one input circuit and N[m] physical layer devices.
    Type: Grant
    Filed: February 5, 2024
    Date of Patent: June 3, 2025
    Assignee: FARADAY TECHNOLOGY CORPORATION
    Inventor: Chun-Yuan Lai
  • Patent number: 12321747
    Abstract: A method is described herein. The method generally includes fetching a set of data from a memory coupled to a memory controller. The method generally includes determining a first subset of data from the set of data. The method generally includes determining a second subset of data from the set of data. The method generally includes determining a first element from the set of data. The method generally includes providing a vector including the first subset, the first element, and the second subset, wherein each element of the first subset is disposed in one portion of the vector and each element of the second subset is disposed in another portion of the vector. The method generally includes storing the vector into a register of the memory controller.
    Type: Grant
    Filed: February 6, 2023
    Date of Patent: June 3, 2025
    Assignee: Texas Instruments Incorporated
    Inventors: Asheesh Bhardwaj, Burton Adrik Copeland, Tim Anderson
  • Patent number: 12314715
    Abstract: Apparatus and methods for tracking sub-micro-operations and groups thereof are described. An integrated circuit includes a load store unit configured to receive store micro-operations cracked from a vector store instruction. The load store unit is configured to unroll multiple store sub-micro-operations from each of the store micro-operations. The load store unit includes an issue status vector to track issuance of each sub-micro-operation, an unroll status vector to track unrolling of each sub-micro-operation associated with a group of sub-micro-operations, and a replay status vector to track a replayability of sub-micro-operations associated with the group of sub-micro-operations.
    Type: Grant
    Filed: June 15, 2023
    Date of Patent: May 27, 2025
    Assignee: SiFive, Inc.
    Inventors: Yueh Chi Wu, Yohann Rabefarihy
  • Patent number: 12314201
    Abstract: Disclosed herein is a method for distributed training of an AI model in a channel-sharing network environment. The method includes determining whether data parallel processing is applied, calculating a computation time and a communication time when input data is evenly distributed across multiple computation devices, and unevenly distributing the input data across the multiple computation devices based on the computation time and the communication time.
    Type: Grant
    Filed: June 30, 2023
    Date of Patent: May 27, 2025
    Assignee: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Ki-Dong Kang, Hong-Yeon Kim, Baik-Song An, Myung-Hoon Cha
  • Patent number: 12293186
    Abstract: Embodiments detailed herein relate to systems and methods to store a tile register pair to memory. In one example, a processor includes: decode circuitry to decode a store matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded store matrix pair instruction to store every element of left and right tiles of the identified source matrix to corresponding element positions of left and right tiles of the identified destination matrix, respectively, wherein the executing stores a chunk of C elements of one row of the identified source matrix at a time.
    Type: Grant
    Filed: November 2, 2023
    Date of Patent: May 6, 2025
    Assignee: Intel Corporation
    Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
  • Patent number: 12292842
    Abstract: Examples described herein relate to network layer 7 (L7) offload to an infrastructure processing unit (IPU) for a service mesh. An apparatus described herein includes an IPU comprising an IPU memory to store a routing table for a service mesh, the routing table to map shared memory address spaces of the IPU and a host device executing one or more microservices, wherein the service mesh provides an infrastructure layer for the one or more microservices executing on the host device; and one or more IPU cores communicably coupled to the IPU memory, the one or more IPU cores to: host a network L7 proxy endpoint for the service mesh, and communicate messages between the network L7 proxy endpoint and an L7 interface device of the one or more microservices by copying data between the shared memory address spaces of the IPU and the host device based on the routing table.
    Type: Grant
    Filed: September 27, 2021
    Date of Patent: May 6, 2025
    Assignee: Intel Corporation
    Inventors: Mrittika Ganguli, Anjali Jain, Reshma Lal, Edwin Verplanke, Priya Autee, Chih-Jen Chang, Abhirupa Layek, Nupur Jain
  • Patent number: 12282525
    Abstract: Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address, and execution circuitry to execute the decoded instruction to store configuration information about usage of storage for two-dimensional data structures at the memory address.
    Type: Grant
    Filed: November 3, 2023
    Date of Patent: April 22, 2025
    Assignee: Intel Corporation
    Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
  • Patent number: 12284122
    Abstract: A circuit and corresponding method perform resource arbitration. The circuit comprises a pending arbiter (PA) that outputs a PA selection for accessing a resource. The PA selection is based on PA input. The PA input represents respective pending-state of requesters of the resource. The circuit further comprises a valid arbiter (VA) that outputs a VA selection for accessing the resource. The VA selection is based on VA input. The VA input represents respective valid-state of the requesters. The circuit performs a validity check on the PA selection output. The circuit outputs a final selection for accessing the resource by selecting, based on the validity check performed, the PA selection output or VA selection output. The circuit addresses arbitration fairness issues that may result when multiple requesters are arbitrating to be selected for access to a shared resource and such requesters require a credit (token) to be eligible for arbitration.
    Type: Grant
    Filed: February 6, 2024
    Date of Patent: April 22, 2025
    Assignee: Marvell Asia Pte Ltd
    Inventors: Joseph Featherston, Aadeetya Shreedhar
  • Patent number: 12277074
    Abstract: Techniques are disclosed pertaining to utilizing a communication fabric via multiple ports. An agent circuit includes a plurality of command-and-data ports that couple the agent circuit to a communication fabric coupled to a plurality of hardware components that includes a plurality of memory controller circuits that facilitate access to a memory. The agent circuit can execute an instruction that involves issuing a command for data stored at the memory. The agent circuit may perform a hash operation on a memory address associated with the command to determine which one of the plurality of memory controller circuits to which to issue the command. The agent circuit issues the command to the determined memory controller circuit on a particular one of the plurality of command-and-data ports that is designated to the memory controller circuit. The agent circuit may issue all commands destined to that memory controller circuit on that port.
    Type: Grant
    Filed: September 25, 2023
    Date of Patent: April 15, 2025
    Assignee: Apple Inc.
    Inventors: Sergio Kolor, Sandeep Gupta, James Vash
  • Patent number: 12265488
    Abstract: An apparatus includes a first die connected to a second die through a die-to-die (D2D) interface. The first die includes a first interconnect configured to provide first lanes communicating with the second die to the D2D interface, the first interconnect includes a first logic circuit configured to indicate a correlation between a number of chiplet dies connected to the first lanes and connected signal pins from among a plurality of signal pins of the connected chiplet dies. The second die includes the number of connected chiplet dies each including a second interconnect configured to provide second lanes to the D2D interface from each of the connected chiplet dies. The second lanes are configured to be set according to a number of the connected signal pins of the connected chiplet dies.
    Type: Grant
    Filed: October 31, 2023
    Date of Patent: April 1, 2025
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Wangyong Im, Byoungkon Jo, Gyesik Oh, Duksung Kim, Jangseok Choi
  • Patent number: 12260222
    Abstract: This application discloses an exception handling method, which may be applied to a processor. The method includes: The processor calls a second function according to a call instruction of a first function, where the first function is a high-level language function, and the second function is a runtime function. When an exception occurs in a process of executing the second function, the processor executes a return operation of the second function, where the return operation of the second function includes restoring a status of a first register used when the second function is executed to a status before the first function calls the second function. The processor performs exception handling based on the status of the first register. The method can improve running performance of the processor.
    Type: Grant
    Filed: August 24, 2023
    Date of Patent: March 25, 2025
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventor: Ning Chu
  • Patent number: 12260906
    Abstract: A hardware/software co-compressed computing method for a static random access memory (SRAM) computing-in-memory-based (CIM-based) processing unit includes performing a data dividing step, a sparsity step, an address assigning step and a hardware decoding and calculating step. The data dividing step is performed to divide a plurality of kernels into a plurality of weight groups. The sparsity step includes performing a weight setting step. The weight setting step is performed to set each of the weight groups to one of a zero weight group and a non-zero weight group. The address assigning step is performed to assign a plurality of index codes to a plurality of the non-zero weight groups, respectively. The hardware decoding and calculating step is performed to execute an inner product to the non-zero weight groups and the input feature data group corresponding to the non-zero weight groups to generate the output feature data group.
    Type: Grant
    Filed: September 3, 2021
    Date of Patent: March 25, 2025
    Assignee: NATIONAL TSING HUA UNIVERSITY
    Inventors: Kea-Tiong Tang, Syuan-Hao Sie, Jye-Luen Lee
  • Patent number: 12248814
    Abstract: Provided is an apparatus for accelerating graph neural network (GNN) pre-processing, the apparatus including a set-partitioning accelerator configured to sort each edge of an original graph stored in a coordinate list (COO) format by a node number, perform radix sorting based on a vertex identification (VID) to generate a COO array of a preset length, and perform uniform random sampling on some nodes of a given node array, a merger configured to merge the COO array of the preset length to generate one sorted COO array, a re-indexer configured to assign new consecutive VIDs respectively to the nodes selected through the uniform random sampling, and a compressed sparse row (CSR) converter configured to the edges sorted by the node number into a CSR format.
    Type: Grant
    Filed: August 22, 2023
    Date of Patent: March 11, 2025
    Assignee: KOREA ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGY
    Inventors: Myoungsoo Jung, Seungkwan Kang, Donghyun Gouk, Miryeong Kwon, Hyunkyu Choi, Junhyeok Jang