Including Coprocessor Patents (Class 712/34)

Digital signal processor (Class 712/35)

Computational primitives using a matrix multiplication accelerator

Patent number: 12373515

Abstract: A method for performing a fundamental computational primitive in a device is provided, where the device includes a processor and a matrix multiplication accelerator (MMA). The method includes configuring a streaming engine in the device to stream data for the fundamental computational primitive from memory, configuring the MMA to format the data, and executing the fundamental computational primitive by the device.

Type: Grant

Filed: April 12, 2024

Date of Patent: July 29, 2025

Assignee: TEXAS INSTRUMENTS INCORPORATED

Inventors: Arthur John Redfern, Timothy David Anderson, Kai Chirca, Chenchi Luo, Zhenhua Yu
Data storage device and method for predictable low-latency in a time-sensitive environment

Patent number: 12248395

Abstract: A data storage device and method are provided for predictable low-latency in a time-sensitive environment. In one embodiment, a data storage device is provided comprising a memory and a controller configured to communicate with the memory. The controller is further configured to: receive, from a host, an indication of a logical block address range that the host will later read; and in response to receiving the indication: read data from the logical block address range; and perform an action on the data to reduce a read latency when the host later reads the logical block address range. Other embodiments are disclosed.

Type: Grant

Filed: July 26, 2023

Date of Patent: March 11, 2025

Assignee: Sandisk Technologies, Inc.

Inventors: Devika Nair, Amit Sharma
Domain crossing in executing instructions in computer processors

Patent number: 12242653

Abstract: Systems, apparatuses, and methods related to securing domain crossing using domain access tables are described. For example, a computer processor can have registers configured to store locations of domain access tables respectively for predefined, non-hierarchical domains. Each respective domain access table can be pre-associated with a respective domain and can have entries configured to identify entry points of the respective domain. The processor is configured to enforce domain crossing in instruction execution using the domain access tables and to prevent arbitrary and/or unauthorized domain crossing.

Type: Grant

Filed: October 27, 2021

Date of Patent: March 4, 2025

Assignee: Micron Technology, Inc.

Inventor: Steven Jeffrey Wallach
Embedded processor supporting fixed-function kernels

Patent number: 12223322

Abstract: A method and apparatus for embedding a microprocessor in a programmable logic device (PLD), where the microprocessor has a logic unit that can operate in two modes. A first mode is a general purpose mode running at least one general purpose process related to the PLD, and a second mode is a fixed function mode emulating a fixed function for use by logic configured into a fabric of the PLD (fabric). A memory unit is coupled to the logic unit and to the fabric, and the fabric is operable for transferring signals with the logic unit in relation to the fixed function.

Type: Grant

Filed: June 28, 2022

Date of Patent: February 11, 2025

Assignee: Microchip Technology Inc.

Inventors: Aaron Severance, Jonathan W. Greene, Joel Vandergriendt
Routing data between processing units indentified by a sequentially ordered list of a packet prefix

Patent number: 12222892

Abstract: A system, and associated method, includes a plurality of data processing units, a target CPU, an interconnect unit that is separate from the target CPU and configured to receive a data payload and a prefix that includes a sequentially ordered list of the processing units that will perform the data operations and the sets of parameters to be used by each of the processing units, and based on the sequentially ordered list, the interconnect unit sends the data payload to a first processing unit, and receives back processed data, then sends the processed data to the subsequent processing unit, and receives back further processed data, and so forth until all of the data operations have been performed by the processing units set forth in the sequentially ordered list.

Type: Grant

Filed: June 16, 2022

Date of Patent: February 11, 2025

Assignee: Eidetic Communications Inc.

Inventors: Sean Gregory Gibb, Saeed Fouladi Fard
Systems and methods for process restoration subsequent to an operating system crash

Patent number: 12204931

Abstract: Disclosed herein are systems and method for restoring a process. An exemplary method may include detecting a crash of an operating system (OS) on a computing device; collecting a memory state of at least one page of physical memory of the OS on the computing device; generating a checkpoint file that includes information related to one or more processes from the collected memory state, wherein the information comprises a state for each of the one or more processes at a time of the crash; for each respective process of the one or more processes, creating, on the computing device or another computing device, a new process corresponding to the respective process; and restoring, based on the checkpoint file, a state of the respective process at the time of the crash such that the new process initiates execution from the restored state.

Type: Grant

Filed: December 8, 2021

Date of Patent: January 21, 2025

Assignee: Virtuozzo International GmbH

Inventor: Vasily Averin
Storage device for mapping virtual streams onto physical streams and method thereof

Patent number: 12197321

Abstract: A storage device includes a controller and nonvolatile memories. The controller receives write commands having virtual stream identifiers (IDs), receives discard commands having the virtual stream IDs, and determines a lifetime of write data to which each of the virtual stream IDs is assigned. The nonvolatile memories are accessed by the controller depending on physical stream IDs. The controller maps the virtual stream IDs and the physical stream IDs based on the lifetime of the write data.

Type: Grant

Filed: December 2, 2022

Date of Patent: January 14, 2025

Assignees: Samsung Electronics Co., Ltd., Research & Business Foundation Sungkyunkwan University

Inventors: Hwanjin Yong, Jin-Soo Kim
Programming model for resource-constrained scheduling

Patent number: 12197954

Abstract: The present technology augments the GPU compute model to provide system-provided data marshalling characteristics of graphics pipelining to increase efficiency and reduce overhead. A simple scheduling model based on scalar counters (e.g., semaphores) abstract the availability of hardware resources. Resource releases can be done programmatically, and a system scheduler only needs to track the states of such counters/semaphores to make work launch decisions. Semantics of the counters/semaphores are defined by an application, which can use the counters/semaphores to represent the availability of free space in a memory buffer, the amount of cache pressure induced by the data flow in the network, or the presence of work items to be processed.

Type: Grant

Filed: March 17, 2021

Date of Patent: January 14, 2025

Assignee: NVIDIA Corporation

Inventors: Yury Uralsky, Henry Moreton, Matthijs de Smedt, Lei Yang
Apparatus and method for complex matrix multiplication

Patent number: 12174911

Abstract: An apparatus and method for complex matrix multiplication. For example, one embodiment of a processor comprises: a decoder to decode a first complex matrix multiplication instruction; execution circuitry to execute the first complex matrix multiplication instruction, the execution circuitry comprising parallel multiplication circuitry to multiply real values from the first plurality of real and imaginary values with corresponding real values from the second plurality of real and imaginary values to generate a first plurality of real products, to multiply imaginary values from the first plurality of real and imaginary values with corresponding imaginary values from the second plurality of real and imaginary values to generate a second plurality of real products; and addition/subtraction circuitry to subtract each real product in the second plurality of real products from a corresponding real product in the first plurality of real products to produce a corresponding real value in the result matrix.

Type: Grant

Filed: December 23, 2020

Date of Patent: December 24, 2024

Assignee: Intel Corporation

Inventors: Menachem Adelman, Robert Valentine, Daniel Towner, Amit Gradstein, Mark Jay Charney
System and architecture including processor and neural network accelerator

Patent number: 12165030

Abstract: A system and method include an accelerator circuit comprising an input circuit block, a filter circuit block, a post-processing circuit block, and an output circuit block and a processor to initialize the accelerator circuit, determining tasks of a neural network application to be performed by at least one of the input circuit block, the filter circuit block, the post-processing circuit block, or the output circuit block, assign each of the tasks to a corresponding one of the input circuit block, the filter circuit block, the post-processing circuit block, or the output circuit block, instruct the accelerator circuit to perform the tasks, and execute the neural network application based on results received from the accelerator circuit completing performance of the tasks.

Type: Grant

Filed: June 18, 2021

Date of Patent: December 10, 2024

Inventors: Mayan Moudgill, John Glossner
Processor related communications

Patent number: 12160369

Abstract: A compute device can access local or remote accelerator devices for use in processing a received packet. The received packet can be processed by any combination of local accelerator devices and remote accelerator devices. In some cases, the received packet can be encapsulated in an encapsulating packet and sent to a remote accelerator device for processing. The encapsulating packet can indicate a priority level for processing the received packet and its associated processing task. The priority level can override a priority level that would otherwise be assigned to the received packet and its associated processing task. The remote accelerator device can specify a fullness of an input queue to the compute device. Other information can be conveyed by packets transmitted between and among compute devices and remote accelerator devices to assist in determining an accelerator to use or other uses.

Type: Grant

Filed: February 15, 2019

Date of Patent: December 3, 2024

Assignee: Intel Corporation

Inventors: Chih-Jen Chang, Daniel Christian Biederman, Matthew James Webb, Wing Cheung, Jose Niell, Robert Hathaway
Method for handling exception or interrupt in heterogeneous instruction set architecture and apparatus

Patent number: 12147813

Abstract: A method for handling an exception or interrupt in a heterogeneous instruction set architecture is provided. A physical host to which the method is applied can support two instruction set architectures. When a secondary architecture virtual machine triggers an exception or interrupt, a virtual machine monitor may translate code of the exception or interrupt in a secondary instruction set architecture into code of the exception or interrupt in a primary instruction set architecture. The virtual machine monitor) may identify the code of the exception or interrupt in the primary instruction set architecture. The virtual machine monitor identifies, based on the translated code, a type of the exception or interrupt triggered by the secondary architecture virtual machine, to handle the exception or interrupt.

Type: Grant

Filed: December 12, 2022

Date of Patent: November 19, 2024

Assignee: Huawei Technologies Co., Ltd.

Inventors: Yifei Jiang, Siqi Zhao, Bo Wan
Training of model for processing sequence data

Patent number: 12136411

Abstract: A technique for training a model is disclosed. A training sample including an input sequence of observations and a target sequence of symbols having length different from the input sequence of observations is obtained. The input sequence of observations is fed into the model to obtain a sequence of predictions. The sequence of predictions is shifted by an amount with respect to the input sequence of observations. The model is updated based on a loss using a shifted sequence of predictions and the target sequence of the symbols.

Type: Grant

Filed: April 3, 2020

Date of Patent: November 5, 2024

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Gakuto Kurata, Kartik Audhkhasi
Neural network processing based on subgraph recognition

Patent number: 12093801

Abstract: Systems and methods for providing executable instructions to a neural network processor are provided. In one example, a system comprises a database that stores a plurality of executable instructions and a plurality of subgraph identifiers, each subgraph identifier of the plurality of subgraph identifiers being associated with a subset of instructions of the plurality of executable instructions.

Type: Grant

Filed: May 3, 2023

Date of Patent: September 17, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Richard John Heaton, Randy Renfu Huang, Ron Diamant
Integrated circuit, data processing device and method

Patent number: 12013804

Abstract: An integrated circuit, and a data processing device and method are provided. The integrated circuit includes a processor circuit and an accelerator circuit. The processor circuit includes a processor, a first data storage section, and a first data input/output interface. The accelerator circuit includes an accelerator and a second data input/output interface. The second data input/output interface is electrically connected to the first data input/output interface, so that the accelerator circuit can perform information interaction with the first data storage section.

Type: Grant

Filed: May 5, 2022

Date of Patent: June 18, 2024

Assignee: Lemon Inc.

Inventors: Yimin Chen, Shan Lu, Junmou Zhang, Chuang Zhang, Yuanlin Cheng, Jian Wang
Decoupled access-execute processing

Patent number: 12001845

Abstract: An apparatus comprises first instruction execution circuitry, second instruction execution circuitry, and a decoupled access buffer. Instructions of an ordered sequence of instructions are issued to one of the first and second instruction execution circuitry for execution in dependence on whether the instruction has a first type label or a second type label. An instruction with the first type label is an access-related instruction which determines at least one characteristic of a load operation to retrieve a data value from a memory address. Instruction execution by the first instruction execution circuitry of instructions having the first type label is prioritised over instruction execution by the second instruction execution circuitry of instructions having the second type label. Data values retrieved from memory as a result of execution of the first type instructions are stored in the decoupled access buffer.

Type: Grant

Filed: October 15, 2020

Date of Patent: June 4, 2024

Assignee: Arm Limited

Inventors: Mbou Eyole, Stefanos Kaxiras
Hardware latency monitoring for memory device input/output requests

Patent number: 11947804

Abstract: A system includes a hardware circuitry having a device coupled with one or more external memory devices. The device is to detect an input/output (I/O) request associated with an external memory device of the one or more external memory devices. The device is to record a first timestamp in response to detecting the IO request transmitted to the external memory device. The device is further to detect an indication from the external memory device of a completion of the IO request associated with the external memory device and record a second timestamp in response to detecting the indication. The device is also to determine a latency associated with the IO request based on the first timestamp and the second timestamp.

Type: Grant

Filed: April 6, 2022

Date of Patent: April 2, 2024

Assignee: NVIDIA Corporation

Inventors: Shridhar Rasal, Oren Duer, Aviv Kfir, Liron Mula
Debug trace streams for core synchronization

Patent number: 11934295

Abstract: The present disclosure provides for synchronization of multi-core systems by monitoring a plurality of debug trace data streams for a redundantly operating system including a corresponding plurality of cores performing a task in parallel; in response to detecting a state difference on one debug trace data stream of the plurality of debug trace data streams relative to other debug trace data streams of the plurality of debug trace data streams: marking a given core associated with the one debug trace data stream as an affected core; and restarting the affected core.

Type: Grant

Filed: November 9, 2021

Date of Patent: March 19, 2024

Assignee: THE BOEING COMPANY

Inventors: David P. Haldeman, Eric J. Miller
Metadata predictor

Patent number: 11928471

Abstract: Embodiments for a metadata predictor. An index pipeline generates indices in an index buffer in which the indices are used for reading out a memory device. A prediction cache is populated with metadata of instructions read from the memory device. A prediction pipeline generates a prediction using the metadata of the instructions from the prediction cache, the populating of the prediction cache with the metadata of the instructions being performed asynchronously to the operating of the prediction pipeline.

Type: Grant

Filed: August 19, 2021

Date of Patent: March 12, 2024

Assignee: International Business Machines Corporation

Inventors: Edward Thomas Malley, Adam Benjamin Collura, Brian Robert Prasky, James Bonanno, Dominic Ditomaso
Systems and methods for use of pre-boot resources by modern workspaces

Patent number: 11915015

Abstract: Systems and methods provide isolated workspaces operating on an IHS (Information Handling System) with use of pre-boot resources of the IHS that are not directly accessible by the workspaces. Upon notification of a workspace initialization, a segregated variable space, such as a segregated memory utilized by a UEFI (Unified Extensible Firmware Interface) of the IHS, is specified for use by the workspace. The segregated variable space is initialized and populated with pre-boot variables, such as UEFI variables, that are allowed for configuration by the workspace. Upon a workspace issuing a request to configure a pre-boot variable, the segregated variable space is identified that was mapped for use by the workspace. The requested pre-boot variable configuration is allowed based on whether the pre-boot variable is populated in the segregated variable space. When the requested pre-boot variable configuration is allowed, the pre-boot variable is configured on behalf of the workspace.

Type: Grant

Filed: August 27, 2021

Date of Patent: February 27, 2024

Assignee: Dell Products, L.P.

Inventors: Balasingh P. Samuel, Vivek Viswanathan Iyer
Method and apparatus to process an instruction for a distributed logic having tightly coupled accelerator core and processor core in a multi-dimensional packaging

Patent number: 11899613

Abstract: A packaging technology to improve performance of an AI processing system resulting in an ultra-high bandwidth system. An IC package is provided which comprises: a substrate; a first die on the substrate, and a second die stacked over the first die. The first die can be a first logic die (e.g., a compute chip, CPU, GPU, etc.) while the second die can be a compute chiplet comprising ferroelectric or paraelectric logic. Both dies can include ferroelectric or paraelectric logic. The ferroelectric/paraelectric logic may include AND gates, OR gates, complex gates, majority, minority, and/or threshold gates, sequential logic, etc. The IC package can be in a 3D or 2.5D configuration that implements logic-on-logic stacking configuration. The 3D or 2.5D packaging configurations have chips or chiplets designed to have time distributed or spatially distributed processing. The logic of chips or chiplets is segregated so that one chip in a 3D or 2.5D stacking arrangement is hot at a time.

Type: Grant

Filed: August 20, 2021

Date of Patent: February 13, 2024

Assignee: KEPLER COMPUTING INC.

Inventors: Amrita Mathuriya, Christopher B. Wilkerson, Rajeev Kumar Dokania, Debo Olaosebikan, Sasikanth Manipatruni
Central processor-coprocessor synchronization

Patent number: 11868780

Abstract: An electronic device that includes a central processor and a coprocessor coupled to the central processor. The central processor includes a plurality of registers and is configured to decode a first set of instructions. The first set of instructions includes a command instruction and an identity of a destination register. The coprocessor is configured to receive the command instruction from the central processor, execute the command instruction, and write a result of the command instruction in the destination register. The central processor is further configured to set a register tag for the destination register at the time the central processor decodes the first set of instructions and to clear the register tag at the time the result is written in the destination register.

Type: Grant

Filed: August 26, 2021

Date of Patent: January 9, 2024

Assignee: TEXAS INSTRUMENTS INCORPORATED

Inventors: Christian Wiencke, Armin Stingl, Jeroen Vliegen
Systems and methods to store a tile register pair to memory

Patent number: 11809869

Abstract: Embodiments detailed herein relate to systems and methods to store a tile register pair to memory. In one example, a processor includes: decode circuitry to decode a store matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded store matrix pair instruction to store every element of left and right tiles of the identified source matrix to corresponding element positions of left and right tiles of the identified destination matrix, respectively, wherein the executing stores a chunk of C elements of one row of the identified source matrix at a time.

Type: Grant

Filed: December 29, 2017

Date of Patent: November 7, 2023

Assignee: Intel Corporation

Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
Systems and methods for resource efficient model learning and model inference

Patent number: 11797856

Abstract: Presented herein are framework embodiments that allow the representation of complex systems and processes that are suitable for resource efficient machine learning and inference. Furthermore, disclosed are new reinforcement learning techniques that are capable of learning to plan and optimize dynamic and nuanced systems and processes. Different embodiments comprising combinations of one or more neural networks, reinforcement learning, and linear programming are discussed to learn representations and models—even for complex systems and methods. Furthermore, the introduction of neural field embodiments and methods to compute a Deep Argmax, as well to invert neural networks and neural fields with linear programming, provide the ability to create models and train models that are accurate and very resource efficient—using less memory, less computations, less time, and, as a result, less energy.

Type: Grant

Filed: June 11, 2020

Date of Patent: October 24, 2023

Assignee: System AI, Inc.

Inventor: Tuna Oezer
Accelerator architecture on a programmable platform

Patent number: 11797473

Abstract: An accelerated processor structure on a programmable integrated circuit device includes a processor and a plurality of configurable digital signal processors (DSPs). Each configurable DSP includes a circuit block, which in turn includes a plurality of multipliers. The accelerated processor structure further includes a first bus to transfer data from the processor to the configurable DSPs, and a second bus to transfer data from the configurable DSPs to the processor.

Type: Grant

Filed: October 8, 2018

Date of Patent: October 24, 2023

Assignee: Altera Corporation

Inventors: David Shippy, Martin Langhammer, Jeffrey Eastlack
Input and output interfaces for transmitting complex computing information between AI processors and computing components of a special function unit

Patent number: 11782722

Abstract: A complex computing device, a complex computing method, an artificial intelligence chip and an electronic apparatus are provided. An input interface receives complex computing instructions and arbitrates each complex computing instruction to a corresponding computing component respectively, according to the computing types in the respective complex computing instructions Each computing component is connected to the input interface, acquires a source operand from a complex computing instruction to perform complex computing, and generates a computing result instruction to feed back to an output interface. The output interface arbitrates the computing result in each computing result instruction to the corresponding instruction source respectively, according to the instruction source identifier in each computing result instruction.

Type: Grant

Filed: January 14, 2021

Date of Patent: October 10, 2023

Assignees: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD., KUNLUNXIN TECHNOLOGY (BEIJING) COMPANY LIMITED

Inventors: Baofu Zhao, Xueliang Du, Kang An, Yingnan Xu, Chao Tang
Apparatus, systems, and methods for low power computational imaging

Patent number: 11768689

Abstract: The present application discloses a computing device that can provide a low-power, highly capable computing platform for computational imaging. The computing device can include one or more processing units, for example one or more vector processors and one or more hardware accelerators, an intelligent memory fabric, a peripheral device, and a power management module. The computing device can communicate with external devices, such as one or more image sensors, an accelerometer, a gyroscope, or any other suitable sensor devices.

Type: Grant

Filed: November 12, 2021

Date of Patent: September 26, 2023

Assignee: Movidius Limited

Inventors: Brendan Barry, Richard Richmond, Fergal Connor, David Moloney
Memory expander, heterogeneous computing device using memory expander, and operation method of heterogenous computing

Patent number: 11726701

Abstract: A memory expander includes a memory device that stores a plurality of task data. A controller controls the memory device. The controller receives metadata and a management request from an external central processing unit (CPU) through a compute express link (CXL) interface and operates in a management mode in response to the management request. In the management mode, the controller receives a read request and a first address from an accelerator through the CXL interface and transmits one of the plurality of task data to the accelerator based on the metadata in response to the read request.

Type: Grant

Filed: October 25, 2021

Date of Patent: August 15, 2023

Inventors: Chon Yong Lee, Jae-Gon Lee, Kyunghan Lee
Debugging dataflow computer architectures

Patent number: 11720475

Abstract: Disclosed in some examples are methods, systems, devices, and machine-readable mediums that use parallel hardware execution with software co-simulation to enable more advanced debugging operations on data flow architectures. Upon a halt to execution of a program thread, a state of the tiles that are executing the thread are saved and offloaded from the HTF to a host system. A developer may then examine this state on the host system to debug their program. Additionally, the state may be loaded into a software simulator that simulates the HTF hardware. This simulator allows for the developer to step through the code and to examine values to find bugs.

Type: Grant

Filed: November 21, 2022

Date of Patent: August 8, 2023

Assignee: Micron Technology, Inc.

Inventors: Skyler Arron Windh, Tony M. Brewer, Patrick Estep
RISC-V-based 3D interconnected multi-core processor architecture and working method thereof

Patent number: 11714649

Abstract: An RISC-V-based 3D interconnected multi-core processor architecture and a working method thereof. The RISC-V-based 3D interconnected multi-core processor architecture includes a main control layer, a micro core array layer and an accelerator layer, wherein the main control layer includes a plurality of main cores which are RISC-V instruction set CPU cores, the micro core array layer includes a plurality of micro unit groups including a micro core, a data storage unit, an instruction storage unit and a linking controller, wherein the micro core is an RISC-V instruction set CPU core that executes partial functions of the main core; the accelerator layer is configured to optimize a running speed of space utilization for accelerators meeting specific requirements, wherein some main cores in the main control layer perform data interaction with the accelerator layer, the other main cores interact with the micro core array layer.

Type: Grant

Filed: December 1, 2021

Date of Patent: August 1, 2023

Assignee: SHANDONG LINGNENG ELECTRONIC TECHNOLOGY CO., LTD.

Inventors: Gang Wang, Jinzheng Mou, Yang An, Moujun Xie, Benyang Wu, Zesheng Zhang, Wenyong Hou, Yongwei Wang, Zixuan Qiu, Xintan Li
Neural network processing based on subgraph recognition

Patent number: 11714992

Abstract: Systems and methods for providing executable instructions to a neural network processor are provided. In one example, a system comprises a database that stores a plurality of executable instructions and a plurality of subgraph identifiers, each subgraph identifier of the plurality of subgraph identifiers being associated with a subset of instructions of the plurality of executable instructions.

Type: Grant

Filed: December 13, 2018

Date of Patent: August 1, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Richard John Heaton, Randy Renfu Huang, Ron Diamant
Configurable apron support for expanded-binning

Patent number: 11682109

Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for configurable aprons for expanded binning. Aspects of the present disclosure include identifying one or more pixel tiles in at least one bin and determining edge information for each pixel tile of the one or more pixel tiles. The edge information may be associated with one or more pixels adjacent to each pixel tile. The present disclosure further describes determining whether at least one adjacent bin is visible based on the edge information for each pixel tile, where the at least one adjacent bin may be adjacent to the at least one bin.

Type: Grant

Filed: October 16, 2020

Date of Patent: June 20, 2023

Assignee: QUALCOMM Incorporated

Inventors: Kalyan Kumar Bhiravabhatla, Krishnaiah Gummidipudi, Ankit Kumar Singh, Andrew Evan Gruber, Pavan Kumar Akkaraju, Srihari Babu Alla, Jonnala Gadda Nagendra Kumar, Vishwanath Shashikant Nikam
Family of lossy sparse load SIMD instructions

Patent number: 11663001

Abstract: Systems, apparatuses, and methods for implementing a family of lossy sparse load single instruction, multiple data (SIMD) instructions are disclosed. A lossy sparse load unit (LSLU) loads a plurality of values from one or more input vector operands and determines how many non-zero values are included in one or more input vector operands of a given instruction. If the one or more input vector operands have less than a threshold number of non-zero values, then the LSLU causes an instruction for processing the one or more input vector operands to be skipped. In this case, the processing of the instruction of the one or more input vector operands is deemed to be redundant. If the one or more input vector operands have greater than or equal to the threshold number of non-zero values, then the LSLU causes an instruction for processing the input vector operand(s) to be executed.

Type: Grant

Filed: November 19, 2018

Date of Patent: May 30, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Sanchari Sen, Derrick Allen Aguren, Joseph Lee Greathouse
Debugging dataflow computer architectures

Patent number: 11507493

Abstract: Disclosed in some examples are methods, systems, devices, and machine-readable mediums that use parallel hardware execution with software co-simulation to enable more advanced debugging operations on data flow architectures. Upon a halt to execution of a program thread, a state of the tiles that are executing the thread are saved and offloaded from the HTF to a host system. A developer may then examine this state on the host system to debug their program. Additionally, the state may be loaded into a software simulator that simulates the HTF hardware. This simulator allows for the developer to step through the code and to examine values to find bugs.

Type: Grant

Filed: August 18, 2021

Date of Patent: November 22, 2022

Assignee: Micron Technology, Inc.

Inventors: Skyler Arron Windh, Tony M. Brewer, Patrick Estep
Acceleration of neural networks using depth-first processing

Patent number: 11429855

Abstract: A method for accelerating a neural network includes identifying neural network layers that meet a locality constraint. Code is generated to implement depth-first processing for different hardware based on the identified neural network layers. The generated code is used to perform the depth-first processing on the neural network based on the generated code.

Type: Grant

Filed: February 6, 2018

Date of Patent: August 30, 2022

Assignee: NEC CORPORATION

Inventors: Nicolas Weber, Felipe Huici, Mathias Niepert
System and method for field programmable gate array-assisted binary translation

Patent number: 11422815

Abstract: Binary translation may be performed by a field programmable gate array (FPGA) integrated with a processor as a single integrated circuit. The FPGA contains multiple blocks of logic for performing different binary translations. The processor may offload the binary translation to the FPGA. The FPGA may use historical logging to skip the binary translation of source instructions that have been previously translated into target instructions.

Type: Grant

Filed: March 1, 2018

Date of Patent: August 23, 2022

Assignee: Dell Products L.P.

Inventors: Mukund P. Khatri, Ramesh Radhakrishnan
Operation accelerator, switch, task scheduling method, and processing system

Patent number: 11403250

Abstract: Examples in this application disclose an operation accelerator, a switch, and a processing system. One example operation accelerator includes a shunt circuit directly connected to a first peripheral component interconnect express (PCIe) device through a PCIe link. The shunt circuit is configured to receive first data sent by the first PCIe device through the PCIe link, and transmit the first data through an internal bus. A first address carried in the first data is located in a first range. In some examples of this application, the first PCIe device directly communicates with the operation accelerator through the shunt circuit in the operation accelerator.

Type: Grant

Filed: March 29, 2021

Date of Patent: August 2, 2022

Assignee: Huawei Technologies Co., Ltd.

Inventors: Chuanning Cheng, Shengyong Peng
Graph-based data flow control system

Patent number: 11392513

Abstract: A graph-based data flow control system includes a control plane system coupled to SCP subsystems. The control plane system identifies a workload, and identifies service(s) on the SCP subsystems for manipulting/exchanging data to perform the workload. The control plane system generates a respective SCP-local data flow control graph for each SCP subsystem that defines how their service(s) will manipulate/exchange data within that SCP subsystem, and generates inter-SCP data flow control graph(s) that define how service(s) provided by at least one SCP subsystem will manipulate/exchange data with service(s) provided by at least one other SCP subsystem. The control plane system then transmits each respective SCP-local data flow control graph to each of the SCP subsystems, and the inter-SCP data flow control graph(s) to at least one SCP subsystem, for use by the SCP subsystems in causing their service(s) to manipulate/exchange data to perform the workload.

Type: Grant

Filed: October 15, 2020

Date of Patent: July 19, 2022

Assignee: Dell Products L.P.

Inventors: Gaurav Chawla, Mark Steven Sanders, Elie Jreij, Jimmy D. Pike, Robert W. Hormuth, William Price Dawkins
High-level synthesis multiprocessor system and the like

Patent number: 11366662

Abstract: A high-level synthesis multiprocessor system enables sophisticated algorithms to be easily realized by almost a smallest circuit. A shared memory is divided into a plurality of banks. The memory banks are connected to processors, respectively. Each processor receives an instruction code and an operand from its connected memory bank. After the operation execution, the processor sends the result to its adjacent processor element to set it as an accumulator value at the time of execution of a next instruction. A software program to be executed is fixed. A processor to execute each instruction in the software program is uniquely identified. Each processor has a function for executing its instruction out of all executable instructions in the multiprocessor system, and does not have a function for executing an instruction that the processor is not to execute. The circuit configuration with unused instructions deleted is provided.

Type: Grant

Filed: August 22, 2018

Date of Patent: June 21, 2022

Assignee: El Amina Inc.

Inventor: Hideki Tanuma
Method and apparatus for stress management in a searchable data service

Patent number: 11354315

Abstract: Method and apparatus for stress management in a searchable data service. The searchable data service may provide a searchable index to a backend data store, and an interface to build and query the searchable index, that enables client applications to search for and retrieve locators for stored entities in the backend data store. Embodiments of the searchable data service may implement a distributed stress management mechanism that may provide functionality including, but not limited to, the automated monitoring of critical resources, analysis of resource usage, and decisions on and performance of actions to keep resource usage within comfort zones. In one embodiment, in response to usage of a particular resource being detected as out of the comfort zone on a node, an action may be performed to transfer at least part of the resource usage for the local resource to another node that provides a similar resource.

Type: Grant

Filed: May 22, 2020

Date of Patent: June 7, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Patrick W. Ransil, Aleksey Martynov, James Larson, James R. Collette, Robert Wai-Chi Chu, Partha Saha
Intelligent computation acceleration transform utility

Patent number: 11354592

Abstract: Systems and methods for intelligent computation acceleration transform to allow applications to be executed by accelerated processing units such as graphic processing units (GPUs) or field programmable gate arrays (FPGAs) are disclosed. In an embodiment, a computational profile is generated for an application based on execution metrics of the application for the CPU and the accelerated processing unit, and a genetic algorithm (GA) prediction model is applied to predict execution speedup on an accelerated processing unit for the application. In an embodiment, upon identification of speedup, computational steps are arbitrated among various processing units according to compute availability to achieve optimal completion time for the compute job.

Type: Grant

Filed: December 20, 2018

Date of Patent: June 7, 2022

Assignee: Morgan Stanley Services Group Inc.

Inventors: Michael A. Dobrovolsky, Kwokhin Chu, Pankaj Parashar
Method and apparatus for efficiently managing offload work between processing units

Patent number: 11321144

Abstract: Apparatus and method for selectively saving and restoring execution state components in an inter-core work offload environment. For example, one embodiment of a processor comprises: a plurality of cores; an interconnect coupling the plurality of cores; and offload circuitry to transfer work from a first core of the plurality of cores to a second core of the plurality of cores without operating system (OS) intervention, wherein the second core is to reach a first execution state upon completing the offload work and to store results in a first memory location or register; the second core comprising: a decoder to decode a first instruction comprising at least one operand to identify one or more components of the first execution state; and execution circuitry to execute the first instruction to save the one or more components of the first execution state to a specified region in memory.

Type: Grant

Filed: June 29, 2019

Date of Patent: May 3, 2022

Assignee: INTEL CORPORATION

Inventor: ElMoustapha Ould-Ahmed-Vall
Sharing instruction encoding space between a coprocessor and auxiliary execution circuitry

Patent number: 11263014

Abstract: Data processing apparatuses, methods of data processing, and non-transitory computer-readable media on which computer-readable code is stored defining logical configurations of processing devices are disclosed. In an apparatus, fetch circuitry retrieves a sequence of instructions and execution circuitry performs data processing operations with respect to data values in a set of registers. An auxiliary execution circuitry interface and a coprocessor interface to provide a connection to a coprocessor outside the apparatus are provided.

Type: Grant

Filed: August 5, 2019

Date of Patent: March 1, 2022

Assignee: Arm Limited

Inventors: Frederic Claude Marie Piry, Thomas Christoper Grocutt, Simon John Craske, Carlo Dario Fanara, Jean Sébastien Leroy
Processing system with a main processor pipeline and a co-processor pipeline

Patent number: 11256516

Abstract: A system comprising a data memory, a first processor with first execution pipeline, and a co-processor with second execution pipeline branching from the first pipeline via an inter-processor interface. The first pipeline can decode instructions from an instruction set comprising first and second instruction subsets. The first subset comprises a load instruction which loads data from the memory into a register file, and a compute instruction of a first type which performs a compute operation on such loaded data. The second subset includes a compute instruction of a second type which does not require a separate load instruction to first load data from memory into a register file, but instead reads data from the memory directly and performs a compute operation on that data, this reading being performed in a pipeline stage of the second pipeline that is aligned with the memory access stage of the first pipeline.

Type: Grant

Filed: December 17, 2018

Date of Patent: February 22, 2022

Assignee: XMOS LTD

Inventors: Henk Lambertus Muller, Peter Hedinger
System, method and computer readable medium for quassical computing

Patent number: 11250341

Abstract: A system comprising a classical computing subsystem to perform classical operations in a three-dimensional (3D) classical space unit using decomposed stopping points along a consecutive sequence of stopping points of sub-cells, along a vector with a shortest path between two points of the 3D classical space unit. The system includes a quantum computing subsystem to perform quantum operations in a 3D quantum space unit using decomposed stopping points along a consecutive sequence of stopping points of sub-cells, along a vector selected to have a shortest path between two points of the 3D quantum space unit. The system includes a control subsystem to decompose classical subproblems and quantum subproblems into the decomposed points and provide computing instructions and state information to the classical computing subsystem to perform the classical operations to the quantum computing subsystem to perform the quantum operations. A method and computer readable medium are provided.

Type: Grant

Filed: September 7, 2018

Date of Patent: February 15, 2022

Assignee: LOCKHEED MARTIN CORPORATION

Inventors: Edward H. Allen, Luke A. Uribarri, Kristen L. Pudenz
Systems and methods for improving computational speed of planning by caching optimization in hypercubes

Patent number: 11170025

Abstract: A system for caching includes an interface to receive a portion of a hypercube to evaluate. The hypercube includes cells with a set of the cells having a formula. The system includes a processor to determine term(s) in the formula for each cell of the set of cells; remove from consideration a time dimension and/or a primary dimension for the term(s) in the formula for each cell of the set of cells; determine a set of distinct terms using the term(s); determine whether a total number of terms in the set of cells is larger than a number of distinct terms in the set of distinct terms; and in response to determining that the total number of terms in the set of cells is larger than the number of distinct terms in the set of distinct terms, indicate to cache the set of distinct terms during evaluation.

Type: Grant

Filed: April 29, 2019

Date of Patent: November 9, 2021

Assignee: Workday, Inc.

Inventors: Ngoc Nguyen, Darren Kermit Lee, Shuyuan Chen, Ritu Jain, Francis Wang
Method and apparatus for enabling autonomous acceleration of dataflow AI applications

Patent number: 11144290

Abstract: A method includes analyzing a dataflow graph representing data dependencies between operators of a dataflow application to identify a plurality of candidate groups of the operators. Based on characteristics of a given hardware accelerator and the operators of a given candidate group of the plurality of candidate groups, determining whether the operators of the given candidate group are to be combined. In response to determining that the operators of the given candidate group are to be combined, retrieving executable binary code segments corresponding to the operators of the given candidate group, generating a unit of binary code including the executable binary code segments and metadata representing an execution control flow among the executable binary code segments, and dispatching the unit of code to the given hardware accelerator for execution of the unit of code.

Type: Grant

Filed: September 13, 2019

Date of Patent: October 12, 2021

Assignee: HUAWEI TECHNOLOGIES CO., LTD.

Inventors: Reza Azimi, Cheng Xiang Feng, Kai-Ting Amy Wang, Yaoqing Gao, Ye Tian, Xiang Wang
Systems and methods to load a tile register pair

Patent number: 11093247

Abstract: Embodiments detailed herein relate to systems and methods to load a tile register pair. In one example, a processor includes: decode circuitry to decode a load matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded load matrix pair instruction to load every element of left and right tiles of the identified destination matrix from corresponding element positions of left and right tiles of the identified source matrix, respectively, wherein the executing operates on one row of the identified destination matrix at a time, starting with the first row.

Type: Grant

Filed: December 29, 2017

Date of Patent: August 17, 2021

Assignee: Intel Corporation

Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
Optimizing data partitioning and replacement strategy for convolutional neural networks

Patent number: 11010308

Abstract: Embodiments of the present disclosure include method for optimizing an internal memory for calculation of a convolutional layer of a convolutional neural network (CNN), the method including determining a computation cost of calculating the convolutional layer using each combination of a memory management scheme of a plurality of memory management schemes and data partition sizes of input feature map (IFM) data, kernel data, and output feature map (OFM) data to be loaded in the internal memory; identifying one combination of a memory management scheme and data partition sizes having a lowest computation cost for the convolutional layer; and implementing the CNN to use the one combination for calculation of the convolutional layer.

Type: Grant

Filed: May 31, 2019

Date of Patent: May 18, 2021

Assignee: LG ELECTRONICS INC.

Inventors: Jaewon Kim, Thi Huong Giang Nguyen
Memory with artificial intelligence mode

Patent number: 11004500

Abstract: Disclosed herein are apparatuses and methods related to an artificial intelligence accelerator in memory. An apparatus can include a number of registers configured to enable the apparatus to operate in an artificial intelligence mode to perform artificial intelligence operations and an artificial intelligence (AI) accelerator configured to perform the artificial intelligence operations using the data stored in the number of memory arrays. The AI accelerator can include hardware, software, and or firmware that is configured to perform operations associated with AI operations. The hardware can include circuitry configured as an adder and/or multiplier to perform operations, such as logic operations, associated with AI operations.

Type: Grant

Filed: August 28, 2019

Date of Patent: May 11, 2021

Assignee: Micron Technology, Inc.

Inventor: Alberto Troia

1 2 3 4 5 … next