Including Coprocessor Patents (Class 712/34)
  • Patent number: 12248395
    Abstract: A data storage device and method are provided for predictable low-latency in a time-sensitive environment. In one embodiment, a data storage device is provided comprising a memory and a controller configured to communicate with the memory. The controller is further configured to: receive, from a host, an indication of a logical block address range that the host will later read; and in response to receiving the indication: read data from the logical block address range; and perform an action on the data to reduce a read latency when the host later reads the logical block address range. Other embodiments are disclosed.
    Type: Grant
    Filed: July 26, 2023
    Date of Patent: March 11, 2025
    Assignee: Sandisk Technologies, Inc.
    Inventors: Devika Nair, Amit Sharma
  • Patent number: 12242653
    Abstract: Systems, apparatuses, and methods related to securing domain crossing using domain access tables are described. For example, a computer processor can have registers configured to store locations of domain access tables respectively for predefined, non-hierarchical domains. Each respective domain access table can be pre-associated with a respective domain and can have entries configured to identify entry points of the respective domain. The processor is configured to enforce domain crossing in instruction execution using the domain access tables and to prevent arbitrary and/or unauthorized domain crossing.
    Type: Grant
    Filed: October 27, 2021
    Date of Patent: March 4, 2025
    Assignee: Micron Technology, Inc.
    Inventor: Steven Jeffrey Wallach
  • Patent number: 12222892
    Abstract: A system, and associated method, includes a plurality of data processing units, a target CPU, an interconnect unit that is separate from the target CPU and configured to receive a data payload and a prefix that includes a sequentially ordered list of the processing units that will perform the data operations and the sets of parameters to be used by each of the processing units, and based on the sequentially ordered list, the interconnect unit sends the data payload to a first processing unit, and receives back processed data, then sends the processed data to the subsequent processing unit, and receives back further processed data, and so forth until all of the data operations have been performed by the processing units set forth in the sequentially ordered list.
    Type: Grant
    Filed: June 16, 2022
    Date of Patent: February 11, 2025
    Assignee: Eidetic Communications Inc.
    Inventors: Sean Gregory Gibb, Saeed Fouladi Fard
  • Patent number: 12223322
    Abstract: A method and apparatus for embedding a microprocessor in a programmable logic device (PLD), where the microprocessor has a logic unit that can operate in two modes. A first mode is a general purpose mode running at least one general purpose process related to the PLD, and a second mode is a fixed function mode emulating a fixed function for use by logic configured into a fabric of the PLD (fabric). A memory unit is coupled to the logic unit and to the fabric, and the fabric is operable for transferring signals with the logic unit in relation to the fixed function.
    Type: Grant
    Filed: June 28, 2022
    Date of Patent: February 11, 2025
    Assignee: Microchip Technology Inc.
    Inventors: Aaron Severance, Jonathan W. Greene, Joel Vandergriendt
  • Patent number: 12204931
    Abstract: Disclosed herein are systems and method for restoring a process. An exemplary method may include detecting a crash of an operating system (OS) on a computing device; collecting a memory state of at least one page of physical memory of the OS on the computing device; generating a checkpoint file that includes information related to one or more processes from the collected memory state, wherein the information comprises a state for each of the one or more processes at a time of the crash; for each respective process of the one or more processes, creating, on the computing device or another computing device, a new process corresponding to the respective process; and restoring, based on the checkpoint file, a state of the respective process at the time of the crash such that the new process initiates execution from the restored state.
    Type: Grant
    Filed: December 8, 2021
    Date of Patent: January 21, 2025
    Assignee: Virtuozzo International GmbH
    Inventor: Vasily Averin
  • Patent number: 12197954
    Abstract: The present technology augments the GPU compute model to provide system-provided data marshalling characteristics of graphics pipelining to increase efficiency and reduce overhead. A simple scheduling model based on scalar counters (e.g., semaphores) abstract the availability of hardware resources. Resource releases can be done programmatically, and a system scheduler only needs to track the states of such counters/semaphores to make work launch decisions. Semantics of the counters/semaphores are defined by an application, which can use the counters/semaphores to represent the availability of free space in a memory buffer, the amount of cache pressure induced by the data flow in the network, or the presence of work items to be processed.
    Type: Grant
    Filed: March 17, 2021
    Date of Patent: January 14, 2025
    Assignee: NVIDIA Corporation
    Inventors: Yury Uralsky, Henry Moreton, Matthijs de Smedt, Lei Yang
  • Patent number: 12197321
    Abstract: A storage device includes a controller and nonvolatile memories. The controller receives write commands having virtual stream identifiers (IDs), receives discard commands having the virtual stream IDs, and determines a lifetime of write data to which each of the virtual stream IDs is assigned. The nonvolatile memories are accessed by the controller depending on physical stream IDs. The controller maps the virtual stream IDs and the physical stream IDs based on the lifetime of the write data.
    Type: Grant
    Filed: December 2, 2022
    Date of Patent: January 14, 2025
    Assignees: Samsung Electronics Co., Ltd., Research & Business Foundation Sungkyunkwan University
    Inventors: Hwanjin Yong, Jin-Soo Kim
  • Patent number: 12174911
    Abstract: An apparatus and method for complex matrix multiplication. For example, one embodiment of a processor comprises: a decoder to decode a first complex matrix multiplication instruction; execution circuitry to execute the first complex matrix multiplication instruction, the execution circuitry comprising parallel multiplication circuitry to multiply real values from the first plurality of real and imaginary values with corresponding real values from the second plurality of real and imaginary values to generate a first plurality of real products, to multiply imaginary values from the first plurality of real and imaginary values with corresponding imaginary values from the second plurality of real and imaginary values to generate a second plurality of real products; and addition/subtraction circuitry to subtract each real product in the second plurality of real products from a corresponding real product in the first plurality of real products to produce a corresponding real value in the result matrix.
    Type: Grant
    Filed: December 23, 2020
    Date of Patent: December 24, 2024
    Assignee: Intel Corporation
    Inventors: Menachem Adelman, Robert Valentine, Daniel Towner, Amit Gradstein, Mark Jay Charney
  • Patent number: 12165030
    Abstract: A system and method include an accelerator circuit comprising an input circuit block, a filter circuit block, a post-processing circuit block, and an output circuit block and a processor to initialize the accelerator circuit, determining tasks of a neural network application to be performed by at least one of the input circuit block, the filter circuit block, the post-processing circuit block, or the output circuit block, assign each of the tasks to a corresponding one of the input circuit block, the filter circuit block, the post-processing circuit block, or the output circuit block, instruct the accelerator circuit to perform the tasks, and execute the neural network application based on results received from the accelerator circuit completing performance of the tasks.
    Type: Grant
    Filed: June 18, 2021
    Date of Patent: December 10, 2024
    Inventors: Mayan Moudgill, John Glossner
  • Patent number: 12160369
    Abstract: A compute device can access local or remote accelerator devices for use in processing a received packet. The received packet can be processed by any combination of local accelerator devices and remote accelerator devices. In some cases, the received packet can be encapsulated in an encapsulating packet and sent to a remote accelerator device for processing. The encapsulating packet can indicate a priority level for processing the received packet and its associated processing task. The priority level can override a priority level that would otherwise be assigned to the received packet and its associated processing task. The remote accelerator device can specify a fullness of an input queue to the compute device. Other information can be conveyed by packets transmitted between and among compute devices and remote accelerator devices to assist in determining an accelerator to use or other uses.
    Type: Grant
    Filed: February 15, 2019
    Date of Patent: December 3, 2024
    Assignee: Intel Corporation
    Inventors: Chih-Jen Chang, Daniel Christian Biederman, Matthew James Webb, Wing Cheung, Jose Niell, Robert Hathaway
  • Patent number: 12147813
    Abstract: A method for handling an exception or interrupt in a heterogeneous instruction set architecture is provided. A physical host to which the method is applied can support two instruction set architectures. When a secondary architecture virtual machine triggers an exception or interrupt, a virtual machine monitor may translate code of the exception or interrupt in a secondary instruction set architecture into code of the exception or interrupt in a primary instruction set architecture. The virtual machine monitor) may identify the code of the exception or interrupt in the primary instruction set architecture. The virtual machine monitor identifies, based on the translated code, a type of the exception or interrupt triggered by the secondary architecture virtual machine, to handle the exception or interrupt.
    Type: Grant
    Filed: December 12, 2022
    Date of Patent: November 19, 2024
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Yifei Jiang, Siqi Zhao, Bo Wan
  • Patent number: 12136411
    Abstract: A technique for training a model is disclosed. A training sample including an input sequence of observations and a target sequence of symbols having length different from the input sequence of observations is obtained. The input sequence of observations is fed into the model to obtain a sequence of predictions. The sequence of predictions is shifted by an amount with respect to the input sequence of observations. The model is updated based on a loss using a shifted sequence of predictions and the target sequence of the symbols.
    Type: Grant
    Filed: April 3, 2020
    Date of Patent: November 5, 2024
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Gakuto Kurata, Kartik Audhkhasi
  • Patent number: 12093801
    Abstract: Systems and methods for providing executable instructions to a neural network processor are provided. In one example, a system comprises a database that stores a plurality of executable instructions and a plurality of subgraph identifiers, each subgraph identifier of the plurality of subgraph identifiers being associated with a subset of instructions of the plurality of executable instructions.
    Type: Grant
    Filed: May 3, 2023
    Date of Patent: September 17, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Richard John Heaton, Randy Renfu Huang, Ron Diamant
  • Patent number: 12013804
    Abstract: An integrated circuit, and a data processing device and method are provided. The integrated circuit includes a processor circuit and an accelerator circuit. The processor circuit includes a processor, a first data storage section, and a first data input/output interface. The accelerator circuit includes an accelerator and a second data input/output interface. The second data input/output interface is electrically connected to the first data input/output interface, so that the accelerator circuit can perform information interaction with the first data storage section.
    Type: Grant
    Filed: May 5, 2022
    Date of Patent: June 18, 2024
    Assignee: Lemon Inc.
    Inventors: Yimin Chen, Shan Lu, Junmou Zhang, Chuang Zhang, Yuanlin Cheng, Jian Wang
  • Patent number: 12001845
    Abstract: An apparatus comprises first instruction execution circuitry, second instruction execution circuitry, and a decoupled access buffer. Instructions of an ordered sequence of instructions are issued to one of the first and second instruction execution circuitry for execution in dependence on whether the instruction has a first type label or a second type label. An instruction with the first type label is an access-related instruction which determines at least one characteristic of a load operation to retrieve a data value from a memory address. Instruction execution by the first instruction execution circuitry of instructions having the first type label is prioritised over instruction execution by the second instruction execution circuitry of instructions having the second type label. Data values retrieved from memory as a result of execution of the first type instructions are stored in the decoupled access buffer.
    Type: Grant
    Filed: October 15, 2020
    Date of Patent: June 4, 2024
    Assignee: Arm Limited
    Inventors: Mbou Eyole, Stefanos Kaxiras
  • Patent number: 11947804
    Abstract: A system includes a hardware circuitry having a device coupled with one or more external memory devices. The device is to detect an input/output (I/O) request associated with an external memory device of the one or more external memory devices. The device is to record a first timestamp in response to detecting the IO request transmitted to the external memory device. The device is further to detect an indication from the external memory device of a completion of the IO request associated with the external memory device and record a second timestamp in response to detecting the indication. The device is also to determine a latency associated with the IO request based on the first timestamp and the second timestamp.
    Type: Grant
    Filed: April 6, 2022
    Date of Patent: April 2, 2024
    Assignee: NVIDIA Corporation
    Inventors: Shridhar Rasal, Oren Duer, Aviv Kfir, Liron Mula
  • Patent number: 11934295
    Abstract: The present disclosure provides for synchronization of multi-core systems by monitoring a plurality of debug trace data streams for a redundantly operating system including a corresponding plurality of cores performing a task in parallel; in response to detecting a state difference on one debug trace data stream of the plurality of debug trace data streams relative to other debug trace data streams of the plurality of debug trace data streams: marking a given core associated with the one debug trace data stream as an affected core; and restarting the affected core.
    Type: Grant
    Filed: November 9, 2021
    Date of Patent: March 19, 2024
    Assignee: THE BOEING COMPANY
    Inventors: David P. Haldeman, Eric J. Miller
  • Patent number: 11928471
    Abstract: Embodiments for a metadata predictor. An index pipeline generates indices in an index buffer in which the indices are used for reading out a memory device. A prediction cache is populated with metadata of instructions read from the memory device. A prediction pipeline generates a prediction using the metadata of the instructions from the prediction cache, the populating of the prediction cache with the metadata of the instructions being performed asynchronously to the operating of the prediction pipeline.
    Type: Grant
    Filed: August 19, 2021
    Date of Patent: March 12, 2024
    Assignee: International Business Machines Corporation
    Inventors: Edward Thomas Malley, Adam Benjamin Collura, Brian Robert Prasky, James Bonanno, Dominic Ditomaso
  • Patent number: 11915015
    Abstract: Systems and methods provide isolated workspaces operating on an IHS (Information Handling System) with use of pre-boot resources of the IHS that are not directly accessible by the workspaces. Upon notification of a workspace initialization, a segregated variable space, such as a segregated memory utilized by a UEFI (Unified Extensible Firmware Interface) of the IHS, is specified for use by the workspace. The segregated variable space is initialized and populated with pre-boot variables, such as UEFI variables, that are allowed for configuration by the workspace. Upon a workspace issuing a request to configure a pre-boot variable, the segregated variable space is identified that was mapped for use by the workspace. The requested pre-boot variable configuration is allowed based on whether the pre-boot variable is populated in the segregated variable space. When the requested pre-boot variable configuration is allowed, the pre-boot variable is configured on behalf of the workspace.
    Type: Grant
    Filed: August 27, 2021
    Date of Patent: February 27, 2024
    Assignee: Dell Products, L.P.
    Inventors: Balasingh P. Samuel, Vivek Viswanathan Iyer
  • Patent number: 11899613
    Abstract: A packaging technology to improve performance of an AI processing system resulting in an ultra-high bandwidth system. An IC package is provided which comprises: a substrate; a first die on the substrate, and a second die stacked over the first die. The first die can be a first logic die (e.g., a compute chip, CPU, GPU, etc.) while the second die can be a compute chiplet comprising ferroelectric or paraelectric logic. Both dies can include ferroelectric or paraelectric logic. The ferroelectric/paraelectric logic may include AND gates, OR gates, complex gates, majority, minority, and/or threshold gates, sequential logic, etc. The IC package can be in a 3D or 2.5D configuration that implements logic-on-logic stacking configuration. The 3D or 2.5D packaging configurations have chips or chiplets designed to have time distributed or spatially distributed processing. The logic of chips or chiplets is segregated so that one chip in a 3D or 2.5D stacking arrangement is hot at a time.
    Type: Grant
    Filed: August 20, 2021
    Date of Patent: February 13, 2024
    Assignee: KEPLER COMPUTING INC.
    Inventors: Amrita Mathuriya, Christopher B. Wilkerson, Rajeev Kumar Dokania, Debo Olaosebikan, Sasikanth Manipatruni
  • Patent number: 11868780
    Abstract: An electronic device that includes a central processor and a coprocessor coupled to the central processor. The central processor includes a plurality of registers and is configured to decode a first set of instructions. The first set of instructions includes a command instruction and an identity of a destination register. The coprocessor is configured to receive the command instruction from the central processor, execute the command instruction, and write a result of the command instruction in the destination register. The central processor is further configured to set a register tag for the destination register at the time the central processor decodes the first set of instructions and to clear the register tag at the time the result is written in the destination register.
    Type: Grant
    Filed: August 26, 2021
    Date of Patent: January 9, 2024
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Christian Wiencke, Armin Stingl, Jeroen Vliegen
  • Patent number: 11809869
    Abstract: Embodiments detailed herein relate to systems and methods to store a tile register pair to memory. In one example, a processor includes: decode circuitry to decode a store matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded store matrix pair instruction to store every element of left and right tiles of the identified source matrix to corresponding element positions of left and right tiles of the identified destination matrix, respectively, wherein the executing stores a chunk of C elements of one row of the identified source matrix at a time.
    Type: Grant
    Filed: December 29, 2017
    Date of Patent: November 7, 2023
    Assignee: Intel Corporation
    Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
  • Patent number: 11797473
    Abstract: An accelerated processor structure on a programmable integrated circuit device includes a processor and a plurality of configurable digital signal processors (DSPs). Each configurable DSP includes a circuit block, which in turn includes a plurality of multipliers. The accelerated processor structure further includes a first bus to transfer data from the processor to the configurable DSPs, and a second bus to transfer data from the configurable DSPs to the processor.
    Type: Grant
    Filed: October 8, 2018
    Date of Patent: October 24, 2023
    Assignee: Altera Corporation
    Inventors: David Shippy, Martin Langhammer, Jeffrey Eastlack
  • Patent number: 11797856
    Abstract: Presented herein are framework embodiments that allow the representation of complex systems and processes that are suitable for resource efficient machine learning and inference. Furthermore, disclosed are new reinforcement learning techniques that are capable of learning to plan and optimize dynamic and nuanced systems and processes. Different embodiments comprising combinations of one or more neural networks, reinforcement learning, and linear programming are discussed to learn representations and models—even for complex systems and methods. Furthermore, the introduction of neural field embodiments and methods to compute a Deep Argmax, as well to invert neural networks and neural fields with linear programming, provide the ability to create models and train models that are accurate and very resource efficient—using less memory, less computations, less time, and, as a result, less energy.
    Type: Grant
    Filed: June 11, 2020
    Date of Patent: October 24, 2023
    Assignee: System AI, Inc.
    Inventor: Tuna Oezer
  • Patent number: 11782722
    Abstract: A complex computing device, a complex computing method, an artificial intelligence chip and an electronic apparatus are provided. An input interface receives complex computing instructions and arbitrates each complex computing instruction to a corresponding computing component respectively, according to the computing types in the respective complex computing instructions Each computing component is connected to the input interface, acquires a source operand from a complex computing instruction to perform complex computing, and generates a computing result instruction to feed back to an output interface. The output interface arbitrates the computing result in each computing result instruction to the corresponding instruction source respectively, according to the instruction source identifier in each computing result instruction.
    Type: Grant
    Filed: January 14, 2021
    Date of Patent: October 10, 2023
    Assignees: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD., KUNLUNXIN TECHNOLOGY (BEIJING) COMPANY LIMITED
    Inventors: Baofu Zhao, Xueliang Du, Kang An, Yingnan Xu, Chao Tang
  • Patent number: 11768689
    Abstract: The present application discloses a computing device that can provide a low-power, highly capable computing platform for computational imaging. The computing device can include one or more processing units, for example one or more vector processors and one or more hardware accelerators, an intelligent memory fabric, a peripheral device, and a power management module. The computing device can communicate with external devices, such as one or more image sensors, an accelerometer, a gyroscope, or any other suitable sensor devices.
    Type: Grant
    Filed: November 12, 2021
    Date of Patent: September 26, 2023
    Assignee: Movidius Limited
    Inventors: Brendan Barry, Richard Richmond, Fergal Connor, David Moloney
  • Patent number: 11726701
    Abstract: A memory expander includes a memory device that stores a plurality of task data. A controller controls the memory device. The controller receives metadata and a management request from an external central processing unit (CPU) through a compute express link (CXL) interface and operates in a management mode in response to the management request. In the management mode, the controller receives a read request and a first address from an accelerator through the CXL interface and transmits one of the plurality of task data to the accelerator based on the metadata in response to the read request.
    Type: Grant
    Filed: October 25, 2021
    Date of Patent: August 15, 2023
    Inventors: Chon Yong Lee, Jae-Gon Lee, Kyunghan Lee
  • Patent number: 11720475
    Abstract: Disclosed in some examples are methods, systems, devices, and machine-readable mediums that use parallel hardware execution with software co-simulation to enable more advanced debugging operations on data flow architectures. Upon a halt to execution of a program thread, a state of the tiles that are executing the thread are saved and offloaded from the HTF to a host system. A developer may then examine this state on the host system to debug their program. Additionally, the state may be loaded into a software simulator that simulates the HTF hardware. This simulator allows for the developer to step through the code and to examine values to find bugs.
    Type: Grant
    Filed: November 21, 2022
    Date of Patent: August 8, 2023
    Assignee: Micron Technology, Inc.
    Inventors: Skyler Arron Windh, Tony M. Brewer, Patrick Estep
  • Patent number: 11714992
    Abstract: Systems and methods for providing executable instructions to a neural network processor are provided. In one example, a system comprises a database that stores a plurality of executable instructions and a plurality of subgraph identifiers, each subgraph identifier of the plurality of subgraph identifiers being associated with a subset of instructions of the plurality of executable instructions.
    Type: Grant
    Filed: December 13, 2018
    Date of Patent: August 1, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Richard John Heaton, Randy Renfu Huang, Ron Diamant
  • Patent number: 11714649
    Abstract: An RISC-V-based 3D interconnected multi-core processor architecture and a working method thereof. The RISC-V-based 3D interconnected multi-core processor architecture includes a main control layer, a micro core array layer and an accelerator layer, wherein the main control layer includes a plurality of main cores which are RISC-V instruction set CPU cores, the micro core array layer includes a plurality of micro unit groups including a micro core, a data storage unit, an instruction storage unit and a linking controller, wherein the micro core is an RISC-V instruction set CPU core that executes partial functions of the main core; the accelerator layer is configured to optimize a running speed of space utilization for accelerators meeting specific requirements, wherein some main cores in the main control layer perform data interaction with the accelerator layer, the other main cores interact with the micro core array layer.
    Type: Grant
    Filed: December 1, 2021
    Date of Patent: August 1, 2023
    Assignee: SHANDONG LINGNENG ELECTRONIC TECHNOLOGY CO., LTD.
    Inventors: Gang Wang, Jinzheng Mou, Yang An, Moujun Xie, Benyang Wu, Zesheng Zhang, Wenyong Hou, Yongwei Wang, Zixuan Qiu, Xintan Li
  • Patent number: 11682109
    Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for configurable aprons for expanded binning. Aspects of the present disclosure include identifying one or more pixel tiles in at least one bin and determining edge information for each pixel tile of the one or more pixel tiles. The edge information may be associated with one or more pixels adjacent to each pixel tile. The present disclosure further describes determining whether at least one adjacent bin is visible based on the edge information for each pixel tile, where the at least one adjacent bin may be adjacent to the at least one bin.
    Type: Grant
    Filed: October 16, 2020
    Date of Patent: June 20, 2023
    Assignee: QUALCOMM Incorporated
    Inventors: Kalyan Kumar Bhiravabhatla, Krishnaiah Gummidipudi, Ankit Kumar Singh, Andrew Evan Gruber, Pavan Kumar Akkaraju, Srihari Babu Alla, Jonnala Gadda Nagendra Kumar, Vishwanath Shashikant Nikam
  • Patent number: 11663001
    Abstract: Systems, apparatuses, and methods for implementing a family of lossy sparse load single instruction, multiple data (SIMD) instructions are disclosed. A lossy sparse load unit (LSLU) loads a plurality of values from one or more input vector operands and determines how many non-zero values are included in one or more input vector operands of a given instruction. If the one or more input vector operands have less than a threshold number of non-zero values, then the LSLU causes an instruction for processing the one or more input vector operands to be skipped. In this case, the processing of the instruction of the one or more input vector operands is deemed to be redundant. If the one or more input vector operands have greater than or equal to the threshold number of non-zero values, then the LSLU causes an instruction for processing the input vector operand(s) to be executed.
    Type: Grant
    Filed: November 19, 2018
    Date of Patent: May 30, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Sanchari Sen, Derrick Allen Aguren, Joseph Lee Greathouse
  • Patent number: 11507493
    Abstract: Disclosed in some examples are methods, systems, devices, and machine-readable mediums that use parallel hardware execution with software co-simulation to enable more advanced debugging operations on data flow architectures. Upon a halt to execution of a program thread, a state of the tiles that are executing the thread are saved and offloaded from the HTF to a host system. A developer may then examine this state on the host system to debug their program. Additionally, the state may be loaded into a software simulator that simulates the HTF hardware. This simulator allows for the developer to step through the code and to examine values to find bugs.
    Type: Grant
    Filed: August 18, 2021
    Date of Patent: November 22, 2022
    Assignee: Micron Technology, Inc.
    Inventors: Skyler Arron Windh, Tony M. Brewer, Patrick Estep
  • Patent number: 11429855
    Abstract: A method for accelerating a neural network includes identifying neural network layers that meet a locality constraint. Code is generated to implement depth-first processing for different hardware based on the identified neural network layers. The generated code is used to perform the depth-first processing on the neural network based on the generated code.
    Type: Grant
    Filed: February 6, 2018
    Date of Patent: August 30, 2022
    Assignee: NEC CORPORATION
    Inventors: Nicolas Weber, Felipe Huici, Mathias Niepert
  • Patent number: 11422815
    Abstract: Binary translation may be performed by a field programmable gate array (FPGA) integrated with a processor as a single integrated circuit. The FPGA contains multiple blocks of logic for performing different binary translations. The processor may offload the binary translation to the FPGA. The FPGA may use historical logging to skip the binary translation of source instructions that have been previously translated into target instructions.
    Type: Grant
    Filed: March 1, 2018
    Date of Patent: August 23, 2022
    Assignee: Dell Products L.P.
    Inventors: Mukund P. Khatri, Ramesh Radhakrishnan
  • Patent number: 11403250
    Abstract: Examples in this application disclose an operation accelerator, a switch, and a processing system. One example operation accelerator includes a shunt circuit directly connected to a first peripheral component interconnect express (PCIe) device through a PCIe link. The shunt circuit is configured to receive first data sent by the first PCIe device through the PCIe link, and transmit the first data through an internal bus. A first address carried in the first data is located in a first range. In some examples of this application, the first PCIe device directly communicates with the operation accelerator through the shunt circuit in the operation accelerator.
    Type: Grant
    Filed: March 29, 2021
    Date of Patent: August 2, 2022
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Chuanning Cheng, Shengyong Peng
  • Patent number: 11392513
    Abstract: A graph-based data flow control system includes a control plane system coupled to SCP subsystems. The control plane system identifies a workload, and identifies service(s) on the SCP subsystems for manipulting/exchanging data to perform the workload. The control plane system generates a respective SCP-local data flow control graph for each SCP subsystem that defines how their service(s) will manipulate/exchange data within that SCP subsystem, and generates inter-SCP data flow control graph(s) that define how service(s) provided by at least one SCP subsystem will manipulate/exchange data with service(s) provided by at least one other SCP subsystem. The control plane system then transmits each respective SCP-local data flow control graph to each of the SCP subsystems, and the inter-SCP data flow control graph(s) to at least one SCP subsystem, for use by the SCP subsystems in causing their service(s) to manipulate/exchange data to perform the workload.
    Type: Grant
    Filed: October 15, 2020
    Date of Patent: July 19, 2022
    Assignee: Dell Products L.P.
    Inventors: Gaurav Chawla, Mark Steven Sanders, Elie Jreij, Jimmy D. Pike, Robert W. Hormuth, William Price Dawkins
  • Patent number: 11366662
    Abstract: A high-level synthesis multiprocessor system enables sophisticated algorithms to be easily realized by almost a smallest circuit. A shared memory is divided into a plurality of banks. The memory banks are connected to processors, respectively. Each processor receives an instruction code and an operand from its connected memory bank. After the operation execution, the processor sends the result to its adjacent processor element to set it as an accumulator value at the time of execution of a next instruction. A software program to be executed is fixed. A processor to execute each instruction in the software program is uniquely identified. Each processor has a function for executing its instruction out of all executable instructions in the multiprocessor system, and does not have a function for executing an instruction that the processor is not to execute. The circuit configuration with unused instructions deleted is provided.
    Type: Grant
    Filed: August 22, 2018
    Date of Patent: June 21, 2022
    Assignee: El Amina Inc.
    Inventor: Hideki Tanuma
  • Patent number: 11354592
    Abstract: Systems and methods for intelligent computation acceleration transform to allow applications to be executed by accelerated processing units such as graphic processing units (GPUs) or field programmable gate arrays (FPGAs) are disclosed. In an embodiment, a computational profile is generated for an application based on execution metrics of the application for the CPU and the accelerated processing unit, and a genetic algorithm (GA) prediction model is applied to predict execution speedup on an accelerated processing unit for the application. In an embodiment, upon identification of speedup, computational steps are arbitrated among various processing units according to compute availability to achieve optimal completion time for the compute job.
    Type: Grant
    Filed: December 20, 2018
    Date of Patent: June 7, 2022
    Assignee: Morgan Stanley Services Group Inc.
    Inventors: Michael A. Dobrovolsky, Kwokhin Chu, Pankaj Parashar
  • Patent number: 11354315
    Abstract: Method and apparatus for stress management in a searchable data service. The searchable data service may provide a searchable index to a backend data store, and an interface to build and query the searchable index, that enables client applications to search for and retrieve locators for stored entities in the backend data store. Embodiments of the searchable data service may implement a distributed stress management mechanism that may provide functionality including, but not limited to, the automated monitoring of critical resources, analysis of resource usage, and decisions on and performance of actions to keep resource usage within comfort zones. In one embodiment, in response to usage of a particular resource being detected as out of the comfort zone on a node, an action may be performed to transfer at least part of the resource usage for the local resource to another node that provides a similar resource.
    Type: Grant
    Filed: May 22, 2020
    Date of Patent: June 7, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Patrick W. Ransil, Aleksey Martynov, James Larson, James R. Collette, Robert Wai-Chi Chu, Partha Saha
  • Patent number: 11321144
    Abstract: Apparatus and method for selectively saving and restoring execution state components in an inter-core work offload environment. For example, one embodiment of a processor comprises: a plurality of cores; an interconnect coupling the plurality of cores; and offload circuitry to transfer work from a first core of the plurality of cores to a second core of the plurality of cores without operating system (OS) intervention, wherein the second core is to reach a first execution state upon completing the offload work and to store results in a first memory location or register; the second core comprising: a decoder to decode a first instruction comprising at least one operand to identify one or more components of the first execution state; and execution circuitry to execute the first instruction to save the one or more components of the first execution state to a specified region in memory.
    Type: Grant
    Filed: June 29, 2019
    Date of Patent: May 3, 2022
    Assignee: INTEL CORPORATION
    Inventor: ElMoustapha Ould-Ahmed-Vall
  • Patent number: 11263014
    Abstract: Data processing apparatuses, methods of data processing, and non-transitory computer-readable media on which computer-readable code is stored defining logical configurations of processing devices are disclosed. In an apparatus, fetch circuitry retrieves a sequence of instructions and execution circuitry performs data processing operations with respect to data values in a set of registers. An auxiliary execution circuitry interface and a coprocessor interface to provide a connection to a coprocessor outside the apparatus are provided.
    Type: Grant
    Filed: August 5, 2019
    Date of Patent: March 1, 2022
    Assignee: Arm Limited
    Inventors: Frederic Claude Marie Piry, Thomas Christoper Grocutt, Simon John Craske, Carlo Dario Fanara, Jean Sébastien Leroy
  • Patent number: 11256516
    Abstract: A system comprising a data memory, a first processor with first execution pipeline, and a co-processor with second execution pipeline branching from the first pipeline via an inter-processor interface. The first pipeline can decode instructions from an instruction set comprising first and second instruction subsets. The first subset comprises a load instruction which loads data from the memory into a register file, and a compute instruction of a first type which performs a compute operation on such loaded data. The second subset includes a compute instruction of a second type which does not require a separate load instruction to first load data from memory into a register file, but instead reads data from the memory directly and performs a compute operation on that data, this reading being performed in a pipeline stage of the second pipeline that is aligned with the memory access stage of the first pipeline.
    Type: Grant
    Filed: December 17, 2018
    Date of Patent: February 22, 2022
    Assignee: XMOS LTD
    Inventors: Henk Lambertus Muller, Peter Hedinger
  • Patent number: 11250341
    Abstract: A system comprising a classical computing subsystem to perform classical operations in a three-dimensional (3D) classical space unit using decomposed stopping points along a consecutive sequence of stopping points of sub-cells, along a vector with a shortest path between two points of the 3D classical space unit. The system includes a quantum computing subsystem to perform quantum operations in a 3D quantum space unit using decomposed stopping points along a consecutive sequence of stopping points of sub-cells, along a vector selected to have a shortest path between two points of the 3D quantum space unit. The system includes a control subsystem to decompose classical subproblems and quantum subproblems into the decomposed points and provide computing instructions and state information to the classical computing subsystem to perform the classical operations to the quantum computing subsystem to perform the quantum operations. A method and computer readable medium are provided.
    Type: Grant
    Filed: September 7, 2018
    Date of Patent: February 15, 2022
    Assignee: LOCKHEED MARTIN CORPORATION
    Inventors: Edward H. Allen, Luke A. Uribarri, Kristen L. Pudenz
  • Patent number: 11170025
    Abstract: A system for caching includes an interface to receive a portion of a hypercube to evaluate. The hypercube includes cells with a set of the cells having a formula. The system includes a processor to determine term(s) in the formula for each cell of the set of cells; remove from consideration a time dimension and/or a primary dimension for the term(s) in the formula for each cell of the set of cells; determine a set of distinct terms using the term(s); determine whether a total number of terms in the set of cells is larger than a number of distinct terms in the set of distinct terms; and in response to determining that the total number of terms in the set of cells is larger than the number of distinct terms in the set of distinct terms, indicate to cache the set of distinct terms during evaluation.
    Type: Grant
    Filed: April 29, 2019
    Date of Patent: November 9, 2021
    Assignee: Workday, Inc.
    Inventors: Ngoc Nguyen, Darren Kermit Lee, Shuyuan Chen, Ritu Jain, Francis Wang
  • Patent number: 11144290
    Abstract: A method includes analyzing a dataflow graph representing data dependencies between operators of a dataflow application to identify a plurality of candidate groups of the operators. Based on characteristics of a given hardware accelerator and the operators of a given candidate group of the plurality of candidate groups, determining whether the operators of the given candidate group are to be combined. In response to determining that the operators of the given candidate group are to be combined, retrieving executable binary code segments corresponding to the operators of the given candidate group, generating a unit of binary code including the executable binary code segments and metadata representing an execution control flow among the executable binary code segments, and dispatching the unit of code to the given hardware accelerator for execution of the unit of code.
    Type: Grant
    Filed: September 13, 2019
    Date of Patent: October 12, 2021
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Reza Azimi, Cheng Xiang Feng, Kai-Ting Amy Wang, Yaoqing Gao, Ye Tian, Xiang Wang
  • Patent number: 11093247
    Abstract: Embodiments detailed herein relate to systems and methods to load a tile register pair. In one example, a processor includes: decode circuitry to decode a load matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded load matrix pair instruction to load every element of left and right tiles of the identified destination matrix from corresponding element positions of left and right tiles of the identified source matrix, respectively, wherein the executing operates on one row of the identified destination matrix at a time, starting with the first row.
    Type: Grant
    Filed: December 29, 2017
    Date of Patent: August 17, 2021
    Assignee: Intel Corporation
    Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
  • Patent number: 11010308
    Abstract: Embodiments of the present disclosure include method for optimizing an internal memory for calculation of a convolutional layer of a convolutional neural network (CNN), the method including determining a computation cost of calculating the convolutional layer using each combination of a memory management scheme of a plurality of memory management schemes and data partition sizes of input feature map (IFM) data, kernel data, and output feature map (OFM) data to be loaded in the internal memory; identifying one combination of a memory management scheme and data partition sizes having a lowest computation cost for the convolutional layer; and implementing the CNN to use the one combination for calculation of the convolutional layer.
    Type: Grant
    Filed: May 31, 2019
    Date of Patent: May 18, 2021
    Assignee: LG ELECTRONICS INC.
    Inventors: Jaewon Kim, Thi Huong Giang Nguyen
  • Patent number: 11004500
    Abstract: Disclosed herein are apparatuses and methods related to an artificial intelligence accelerator in memory. An apparatus can include a number of registers configured to enable the apparatus to operate in an artificial intelligence mode to perform artificial intelligence operations and an artificial intelligence (AI) accelerator configured to perform the artificial intelligence operations using the data stored in the number of memory arrays. The AI accelerator can include hardware, software, and or firmware that is configured to perform operations associated with AI operations. The hardware can include circuitry configured as an adder and/or multiplier to perform operations, such as logic operations, associated with AI operations.
    Type: Grant
    Filed: August 28, 2019
    Date of Patent: May 11, 2021
    Assignee: Micron Technology, Inc.
    Inventor: Alberto Troia
  • Patent number: 10990398
    Abstract: Techniques related to executing a plurality of instructions by a processor comprising receiving a first instruction for execution on an instruction execution pipeline, beginning execution of the first instruction, receiving one or more second instructions for execution on the instruction execution pipeline, the one or more second instructions associated with a higher priority task than the first instruction, storing a register state associated with the execution of the first instruction in one or more registers of a capture queue associated with the instruction execution pipeline, copying the register state from the capture queue to a memory, determining that the one or more second instructions have been executed, copying the register state from the memory to the one or more registers of the capture queue, and restoring the register state to the instruction execution pipeline from the capture queue.
    Type: Grant
    Filed: April 15, 2019
    Date of Patent: April 27, 2021
    Assignee: Texas Instruments Incorporated
    Inventors: Timothy D. Anderson, Joseph Zbiciak, Kai Chirca