Including Coprocessor Patents (Class 712/34)
  • Patent number: 11947804
    Abstract: A system includes a hardware circuitry having a device coupled with one or more external memory devices. The device is to detect an input/output (I/O) request associated with an external memory device of the one or more external memory devices. The device is to record a first timestamp in response to detecting the IO request transmitted to the external memory device. The device is further to detect an indication from the external memory device of a completion of the IO request associated with the external memory device and record a second timestamp in response to detecting the indication. The device is also to determine a latency associated with the IO request based on the first timestamp and the second timestamp.
    Type: Grant
    Filed: April 6, 2022
    Date of Patent: April 2, 2024
    Assignee: NVIDIA Corporation
    Inventors: Shridhar Rasal, Oren Duer, Aviv Kfir, Liron Mula
  • Patent number: 11934295
    Abstract: The present disclosure provides for synchronization of multi-core systems by monitoring a plurality of debug trace data streams for a redundantly operating system including a corresponding plurality of cores performing a task in parallel; in response to detecting a state difference on one debug trace data stream of the plurality of debug trace data streams relative to other debug trace data streams of the plurality of debug trace data streams: marking a given core associated with the one debug trace data stream as an affected core; and restarting the affected core.
    Type: Grant
    Filed: November 9, 2021
    Date of Patent: March 19, 2024
    Assignee: THE BOEING COMPANY
    Inventors: David P. Haldeman, Eric J. Miller
  • Patent number: 11928471
    Abstract: Embodiments for a metadata predictor. An index pipeline generates indices in an index buffer in which the indices are used for reading out a memory device. A prediction cache is populated with metadata of instructions read from the memory device. A prediction pipeline generates a prediction using the metadata of the instructions from the prediction cache, the populating of the prediction cache with the metadata of the instructions being performed asynchronously to the operating of the prediction pipeline.
    Type: Grant
    Filed: August 19, 2021
    Date of Patent: March 12, 2024
    Assignee: International Business Machines Corporation
    Inventors: Edward Thomas Malley, Adam Benjamin Collura, Brian Robert Prasky, James Bonanno, Dominic Ditomaso
  • Patent number: 11915015
    Abstract: Systems and methods provide isolated workspaces operating on an IHS (Information Handling System) with use of pre-boot resources of the IHS that are not directly accessible by the workspaces. Upon notification of a workspace initialization, a segregated variable space, such as a segregated memory utilized by a UEFI (Unified Extensible Firmware Interface) of the IHS, is specified for use by the workspace. The segregated variable space is initialized and populated with pre-boot variables, such as UEFI variables, that are allowed for configuration by the workspace. Upon a workspace issuing a request to configure a pre-boot variable, the segregated variable space is identified that was mapped for use by the workspace. The requested pre-boot variable configuration is allowed based on whether the pre-boot variable is populated in the segregated variable space. When the requested pre-boot variable configuration is allowed, the pre-boot variable is configured on behalf of the workspace.
    Type: Grant
    Filed: August 27, 2021
    Date of Patent: February 27, 2024
    Assignee: Dell Products, L.P.
    Inventors: Balasingh P. Samuel, Vivek Viswanathan Iyer
  • Patent number: 11899613
    Abstract: A packaging technology to improve performance of an AI processing system resulting in an ultra-high bandwidth system. An IC package is provided which comprises: a substrate; a first die on the substrate, and a second die stacked over the first die. The first die can be a first logic die (e.g., a compute chip, CPU, GPU, etc.) while the second die can be a compute chiplet comprising ferroelectric or paraelectric logic. Both dies can include ferroelectric or paraelectric logic. The ferroelectric/paraelectric logic may include AND gates, OR gates, complex gates, majority, minority, and/or threshold gates, sequential logic, etc. The IC package can be in a 3D or 2.5D configuration that implements logic-on-logic stacking configuration. The 3D or 2.5D packaging configurations have chips or chiplets designed to have time distributed or spatially distributed processing. The logic of chips or chiplets is segregated so that one chip in a 3D or 2.5D stacking arrangement is hot at a time.
    Type: Grant
    Filed: August 20, 2021
    Date of Patent: February 13, 2024
    Assignee: KEPLER COMPUTING INC.
    Inventors: Amrita Mathuriya, Christopher B. Wilkerson, Rajeev Kumar Dokania, Debo Olaosebikan, Sasikanth Manipatruni
  • Patent number: 11868780
    Abstract: An electronic device that includes a central processor and a coprocessor coupled to the central processor. The central processor includes a plurality of registers and is configured to decode a first set of instructions. The first set of instructions includes a command instruction and an identity of a destination register. The coprocessor is configured to receive the command instruction from the central processor, execute the command instruction, and write a result of the command instruction in the destination register. The central processor is further configured to set a register tag for the destination register at the time the central processor decodes the first set of instructions and to clear the register tag at the time the result is written in the destination register.
    Type: Grant
    Filed: August 26, 2021
    Date of Patent: January 9, 2024
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Christian Wiencke, Armin Stingl, Jeroen Vliegen
  • Patent number: 11809869
    Abstract: Embodiments detailed herein relate to systems and methods to store a tile register pair to memory. In one example, a processor includes: decode circuitry to decode a store matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded store matrix pair instruction to store every element of left and right tiles of the identified source matrix to corresponding element positions of left and right tiles of the identified destination matrix, respectively, wherein the executing stores a chunk of C elements of one row of the identified source matrix at a time.
    Type: Grant
    Filed: December 29, 2017
    Date of Patent: November 7, 2023
    Assignee: Intel Corporation
    Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
  • Patent number: 11797856
    Abstract: Presented herein are framework embodiments that allow the representation of complex systems and processes that are suitable for resource efficient machine learning and inference. Furthermore, disclosed are new reinforcement learning techniques that are capable of learning to plan and optimize dynamic and nuanced systems and processes. Different embodiments comprising combinations of one or more neural networks, reinforcement learning, and linear programming are discussed to learn representations and models—even for complex systems and methods. Furthermore, the introduction of neural field embodiments and methods to compute a Deep Argmax, as well to invert neural networks and neural fields with linear programming, provide the ability to create models and train models that are accurate and very resource efficient—using less memory, less computations, less time, and, as a result, less energy.
    Type: Grant
    Filed: June 11, 2020
    Date of Patent: October 24, 2023
    Assignee: System AI, Inc.
    Inventor: Tuna Oezer
  • Patent number: 11797473
    Abstract: An accelerated processor structure on a programmable integrated circuit device includes a processor and a plurality of configurable digital signal processors (DSPs). Each configurable DSP includes a circuit block, which in turn includes a plurality of multipliers. The accelerated processor structure further includes a first bus to transfer data from the processor to the configurable DSPs, and a second bus to transfer data from the configurable DSPs to the processor.
    Type: Grant
    Filed: October 8, 2018
    Date of Patent: October 24, 2023
    Assignee: Altera Corporation
    Inventors: David Shippy, Martin Langhammer, Jeffrey Eastlack
  • Patent number: 11782722
    Abstract: A complex computing device, a complex computing method, an artificial intelligence chip and an electronic apparatus are provided. An input interface receives complex computing instructions and arbitrates each complex computing instruction to a corresponding computing component respectively, according to the computing types in the respective complex computing instructions Each computing component is connected to the input interface, acquires a source operand from a complex computing instruction to perform complex computing, and generates a computing result instruction to feed back to an output interface. The output interface arbitrates the computing result in each computing result instruction to the corresponding instruction source respectively, according to the instruction source identifier in each computing result instruction.
    Type: Grant
    Filed: January 14, 2021
    Date of Patent: October 10, 2023
    Assignees: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD., KUNLUNXIN TECHNOLOGY (BEIJING) COMPANY LIMITED
    Inventors: Baofu Zhao, Xueliang Du, Kang An, Yingnan Xu, Chao Tang
  • Patent number: 11768689
    Abstract: The present application discloses a computing device that can provide a low-power, highly capable computing platform for computational imaging. The computing device can include one or more processing units, for example one or more vector processors and one or more hardware accelerators, an intelligent memory fabric, a peripheral device, and a power management module. The computing device can communicate with external devices, such as one or more image sensors, an accelerometer, a gyroscope, or any other suitable sensor devices.
    Type: Grant
    Filed: November 12, 2021
    Date of Patent: September 26, 2023
    Assignee: Movidius Limited
    Inventors: Brendan Barry, Richard Richmond, Fergal Connor, David Moloney
  • Patent number: 11726701
    Abstract: A memory expander includes a memory device that stores a plurality of task data. A controller controls the memory device. The controller receives metadata and a management request from an external central processing unit (CPU) through a compute express link (CXL) interface and operates in a management mode in response to the management request. In the management mode, the controller receives a read request and a first address from an accelerator through the CXL interface and transmits one of the plurality of task data to the accelerator based on the metadata in response to the read request.
    Type: Grant
    Filed: October 25, 2021
    Date of Patent: August 15, 2023
    Inventors: Chon Yong Lee, Jae-Gon Lee, Kyunghan Lee
  • Patent number: 11720475
    Abstract: Disclosed in some examples are methods, systems, devices, and machine-readable mediums that use parallel hardware execution with software co-simulation to enable more advanced debugging operations on data flow architectures. Upon a halt to execution of a program thread, a state of the tiles that are executing the thread are saved and offloaded from the HTF to a host system. A developer may then examine this state on the host system to debug their program. Additionally, the state may be loaded into a software simulator that simulates the HTF hardware. This simulator allows for the developer to step through the code and to examine values to find bugs.
    Type: Grant
    Filed: November 21, 2022
    Date of Patent: August 8, 2023
    Assignee: Micron Technology, Inc.
    Inventors: Skyler Arron Windh, Tony M. Brewer, Patrick Estep
  • Patent number: 11714992
    Abstract: Systems and methods for providing executable instructions to a neural network processor are provided. In one example, a system comprises a database that stores a plurality of executable instructions and a plurality of subgraph identifiers, each subgraph identifier of the plurality of subgraph identifiers being associated with a subset of instructions of the plurality of executable instructions.
    Type: Grant
    Filed: December 13, 2018
    Date of Patent: August 1, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Richard John Heaton, Randy Renfu Huang, Ron Diamant
  • Patent number: 11714649
    Abstract: An RISC-V-based 3D interconnected multi-core processor architecture and a working method thereof. The RISC-V-based 3D interconnected multi-core processor architecture includes a main control layer, a micro core array layer and an accelerator layer, wherein the main control layer includes a plurality of main cores which are RISC-V instruction set CPU cores, the micro core array layer includes a plurality of micro unit groups including a micro core, a data storage unit, an instruction storage unit and a linking controller, wherein the micro core is an RISC-V instruction set CPU core that executes partial functions of the main core; the accelerator layer is configured to optimize a running speed of space utilization for accelerators meeting specific requirements, wherein some main cores in the main control layer perform data interaction with the accelerator layer, the other main cores interact with the micro core array layer.
    Type: Grant
    Filed: December 1, 2021
    Date of Patent: August 1, 2023
    Assignee: SHANDONG LINGNENG ELECTRONIC TECHNOLOGY CO., LTD.
    Inventors: Gang Wang, Jinzheng Mou, Yang An, Moujun Xie, Benyang Wu, Zesheng Zhang, Wenyong Hou, Yongwei Wang, Zixuan Qiu, Xintan Li
  • Patent number: 11682109
    Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for configurable aprons for expanded binning. Aspects of the present disclosure include identifying one or more pixel tiles in at least one bin and determining edge information for each pixel tile of the one or more pixel tiles. The edge information may be associated with one or more pixels adjacent to each pixel tile. The present disclosure further describes determining whether at least one adjacent bin is visible based on the edge information for each pixel tile, where the at least one adjacent bin may be adjacent to the at least one bin.
    Type: Grant
    Filed: October 16, 2020
    Date of Patent: June 20, 2023
    Assignee: QUALCOMM Incorporated
    Inventors: Kalyan Kumar Bhiravabhatla, Krishnaiah Gummidipudi, Ankit Kumar Singh, Andrew Evan Gruber, Pavan Kumar Akkaraju, Srihari Babu Alla, Jonnala Gadda Nagendra Kumar, Vishwanath Shashikant Nikam
  • Patent number: 11663001
    Abstract: Systems, apparatuses, and methods for implementing a family of lossy sparse load single instruction, multiple data (SIMD) instructions are disclosed. A lossy sparse load unit (LSLU) loads a plurality of values from one or more input vector operands and determines how many non-zero values are included in one or more input vector operands of a given instruction. If the one or more input vector operands have less than a threshold number of non-zero values, then the LSLU causes an instruction for processing the one or more input vector operands to be skipped. In this case, the processing of the instruction of the one or more input vector operands is deemed to be redundant. If the one or more input vector operands have greater than or equal to the threshold number of non-zero values, then the LSLU causes an instruction for processing the input vector operand(s) to be executed.
    Type: Grant
    Filed: November 19, 2018
    Date of Patent: May 30, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Sanchari Sen, Derrick Allen Aguren, Joseph Lee Greathouse
  • Patent number: 11507493
    Abstract: Disclosed in some examples are methods, systems, devices, and machine-readable mediums that use parallel hardware execution with software co-simulation to enable more advanced debugging operations on data flow architectures. Upon a halt to execution of a program thread, a state of the tiles that are executing the thread are saved and offloaded from the HTF to a host system. A developer may then examine this state on the host system to debug their program. Additionally, the state may be loaded into a software simulator that simulates the HTF hardware. This simulator allows for the developer to step through the code and to examine values to find bugs.
    Type: Grant
    Filed: August 18, 2021
    Date of Patent: November 22, 2022
    Assignee: Micron Technology, Inc.
    Inventors: Skyler Arron Windh, Tony M. Brewer, Patrick Estep
  • Patent number: 11429855
    Abstract: A method for accelerating a neural network includes identifying neural network layers that meet a locality constraint. Code is generated to implement depth-first processing for different hardware based on the identified neural network layers. The generated code is used to perform the depth-first processing on the neural network based on the generated code.
    Type: Grant
    Filed: February 6, 2018
    Date of Patent: August 30, 2022
    Assignee: NEC CORPORATION
    Inventors: Nicolas Weber, Felipe Huici, Mathias Niepert
  • Patent number: 11422815
    Abstract: Binary translation may be performed by a field programmable gate array (FPGA) integrated with a processor as a single integrated circuit. The FPGA contains multiple blocks of logic for performing different binary translations. The processor may offload the binary translation to the FPGA. The FPGA may use historical logging to skip the binary translation of source instructions that have been previously translated into target instructions.
    Type: Grant
    Filed: March 1, 2018
    Date of Patent: August 23, 2022
    Assignee: Dell Products L.P.
    Inventors: Mukund P. Khatri, Ramesh Radhakrishnan
  • Patent number: 11403250
    Abstract: Examples in this application disclose an operation accelerator, a switch, and a processing system. One example operation accelerator includes a shunt circuit directly connected to a first peripheral component interconnect express (PCIe) device through a PCIe link. The shunt circuit is configured to receive first data sent by the first PCIe device through the PCIe link, and transmit the first data through an internal bus. A first address carried in the first data is located in a first range. In some examples of this application, the first PCIe device directly communicates with the operation accelerator through the shunt circuit in the operation accelerator.
    Type: Grant
    Filed: March 29, 2021
    Date of Patent: August 2, 2022
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Chuanning Cheng, Shengyong Peng
  • Patent number: 11392513
    Abstract: A graph-based data flow control system includes a control plane system coupled to SCP subsystems. The control plane system identifies a workload, and identifies service(s) on the SCP subsystems for manipulting/exchanging data to perform the workload. The control plane system generates a respective SCP-local data flow control graph for each SCP subsystem that defines how their service(s) will manipulate/exchange data within that SCP subsystem, and generates inter-SCP data flow control graph(s) that define how service(s) provided by at least one SCP subsystem will manipulate/exchange data with service(s) provided by at least one other SCP subsystem. The control plane system then transmits each respective SCP-local data flow control graph to each of the SCP subsystems, and the inter-SCP data flow control graph(s) to at least one SCP subsystem, for use by the SCP subsystems in causing their service(s) to manipulate/exchange data to perform the workload.
    Type: Grant
    Filed: October 15, 2020
    Date of Patent: July 19, 2022
    Assignee: Dell Products L.P.
    Inventors: Gaurav Chawla, Mark Steven Sanders, Elie Jreij, Jimmy D. Pike, Robert W. Hormuth, William Price Dawkins
  • Patent number: 11366662
    Abstract: A high-level synthesis multiprocessor system enables sophisticated algorithms to be easily realized by almost a smallest circuit. A shared memory is divided into a plurality of banks. The memory banks are connected to processors, respectively. Each processor receives an instruction code and an operand from its connected memory bank. After the operation execution, the processor sends the result to its adjacent processor element to set it as an accumulator value at the time of execution of a next instruction. A software program to be executed is fixed. A processor to execute each instruction in the software program is uniquely identified. Each processor has a function for executing its instruction out of all executable instructions in the multiprocessor system, and does not have a function for executing an instruction that the processor is not to execute. The circuit configuration with unused instructions deleted is provided.
    Type: Grant
    Filed: August 22, 2018
    Date of Patent: June 21, 2022
    Assignee: El Amina Inc.
    Inventor: Hideki Tanuma
  • Patent number: 11354592
    Abstract: Systems and methods for intelligent computation acceleration transform to allow applications to be executed by accelerated processing units such as graphic processing units (GPUs) or field programmable gate arrays (FPGAs) are disclosed. In an embodiment, a computational profile is generated for an application based on execution metrics of the application for the CPU and the accelerated processing unit, and a genetic algorithm (GA) prediction model is applied to predict execution speedup on an accelerated processing unit for the application. In an embodiment, upon identification of speedup, computational steps are arbitrated among various processing units according to compute availability to achieve optimal completion time for the compute job.
    Type: Grant
    Filed: December 20, 2018
    Date of Patent: June 7, 2022
    Assignee: Morgan Stanley Services Group Inc.
    Inventors: Michael A. Dobrovolsky, Kwokhin Chu, Pankaj Parashar
  • Patent number: 11354315
    Abstract: Method and apparatus for stress management in a searchable data service. The searchable data service may provide a searchable index to a backend data store, and an interface to build and query the searchable index, that enables client applications to search for and retrieve locators for stored entities in the backend data store. Embodiments of the searchable data service may implement a distributed stress management mechanism that may provide functionality including, but not limited to, the automated monitoring of critical resources, analysis of resource usage, and decisions on and performance of actions to keep resource usage within comfort zones. In one embodiment, in response to usage of a particular resource being detected as out of the comfort zone on a node, an action may be performed to transfer at least part of the resource usage for the local resource to another node that provides a similar resource.
    Type: Grant
    Filed: May 22, 2020
    Date of Patent: June 7, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Patrick W. Ransil, Aleksey Martynov, James Larson, James R. Collette, Robert Wai-Chi Chu, Partha Saha
  • Patent number: 11321144
    Abstract: Apparatus and method for selectively saving and restoring execution state components in an inter-core work offload environment. For example, one embodiment of a processor comprises: a plurality of cores; an interconnect coupling the plurality of cores; and offload circuitry to transfer work from a first core of the plurality of cores to a second core of the plurality of cores without operating system (OS) intervention, wherein the second core is to reach a first execution state upon completing the offload work and to store results in a first memory location or register; the second core comprising: a decoder to decode a first instruction comprising at least one operand to identify one or more components of the first execution state; and execution circuitry to execute the first instruction to save the one or more components of the first execution state to a specified region in memory.
    Type: Grant
    Filed: June 29, 2019
    Date of Patent: May 3, 2022
    Assignee: INTEL CORPORATION
    Inventor: ElMoustapha Ould-Ahmed-Vall
  • Patent number: 11263014
    Abstract: Data processing apparatuses, methods of data processing, and non-transitory computer-readable media on which computer-readable code is stored defining logical configurations of processing devices are disclosed. In an apparatus, fetch circuitry retrieves a sequence of instructions and execution circuitry performs data processing operations with respect to data values in a set of registers. An auxiliary execution circuitry interface and a coprocessor interface to provide a connection to a coprocessor outside the apparatus are provided.
    Type: Grant
    Filed: August 5, 2019
    Date of Patent: March 1, 2022
    Assignee: Arm Limited
    Inventors: Frederic Claude Marie Piry, Thomas Christoper Grocutt, Simon John Craske, Carlo Dario Fanara, Jean Sébastien Leroy
  • Patent number: 11256516
    Abstract: A system comprising a data memory, a first processor with first execution pipeline, and a co-processor with second execution pipeline branching from the first pipeline via an inter-processor interface. The first pipeline can decode instructions from an instruction set comprising first and second instruction subsets. The first subset comprises a load instruction which loads data from the memory into a register file, and a compute instruction of a first type which performs a compute operation on such loaded data. The second subset includes a compute instruction of a second type which does not require a separate load instruction to first load data from memory into a register file, but instead reads data from the memory directly and performs a compute operation on that data, this reading being performed in a pipeline stage of the second pipeline that is aligned with the memory access stage of the first pipeline.
    Type: Grant
    Filed: December 17, 2018
    Date of Patent: February 22, 2022
    Assignee: XMOS LTD
    Inventors: Henk Lambertus Muller, Peter Hedinger
  • Patent number: 11250341
    Abstract: A system comprising a classical computing subsystem to perform classical operations in a three-dimensional (3D) classical space unit using decomposed stopping points along a consecutive sequence of stopping points of sub-cells, along a vector with a shortest path between two points of the 3D classical space unit. The system includes a quantum computing subsystem to perform quantum operations in a 3D quantum space unit using decomposed stopping points along a consecutive sequence of stopping points of sub-cells, along a vector selected to have a shortest path between two points of the 3D quantum space unit. The system includes a control subsystem to decompose classical subproblems and quantum subproblems into the decomposed points and provide computing instructions and state information to the classical computing subsystem to perform the classical operations to the quantum computing subsystem to perform the quantum operations. A method and computer readable medium are provided.
    Type: Grant
    Filed: September 7, 2018
    Date of Patent: February 15, 2022
    Assignee: LOCKHEED MARTIN CORPORATION
    Inventors: Edward H. Allen, Luke A. Uribarri, Kristen L. Pudenz
  • Patent number: 11170025
    Abstract: A system for caching includes an interface to receive a portion of a hypercube to evaluate. The hypercube includes cells with a set of the cells having a formula. The system includes a processor to determine term(s) in the formula for each cell of the set of cells; remove from consideration a time dimension and/or a primary dimension for the term(s) in the formula for each cell of the set of cells; determine a set of distinct terms using the term(s); determine whether a total number of terms in the set of cells is larger than a number of distinct terms in the set of distinct terms; and in response to determining that the total number of terms in the set of cells is larger than the number of distinct terms in the set of distinct terms, indicate to cache the set of distinct terms during evaluation.
    Type: Grant
    Filed: April 29, 2019
    Date of Patent: November 9, 2021
    Assignee: Workday, Inc.
    Inventors: Ngoc Nguyen, Darren Kermit Lee, Shuyuan Chen, Ritu Jain, Francis Wang
  • Patent number: 11144290
    Abstract: A method includes analyzing a dataflow graph representing data dependencies between operators of a dataflow application to identify a plurality of candidate groups of the operators. Based on characteristics of a given hardware accelerator and the operators of a given candidate group of the plurality of candidate groups, determining whether the operators of the given candidate group are to be combined. In response to determining that the operators of the given candidate group are to be combined, retrieving executable binary code segments corresponding to the operators of the given candidate group, generating a unit of binary code including the executable binary code segments and metadata representing an execution control flow among the executable binary code segments, and dispatching the unit of code to the given hardware accelerator for execution of the unit of code.
    Type: Grant
    Filed: September 13, 2019
    Date of Patent: October 12, 2021
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Reza Azimi, Cheng Xiang Feng, Kai-Ting Amy Wang, Yaoqing Gao, Ye Tian, Xiang Wang
  • Patent number: 11093247
    Abstract: Embodiments detailed herein relate to systems and methods to load a tile register pair. In one example, a processor includes: decode circuitry to decode a load matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded load matrix pair instruction to load every element of left and right tiles of the identified destination matrix from corresponding element positions of left and right tiles of the identified source matrix, respectively, wherein the executing operates on one row of the identified destination matrix at a time, starting with the first row.
    Type: Grant
    Filed: December 29, 2017
    Date of Patent: August 17, 2021
    Assignee: Intel Corporation
    Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
  • Patent number: 11010308
    Abstract: Embodiments of the present disclosure include method for optimizing an internal memory for calculation of a convolutional layer of a convolutional neural network (CNN), the method including determining a computation cost of calculating the convolutional layer using each combination of a memory management scheme of a plurality of memory management schemes and data partition sizes of input feature map (IFM) data, kernel data, and output feature map (OFM) data to be loaded in the internal memory; identifying one combination of a memory management scheme and data partition sizes having a lowest computation cost for the convolutional layer; and implementing the CNN to use the one combination for calculation of the convolutional layer.
    Type: Grant
    Filed: May 31, 2019
    Date of Patent: May 18, 2021
    Assignee: LG ELECTRONICS INC.
    Inventors: Jaewon Kim, Thi Huong Giang Nguyen
  • Patent number: 11004500
    Abstract: Disclosed herein are apparatuses and methods related to an artificial intelligence accelerator in memory. An apparatus can include a number of registers configured to enable the apparatus to operate in an artificial intelligence mode to perform artificial intelligence operations and an artificial intelligence (AI) accelerator configured to perform the artificial intelligence operations using the data stored in the number of memory arrays. The AI accelerator can include hardware, software, and or firmware that is configured to perform operations associated with AI operations. The hardware can include circuitry configured as an adder and/or multiplier to perform operations, such as logic operations, associated with AI operations.
    Type: Grant
    Filed: August 28, 2019
    Date of Patent: May 11, 2021
    Assignee: Micron Technology, Inc.
    Inventor: Alberto Troia
  • Patent number: 10990398
    Abstract: Techniques related to executing a plurality of instructions by a processor comprising receiving a first instruction for execution on an instruction execution pipeline, beginning execution of the first instruction, receiving one or more second instructions for execution on the instruction execution pipeline, the one or more second instructions associated with a higher priority task than the first instruction, storing a register state associated with the execution of the first instruction in one or more registers of a capture queue associated with the instruction execution pipeline, copying the register state from the capture queue to a memory, determining that the one or more second instructions have been executed, copying the register state from the memory to the one or more registers of the capture queue, and restoring the register state to the instruction execution pipeline from the capture queue.
    Type: Grant
    Filed: April 15, 2019
    Date of Patent: April 27, 2021
    Assignee: Texas Instruments Incorporated
    Inventors: Timothy D. Anderson, Joseph Zbiciak, Kai Chirca
  • Patent number: 10983796
    Abstract: Embodiments involving core-to-core offload are detailed herein. For example, a method decoding an instruction having fields for at least an opcode to indicate an end a task offload operation is to be performed, and executing the decoded instruction to cause a transmission of an offload end indication to the second core, the indication including one or more of an identifier of the second core, a location of where the second core can find the results of the offload, the results of execution of the offloaded task, an instruction pointer in the original code of the second source, a requesting core state, and a requesting core state location is described.
    Type: Grant
    Filed: June 29, 2019
    Date of Patent: April 20, 2021
    Assignee: Intel Corporation
    Inventor: Elmoustapha Ould-Ahmed-Vall
  • Patent number: 10970076
    Abstract: Disclosed embodiments relate to systems and methods for performing instructions specifying ternary tile operations. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction specifying a ternary tile operation, and locations of destination and first, second, and third source matrices, each of the matrices having M rows by N columns; and execution circuitry to respond to the decoded instruction by, for each equal-sized group of K elements of the specified first, second, and third source matrices, generate K results by performing the ternary tile operation in parallel on K corresponding elements of the specified first, second, and third source matrices, and store each of the K results to a corresponding element of the specified destination matrix, wherein corresponding elements of the specified source and destination matrices occupy a same relative position within their associated matrix.
    Type: Grant
    Filed: September 14, 2018
    Date of Patent: April 6, 2021
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Christopher J. Hughes, Bret Toll, Dan Baum, Raanan Sade, Robert Valentine, Mark J. Charney, Alexander F. Heinecke
  • Patent number: 10970078
    Abstract: In an embodiment, a computation engine may perform computations on input vectors having vector elements of a first precision and data type. The computation engine may convert the vector elements from the first precision to a second precision and may also interleave the vector elements as specified by an instruction issued by the processor to the computation engine. The interleave may be based on a ratio of a result precision and the second precision. An extract instruction may be supported to extract results from the computations and convert and deinterleave the vector elements to provide a compact result in a desired order.
    Type: Grant
    Filed: April 5, 2018
    Date of Patent: April 6, 2021
    Assignee: Apple Inc.
    Inventors: Eric Bainville, Tal Uliel, Jeffry E. Gonion, Ali Sazegari, Erik K. Norden
  • Patent number: 10943673
    Abstract: A method of medical data auto collection segmentation and analysis, includes collecting, from a plurality of sources, unstructured medical data in a plurality of formats, recognizing a medical name entity of each piece of the unstructured medical data, using a medical dictionary, and performing semantic text segmentation on each piece of the unstructured medical data so that each piece of the unstructured medical data is partitioned into groups sharing a same topic. The method further includes generating, as structured medical data, each piece of the unstructured medical data of which the medical name entity is recognized, each piece of the unstructured medical data being partitioned into the groups, and indexing the structured medical data into elastic search clusters.
    Type: Grant
    Filed: April 10, 2019
    Date of Patent: March 9, 2021
    Assignee: TENCENT AMERICA LLC
    Inventors: Shangqing Zhang, Min Tu, Nan Du, Yusheng Xie, Yaliang Li, Tao Yang, Wei Fan
  • Patent number: 10922140
    Abstract: Physical Graphics Processing Unit (GPU) resource scheduling system and method between virtual machines are provided. An agent is inserted between a physical GPU instruction dispatch and a physical GPU interface through a hooking method, for delaying sending instructions and data in the physical GPU instruction dispatch to the physical GPU interface, monitoring a set of GPU conditions of a guest application executing in the virtual machine and a use condition of physical GPU hardware resources, and then providing a feedback to a GPU resource scheduling algorithm based on time or a time sequence. With the agent, it is unneeded for the method to make any modification to the guest application of the virtual machine, a host operating system, a virtual machine operating system, a GPU driver and a virtual machine manager.
    Type: Grant
    Filed: June 19, 2013
    Date of Patent: February 16, 2021
    Assignee: SHANGHAI JIAOTONG UNIVERSITY
    Inventors: Miao Yu, Zhengwei Qi, Haibing Guan, Yin Wang
  • Patent number: 10891156
    Abstract: Systems and methods are provided to implement intelligent data coordination for accelerated computing in a distributed computing environment. For example, a method includes executing a task on a computing node, monitoring requests issued by the executing task, intercepting requests issued by the executing task which correspond to data flow operations to be performed as part of the task execution, and asynchronously executing the intercepted requests at scheduled times to coordinate data flow between resources on the computing node.
    Type: Grant
    Filed: April 26, 2017
    Date of Patent: January 12, 2021
    Assignee: EMC IP Holding Company LLC
    Inventors: Junping Zhao, Yifan Sun, Layne Peng, Jie Bao, Kun Wang
  • Patent number: 10846091
    Abstract: In an embodiment, a coprocessor includes multiple processing elements arranged in a grid of one or more rows and one or more columns. A given processing element includes an arithmetic/logic unit (ALU) circuit configured to perform an ALU operation specified by an instruction executable by the coprocessor, wherein the ALU circuit is configured to produce a result. The given processing element further comprises a first memory coupled to the execute circuit. The first memory is configured to store results generated by the given processing element. The first memory includes a portion of a result memory implemented by the coprocessor, wherein locations in the result memory are specifiable as destination operands of instructions executable by the coprocessor. The portion of the result memory implemented by the first memory is the portion of the result memory that the given processing element is capable of updating.
    Type: Grant
    Filed: February 26, 2019
    Date of Patent: November 24, 2020
    Assignee: Apple Inc.
    Inventors: Aditya Kesiraju, Andrew J. Beaumont-Smith, Deepankar Duggal, Ran A. Chachick
  • Patent number: 10831794
    Abstract: In one embodiment, a method for providing alternate keys in a keyed index includes creating a first base record in a keyed index of a database, the first base record including a first unique key and a first data record, wherein the first data record includes at least one sub key and at least one first value, each sub key being correlated with a different one of the at least one first value in a sub key/value pair, and creating one or more alternate key records in the database, each of the alternate key records including one of the at least one sub key which is correlated with the first base record and the first unique key of the first base record. The database adheres to virtual storage access method (VSAM) in some approaches. In other approaches, a number of alternate key records created is equal to a number of first sub keys in the first data record.
    Type: Grant
    Filed: July 17, 2018
    Date of Patent: November 10, 2020
    Assignee: International Business Machines Corporation
    Inventor: Terri A. Menendez
  • Patent number: 10831488
    Abstract: In an embodiment, a computation engine may offload work from a processor (e.g. a CPU) and efficiently perform computations such as those used in LSTM and other workloads at high performance. In an embodiment, the computation engine may perform computations on input vectors from input memories in the computation engine, and may accumulate results in an output memory within the computation engine. The input memories may be loaded with initial vector data from memory, incurring the memory latency that may be associated with reading the operands. Compute instructions may be performed on the operands, generating results in an output memory. One or more extract instructions may be supported to move data from the output memory to the input memory, permitting additional computation on the data in the output memory without moving the results to main memory.
    Type: Grant
    Filed: August 20, 2018
    Date of Patent: November 10, 2020
    Assignee: Apple Inc.
    Inventors: Eric Bainville, Jeffry E. Gonion, Ali Sazegari, Gerard R. Williams, III, Andrew J. Beaumont-Smith
  • Patent number: 10824423
    Abstract: A reconfigurable arithmetic device includes a plurality of processor elements configured to perform first arithmetic processes corresponding to a first type of instruction and second arithmetic processes corresponding to a second type of instruction, a random-access memory (RAM), and a control unit. The first type of instruction is written into the RAM at a first address, data for the first type of instruction is written into the RAM at a second address, and data for the second type of instruction is written into the RAM at a third address. When the first type of instruction is written at the first address, the control unit decodes the first type of instruction and configures the processor elements to perform the first arithmetic processes. When data for the second type of instruction is written at the third address, the control unit configures the processor elements to perform the second arithmetic processes.
    Type: Grant
    Filed: September 28, 2015
    Date of Patent: November 3, 2020
    Assignee: Cypress Semiconductor Corporation
    Inventors: Hiroshi Furukawa, Ichiro Kasama
  • Patent number: 10761900
    Abstract: A method for distributed processing includes receiving a job bundle at a command center comprising a processor, a network interface, and a memory. The method includes determining a value of a dimension of the job bundle, determining, based on a predetermined rule applied to the determined value of the dimension of the job bundle, an aggregate processing cost for the job bundle and identifying one or more available member devices communicatively connected to the command center via the network interface. Additionally, the method includes the operations of splitting the job bundle into one or more threads based on at least one of the determined value of the dimension, the aggregate processing cost or the available member devices, apportioning a thread of the one or more threads to a member device and transmitting, via the network interface, the apportioned thread to a secure processing environment of the member device.
    Type: Grant
    Filed: April 9, 2018
    Date of Patent: September 1, 2020
    Assignee: V2Com S.A.
    Inventors: Guilherme Spina, Leonardo de Moura Rocha Lima
  • Patent number: 10740152
    Abstract: Technologies for dynamic acceleration of general-purpose code include a computing device having a general-purpose processor core and one or more hardware accelerators. The computing device identifies an acceleration candidate in an application that is targeted to the processor core. The acceleration candidate may be a long-running computation of the application. The computing device translates the acceleration candidate into a translated executable targeted to the hardware accelerator. The computing device determines whether to offload execution of the acceleration candidate and, if so, executes the translated executable with the hardware accelerator. The computing device may translate the acceleration candidate into multiple translated executables, each targeted to a different hardware accelerator. The computing device may select among the translated executables in response to determining to offload execution.
    Type: Grant
    Filed: December 6, 2016
    Date of Patent: August 11, 2020
    Assignee: Intel Corporation
    Inventors: Jayaram Bobba, Niranjan K. Soundararajan
  • Patent number: 10740097
    Abstract: Embodiments of the invention provide a method, system and computer program product for embedding a global barrier and global interrupt network in a parallel computer system organized as a torus network. The computer system includes a multitude of nodes. In one embodiment, the method comprises taking inputs from a set of receivers of the nodes, dividing the inputs from the receivers into a plurality of classes, combining the inputs of each of the classes to obtain a result, and sending said result to a set of senders of the nodes. Embodiments of the invention provide a method, system and computer program product for embedding a collective network in a parallel computer system organized as a torus network. In one embodiment, the method comprises adding to a torus network a central collective logic to route messages among at least a group of nodes in a tree structure.
    Type: Grant
    Filed: May 20, 2016
    Date of Patent: August 11, 2020
    Assignee: International Business Machines Corporation
    Inventors: Dong Chen, Paul W. Coteus, Noel A. Eisley, Alan Gara, Philip Heidelberger, Robert M. Senger, Valentina Salapura, Burkhard Steinmacher-Burow, Yutaka Sugawara, Todd E. Takken
  • Patent number: 10664310
    Abstract: A method of configuring a System on Chip to execute a CNN process comprising CNN layers, the method comprising, for each schedule: determining memory access amount information describing how many memory accesses are required; expressing the memory access amount information as relationships describing reusability of data; combining the relationships with a cost of writing and reading from external memory, to form memory access information; determining a memory allocation for on-chip memory of the SoC for the input FMs and the output FMs; and determining, dependent upon the memory access information and the memory allocation for each schedule; a schedule which minimises the memory access information of external memory access for the CNN layer of the CNN process; and a memory allocation associated with the determined schedule.
    Type: Grant
    Filed: December 14, 2018
    Date of Patent: May 26, 2020
    Assignee: Canon Kabushiki Kaisha
    Inventors: Haseeb Bokhari, Jorgen Peddersen, Sridevan Parameswaran, Iftekhar Ahmed, Yusuke Yachide
  • Patent number: 10635445
    Abstract: An apparatus and method of operating an apparatus are disclosed. The apparatus has a program counter permitted range storage element defining a permitted range of program counter values for the sequence of instructions it executes. Branch prediction circuitry predicts target instruction addresses for branch instructions. In response to a program counter modifying event, a program counter speculative range storage element is updated corresponding to each speculatively executed instruction after a branch instruction. Program counter permitted range verification circuitry is responsive to resolution of a modification of the program counter permitted range indication resulting from the program counter modifying event to determine whether the speculatively executed program counter range satisfies the permitted range of program counter values. A branch mis-prediction mechanism may support the response of the apparatus if the permitted range of program counter values is violated.
    Type: Grant
    Filed: May 29, 2018
    Date of Patent: April 28, 2020
    Assignee: Arm Limited
    Inventors: Rémi Marius Teyssier, Albin Pierrick Tonnerre, Cédric Denis Robert Airaud, Luca Nassi, Guillaume Bolbenes, Francois Donati, Lee Evan Eisen, Pasquale Ranone