Including Coprocessor Patents (Class 712/34)
-
Patent number: 12248395Abstract: A data storage device and method are provided for predictable low-latency in a time-sensitive environment. In one embodiment, a data storage device is provided comprising a memory and a controller configured to communicate with the memory. The controller is further configured to: receive, from a host, an indication of a logical block address range that the host will later read; and in response to receiving the indication: read data from the logical block address range; and perform an action on the data to reduce a read latency when the host later reads the logical block address range. Other embodiments are disclosed.Type: GrantFiled: July 26, 2023Date of Patent: March 11, 2025Assignee: Sandisk Technologies, Inc.Inventors: Devika Nair, Amit Sharma
-
Patent number: 12242653Abstract: Systems, apparatuses, and methods related to securing domain crossing using domain access tables are described. For example, a computer processor can have registers configured to store locations of domain access tables respectively for predefined, non-hierarchical domains. Each respective domain access table can be pre-associated with a respective domain and can have entries configured to identify entry points of the respective domain. The processor is configured to enforce domain crossing in instruction execution using the domain access tables and to prevent arbitrary and/or unauthorized domain crossing.Type: GrantFiled: October 27, 2021Date of Patent: March 4, 2025Assignee: Micron Technology, Inc.Inventor: Steven Jeffrey Wallach
-
Patent number: 12222892Abstract: A system, and associated method, includes a plurality of data processing units, a target CPU, an interconnect unit that is separate from the target CPU and configured to receive a data payload and a prefix that includes a sequentially ordered list of the processing units that will perform the data operations and the sets of parameters to be used by each of the processing units, and based on the sequentially ordered list, the interconnect unit sends the data payload to a first processing unit, and receives back processed data, then sends the processed data to the subsequent processing unit, and receives back further processed data, and so forth until all of the data operations have been performed by the processing units set forth in the sequentially ordered list.Type: GrantFiled: June 16, 2022Date of Patent: February 11, 2025Assignee: Eidetic Communications Inc.Inventors: Sean Gregory Gibb, Saeed Fouladi Fard
-
Patent number: 12223322Abstract: A method and apparatus for embedding a microprocessor in a programmable logic device (PLD), where the microprocessor has a logic unit that can operate in two modes. A first mode is a general purpose mode running at least one general purpose process related to the PLD, and a second mode is a fixed function mode emulating a fixed function for use by logic configured into a fabric of the PLD (fabric). A memory unit is coupled to the logic unit and to the fabric, and the fabric is operable for transferring signals with the logic unit in relation to the fixed function.Type: GrantFiled: June 28, 2022Date of Patent: February 11, 2025Assignee: Microchip Technology Inc.Inventors: Aaron Severance, Jonathan W. Greene, Joel Vandergriendt
-
Patent number: 12204931Abstract: Disclosed herein are systems and method for restoring a process. An exemplary method may include detecting a crash of an operating system (OS) on a computing device; collecting a memory state of at least one page of physical memory of the OS on the computing device; generating a checkpoint file that includes information related to one or more processes from the collected memory state, wherein the information comprises a state for each of the one or more processes at a time of the crash; for each respective process of the one or more processes, creating, on the computing device or another computing device, a new process corresponding to the respective process; and restoring, based on the checkpoint file, a state of the respective process at the time of the crash such that the new process initiates execution from the restored state.Type: GrantFiled: December 8, 2021Date of Patent: January 21, 2025Assignee: Virtuozzo International GmbHInventor: Vasily Averin
-
Patent number: 12197954Abstract: The present technology augments the GPU compute model to provide system-provided data marshalling characteristics of graphics pipelining to increase efficiency and reduce overhead. A simple scheduling model based on scalar counters (e.g., semaphores) abstract the availability of hardware resources. Resource releases can be done programmatically, and a system scheduler only needs to track the states of such counters/semaphores to make work launch decisions. Semantics of the counters/semaphores are defined by an application, which can use the counters/semaphores to represent the availability of free space in a memory buffer, the amount of cache pressure induced by the data flow in the network, or the presence of work items to be processed.Type: GrantFiled: March 17, 2021Date of Patent: January 14, 2025Assignee: NVIDIA CorporationInventors: Yury Uralsky, Henry Moreton, Matthijs de Smedt, Lei Yang
-
Patent number: 12197321Abstract: A storage device includes a controller and nonvolatile memories. The controller receives write commands having virtual stream identifiers (IDs), receives discard commands having the virtual stream IDs, and determines a lifetime of write data to which each of the virtual stream IDs is assigned. The nonvolatile memories are accessed by the controller depending on physical stream IDs. The controller maps the virtual stream IDs and the physical stream IDs based on the lifetime of the write data.Type: GrantFiled: December 2, 2022Date of Patent: January 14, 2025Assignees: Samsung Electronics Co., Ltd., Research & Business Foundation Sungkyunkwan UniversityInventors: Hwanjin Yong, Jin-Soo Kim
-
Patent number: 12174911Abstract: An apparatus and method for complex matrix multiplication. For example, one embodiment of a processor comprises: a decoder to decode a first complex matrix multiplication instruction; execution circuitry to execute the first complex matrix multiplication instruction, the execution circuitry comprising parallel multiplication circuitry to multiply real values from the first plurality of real and imaginary values with corresponding real values from the second plurality of real and imaginary values to generate a first plurality of real products, to multiply imaginary values from the first plurality of real and imaginary values with corresponding imaginary values from the second plurality of real and imaginary values to generate a second plurality of real products; and addition/subtraction circuitry to subtract each real product in the second plurality of real products from a corresponding real product in the first plurality of real products to produce a corresponding real value in the result matrix.Type: GrantFiled: December 23, 2020Date of Patent: December 24, 2024Assignee: Intel CorporationInventors: Menachem Adelman, Robert Valentine, Daniel Towner, Amit Gradstein, Mark Jay Charney
-
Patent number: 12165030Abstract: A system and method include an accelerator circuit comprising an input circuit block, a filter circuit block, a post-processing circuit block, and an output circuit block and a processor to initialize the accelerator circuit, determining tasks of a neural network application to be performed by at least one of the input circuit block, the filter circuit block, the post-processing circuit block, or the output circuit block, assign each of the tasks to a corresponding one of the input circuit block, the filter circuit block, the post-processing circuit block, or the output circuit block, instruct the accelerator circuit to perform the tasks, and execute the neural network application based on results received from the accelerator circuit completing performance of the tasks.Type: GrantFiled: June 18, 2021Date of Patent: December 10, 2024Inventors: Mayan Moudgill, John Glossner
-
Patent number: 12160369Abstract: A compute device can access local or remote accelerator devices for use in processing a received packet. The received packet can be processed by any combination of local accelerator devices and remote accelerator devices. In some cases, the received packet can be encapsulated in an encapsulating packet and sent to a remote accelerator device for processing. The encapsulating packet can indicate a priority level for processing the received packet and its associated processing task. The priority level can override a priority level that would otherwise be assigned to the received packet and its associated processing task. The remote accelerator device can specify a fullness of an input queue to the compute device. Other information can be conveyed by packets transmitted between and among compute devices and remote accelerator devices to assist in determining an accelerator to use or other uses.Type: GrantFiled: February 15, 2019Date of Patent: December 3, 2024Assignee: Intel CorporationInventors: Chih-Jen Chang, Daniel Christian Biederman, Matthew James Webb, Wing Cheung, Jose Niell, Robert Hathaway
-
Patent number: 12147813Abstract: A method for handling an exception or interrupt in a heterogeneous instruction set architecture is provided. A physical host to which the method is applied can support two instruction set architectures. When a secondary architecture virtual machine triggers an exception or interrupt, a virtual machine monitor may translate code of the exception or interrupt in a secondary instruction set architecture into code of the exception or interrupt in a primary instruction set architecture. The virtual machine monitor) may identify the code of the exception or interrupt in the primary instruction set architecture. The virtual machine monitor identifies, based on the translated code, a type of the exception or interrupt triggered by the secondary architecture virtual machine, to handle the exception or interrupt.Type: GrantFiled: December 12, 2022Date of Patent: November 19, 2024Assignee: Huawei Technologies Co., Ltd.Inventors: Yifei Jiang, Siqi Zhao, Bo Wan
-
Patent number: 12136411Abstract: A technique for training a model is disclosed. A training sample including an input sequence of observations and a target sequence of symbols having length different from the input sequence of observations is obtained. The input sequence of observations is fed into the model to obtain a sequence of predictions. The sequence of predictions is shifted by an amount with respect to the input sequence of observations. The model is updated based on a loss using a shifted sequence of predictions and the target sequence of the symbols.Type: GrantFiled: April 3, 2020Date of Patent: November 5, 2024Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Gakuto Kurata, Kartik Audhkhasi
-
Patent number: 12093801Abstract: Systems and methods for providing executable instructions to a neural network processor are provided. In one example, a system comprises a database that stores a plurality of executable instructions and a plurality of subgraph identifiers, each subgraph identifier of the plurality of subgraph identifiers being associated with a subset of instructions of the plurality of executable instructions.Type: GrantFiled: May 3, 2023Date of Patent: September 17, 2024Assignee: Amazon Technologies, Inc.Inventors: Richard John Heaton, Randy Renfu Huang, Ron Diamant
-
Patent number: 12013804Abstract: An integrated circuit, and a data processing device and method are provided. The integrated circuit includes a processor circuit and an accelerator circuit. The processor circuit includes a processor, a first data storage section, and a first data input/output interface. The accelerator circuit includes an accelerator and a second data input/output interface. The second data input/output interface is electrically connected to the first data input/output interface, so that the accelerator circuit can perform information interaction with the first data storage section.Type: GrantFiled: May 5, 2022Date of Patent: June 18, 2024Assignee: Lemon Inc.Inventors: Yimin Chen, Shan Lu, Junmou Zhang, Chuang Zhang, Yuanlin Cheng, Jian Wang
-
Patent number: 12001845Abstract: An apparatus comprises first instruction execution circuitry, second instruction execution circuitry, and a decoupled access buffer. Instructions of an ordered sequence of instructions are issued to one of the first and second instruction execution circuitry for execution in dependence on whether the instruction has a first type label or a second type label. An instruction with the first type label is an access-related instruction which determines at least one characteristic of a load operation to retrieve a data value from a memory address. Instruction execution by the first instruction execution circuitry of instructions having the first type label is prioritised over instruction execution by the second instruction execution circuitry of instructions having the second type label. Data values retrieved from memory as a result of execution of the first type instructions are stored in the decoupled access buffer.Type: GrantFiled: October 15, 2020Date of Patent: June 4, 2024Assignee: Arm LimitedInventors: Mbou Eyole, Stefanos Kaxiras
-
Patent number: 11947804Abstract: A system includes a hardware circuitry having a device coupled with one or more external memory devices. The device is to detect an input/output (I/O) request associated with an external memory device of the one or more external memory devices. The device is to record a first timestamp in response to detecting the IO request transmitted to the external memory device. The device is further to detect an indication from the external memory device of a completion of the IO request associated with the external memory device and record a second timestamp in response to detecting the indication. The device is also to determine a latency associated with the IO request based on the first timestamp and the second timestamp.Type: GrantFiled: April 6, 2022Date of Patent: April 2, 2024Assignee: NVIDIA CorporationInventors: Shridhar Rasal, Oren Duer, Aviv Kfir, Liron Mula
-
Patent number: 11934295Abstract: The present disclosure provides for synchronization of multi-core systems by monitoring a plurality of debug trace data streams for a redundantly operating system including a corresponding plurality of cores performing a task in parallel; in response to detecting a state difference on one debug trace data stream of the plurality of debug trace data streams relative to other debug trace data streams of the plurality of debug trace data streams: marking a given core associated with the one debug trace data stream as an affected core; and restarting the affected core.Type: GrantFiled: November 9, 2021Date of Patent: March 19, 2024Assignee: THE BOEING COMPANYInventors: David P. Haldeman, Eric J. Miller
-
Patent number: 11928471Abstract: Embodiments for a metadata predictor. An index pipeline generates indices in an index buffer in which the indices are used for reading out a memory device. A prediction cache is populated with metadata of instructions read from the memory device. A prediction pipeline generates a prediction using the metadata of the instructions from the prediction cache, the populating of the prediction cache with the metadata of the instructions being performed asynchronously to the operating of the prediction pipeline.Type: GrantFiled: August 19, 2021Date of Patent: March 12, 2024Assignee: International Business Machines CorporationInventors: Edward Thomas Malley, Adam Benjamin Collura, Brian Robert Prasky, James Bonanno, Dominic Ditomaso
-
Patent number: 11915015Abstract: Systems and methods provide isolated workspaces operating on an IHS (Information Handling System) with use of pre-boot resources of the IHS that are not directly accessible by the workspaces. Upon notification of a workspace initialization, a segregated variable space, such as a segregated memory utilized by a UEFI (Unified Extensible Firmware Interface) of the IHS, is specified for use by the workspace. The segregated variable space is initialized and populated with pre-boot variables, such as UEFI variables, that are allowed for configuration by the workspace. Upon a workspace issuing a request to configure a pre-boot variable, the segregated variable space is identified that was mapped for use by the workspace. The requested pre-boot variable configuration is allowed based on whether the pre-boot variable is populated in the segregated variable space. When the requested pre-boot variable configuration is allowed, the pre-boot variable is configured on behalf of the workspace.Type: GrantFiled: August 27, 2021Date of Patent: February 27, 2024Assignee: Dell Products, L.P.Inventors: Balasingh P. Samuel, Vivek Viswanathan Iyer
-
Patent number: 11899613Abstract: A packaging technology to improve performance of an AI processing system resulting in an ultra-high bandwidth system. An IC package is provided which comprises: a substrate; a first die on the substrate, and a second die stacked over the first die. The first die can be a first logic die (e.g., a compute chip, CPU, GPU, etc.) while the second die can be a compute chiplet comprising ferroelectric or paraelectric logic. Both dies can include ferroelectric or paraelectric logic. The ferroelectric/paraelectric logic may include AND gates, OR gates, complex gates, majority, minority, and/or threshold gates, sequential logic, etc. The IC package can be in a 3D or 2.5D configuration that implements logic-on-logic stacking configuration. The 3D or 2.5D packaging configurations have chips or chiplets designed to have time distributed or spatially distributed processing. The logic of chips or chiplets is segregated so that one chip in a 3D or 2.5D stacking arrangement is hot at a time.Type: GrantFiled: August 20, 2021Date of Patent: February 13, 2024Assignee: KEPLER COMPUTING INC.Inventors: Amrita Mathuriya, Christopher B. Wilkerson, Rajeev Kumar Dokania, Debo Olaosebikan, Sasikanth Manipatruni
-
Patent number: 11868780Abstract: An electronic device that includes a central processor and a coprocessor coupled to the central processor. The central processor includes a plurality of registers and is configured to decode a first set of instructions. The first set of instructions includes a command instruction and an identity of a destination register. The coprocessor is configured to receive the command instruction from the central processor, execute the command instruction, and write a result of the command instruction in the destination register. The central processor is further configured to set a register tag for the destination register at the time the central processor decodes the first set of instructions and to clear the register tag at the time the result is written in the destination register.Type: GrantFiled: August 26, 2021Date of Patent: January 9, 2024Assignee: TEXAS INSTRUMENTS INCORPORATEDInventors: Christian Wiencke, Armin Stingl, Jeroen Vliegen
-
Patent number: 11809869Abstract: Embodiments detailed herein relate to systems and methods to store a tile register pair to memory. In one example, a processor includes: decode circuitry to decode a store matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded store matrix pair instruction to store every element of left and right tiles of the identified source matrix to corresponding element positions of left and right tiles of the identified destination matrix, respectively, wherein the executing stores a chunk of C elements of one row of the identified source matrix at a time.Type: GrantFiled: December 29, 2017Date of Patent: November 7, 2023Assignee: Intel CorporationInventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
-
Patent number: 11797473Abstract: An accelerated processor structure on a programmable integrated circuit device includes a processor and a plurality of configurable digital signal processors (DSPs). Each configurable DSP includes a circuit block, which in turn includes a plurality of multipliers. The accelerated processor structure further includes a first bus to transfer data from the processor to the configurable DSPs, and a second bus to transfer data from the configurable DSPs to the processor.Type: GrantFiled: October 8, 2018Date of Patent: October 24, 2023Assignee: Altera CorporationInventors: David Shippy, Martin Langhammer, Jeffrey Eastlack
-
Patent number: 11797856Abstract: Presented herein are framework embodiments that allow the representation of complex systems and processes that are suitable for resource efficient machine learning and inference. Furthermore, disclosed are new reinforcement learning techniques that are capable of learning to plan and optimize dynamic and nuanced systems and processes. Different embodiments comprising combinations of one or more neural networks, reinforcement learning, and linear programming are discussed to learn representations and models—even for complex systems and methods. Furthermore, the introduction of neural field embodiments and methods to compute a Deep Argmax, as well to invert neural networks and neural fields with linear programming, provide the ability to create models and train models that are accurate and very resource efficient—using less memory, less computations, less time, and, as a result, less energy.Type: GrantFiled: June 11, 2020Date of Patent: October 24, 2023Assignee: System AI, Inc.Inventor: Tuna Oezer
-
Patent number: 11782722Abstract: A complex computing device, a complex computing method, an artificial intelligence chip and an electronic apparatus are provided. An input interface receives complex computing instructions and arbitrates each complex computing instruction to a corresponding computing component respectively, according to the computing types in the respective complex computing instructions Each computing component is connected to the input interface, acquires a source operand from a complex computing instruction to perform complex computing, and generates a computing result instruction to feed back to an output interface. The output interface arbitrates the computing result in each computing result instruction to the corresponding instruction source respectively, according to the instruction source identifier in each computing result instruction.Type: GrantFiled: January 14, 2021Date of Patent: October 10, 2023Assignees: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD., KUNLUNXIN TECHNOLOGY (BEIJING) COMPANY LIMITEDInventors: Baofu Zhao, Xueliang Du, Kang An, Yingnan Xu, Chao Tang
-
Patent number: 11768689Abstract: The present application discloses a computing device that can provide a low-power, highly capable computing platform for computational imaging. The computing device can include one or more processing units, for example one or more vector processors and one or more hardware accelerators, an intelligent memory fabric, a peripheral device, and a power management module. The computing device can communicate with external devices, such as one or more image sensors, an accelerometer, a gyroscope, or any other suitable sensor devices.Type: GrantFiled: November 12, 2021Date of Patent: September 26, 2023Assignee: Movidius LimitedInventors: Brendan Barry, Richard Richmond, Fergal Connor, David Moloney
-
Patent number: 11726701Abstract: A memory expander includes a memory device that stores a plurality of task data. A controller controls the memory device. The controller receives metadata and a management request from an external central processing unit (CPU) through a compute express link (CXL) interface and operates in a management mode in response to the management request. In the management mode, the controller receives a read request and a first address from an accelerator through the CXL interface and transmits one of the plurality of task data to the accelerator based on the metadata in response to the read request.Type: GrantFiled: October 25, 2021Date of Patent: August 15, 2023Inventors: Chon Yong Lee, Jae-Gon Lee, Kyunghan Lee
-
Patent number: 11720475Abstract: Disclosed in some examples are methods, systems, devices, and machine-readable mediums that use parallel hardware execution with software co-simulation to enable more advanced debugging operations on data flow architectures. Upon a halt to execution of a program thread, a state of the tiles that are executing the thread are saved and offloaded from the HTF to a host system. A developer may then examine this state on the host system to debug their program. Additionally, the state may be loaded into a software simulator that simulates the HTF hardware. This simulator allows for the developer to step through the code and to examine values to find bugs.Type: GrantFiled: November 21, 2022Date of Patent: August 8, 2023Assignee: Micron Technology, Inc.Inventors: Skyler Arron Windh, Tony M. Brewer, Patrick Estep
-
Patent number: 11714992Abstract: Systems and methods for providing executable instructions to a neural network processor are provided. In one example, a system comprises a database that stores a plurality of executable instructions and a plurality of subgraph identifiers, each subgraph identifier of the plurality of subgraph identifiers being associated with a subset of instructions of the plurality of executable instructions.Type: GrantFiled: December 13, 2018Date of Patent: August 1, 2023Assignee: Amazon Technologies, Inc.Inventors: Richard John Heaton, Randy Renfu Huang, Ron Diamant
-
Patent number: 11714649Abstract: An RISC-V-based 3D interconnected multi-core processor architecture and a working method thereof. The RISC-V-based 3D interconnected multi-core processor architecture includes a main control layer, a micro core array layer and an accelerator layer, wherein the main control layer includes a plurality of main cores which are RISC-V instruction set CPU cores, the micro core array layer includes a plurality of micro unit groups including a micro core, a data storage unit, an instruction storage unit and a linking controller, wherein the micro core is an RISC-V instruction set CPU core that executes partial functions of the main core; the accelerator layer is configured to optimize a running speed of space utilization for accelerators meeting specific requirements, wherein some main cores in the main control layer perform data interaction with the accelerator layer, the other main cores interact with the micro core array layer.Type: GrantFiled: December 1, 2021Date of Patent: August 1, 2023Assignee: SHANDONG LINGNENG ELECTRONIC TECHNOLOGY CO., LTD.Inventors: Gang Wang, Jinzheng Mou, Yang An, Moujun Xie, Benyang Wu, Zesheng Zhang, Wenyong Hou, Yongwei Wang, Zixuan Qiu, Xintan Li
-
Patent number: 11682109Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for configurable aprons for expanded binning. Aspects of the present disclosure include identifying one or more pixel tiles in at least one bin and determining edge information for each pixel tile of the one or more pixel tiles. The edge information may be associated with one or more pixels adjacent to each pixel tile. The present disclosure further describes determining whether at least one adjacent bin is visible based on the edge information for each pixel tile, where the at least one adjacent bin may be adjacent to the at least one bin.Type: GrantFiled: October 16, 2020Date of Patent: June 20, 2023Assignee: QUALCOMM IncorporatedInventors: Kalyan Kumar Bhiravabhatla, Krishnaiah Gummidipudi, Ankit Kumar Singh, Andrew Evan Gruber, Pavan Kumar Akkaraju, Srihari Babu Alla, Jonnala Gadda Nagendra Kumar, Vishwanath Shashikant Nikam
-
Patent number: 11663001Abstract: Systems, apparatuses, and methods for implementing a family of lossy sparse load single instruction, multiple data (SIMD) instructions are disclosed. A lossy sparse load unit (LSLU) loads a plurality of values from one or more input vector operands and determines how many non-zero values are included in one or more input vector operands of a given instruction. If the one or more input vector operands have less than a threshold number of non-zero values, then the LSLU causes an instruction for processing the one or more input vector operands to be skipped. In this case, the processing of the instruction of the one or more input vector operands is deemed to be redundant. If the one or more input vector operands have greater than or equal to the threshold number of non-zero values, then the LSLU causes an instruction for processing the input vector operand(s) to be executed.Type: GrantFiled: November 19, 2018Date of Patent: May 30, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Sanchari Sen, Derrick Allen Aguren, Joseph Lee Greathouse
-
Patent number: 11507493Abstract: Disclosed in some examples are methods, systems, devices, and machine-readable mediums that use parallel hardware execution with software co-simulation to enable more advanced debugging operations on data flow architectures. Upon a halt to execution of a program thread, a state of the tiles that are executing the thread are saved and offloaded from the HTF to a host system. A developer may then examine this state on the host system to debug their program. Additionally, the state may be loaded into a software simulator that simulates the HTF hardware. This simulator allows for the developer to step through the code and to examine values to find bugs.Type: GrantFiled: August 18, 2021Date of Patent: November 22, 2022Assignee: Micron Technology, Inc.Inventors: Skyler Arron Windh, Tony M. Brewer, Patrick Estep
-
Patent number: 11429855Abstract: A method for accelerating a neural network includes identifying neural network layers that meet a locality constraint. Code is generated to implement depth-first processing for different hardware based on the identified neural network layers. The generated code is used to perform the depth-first processing on the neural network based on the generated code.Type: GrantFiled: February 6, 2018Date of Patent: August 30, 2022Assignee: NEC CORPORATIONInventors: Nicolas Weber, Felipe Huici, Mathias Niepert
-
Patent number: 11422815Abstract: Binary translation may be performed by a field programmable gate array (FPGA) integrated with a processor as a single integrated circuit. The FPGA contains multiple blocks of logic for performing different binary translations. The processor may offload the binary translation to the FPGA. The FPGA may use historical logging to skip the binary translation of source instructions that have been previously translated into target instructions.Type: GrantFiled: March 1, 2018Date of Patent: August 23, 2022Assignee: Dell Products L.P.Inventors: Mukund P. Khatri, Ramesh Radhakrishnan
-
Patent number: 11403250Abstract: Examples in this application disclose an operation accelerator, a switch, and a processing system. One example operation accelerator includes a shunt circuit directly connected to a first peripheral component interconnect express (PCIe) device through a PCIe link. The shunt circuit is configured to receive first data sent by the first PCIe device through the PCIe link, and transmit the first data through an internal bus. A first address carried in the first data is located in a first range. In some examples of this application, the first PCIe device directly communicates with the operation accelerator through the shunt circuit in the operation accelerator.Type: GrantFiled: March 29, 2021Date of Patent: August 2, 2022Assignee: Huawei Technologies Co., Ltd.Inventors: Chuanning Cheng, Shengyong Peng
-
Patent number: 11392513Abstract: A graph-based data flow control system includes a control plane system coupled to SCP subsystems. The control plane system identifies a workload, and identifies service(s) on the SCP subsystems for manipulting/exchanging data to perform the workload. The control plane system generates a respective SCP-local data flow control graph for each SCP subsystem that defines how their service(s) will manipulate/exchange data within that SCP subsystem, and generates inter-SCP data flow control graph(s) that define how service(s) provided by at least one SCP subsystem will manipulate/exchange data with service(s) provided by at least one other SCP subsystem. The control plane system then transmits each respective SCP-local data flow control graph to each of the SCP subsystems, and the inter-SCP data flow control graph(s) to at least one SCP subsystem, for use by the SCP subsystems in causing their service(s) to manipulate/exchange data to perform the workload.Type: GrantFiled: October 15, 2020Date of Patent: July 19, 2022Assignee: Dell Products L.P.Inventors: Gaurav Chawla, Mark Steven Sanders, Elie Jreij, Jimmy D. Pike, Robert W. Hormuth, William Price Dawkins
-
Patent number: 11366662Abstract: A high-level synthesis multiprocessor system enables sophisticated algorithms to be easily realized by almost a smallest circuit. A shared memory is divided into a plurality of banks. The memory banks are connected to processors, respectively. Each processor receives an instruction code and an operand from its connected memory bank. After the operation execution, the processor sends the result to its adjacent processor element to set it as an accumulator value at the time of execution of a next instruction. A software program to be executed is fixed. A processor to execute each instruction in the software program is uniquely identified. Each processor has a function for executing its instruction out of all executable instructions in the multiprocessor system, and does not have a function for executing an instruction that the processor is not to execute. The circuit configuration with unused instructions deleted is provided.Type: GrantFiled: August 22, 2018Date of Patent: June 21, 2022Assignee: El Amina Inc.Inventor: Hideki Tanuma
-
Patent number: 11354592Abstract: Systems and methods for intelligent computation acceleration transform to allow applications to be executed by accelerated processing units such as graphic processing units (GPUs) or field programmable gate arrays (FPGAs) are disclosed. In an embodiment, a computational profile is generated for an application based on execution metrics of the application for the CPU and the accelerated processing unit, and a genetic algorithm (GA) prediction model is applied to predict execution speedup on an accelerated processing unit for the application. In an embodiment, upon identification of speedup, computational steps are arbitrated among various processing units according to compute availability to achieve optimal completion time for the compute job.Type: GrantFiled: December 20, 2018Date of Patent: June 7, 2022Assignee: Morgan Stanley Services Group Inc.Inventors: Michael A. Dobrovolsky, Kwokhin Chu, Pankaj Parashar
-
Patent number: 11354315Abstract: Method and apparatus for stress management in a searchable data service. The searchable data service may provide a searchable index to a backend data store, and an interface to build and query the searchable index, that enables client applications to search for and retrieve locators for stored entities in the backend data store. Embodiments of the searchable data service may implement a distributed stress management mechanism that may provide functionality including, but not limited to, the automated monitoring of critical resources, analysis of resource usage, and decisions on and performance of actions to keep resource usage within comfort zones. In one embodiment, in response to usage of a particular resource being detected as out of the comfort zone on a node, an action may be performed to transfer at least part of the resource usage for the local resource to another node that provides a similar resource.Type: GrantFiled: May 22, 2020Date of Patent: June 7, 2022Assignee: Amazon Technologies, Inc.Inventors: Patrick W. Ransil, Aleksey Martynov, James Larson, James R. Collette, Robert Wai-Chi Chu, Partha Saha
-
Patent number: 11321144Abstract: Apparatus and method for selectively saving and restoring execution state components in an inter-core work offload environment. For example, one embodiment of a processor comprises: a plurality of cores; an interconnect coupling the plurality of cores; and offload circuitry to transfer work from a first core of the plurality of cores to a second core of the plurality of cores without operating system (OS) intervention, wherein the second core is to reach a first execution state upon completing the offload work and to store results in a first memory location or register; the second core comprising: a decoder to decode a first instruction comprising at least one operand to identify one or more components of the first execution state; and execution circuitry to execute the first instruction to save the one or more components of the first execution state to a specified region in memory.Type: GrantFiled: June 29, 2019Date of Patent: May 3, 2022Assignee: INTEL CORPORATIONInventor: ElMoustapha Ould-Ahmed-Vall
-
Patent number: 11263014Abstract: Data processing apparatuses, methods of data processing, and non-transitory computer-readable media on which computer-readable code is stored defining logical configurations of processing devices are disclosed. In an apparatus, fetch circuitry retrieves a sequence of instructions and execution circuitry performs data processing operations with respect to data values in a set of registers. An auxiliary execution circuitry interface and a coprocessor interface to provide a connection to a coprocessor outside the apparatus are provided.Type: GrantFiled: August 5, 2019Date of Patent: March 1, 2022Assignee: Arm LimitedInventors: Frederic Claude Marie Piry, Thomas Christoper Grocutt, Simon John Craske, Carlo Dario Fanara, Jean Sébastien Leroy
-
Patent number: 11256516Abstract: A system comprising a data memory, a first processor with first execution pipeline, and a co-processor with second execution pipeline branching from the first pipeline via an inter-processor interface. The first pipeline can decode instructions from an instruction set comprising first and second instruction subsets. The first subset comprises a load instruction which loads data from the memory into a register file, and a compute instruction of a first type which performs a compute operation on such loaded data. The second subset includes a compute instruction of a second type which does not require a separate load instruction to first load data from memory into a register file, but instead reads data from the memory directly and performs a compute operation on that data, this reading being performed in a pipeline stage of the second pipeline that is aligned with the memory access stage of the first pipeline.Type: GrantFiled: December 17, 2018Date of Patent: February 22, 2022Assignee: XMOS LTDInventors: Henk Lambertus Muller, Peter Hedinger
-
Patent number: 11250341Abstract: A system comprising a classical computing subsystem to perform classical operations in a three-dimensional (3D) classical space unit using decomposed stopping points along a consecutive sequence of stopping points of sub-cells, along a vector with a shortest path between two points of the 3D classical space unit. The system includes a quantum computing subsystem to perform quantum operations in a 3D quantum space unit using decomposed stopping points along a consecutive sequence of stopping points of sub-cells, along a vector selected to have a shortest path between two points of the 3D quantum space unit. The system includes a control subsystem to decompose classical subproblems and quantum subproblems into the decomposed points and provide computing instructions and state information to the classical computing subsystem to perform the classical operations to the quantum computing subsystem to perform the quantum operations. A method and computer readable medium are provided.Type: GrantFiled: September 7, 2018Date of Patent: February 15, 2022Assignee: LOCKHEED MARTIN CORPORATIONInventors: Edward H. Allen, Luke A. Uribarri, Kristen L. Pudenz
-
Patent number: 11170025Abstract: A system for caching includes an interface to receive a portion of a hypercube to evaluate. The hypercube includes cells with a set of the cells having a formula. The system includes a processor to determine term(s) in the formula for each cell of the set of cells; remove from consideration a time dimension and/or a primary dimension for the term(s) in the formula for each cell of the set of cells; determine a set of distinct terms using the term(s); determine whether a total number of terms in the set of cells is larger than a number of distinct terms in the set of distinct terms; and in response to determining that the total number of terms in the set of cells is larger than the number of distinct terms in the set of distinct terms, indicate to cache the set of distinct terms during evaluation.Type: GrantFiled: April 29, 2019Date of Patent: November 9, 2021Assignee: Workday, Inc.Inventors: Ngoc Nguyen, Darren Kermit Lee, Shuyuan Chen, Ritu Jain, Francis Wang
-
Patent number: 11144290Abstract: A method includes analyzing a dataflow graph representing data dependencies between operators of a dataflow application to identify a plurality of candidate groups of the operators. Based on characteristics of a given hardware accelerator and the operators of a given candidate group of the plurality of candidate groups, determining whether the operators of the given candidate group are to be combined. In response to determining that the operators of the given candidate group are to be combined, retrieving executable binary code segments corresponding to the operators of the given candidate group, generating a unit of binary code including the executable binary code segments and metadata representing an execution control flow among the executable binary code segments, and dispatching the unit of code to the given hardware accelerator for execution of the unit of code.Type: GrantFiled: September 13, 2019Date of Patent: October 12, 2021Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Reza Azimi, Cheng Xiang Feng, Kai-Ting Amy Wang, Yaoqing Gao, Ye Tian, Xiang Wang
-
Patent number: 11093247Abstract: Embodiments detailed herein relate to systems and methods to load a tile register pair. In one example, a processor includes: decode circuitry to decode a load matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded load matrix pair instruction to load every element of left and right tiles of the identified destination matrix from corresponding element positions of left and right tiles of the identified source matrix, respectively, wherein the executing operates on one row of the identified destination matrix at a time, starting with the first row.Type: GrantFiled: December 29, 2017Date of Patent: August 17, 2021Assignee: Intel CorporationInventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
-
Patent number: 11010308Abstract: Embodiments of the present disclosure include method for optimizing an internal memory for calculation of a convolutional layer of a convolutional neural network (CNN), the method including determining a computation cost of calculating the convolutional layer using each combination of a memory management scheme of a plurality of memory management schemes and data partition sizes of input feature map (IFM) data, kernel data, and output feature map (OFM) data to be loaded in the internal memory; identifying one combination of a memory management scheme and data partition sizes having a lowest computation cost for the convolutional layer; and implementing the CNN to use the one combination for calculation of the convolutional layer.Type: GrantFiled: May 31, 2019Date of Patent: May 18, 2021Assignee: LG ELECTRONICS INC.Inventors: Jaewon Kim, Thi Huong Giang Nguyen
-
Patent number: 11004500Abstract: Disclosed herein are apparatuses and methods related to an artificial intelligence accelerator in memory. An apparatus can include a number of registers configured to enable the apparatus to operate in an artificial intelligence mode to perform artificial intelligence operations and an artificial intelligence (AI) accelerator configured to perform the artificial intelligence operations using the data stored in the number of memory arrays. The AI accelerator can include hardware, software, and or firmware that is configured to perform operations associated with AI operations. The hardware can include circuitry configured as an adder and/or multiplier to perform operations, such as logic operations, associated with AI operations.Type: GrantFiled: August 28, 2019Date of Patent: May 11, 2021Assignee: Micron Technology, Inc.Inventor: Alberto Troia
-
Patent number: 10990398Abstract: Techniques related to executing a plurality of instructions by a processor comprising receiving a first instruction for execution on an instruction execution pipeline, beginning execution of the first instruction, receiving one or more second instructions for execution on the instruction execution pipeline, the one or more second instructions associated with a higher priority task than the first instruction, storing a register state associated with the execution of the first instruction in one or more registers of a capture queue associated with the instruction execution pipeline, copying the register state from the capture queue to a memory, determining that the one or more second instructions have been executed, copying the register state from the memory to the one or more registers of the capture queue, and restoring the register state to the instruction execution pipeline from the capture queue.Type: GrantFiled: April 15, 2019Date of Patent: April 27, 2021Assignee: Texas Instruments IncorporatedInventors: Timothy D. Anderson, Joseph Zbiciak, Kai Chirca