Including Coprocessor Patents (Class 712/34)
-
Patent number: 11947804Abstract: A system includes a hardware circuitry having a device coupled with one or more external memory devices. The device is to detect an input/output (I/O) request associated with an external memory device of the one or more external memory devices. The device is to record a first timestamp in response to detecting the IO request transmitted to the external memory device. The device is further to detect an indication from the external memory device of a completion of the IO request associated with the external memory device and record a second timestamp in response to detecting the indication. The device is also to determine a latency associated with the IO request based on the first timestamp and the second timestamp.Type: GrantFiled: April 6, 2022Date of Patent: April 2, 2024Assignee: NVIDIA CorporationInventors: Shridhar Rasal, Oren Duer, Aviv Kfir, Liron Mula
-
Patent number: 11934295Abstract: The present disclosure provides for synchronization of multi-core systems by monitoring a plurality of debug trace data streams for a redundantly operating system including a corresponding plurality of cores performing a task in parallel; in response to detecting a state difference on one debug trace data stream of the plurality of debug trace data streams relative to other debug trace data streams of the plurality of debug trace data streams: marking a given core associated with the one debug trace data stream as an affected core; and restarting the affected core.Type: GrantFiled: November 9, 2021Date of Patent: March 19, 2024Assignee: THE BOEING COMPANYInventors: David P. Haldeman, Eric J. Miller
-
Patent number: 11928471Abstract: Embodiments for a metadata predictor. An index pipeline generates indices in an index buffer in which the indices are used for reading out a memory device. A prediction cache is populated with metadata of instructions read from the memory device. A prediction pipeline generates a prediction using the metadata of the instructions from the prediction cache, the populating of the prediction cache with the metadata of the instructions being performed asynchronously to the operating of the prediction pipeline.Type: GrantFiled: August 19, 2021Date of Patent: March 12, 2024Assignee: International Business Machines CorporationInventors: Edward Thomas Malley, Adam Benjamin Collura, Brian Robert Prasky, James Bonanno, Dominic Ditomaso
-
Patent number: 11915015Abstract: Systems and methods provide isolated workspaces operating on an IHS (Information Handling System) with use of pre-boot resources of the IHS that are not directly accessible by the workspaces. Upon notification of a workspace initialization, a segregated variable space, such as a segregated memory utilized by a UEFI (Unified Extensible Firmware Interface) of the IHS, is specified for use by the workspace. The segregated variable space is initialized and populated with pre-boot variables, such as UEFI variables, that are allowed for configuration by the workspace. Upon a workspace issuing a request to configure a pre-boot variable, the segregated variable space is identified that was mapped for use by the workspace. The requested pre-boot variable configuration is allowed based on whether the pre-boot variable is populated in the segregated variable space. When the requested pre-boot variable configuration is allowed, the pre-boot variable is configured on behalf of the workspace.Type: GrantFiled: August 27, 2021Date of Patent: February 27, 2024Assignee: Dell Products, L.P.Inventors: Balasingh P. Samuel, Vivek Viswanathan Iyer
-
Patent number: 11899613Abstract: A packaging technology to improve performance of an AI processing system resulting in an ultra-high bandwidth system. An IC package is provided which comprises: a substrate; a first die on the substrate, and a second die stacked over the first die. The first die can be a first logic die (e.g., a compute chip, CPU, GPU, etc.) while the second die can be a compute chiplet comprising ferroelectric or paraelectric logic. Both dies can include ferroelectric or paraelectric logic. The ferroelectric/paraelectric logic may include AND gates, OR gates, complex gates, majority, minority, and/or threshold gates, sequential logic, etc. The IC package can be in a 3D or 2.5D configuration that implements logic-on-logic stacking configuration. The 3D or 2.5D packaging configurations have chips or chiplets designed to have time distributed or spatially distributed processing. The logic of chips or chiplets is segregated so that one chip in a 3D or 2.5D stacking arrangement is hot at a time.Type: GrantFiled: August 20, 2021Date of Patent: February 13, 2024Assignee: KEPLER COMPUTING INC.Inventors: Amrita Mathuriya, Christopher B. Wilkerson, Rajeev Kumar Dokania, Debo Olaosebikan, Sasikanth Manipatruni
-
Patent number: 11868780Abstract: An electronic device that includes a central processor and a coprocessor coupled to the central processor. The central processor includes a plurality of registers and is configured to decode a first set of instructions. The first set of instructions includes a command instruction and an identity of a destination register. The coprocessor is configured to receive the command instruction from the central processor, execute the command instruction, and write a result of the command instruction in the destination register. The central processor is further configured to set a register tag for the destination register at the time the central processor decodes the first set of instructions and to clear the register tag at the time the result is written in the destination register.Type: GrantFiled: August 26, 2021Date of Patent: January 9, 2024Assignee: TEXAS INSTRUMENTS INCORPORATEDInventors: Christian Wiencke, Armin Stingl, Jeroen Vliegen
-
Patent number: 11809869Abstract: Embodiments detailed herein relate to systems and methods to store a tile register pair to memory. In one example, a processor includes: decode circuitry to decode a store matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded store matrix pair instruction to store every element of left and right tiles of the identified source matrix to corresponding element positions of left and right tiles of the identified destination matrix, respectively, wherein the executing stores a chunk of C elements of one row of the identified source matrix at a time.Type: GrantFiled: December 29, 2017Date of Patent: November 7, 2023Assignee: Intel CorporationInventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
-
Patent number: 11797856Abstract: Presented herein are framework embodiments that allow the representation of complex systems and processes that are suitable for resource efficient machine learning and inference. Furthermore, disclosed are new reinforcement learning techniques that are capable of learning to plan and optimize dynamic and nuanced systems and processes. Different embodiments comprising combinations of one or more neural networks, reinforcement learning, and linear programming are discussed to learn representations and models—even for complex systems and methods. Furthermore, the introduction of neural field embodiments and methods to compute a Deep Argmax, as well to invert neural networks and neural fields with linear programming, provide the ability to create models and train models that are accurate and very resource efficient—using less memory, less computations, less time, and, as a result, less energy.Type: GrantFiled: June 11, 2020Date of Patent: October 24, 2023Assignee: System AI, Inc.Inventor: Tuna Oezer
-
Patent number: 11797473Abstract: An accelerated processor structure on a programmable integrated circuit device includes a processor and a plurality of configurable digital signal processors (DSPs). Each configurable DSP includes a circuit block, which in turn includes a plurality of multipliers. The accelerated processor structure further includes a first bus to transfer data from the processor to the configurable DSPs, and a second bus to transfer data from the configurable DSPs to the processor.Type: GrantFiled: October 8, 2018Date of Patent: October 24, 2023Assignee: Altera CorporationInventors: David Shippy, Martin Langhammer, Jeffrey Eastlack
-
Patent number: 11782722Abstract: A complex computing device, a complex computing method, an artificial intelligence chip and an electronic apparatus are provided. An input interface receives complex computing instructions and arbitrates each complex computing instruction to a corresponding computing component respectively, according to the computing types in the respective complex computing instructions Each computing component is connected to the input interface, acquires a source operand from a complex computing instruction to perform complex computing, and generates a computing result instruction to feed back to an output interface. The output interface arbitrates the computing result in each computing result instruction to the corresponding instruction source respectively, according to the instruction source identifier in each computing result instruction.Type: GrantFiled: January 14, 2021Date of Patent: October 10, 2023Assignees: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD., KUNLUNXIN TECHNOLOGY (BEIJING) COMPANY LIMITEDInventors: Baofu Zhao, Xueliang Du, Kang An, Yingnan Xu, Chao Tang
-
Patent number: 11768689Abstract: The present application discloses a computing device that can provide a low-power, highly capable computing platform for computational imaging. The computing device can include one or more processing units, for example one or more vector processors and one or more hardware accelerators, an intelligent memory fabric, a peripheral device, and a power management module. The computing device can communicate with external devices, such as one or more image sensors, an accelerometer, a gyroscope, or any other suitable sensor devices.Type: GrantFiled: November 12, 2021Date of Patent: September 26, 2023Assignee: Movidius LimitedInventors: Brendan Barry, Richard Richmond, Fergal Connor, David Moloney
-
Patent number: 11726701Abstract: A memory expander includes a memory device that stores a plurality of task data. A controller controls the memory device. The controller receives metadata and a management request from an external central processing unit (CPU) through a compute express link (CXL) interface and operates in a management mode in response to the management request. In the management mode, the controller receives a read request and a first address from an accelerator through the CXL interface and transmits one of the plurality of task data to the accelerator based on the metadata in response to the read request.Type: GrantFiled: October 25, 2021Date of Patent: August 15, 2023Inventors: Chon Yong Lee, Jae-Gon Lee, Kyunghan Lee
-
Patent number: 11720475Abstract: Disclosed in some examples are methods, systems, devices, and machine-readable mediums that use parallel hardware execution with software co-simulation to enable more advanced debugging operations on data flow architectures. Upon a halt to execution of a program thread, a state of the tiles that are executing the thread are saved and offloaded from the HTF to a host system. A developer may then examine this state on the host system to debug their program. Additionally, the state may be loaded into a software simulator that simulates the HTF hardware. This simulator allows for the developer to step through the code and to examine values to find bugs.Type: GrantFiled: November 21, 2022Date of Patent: August 8, 2023Assignee: Micron Technology, Inc.Inventors: Skyler Arron Windh, Tony M. Brewer, Patrick Estep
-
Patent number: 11714992Abstract: Systems and methods for providing executable instructions to a neural network processor are provided. In one example, a system comprises a database that stores a plurality of executable instructions and a plurality of subgraph identifiers, each subgraph identifier of the plurality of subgraph identifiers being associated with a subset of instructions of the plurality of executable instructions.Type: GrantFiled: December 13, 2018Date of Patent: August 1, 2023Assignee: Amazon Technologies, Inc.Inventors: Richard John Heaton, Randy Renfu Huang, Ron Diamant
-
Patent number: 11714649Abstract: An RISC-V-based 3D interconnected multi-core processor architecture and a working method thereof. The RISC-V-based 3D interconnected multi-core processor architecture includes a main control layer, a micro core array layer and an accelerator layer, wherein the main control layer includes a plurality of main cores which are RISC-V instruction set CPU cores, the micro core array layer includes a plurality of micro unit groups including a micro core, a data storage unit, an instruction storage unit and a linking controller, wherein the micro core is an RISC-V instruction set CPU core that executes partial functions of the main core; the accelerator layer is configured to optimize a running speed of space utilization for accelerators meeting specific requirements, wherein some main cores in the main control layer perform data interaction with the accelerator layer, the other main cores interact with the micro core array layer.Type: GrantFiled: December 1, 2021Date of Patent: August 1, 2023Assignee: SHANDONG LINGNENG ELECTRONIC TECHNOLOGY CO., LTD.Inventors: Gang Wang, Jinzheng Mou, Yang An, Moujun Xie, Benyang Wu, Zesheng Zhang, Wenyong Hou, Yongwei Wang, Zixuan Qiu, Xintan Li
-
Patent number: 11682109Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for configurable aprons for expanded binning. Aspects of the present disclosure include identifying one or more pixel tiles in at least one bin and determining edge information for each pixel tile of the one or more pixel tiles. The edge information may be associated with one or more pixels adjacent to each pixel tile. The present disclosure further describes determining whether at least one adjacent bin is visible based on the edge information for each pixel tile, where the at least one adjacent bin may be adjacent to the at least one bin.Type: GrantFiled: October 16, 2020Date of Patent: June 20, 2023Assignee: QUALCOMM IncorporatedInventors: Kalyan Kumar Bhiravabhatla, Krishnaiah Gummidipudi, Ankit Kumar Singh, Andrew Evan Gruber, Pavan Kumar Akkaraju, Srihari Babu Alla, Jonnala Gadda Nagendra Kumar, Vishwanath Shashikant Nikam
-
Patent number: 11663001Abstract: Systems, apparatuses, and methods for implementing a family of lossy sparse load single instruction, multiple data (SIMD) instructions are disclosed. A lossy sparse load unit (LSLU) loads a plurality of values from one or more input vector operands and determines how many non-zero values are included in one or more input vector operands of a given instruction. If the one or more input vector operands have less than a threshold number of non-zero values, then the LSLU causes an instruction for processing the one or more input vector operands to be skipped. In this case, the processing of the instruction of the one or more input vector operands is deemed to be redundant. If the one or more input vector operands have greater than or equal to the threshold number of non-zero values, then the LSLU causes an instruction for processing the input vector operand(s) to be executed.Type: GrantFiled: November 19, 2018Date of Patent: May 30, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Sanchari Sen, Derrick Allen Aguren, Joseph Lee Greathouse
-
Patent number: 11507493Abstract: Disclosed in some examples are methods, systems, devices, and machine-readable mediums that use parallel hardware execution with software co-simulation to enable more advanced debugging operations on data flow architectures. Upon a halt to execution of a program thread, a state of the tiles that are executing the thread are saved and offloaded from the HTF to a host system. A developer may then examine this state on the host system to debug their program. Additionally, the state may be loaded into a software simulator that simulates the HTF hardware. This simulator allows for the developer to step through the code and to examine values to find bugs.Type: GrantFiled: August 18, 2021Date of Patent: November 22, 2022Assignee: Micron Technology, Inc.Inventors: Skyler Arron Windh, Tony M. Brewer, Patrick Estep
-
Patent number: 11429855Abstract: A method for accelerating a neural network includes identifying neural network layers that meet a locality constraint. Code is generated to implement depth-first processing for different hardware based on the identified neural network layers. The generated code is used to perform the depth-first processing on the neural network based on the generated code.Type: GrantFiled: February 6, 2018Date of Patent: August 30, 2022Assignee: NEC CORPORATIONInventors: Nicolas Weber, Felipe Huici, Mathias Niepert
-
Patent number: 11422815Abstract: Binary translation may be performed by a field programmable gate array (FPGA) integrated with a processor as a single integrated circuit. The FPGA contains multiple blocks of logic for performing different binary translations. The processor may offload the binary translation to the FPGA. The FPGA may use historical logging to skip the binary translation of source instructions that have been previously translated into target instructions.Type: GrantFiled: March 1, 2018Date of Patent: August 23, 2022Assignee: Dell Products L.P.Inventors: Mukund P. Khatri, Ramesh Radhakrishnan
-
Patent number: 11403250Abstract: Examples in this application disclose an operation accelerator, a switch, and a processing system. One example operation accelerator includes a shunt circuit directly connected to a first peripheral component interconnect express (PCIe) device through a PCIe link. The shunt circuit is configured to receive first data sent by the first PCIe device through the PCIe link, and transmit the first data through an internal bus. A first address carried in the first data is located in a first range. In some examples of this application, the first PCIe device directly communicates with the operation accelerator through the shunt circuit in the operation accelerator.Type: GrantFiled: March 29, 2021Date of Patent: August 2, 2022Assignee: Huawei Technologies Co., Ltd.Inventors: Chuanning Cheng, Shengyong Peng
-
Patent number: 11392513Abstract: A graph-based data flow control system includes a control plane system coupled to SCP subsystems. The control plane system identifies a workload, and identifies service(s) on the SCP subsystems for manipulting/exchanging data to perform the workload. The control plane system generates a respective SCP-local data flow control graph for each SCP subsystem that defines how their service(s) will manipulate/exchange data within that SCP subsystem, and generates inter-SCP data flow control graph(s) that define how service(s) provided by at least one SCP subsystem will manipulate/exchange data with service(s) provided by at least one other SCP subsystem. The control plane system then transmits each respective SCP-local data flow control graph to each of the SCP subsystems, and the inter-SCP data flow control graph(s) to at least one SCP subsystem, for use by the SCP subsystems in causing their service(s) to manipulate/exchange data to perform the workload.Type: GrantFiled: October 15, 2020Date of Patent: July 19, 2022Assignee: Dell Products L.P.Inventors: Gaurav Chawla, Mark Steven Sanders, Elie Jreij, Jimmy D. Pike, Robert W. Hormuth, William Price Dawkins
-
Patent number: 11366662Abstract: A high-level synthesis multiprocessor system enables sophisticated algorithms to be easily realized by almost a smallest circuit. A shared memory is divided into a plurality of banks. The memory banks are connected to processors, respectively. Each processor receives an instruction code and an operand from its connected memory bank. After the operation execution, the processor sends the result to its adjacent processor element to set it as an accumulator value at the time of execution of a next instruction. A software program to be executed is fixed. A processor to execute each instruction in the software program is uniquely identified. Each processor has a function for executing its instruction out of all executable instructions in the multiprocessor system, and does not have a function for executing an instruction that the processor is not to execute. The circuit configuration with unused instructions deleted is provided.Type: GrantFiled: August 22, 2018Date of Patent: June 21, 2022Assignee: El Amina Inc.Inventor: Hideki Tanuma
-
Patent number: 11354592Abstract: Systems and methods for intelligent computation acceleration transform to allow applications to be executed by accelerated processing units such as graphic processing units (GPUs) or field programmable gate arrays (FPGAs) are disclosed. In an embodiment, a computational profile is generated for an application based on execution metrics of the application for the CPU and the accelerated processing unit, and a genetic algorithm (GA) prediction model is applied to predict execution speedup on an accelerated processing unit for the application. In an embodiment, upon identification of speedup, computational steps are arbitrated among various processing units according to compute availability to achieve optimal completion time for the compute job.Type: GrantFiled: December 20, 2018Date of Patent: June 7, 2022Assignee: Morgan Stanley Services Group Inc.Inventors: Michael A. Dobrovolsky, Kwokhin Chu, Pankaj Parashar
-
Patent number: 11354315Abstract: Method and apparatus for stress management in a searchable data service. The searchable data service may provide a searchable index to a backend data store, and an interface to build and query the searchable index, that enables client applications to search for and retrieve locators for stored entities in the backend data store. Embodiments of the searchable data service may implement a distributed stress management mechanism that may provide functionality including, but not limited to, the automated monitoring of critical resources, analysis of resource usage, and decisions on and performance of actions to keep resource usage within comfort zones. In one embodiment, in response to usage of a particular resource being detected as out of the comfort zone on a node, an action may be performed to transfer at least part of the resource usage for the local resource to another node that provides a similar resource.Type: GrantFiled: May 22, 2020Date of Patent: June 7, 2022Assignee: Amazon Technologies, Inc.Inventors: Patrick W. Ransil, Aleksey Martynov, James Larson, James R. Collette, Robert Wai-Chi Chu, Partha Saha
-
Patent number: 11321144Abstract: Apparatus and method for selectively saving and restoring execution state components in an inter-core work offload environment. For example, one embodiment of a processor comprises: a plurality of cores; an interconnect coupling the plurality of cores; and offload circuitry to transfer work from a first core of the plurality of cores to a second core of the plurality of cores without operating system (OS) intervention, wherein the second core is to reach a first execution state upon completing the offload work and to store results in a first memory location or register; the second core comprising: a decoder to decode a first instruction comprising at least one operand to identify one or more components of the first execution state; and execution circuitry to execute the first instruction to save the one or more components of the first execution state to a specified region in memory.Type: GrantFiled: June 29, 2019Date of Patent: May 3, 2022Assignee: INTEL CORPORATIONInventor: ElMoustapha Ould-Ahmed-Vall
-
Patent number: 11263014Abstract: Data processing apparatuses, methods of data processing, and non-transitory computer-readable media on which computer-readable code is stored defining logical configurations of processing devices are disclosed. In an apparatus, fetch circuitry retrieves a sequence of instructions and execution circuitry performs data processing operations with respect to data values in a set of registers. An auxiliary execution circuitry interface and a coprocessor interface to provide a connection to a coprocessor outside the apparatus are provided.Type: GrantFiled: August 5, 2019Date of Patent: March 1, 2022Assignee: Arm LimitedInventors: Frederic Claude Marie Piry, Thomas Christoper Grocutt, Simon John Craske, Carlo Dario Fanara, Jean Sébastien Leroy
-
Patent number: 11256516Abstract: A system comprising a data memory, a first processor with first execution pipeline, and a co-processor with second execution pipeline branching from the first pipeline via an inter-processor interface. The first pipeline can decode instructions from an instruction set comprising first and second instruction subsets. The first subset comprises a load instruction which loads data from the memory into a register file, and a compute instruction of a first type which performs a compute operation on such loaded data. The second subset includes a compute instruction of a second type which does not require a separate load instruction to first load data from memory into a register file, but instead reads data from the memory directly and performs a compute operation on that data, this reading being performed in a pipeline stage of the second pipeline that is aligned with the memory access stage of the first pipeline.Type: GrantFiled: December 17, 2018Date of Patent: February 22, 2022Assignee: XMOS LTDInventors: Henk Lambertus Muller, Peter Hedinger
-
Patent number: 11250341Abstract: A system comprising a classical computing subsystem to perform classical operations in a three-dimensional (3D) classical space unit using decomposed stopping points along a consecutive sequence of stopping points of sub-cells, along a vector with a shortest path between two points of the 3D classical space unit. The system includes a quantum computing subsystem to perform quantum operations in a 3D quantum space unit using decomposed stopping points along a consecutive sequence of stopping points of sub-cells, along a vector selected to have a shortest path between two points of the 3D quantum space unit. The system includes a control subsystem to decompose classical subproblems and quantum subproblems into the decomposed points and provide computing instructions and state information to the classical computing subsystem to perform the classical operations to the quantum computing subsystem to perform the quantum operations. A method and computer readable medium are provided.Type: GrantFiled: September 7, 2018Date of Patent: February 15, 2022Assignee: LOCKHEED MARTIN CORPORATIONInventors: Edward H. Allen, Luke A. Uribarri, Kristen L. Pudenz
-
Patent number: 11170025Abstract: A system for caching includes an interface to receive a portion of a hypercube to evaluate. The hypercube includes cells with a set of the cells having a formula. The system includes a processor to determine term(s) in the formula for each cell of the set of cells; remove from consideration a time dimension and/or a primary dimension for the term(s) in the formula for each cell of the set of cells; determine a set of distinct terms using the term(s); determine whether a total number of terms in the set of cells is larger than a number of distinct terms in the set of distinct terms; and in response to determining that the total number of terms in the set of cells is larger than the number of distinct terms in the set of distinct terms, indicate to cache the set of distinct terms during evaluation.Type: GrantFiled: April 29, 2019Date of Patent: November 9, 2021Assignee: Workday, Inc.Inventors: Ngoc Nguyen, Darren Kermit Lee, Shuyuan Chen, Ritu Jain, Francis Wang
-
Patent number: 11144290Abstract: A method includes analyzing a dataflow graph representing data dependencies between operators of a dataflow application to identify a plurality of candidate groups of the operators. Based on characteristics of a given hardware accelerator and the operators of a given candidate group of the plurality of candidate groups, determining whether the operators of the given candidate group are to be combined. In response to determining that the operators of the given candidate group are to be combined, retrieving executable binary code segments corresponding to the operators of the given candidate group, generating a unit of binary code including the executable binary code segments and metadata representing an execution control flow among the executable binary code segments, and dispatching the unit of code to the given hardware accelerator for execution of the unit of code.Type: GrantFiled: September 13, 2019Date of Patent: October 12, 2021Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Reza Azimi, Cheng Xiang Feng, Kai-Ting Amy Wang, Yaoqing Gao, Ye Tian, Xiang Wang
-
Patent number: 11093247Abstract: Embodiments detailed herein relate to systems and methods to load a tile register pair. In one example, a processor includes: decode circuitry to decode a load matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded load matrix pair instruction to load every element of left and right tiles of the identified destination matrix from corresponding element positions of left and right tiles of the identified source matrix, respectively, wherein the executing operates on one row of the identified destination matrix at a time, starting with the first row.Type: GrantFiled: December 29, 2017Date of Patent: August 17, 2021Assignee: Intel CorporationInventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
-
Patent number: 11010308Abstract: Embodiments of the present disclosure include method for optimizing an internal memory for calculation of a convolutional layer of a convolutional neural network (CNN), the method including determining a computation cost of calculating the convolutional layer using each combination of a memory management scheme of a plurality of memory management schemes and data partition sizes of input feature map (IFM) data, kernel data, and output feature map (OFM) data to be loaded in the internal memory; identifying one combination of a memory management scheme and data partition sizes having a lowest computation cost for the convolutional layer; and implementing the CNN to use the one combination for calculation of the convolutional layer.Type: GrantFiled: May 31, 2019Date of Patent: May 18, 2021Assignee: LG ELECTRONICS INC.Inventors: Jaewon Kim, Thi Huong Giang Nguyen
-
Patent number: 11004500Abstract: Disclosed herein are apparatuses and methods related to an artificial intelligence accelerator in memory. An apparatus can include a number of registers configured to enable the apparatus to operate in an artificial intelligence mode to perform artificial intelligence operations and an artificial intelligence (AI) accelerator configured to perform the artificial intelligence operations using the data stored in the number of memory arrays. The AI accelerator can include hardware, software, and or firmware that is configured to perform operations associated with AI operations. The hardware can include circuitry configured as an adder and/or multiplier to perform operations, such as logic operations, associated with AI operations.Type: GrantFiled: August 28, 2019Date of Patent: May 11, 2021Assignee: Micron Technology, Inc.Inventor: Alberto Troia
-
Patent number: 10990398Abstract: Techniques related to executing a plurality of instructions by a processor comprising receiving a first instruction for execution on an instruction execution pipeline, beginning execution of the first instruction, receiving one or more second instructions for execution on the instruction execution pipeline, the one or more second instructions associated with a higher priority task than the first instruction, storing a register state associated with the execution of the first instruction in one or more registers of a capture queue associated with the instruction execution pipeline, copying the register state from the capture queue to a memory, determining that the one or more second instructions have been executed, copying the register state from the memory to the one or more registers of the capture queue, and restoring the register state to the instruction execution pipeline from the capture queue.Type: GrantFiled: April 15, 2019Date of Patent: April 27, 2021Assignee: Texas Instruments IncorporatedInventors: Timothy D. Anderson, Joseph Zbiciak, Kai Chirca
-
Patent number: 10983796Abstract: Embodiments involving core-to-core offload are detailed herein. For example, a method decoding an instruction having fields for at least an opcode to indicate an end a task offload operation is to be performed, and executing the decoded instruction to cause a transmission of an offload end indication to the second core, the indication including one or more of an identifier of the second core, a location of where the second core can find the results of the offload, the results of execution of the offloaded task, an instruction pointer in the original code of the second source, a requesting core state, and a requesting core state location is described.Type: GrantFiled: June 29, 2019Date of Patent: April 20, 2021Assignee: Intel CorporationInventor: Elmoustapha Ould-Ahmed-Vall
-
Patent number: 10970076Abstract: Disclosed embodiments relate to systems and methods for performing instructions specifying ternary tile operations. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction specifying a ternary tile operation, and locations of destination and first, second, and third source matrices, each of the matrices having M rows by N columns; and execution circuitry to respond to the decoded instruction by, for each equal-sized group of K elements of the specified first, second, and third source matrices, generate K results by performing the ternary tile operation in parallel on K corresponding elements of the specified first, second, and third source matrices, and store each of the K results to a corresponding element of the specified destination matrix, wherein corresponding elements of the specified source and destination matrices occupy a same relative position within their associated matrix.Type: GrantFiled: September 14, 2018Date of Patent: April 6, 2021Assignee: Intel CorporationInventors: Elmoustapha Ould-Ahmed-Vall, Christopher J. Hughes, Bret Toll, Dan Baum, Raanan Sade, Robert Valentine, Mark J. Charney, Alexander F. Heinecke
-
Patent number: 10970078Abstract: In an embodiment, a computation engine may perform computations on input vectors having vector elements of a first precision and data type. The computation engine may convert the vector elements from the first precision to a second precision and may also interleave the vector elements as specified by an instruction issued by the processor to the computation engine. The interleave may be based on a ratio of a result precision and the second precision. An extract instruction may be supported to extract results from the computations and convert and deinterleave the vector elements to provide a compact result in a desired order.Type: GrantFiled: April 5, 2018Date of Patent: April 6, 2021Assignee: Apple Inc.Inventors: Eric Bainville, Tal Uliel, Jeffry E. Gonion, Ali Sazegari, Erik K. Norden
-
Patent number: 10943673Abstract: A method of medical data auto collection segmentation and analysis, includes collecting, from a plurality of sources, unstructured medical data in a plurality of formats, recognizing a medical name entity of each piece of the unstructured medical data, using a medical dictionary, and performing semantic text segmentation on each piece of the unstructured medical data so that each piece of the unstructured medical data is partitioned into groups sharing a same topic. The method further includes generating, as structured medical data, each piece of the unstructured medical data of which the medical name entity is recognized, each piece of the unstructured medical data being partitioned into the groups, and indexing the structured medical data into elastic search clusters.Type: GrantFiled: April 10, 2019Date of Patent: March 9, 2021Assignee: TENCENT AMERICA LLCInventors: Shangqing Zhang, Min Tu, Nan Du, Yusheng Xie, Yaliang Li, Tao Yang, Wei Fan
-
Patent number: 10922140Abstract: Physical Graphics Processing Unit (GPU) resource scheduling system and method between virtual machines are provided. An agent is inserted between a physical GPU instruction dispatch and a physical GPU interface through a hooking method, for delaying sending instructions and data in the physical GPU instruction dispatch to the physical GPU interface, monitoring a set of GPU conditions of a guest application executing in the virtual machine and a use condition of physical GPU hardware resources, and then providing a feedback to a GPU resource scheduling algorithm based on time or a time sequence. With the agent, it is unneeded for the method to make any modification to the guest application of the virtual machine, a host operating system, a virtual machine operating system, a GPU driver and a virtual machine manager.Type: GrantFiled: June 19, 2013Date of Patent: February 16, 2021Assignee: SHANGHAI JIAOTONG UNIVERSITYInventors: Miao Yu, Zhengwei Qi, Haibing Guan, Yin Wang
-
Patent number: 10891156Abstract: Systems and methods are provided to implement intelligent data coordination for accelerated computing in a distributed computing environment. For example, a method includes executing a task on a computing node, monitoring requests issued by the executing task, intercepting requests issued by the executing task which correspond to data flow operations to be performed as part of the task execution, and asynchronously executing the intercepted requests at scheduled times to coordinate data flow between resources on the computing node.Type: GrantFiled: April 26, 2017Date of Patent: January 12, 2021Assignee: EMC IP Holding Company LLCInventors: Junping Zhao, Yifan Sun, Layne Peng, Jie Bao, Kun Wang
-
Patent number: 10846091Abstract: In an embodiment, a coprocessor includes multiple processing elements arranged in a grid of one or more rows and one or more columns. A given processing element includes an arithmetic/logic unit (ALU) circuit configured to perform an ALU operation specified by an instruction executable by the coprocessor, wherein the ALU circuit is configured to produce a result. The given processing element further comprises a first memory coupled to the execute circuit. The first memory is configured to store results generated by the given processing element. The first memory includes a portion of a result memory implemented by the coprocessor, wherein locations in the result memory are specifiable as destination operands of instructions executable by the coprocessor. The portion of the result memory implemented by the first memory is the portion of the result memory that the given processing element is capable of updating.Type: GrantFiled: February 26, 2019Date of Patent: November 24, 2020Assignee: Apple Inc.Inventors: Aditya Kesiraju, Andrew J. Beaumont-Smith, Deepankar Duggal, Ran A. Chachick
-
Patent number: 10831794Abstract: In one embodiment, a method for providing alternate keys in a keyed index includes creating a first base record in a keyed index of a database, the first base record including a first unique key and a first data record, wherein the first data record includes at least one sub key and at least one first value, each sub key being correlated with a different one of the at least one first value in a sub key/value pair, and creating one or more alternate key records in the database, each of the alternate key records including one of the at least one sub key which is correlated with the first base record and the first unique key of the first base record. The database adheres to virtual storage access method (VSAM) in some approaches. In other approaches, a number of alternate key records created is equal to a number of first sub keys in the first data record.Type: GrantFiled: July 17, 2018Date of Patent: November 10, 2020Assignee: International Business Machines CorporationInventor: Terri A. Menendez
-
Patent number: 10831488Abstract: In an embodiment, a computation engine may offload work from a processor (e.g. a CPU) and efficiently perform computations such as those used in LSTM and other workloads at high performance. In an embodiment, the computation engine may perform computations on input vectors from input memories in the computation engine, and may accumulate results in an output memory within the computation engine. The input memories may be loaded with initial vector data from memory, incurring the memory latency that may be associated with reading the operands. Compute instructions may be performed on the operands, generating results in an output memory. One or more extract instructions may be supported to move data from the output memory to the input memory, permitting additional computation on the data in the output memory without moving the results to main memory.Type: GrantFiled: August 20, 2018Date of Patent: November 10, 2020Assignee: Apple Inc.Inventors: Eric Bainville, Jeffry E. Gonion, Ali Sazegari, Gerard R. Williams, III, Andrew J. Beaumont-Smith
-
Patent number: 10824423Abstract: A reconfigurable arithmetic device includes a plurality of processor elements configured to perform first arithmetic processes corresponding to a first type of instruction and second arithmetic processes corresponding to a second type of instruction, a random-access memory (RAM), and a control unit. The first type of instruction is written into the RAM at a first address, data for the first type of instruction is written into the RAM at a second address, and data for the second type of instruction is written into the RAM at a third address. When the first type of instruction is written at the first address, the control unit decodes the first type of instruction and configures the processor elements to perform the first arithmetic processes. When data for the second type of instruction is written at the third address, the control unit configures the processor elements to perform the second arithmetic processes.Type: GrantFiled: September 28, 2015Date of Patent: November 3, 2020Assignee: Cypress Semiconductor CorporationInventors: Hiroshi Furukawa, Ichiro Kasama
-
Patent number: 10761900Abstract: A method for distributed processing includes receiving a job bundle at a command center comprising a processor, a network interface, and a memory. The method includes determining a value of a dimension of the job bundle, determining, based on a predetermined rule applied to the determined value of the dimension of the job bundle, an aggregate processing cost for the job bundle and identifying one or more available member devices communicatively connected to the command center via the network interface. Additionally, the method includes the operations of splitting the job bundle into one or more threads based on at least one of the determined value of the dimension, the aggregate processing cost or the available member devices, apportioning a thread of the one or more threads to a member device and transmitting, via the network interface, the apportioned thread to a secure processing environment of the member device.Type: GrantFiled: April 9, 2018Date of Patent: September 1, 2020Assignee: V2Com S.A.Inventors: Guilherme Spina, Leonardo de Moura Rocha Lima
-
Patent number: 10740152Abstract: Technologies for dynamic acceleration of general-purpose code include a computing device having a general-purpose processor core and one or more hardware accelerators. The computing device identifies an acceleration candidate in an application that is targeted to the processor core. The acceleration candidate may be a long-running computation of the application. The computing device translates the acceleration candidate into a translated executable targeted to the hardware accelerator. The computing device determines whether to offload execution of the acceleration candidate and, if so, executes the translated executable with the hardware accelerator. The computing device may translate the acceleration candidate into multiple translated executables, each targeted to a different hardware accelerator. The computing device may select among the translated executables in response to determining to offload execution.Type: GrantFiled: December 6, 2016Date of Patent: August 11, 2020Assignee: Intel CorporationInventors: Jayaram Bobba, Niranjan K. Soundararajan
-
Patent number: 10740097Abstract: Embodiments of the invention provide a method, system and computer program product for embedding a global barrier and global interrupt network in a parallel computer system organized as a torus network. The computer system includes a multitude of nodes. In one embodiment, the method comprises taking inputs from a set of receivers of the nodes, dividing the inputs from the receivers into a plurality of classes, combining the inputs of each of the classes to obtain a result, and sending said result to a set of senders of the nodes. Embodiments of the invention provide a method, system and computer program product for embedding a collective network in a parallel computer system organized as a torus network. In one embodiment, the method comprises adding to a torus network a central collective logic to route messages among at least a group of nodes in a tree structure.Type: GrantFiled: May 20, 2016Date of Patent: August 11, 2020Assignee: International Business Machines CorporationInventors: Dong Chen, Paul W. Coteus, Noel A. Eisley, Alan Gara, Philip Heidelberger, Robert M. Senger, Valentina Salapura, Burkhard Steinmacher-Burow, Yutaka Sugawara, Todd E. Takken
-
Patent number: 10664310Abstract: A method of configuring a System on Chip to execute a CNN process comprising CNN layers, the method comprising, for each schedule: determining memory access amount information describing how many memory accesses are required; expressing the memory access amount information as relationships describing reusability of data; combining the relationships with a cost of writing and reading from external memory, to form memory access information; determining a memory allocation for on-chip memory of the SoC for the input FMs and the output FMs; and determining, dependent upon the memory access information and the memory allocation for each schedule; a schedule which minimises the memory access information of external memory access for the CNN layer of the CNN process; and a memory allocation associated with the determined schedule.Type: GrantFiled: December 14, 2018Date of Patent: May 26, 2020Assignee: Canon Kabushiki KaishaInventors: Haseeb Bokhari, Jorgen Peddersen, Sridevan Parameswaran, Iftekhar Ahmed, Yusuke Yachide
-
Patent number: 10635445Abstract: An apparatus and method of operating an apparatus are disclosed. The apparatus has a program counter permitted range storage element defining a permitted range of program counter values for the sequence of instructions it executes. Branch prediction circuitry predicts target instruction addresses for branch instructions. In response to a program counter modifying event, a program counter speculative range storage element is updated corresponding to each speculatively executed instruction after a branch instruction. Program counter permitted range verification circuitry is responsive to resolution of a modification of the program counter permitted range indication resulting from the program counter modifying event to determine whether the speculatively executed program counter range satisfies the permitted range of program counter values. A branch mis-prediction mechanism may support the response of the apparatus if the permitted range of program counter values is violated.Type: GrantFiled: May 29, 2018Date of Patent: April 28, 2020Assignee: Arm LimitedInventors: Rémi Marius Teyssier, Albin Pierrick Tonnerre, Cédric Denis Robert Airaud, Luca Nassi, Guillaume Bolbenes, Francois Donati, Lee Evan Eisen, Pasquale Ranone