Array Processor Operation Patents (Class 712/16)
  • Patent number: 11836489
    Abstract: A processor for sparse matrix calculation includes an on-chip memory, a cache, a gather/scatter engine, and a core. The on-chip memory stores a first matrix or vector, and the cache stores a compressed sparse second matrix data structure. The compressed sparse second matrix data structure includes a value array including non-zero element values of the sparse second matrix, where each entry includes a given number of element values; and a column index array where each entry includes the given number of offsets matching the value array. The gather/scatter engine gathers element values of the first matrix or vector using the column index array of the sparse second matrix. In a hybrid horizontal/vertical implementation, the gather/scatter engine gathers sets of element values from sets of rows and from different sub-banks within the same rows based on the column index array of the sparse matrix.
    Type: Grant
    Filed: October 25, 2022
    Date of Patent: December 5, 2023
    Assignee: Alibaba Group Holding Limited
    Inventor: Fei Sun
  • Patent number: 11652484
    Abstract: An application specific integrated circuit (ASIC) chip includes: a systolic array of cells; and multiple controllable bus lines configured to convey data among the systolic array of cells, in which the systolic array of cells is arranged in multiple tiles, each tile of the multiple tiles including 1) a corresponding sub array of cells of the systolic array of cells, 2) a corresponding subset of controllable bus lines of the multiple controllable bus lines, and 3) memory coupled to the subarray of cells.
    Type: Grant
    Filed: August 9, 2021
    Date of Patent: May 16, 2023
    Assignee: Google LLC
    Inventors: Michial Allen Gunter, Charles Henry Leichner, IV, Tammo Spalink
  • Patent number: 11366875
    Abstract: Methods and devices, the method including receiving a matrix of a neural network model; classifying at least a portion of the matrix as a first section based on a first distribution pattern of non-zero elements of the portion of the matrix; and identifying memory addresses of the non-zero elements in the first section of the matrix for loading, according to a first order determined based on the first distribution pattern, the non-zero elements in the first section into one or more vector registers.
    Type: Grant
    Filed: March 13, 2020
    Date of Patent: June 21, 2022
    Assignee: ALIBABA GROUP HOLDING LIMITED
    Inventors: Guoyang Chen, Yu Pu, Yongzhi Zhang, Weifeng Zhang, Yuan Xie
  • Patent number: 11294709
    Abstract: A processing system including a memory, command sequencers, accelerators, and memory banks. The memory stores program code including instruction threads sequentially listed in the program code. The command sequencers include a master command sequencer and multiple slave command sequencers. The master command sequencer executes the program code including distributing the instruction threads for parallel execution among the slave command sequencers. The instruction threads may be provided inline or accessed via inline thread line pointers. Each accelerator is available to each command sequencer in which multiple command sequencers may access multiple accelerators for parallel execution. The memory banks are simultaneously available to multiple accelerators. The master command sequencer may perform implicit synchronization by waiting for completion of simultaneous execution of multiple instruction threads. A command sequencer arbiter may arbitrate among the command sequencers.
    Type: Grant
    Filed: February 18, 2020
    Date of Patent: April 5, 2022
    Assignee: NXP USA, Inc.
    Inventors: Maik Brett, Sidhartha Taneja, Christian Tuschen, Tejbal Prasad, Nikhil Tiwari, Saurabh Arora
  • Patent number: 11275892
    Abstract: A method, system, and computer program product for using a natural language processor to find nodes in a span include providing a parse tree including a trigger node, a first target node connected to the trigger node by a first edge, and a second target node connected to the first target node by a second edge, wherein first trigger node includes a first attribute and a second attribute, and wherein the target node includes a third attribute and a fourth attribute. Further included are recording the first, second, third, and fourth attributes in a first tree table; creating a first consideration table from the first tree table, the first consideration table including the first, second, third, and fourth attributes; and evaluating the first target node to determine whether the first node belongs in a first span that includes the first trigger node.
    Type: Grant
    Filed: April 29, 2019
    Date of Patent: March 15, 2022
    Assignee: International Business Machines Corporation
    Inventors: Joshua Cason, Kandhan Sekar, Thomas Hay Rogers
  • Patent number: 11263292
    Abstract: A method for performing a matrix multiplication operation is provided. The method includes: obtaining a matrix B1, a matrix A2, and an index matrix, wherein the index matrix comprises indexes, in a matrix A1, of elements in the matrix A2; generating m matrices B2 based on the index matrix and the matrix B1, wherein the m matrices B2 are all matrices with t rows and n columns, and each row of each matrix B2 is a row indicated in the matrix B1 by a corresponding element in the index matrix; and generating a matrix C based on the matrix A2 and the m matrices B2, wherein the matrix C is a product of the matrix A1 and the matrix B1.
    Type: Grant
    Filed: May 19, 2021
    Date of Patent: March 1, 2022
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Leijun He, Bin Xu, Kaixing Wang
  • Patent number: 11250108
    Abstract: A matrix processing method includes: determining a quantity of non-zero elements in a to-be-processed matrix, where the to-be-processed matrix is a one-dimensional matrix; generating a distribution matrix of the to-be-processed matrix, where the distribution matrix is used to indicate a position of a non-zero element in the to-be-processed matrix; combining the quantity of non-zero elements, values of all non-zero elements in the to-be-processed matrix arranged sequentially, and the distribution matrix, to obtain a compressed matrix of the to-be-processed matrix.
    Type: Grant
    Filed: May 8, 2020
    Date of Patent: February 15, 2022
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Zhenjiang Dong, Chio In Ieong, Hu Liu, Hai Chen
  • Patent number: 11232347
    Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes various attributes of the fabric vector: length, microthreading eligibility, number of data elements to receive, transmit, and/or process in parallel, virtual channel and task identification information, whether to terminate upon receiving a control wavelet, and whether to mark an outgoing wavelet a control wavelet.
    Type: Grant
    Filed: April 17, 2018
    Date of Patent: January 25, 2022
    Assignee: Cerebras Systems Inc.
    Inventors: Sean Lie, Michael Morrison, Michael Edwin James, Srikanth Arekapudi, Gary R. Lauterbach
  • Patent number: 11074317
    Abstract: A method includes identifying, using at least one processor, input words associated with a user query. The method also includes, for each of one or more of the input words that are contained in a high-frequency word set, retrieving pre-computed element-wise products associated with the input word from a cache. The method further includes performing, using the at least one processor, a convolution operation using the pre-computed element-wise products. In addition, the method includes generating, using the at least one processor, a response to the user query based on results of the convolution operation. The method may also include, for each of one or more of the input words that are not contained in the high-frequency word set, calculating additional element-wise products associated with the input word, and the convolution operation may be performed using the pre-computed element-wise products and the additional element-wise products.
    Type: Grant
    Filed: March 13, 2019
    Date of Patent: July 27, 2021
    Assignee: Samsung Electronics Co., Ltd.
    Inventor: Duanduan Yang
  • Patent number: 11048517
    Abstract: A processor core in an instruction block-based microarchitecture is configured so that an instruction window and operand buffers are decoupled for independent operation in which instructions in the block are not tied to resources such as control bits and operands that are maintained in the operand buffers. Instead, pointers are established among instructions in the block and the resources so that control state can be established for a refreshed instruction block (i.e., an instruction block that is reused without re-fetching it from an instruction cache) by following the pointers. Such decoupling of the instruction window from the operand space can provide greater processor efficiency, particularly in multiple core arrays where refreshing is utilized (for example when executing program code that uses tight loops), because the operands and control bits are pre-validated.
    Type: Grant
    Filed: June 24, 2019
    Date of Patent: June 29, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Douglas C. Burger, Aaron Smith, Jan Gray
  • Patent number: 11016810
    Abstract: A system and method for a computing tile of a multi-tiled integrated circuit includes a plurality of distinct tile computing circuits, wherein each of the plurality of distinct tile computing circuits is configured to receive fixed-length instructions; a token-informed task scheduler that: tracks one or more of a plurality of distinct tokens emitted by one or more of the plurality of distinct tile computing circuits; and selects a distinct computation task of a plurality of distinct computation tasks based on the tracking; and a work queue buffer that: contains a plurality of distinct fixed-length instructions, wherein each one of the fixed-length instructions is associated with one of the plurality of distinct computation tasks; and transmits one of the plurality of distinct fixed-length instructions to one or more of the plurality of distinct tile computing circuits based on the selection of the distinct computation task by the token-informed task scheduler.
    Type: Grant
    Filed: November 24, 2020
    Date of Patent: May 25, 2021
    Assignee: Mythic, Inc.
    Inventors: Malav Parikh, Sergio Schuler, Vimal Reddy, Zainab Zaidi, Paul Toth, Adam Caughron, Bryant Sorensen, Alex Dang-Tran, Scott Johnson, Raul Garibay, Andrew Morten, David Fick
  • Patent number: 10983878
    Abstract: Provided is an image recognition processor. The image recognition processor includes a plurality of nano cores each configured to perform a pattern recognition operation and arranged in rows and columns, an instruction memory configured to provide instructions to the plurality of nano cores in a row unit, a feature memory configured to provide input features to the plurality of nano cores in a row unit, a kernel memory configured to provide a kernel coefficient to the plurality of nano cores in a column unit, and a difference checker configured to receive a result of the pattern recognition operation of each of the plurality of nano cores, detect whether there is an error by referring to the received result, and provide a fault tolerance function that allows an error below a predefined level.
    Type: Grant
    Filed: November 25, 2019
    Date of Patent: April 20, 2021
    Assignee: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Jin Ho Han, Young-Su Kwon, Min-Seok Choi
  • Patent number: 10834226
    Abstract: Embodiments of the present invention provide methods, systems, and computer program products for container communication. In an embodiment, it is determined whether a message is going to a container on a same machine or to a container on a machine at a geographically different location. If it is determined that the message is going to a container on a machine at a geographically different location, then it is determined whether a predetermined threshold has been reached. If it is determined that the predetermined threshold has been reached, then the container from a first machine is migrated to the container on the container on the machine at the geographically different location. A data tracking structure is used to visually represent the migration of containers to other machines.
    Type: Grant
    Filed: July 15, 2016
    Date of Patent: November 10, 2020
    Assignee: International Business Machines Corporation
    Inventors: Rafael C. S. Folco, Breno H. Leitão, Desnes A. Nunes do Rosário, Jose F. Santiago Filho
  • Patent number: 10824429
    Abstract: Systems and methods are disclosed for executing instructions with a block-based processor. Instructions can be executed in any order as their dependencies arrive, but the individual instructions are committed in a serial fashion. Further, exception handling can be performed by storing transient state for an instruction block and resuming by restoring the transient state. This allows programmers to see intermediate state for the instruction block before the subject block has committed. In one examples of the disclosed technology, a method of operating a processor executing a block-based instruction set architecture includes executing at least one instruction encoded for an instruction block, responsive to determining that an individual instruction of the instruction block can commit, advancing a commit frontier for the instruction block to include all instructions in the instruction block that can commit, and committing one or more instructions inside the advanced commit frontier.
    Type: Grant
    Filed: December 18, 2018
    Date of Patent: November 3, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Gagan Gupta, David T. Harper
  • Patent number: 10783011
    Abstract: Systems and methods are directed to efficient management of processor resources, particularly General Purpose Registers (GPRs), for example to minimize pipeline flushes prevent deadlocks by counting GPRs instead of allocating them to specific blocks of code. Blocks of code are allowed to execute if the Free GPRs count is adequate. The method contemplates counting the number of Register Writers in blocks of code which will write to GPRs which are in process of executing, and counting the GPRs which are available instead of merely allocating them to dedicated use by a block of code, or an instruction in a block of code. Because blocks do not run if there is not enough GPRs available for the block, deadlocks and pipeline flushes due to lack of resources can be minimized.
    Type: Grant
    Filed: September 21, 2017
    Date of Patent: September 22, 2020
    Assignee: Qualcomm Incorporated
    Inventors: Vignyan Reddy Kothinti Naresh, Gregory Michael Wright
  • Patent number: 10776312
    Abstract: Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) each having a plurality of arithmetic logic units (ALUs) that are configured to execute a same instruction in parallel threads and a plurality of memory ports (MPs) for the plurality of PEs to access a memory unit. Each of the plurality of MPs may comprise an address calculation unit configured to generate respective memory addresses for each thread to access a common area in the memory unit.
    Type: Grant
    Filed: March 13, 2018
    Date of Patent: September 15, 2020
    Assignee: AzurEngine Technologies Zhuhai Inc.
    Inventors: Jianbin Zhu, Yuan Li
  • Patent number: 10733139
    Abstract: Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) and a plurality of memory ports (MPs) for the plurality of PEs to access a memory unit. Each PE may have a plurality of arithmetic logic units (ALUs) that are configured to execute a same instruction in parallel threads. Each of the plurality of MPs may comprise an address calculation unit configured to generate respective memory addresses for each thread to access a different memory bank in the memory unit.
    Type: Grant
    Filed: March 13, 2018
    Date of Patent: August 4, 2020
    Assignee: AzurEngine Technologies Zhuhai Inc.
    Inventors: Yuan Li, Jianbin Zhu
  • Patent number: 10613910
    Abstract: A virtual architecture generating apparatus and method, a runtime system, a multi-core system, and methods of operating the runtime system and the multi-core system may include analyzing a requirement of an application, a feature of the application, and a requirement of a system enabling an execution of the application, and include generating a virtual architecture corresponding to the application, based on a physical architecture of a reconfigurable processor, the analyzed requirements and the analyzed feature.
    Type: Grant
    Filed: July 11, 2017
    Date of Patent: April 7, 2020
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Min Young Son, Shi Hwa Lee, Seung Won Lee, Jeong Joon Yoo, Jae Don Lee, Young Sam Shin, Hee Jin Ahn
  • Patent number: 10592368
    Abstract: A method and system of imputing corrupted sequential data is provided. A plurality of input data vectors of a sequential data is received. For each input data vector of the sequential data, the input data vector is corrupted. The corrupted input data vector is mapped to a staging hidden layer to create a staging vector. The input data vector is reconstructed based on the staging vector, to provide an output data vector. adjusted parameter of the staging hidden layer is iteratively trained until it is within a predetermined tolerance of a loss function. A next input data vector of the sequential data is predicted based on the staging vector. The predicted next input data vector is stored.
    Type: Grant
    Filed: October 26, 2017
    Date of Patent: March 17, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Shi Jing Guo, Xiang Li, Hai Feng Liu, Jing Mei, Zhi Qiao, Guo Tong Xie, Shi Wan Zhao
  • Patent number: 10529049
    Abstract: Techniques are provided herein for generating an integral image of an input image in parallel across the cores of a multi-core processor. The input image is split into a plurality of tiles, each of which is stored in a scratchpad memory associated with a distinct core. At each tile, a partial integral image of the tile is first computed over the tile, using a Single-Pass Algorithm. This is followed by aggregating partial sums belonging to subsets of tiles using a 2D Inclusive Parallel Prefix Algorithm. A summation is finally performed over the aggregated partial sums to generate the integral image over the entire input image.
    Type: Grant
    Filed: March 27, 2017
    Date of Patent: January 7, 2020
    Assignee: Oracle International Corporation
    Inventors: Venkatanathan Varadarajan, Arun Raghavan, Sam Idicula, Nipun Agarwal
  • Patent number: 10504271
    Abstract: The invention notably relates to a computer-implemented method for simulating a 3D scene. The simulation is carried out with a set of computing resources running in parallel. The method comprises partitioning a 3D scene into a plurality of zones. Each zone is sized to satisfy real-time computing constraint by one computing resource of the set. The method comprises assigning each zone of the plurality to a computing resource, computing an estimation of a load of each computing resource and determining whether one or more computing resources are over-loaded or under-loaded, computing, for each zone, a contribution of the zone to the load of the computing resource to which the zone is assigned, reassigning one or more zones of a computing resource that is over-loaded or under-loaded to another computing resource, the reassignment resulting from the computed contributions of the zones with a combinatorial optimization algorithm.
    Type: Grant
    Filed: September 28, 2017
    Date of Patent: December 10, 2019
    Assignee: DASSAULT SYSTEMES
    Inventors: Malika Boulkenafed, Philippe Robert Felix Belmans
  • Patent number: 10437743
    Abstract: The present embodiments relate to interface circuitry between a serial interface circuit and an array of processing elements in an integrated circuit. The interface circuitry may include a daisy chain of feeder circuits and a daisy chain of drain circuits. If desired, the interface circuitry may include multiple daisy chains of feeder circuits and/or multiple daisy chains of drain circuits. These multiple daisy chains of feeder circuits and drains circuits may be coupled in parallel, respectively. In some embodiments, the interface circuitry may include synchronization circuitry that is coupled between the daisy chains of drain circuits and the serial interface circuit. Pipeline register stages between feeder circuits and/or between drain circuits may enable the placement of the feeder circuits and/or the drain circuits spatially close to the processing elements of the array of processing elements.
    Type: Grant
    Filed: April 1, 2016
    Date of Patent: October 8, 2019
    Assignee: Altera Corporation
    Inventors: Davor Capalija, Andrei Mihai Hagiescu Miriste, John Stuart Freeman, Alan Baker
  • Patent number: 10437650
    Abstract: Provided is a processing apparatus, including: a plurality of processing unit; at least one or more data buffers that are connected between a first processing unit and a second processing unit and is able to store data output from the first processing unit and data input to the second processing unit; a command buffer that stores a task command specifying execution of a task to be executed in one or more specific processing units, the command buffer being able to output the task command to the processing unit; and a task control unit that is configured to control operational processing in the task, by controlling at least one of the data buffer and the command buffer, on the basis of the task command, task setting information representing the processing unit in which the task is executed, and information representing a state of operational processing in respective processing unit.
    Type: Grant
    Filed: June 11, 2015
    Date of Patent: October 8, 2019
    Assignee: NEC Corporation
    Inventor: Tomoyoshi Kobori
  • Patent number: 10332008
    Abstract: A decision tree multi-processor system includes a plurality of decision tree processors that access a common feature vector and execute one or more decision trees with respect to the common feature vector. A related method includes providing a common feature vector to a plurality of decision tree processors implemented within an on-chip decision tree scoring system, and executing, by the plurality of decision tree processors, a plurality off decision trees, by reference to the common feature vector. A related decision tree-walking system includes feature storage that stores a common feature vector and a plurality of decision tree processors that access the common feature vector from the feature storage and execute a plurality of decision trees by comparing threshold values of the decision trees to feature values within the common feature vector.
    Type: Grant
    Filed: March 17, 2014
    Date of Patent: June 25, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Douglas C. Burger, James R. Larus, Andrew Putnam, Jan Gray
  • Patent number: 10303640
    Abstract: The present invention provides a method for managing the wiring and growth of a direct interconnect network implemented on a torus or higher radix interconnect structure based on an architecture that replaces the Network Interface Card (NIC) with PCIe switching cards housed in the server. Also provided is a passive patch panel for use in the implementation of the interconnect, comprising: a passive backplane that houses node to node connectivity for the interconnect; and at least one connector board plugged into the passive backplane comprising multiple connectors. The multiple connectors are capable of receiving an interconnecting plug to maintain the continuity of the torus or higher radix topology when not fully enabled. The PCIe card for use in the implementation of the interconnect comprises: at least 4 electrical or optical ports for the interconnect; a local switch; a processor with RAM and ROM memory; and a PCI interface.
    Type: Grant
    Filed: March 26, 2018
    Date of Patent: May 28, 2019
    Assignee: ROCKPORT NETWORKS INC.
    Inventor: Dan Oprea
  • Patent number: 10146738
    Abstract: An accelerator architecture for processing very-sparse and hyper-sparse matrix data is disclosed. A hardware accelerator comprises one or more tiles, each including a plurality of processing elements (PEs) and a data management unit (DMU). The PEs are to perform matrix operations involving very- or hyper-sparse matrices that are stored by a memory. The DMU is to provide the plurality of PEs access to the memory via an interface that is optimized to provide low-latency, parallel, random accesses to the memory. The PEs, via the DMU, perform the matrix operations by, issuing random access read requests for values of the one or more matrices, issuing random access read requests for values of one or more vectors serving as a second operand, and issuing random access write requests for values of one or more vectors serving as a result.
    Type: Grant
    Filed: December 31, 2016
    Date of Patent: December 4, 2018
    Assignee: Intel Corporation
    Inventors: Eriko Nurvitadhi, Deborah Marr
  • Patent number: 10140118
    Abstract: The present invention discloses an application data synchronization method and an apparatus. When a first operating system and a second operating system are installed in a terminal, and a first application and a second application that have a same function are installed on the first operating system and the second operating system respectively, the method includes: when the second application runs on the second operating system, performing the function by using second application data, and updating the second application data, where the second application data is updated according to first application data, and the first application data is updated when the first application runs on the first operating system to perform the function; where the first application data and the second application data are stored in the terminal. By using the solutions, sharing of data of a same application between different systems is more convenient and less time-consuming.
    Type: Grant
    Filed: March 19, 2014
    Date of Patent: November 27, 2018
    Assignee: Huawei Device (Dongguan) Co., Ltd.
    Inventors: Xi Huang, Jianxin Ding, Huangwei Wu
  • Patent number: 10078620
    Abstract: A processor includes a plurality of processing tiles, wherein each tile is configured at runtime to perforin a configurable operation. A first subset of tiles are configured to perform in a pipeline a first plurality of configurable operations in parallel. A second subset of tiles are configured to perform a second plurality of configurable operations in parallel with the first plurality of configurable operations. The process also includes a multi-port memory access module operably connected to the plurality of tiles via a data bus configured to control access to a memory and to provide data to two or more processing tiles simultaneously. The processor also includes a controller operably connected to the plurality of tiles and the multi-port memory access module via a runtime bus. The processor configures the tiles and the multi-port memory access module to execute a computation.
    Type: Grant
    Filed: May 24, 2012
    Date of Patent: September 18, 2018
    Assignee: NEW YORK UNIVERSITY
    Inventors: Clément Farabet, Yann LeCun
  • Patent number: 9965429
    Abstract: The present invention provides a method for managing the wiring and growth of a direct interconnect network implemented on a torus or higher radix interconnect structure based on an architecture that replaces the Network Interface Card (NIC) with PCIe switching cards housed in the server. Also provided is a passive patch panel for use in the implementation of the interconnect, comprising: a passive backplane that houses node to node connectivity for the interconnect; and at least one connector board plugged into the passive backplane comprising multiple connectors. The multiple connectors are capable of receiving an interconnecting plug to maintain the continuity of the torus or higher radix topology when not fully enabled. The PCIe card for use in the implementation of the interconnect comprises: at least 4 electrical or optical ports for the interconnect; a local switch; a processor with RAM and ROM memory; and a PCI interface.
    Type: Grant
    Filed: August 29, 2014
    Date of Patent: May 8, 2018
    Assignee: ROCKPORT NETWORKS INC.
    Inventor: Dan Oprea
  • Patent number: 9798575
    Abstract: Techniques to manage virtual classes for statistical tests are described. An apparatus may comprise a simulated data component to generate simulated data for a statistical test, statistics of the statistical test based on parameter vectors to follow a probability distribution, a statistic simulator component to simulate statistics for the parameter vectors from the simulated data with a distributed computing system comprising multiple nodes each having one or more processors capable of executing multiple threads, the simulation to occur by distribution of portions of the simulated data across the multiple nodes of the distributed computing system, and a distributed control engine to control task execution on the distributed portions of the simulated data on each node of the distributed computing system with a virtual software class arranged to coordinate task and sub-task operations across the nodes of the distributed computing system. Other embodiments are described and claimed.
    Type: Grant
    Filed: May 6, 2014
    Date of Patent: October 24, 2017
    Assignee: SAS Institute Inc.
    Inventors: Xilong Chen, Mark Roland Little
  • Patent number: 9715481
    Abstract: According to one technique, a modeling computer computes a Hessian matrix by determining whether an input matrix contains more than a threshold number of dense columns. If so, the modeling computer computes a sparsified version of the input matrix and uses the sparsified matrix to compute the Hessian. Otherwise, the modeling computer identifies which columns are dense and which columns are sparse. The modeling computer then partitions the input matrix by column density and uses sparse matrix format to store the sparse columns and dense matrix format to store the dense columns. The modeling computer then computes component parts which combine to form the Hessian, wherein component parts that rely on dense columns are computed using dense matrix multiplication and component parts that rely on sparse columns are computed using sparse matrix multiplication.
    Type: Grant
    Filed: March 9, 2015
    Date of Patent: July 25, 2017
    Assignee: Oracle International Corporation
    Inventors: Dmitry Golovashkin, Uladzislau Sharanhovich, Vaishnavi Sashikanth
  • Patent number: 9702305
    Abstract: Multiple engine sequencers in memory interfaces are disclosed. Individual sequencer engines of multiple engine sequencers perform at least portions of their respective operations in parallel with other individual sequencer engine operations performed in the memory interface. In at least one embodiment, sequencer engine operations are performed at least partially concurrently with other sequencer engine operations in the memory interface.
    Type: Grant
    Filed: April 17, 2013
    Date of Patent: July 11, 2017
    Assignee: Micron Technology, Inc.
    Inventors: William H. Radke, Laszlo Borbely, David Christopher Pruett
  • Patent number: 9588991
    Abstract: An image search device includes a common memory and a plurality of parallel processors for executing a same instruction. The image search device transfers, from storage, a plurality of representative feature vectors, which respectively represent a plurality of clusters including a plurality of image feature vectors, stores, in the common memory, one or more query feature vectors extracted from an image serving as a query, calculates a distance between the plurality of transferred representative feature vectors and the query feature vector using the plurality of parallel processors, and selects one or more of a plurality of images based on a distance between the plurality of image feature vectors, which belong to the cluster selected by the calculated distance, and the query feature vector.
    Type: Grant
    Filed: November 25, 2011
    Date of Patent: March 7, 2017
    Assignee: RAKUTEN, INC.
    Inventors: Ali Cevahir, Junji Torii
  • Patent number: 9565094
    Abstract: A method, system and computer program product are disclosed for routing data packet in a computing system comprising a multidimensional torus compute node network including a multitude of compute nodes, and an I/O node network including a plurality of I/O nodes. In one embodiment, the method comprises assigning to each of the data packets a destination address identifying one of the compute nodes; providing each of the data packets with a toio value; routing the data packets through the compute node network to the destination addresses of the data packets; and when each of the data packets reaches the destination address assigned to said each data packet, routing said each data packet to one of the I/O nodes if the toio value of said each data packet is a specified value. In one embodiment, each of the data packets is also provided with an ioreturn value used to route the data packets through the compute node network.
    Type: Grant
    Filed: January 29, 2010
    Date of Patent: February 7, 2017
    Assignee: International Business Machines Corporation
    Inventors: Dong Chen, Noel A. Eisley, Philip Heidelberger
  • Patent number: 9558151
    Abstract: Disclosed is a data processing device capable of efficiently performing an arithmetic process on variable-length data and an arithmetic process on fixed-length data. The data processing device includes first PEs of SIMD type, SRAMs provided respectively for the first PEs, and second PEs. The first PEs each perform an arithmetic operation on data stored in a corresponding one of the SRAMs. The second PEs each perform an arithmetic operation on data stored in corresponding ones of the SRAMs. Therefore, the SRAMs can be shared so as to efficiently perform the arithmetic process on variable-length data and the arithmetic process on fixed-length data.
    Type: Grant
    Filed: February 3, 2012
    Date of Patent: January 31, 2017
    Assignee: RENESAS ELECTRONICS CORPORATION
    Inventors: Kan Murata, Hideyuki Noda, Masaru Haraguchi
  • Patent number: 9557995
    Abstract: A data processing apparatus and method are provided for performing segmented operations. The data processing apparatus comprises a vector register store for storing vector operands, and vector processing circuitry providing N lanes of parallel processing, and arranged to perform a segmented operation on up to N data elements provided by a specified vector operand, each data element being allocated to one of the N lanes. The up to N data elements forms a plurality of segments, and performance of the segmented operation comprises performing a separate operation on the data elements of each segment, the separate operation involving interaction between the lanes containing the data elements of the associated segment.
    Type: Grant
    Filed: February 7, 2014
    Date of Patent: January 31, 2017
    Assignee: ARM Limited
    Inventors: Mbou Eyole-Monono, Alastair David Reid, Matthias Lothar Böttcher, Giacomo Gabrielli
  • Patent number: 9443269
    Abstract: High volume data processing systems and methods are provided to enable ultra-low latency processing and distribution of data. The systems and methods can be implemented to service primary trading houses where microsecond delays can significantly impact performance and value. According to one aspect, the systems and methods are configured to process data from a variety of market data sources in a variety of formats, while maintaining target latencies of less than 1 microsecond. A matrix of FPGA nodes is configured to provide ultra-low latencies while enabling deterministic and distributed processing. In some embodiments, the matrix can be configured to provide consistent latencies even during micro burst conditions. Further book building operations (determination of current holdings and assets) can occur under ultra-low latency timing, providing for near instantaneous risk management, management, and execution processes, even under microburst conditions.
    Type: Grant
    Filed: February 15, 2013
    Date of Patent: September 13, 2016
    Assignee: NovaSparks, Inc.
    Inventor: Marc Battyani
  • Patent number: 9329621
    Abstract: A serial array processor may have an execution unit, which is comprised of a multiplicity of single bit arithmetic logic units (ALUs), and which may perform parallel operations on a subset of all the words in memory by serially accessing and processing them, one bit at a time, while an instruction unit of the processor is pre-fetching the next instruction, a word at a time, in a manner orthogonal to the execution unit.
    Type: Grant
    Filed: December 9, 2013
    Date of Patent: May 3, 2016
    Inventor: Laurence H. Cooke
  • Patent number: 9323537
    Abstract: A method comprises measuring the execution time T1 for a problem to be solved with a program being run by a single processor, measuring the execution time TM and TS of MIMD and SIMD program fragments being run by a single processor and a single accelerator correspondingly, determining the specific acceleration ? of the execution time for an SIMD program fragment being run by a single accelerator in comparison with the execution time for the fragment being run by a single processor, determining a portion of the execution time for an MIMD fragment being run by a single processor and a portion of the execution time for an SIMD fragment being run by a single processor and adjusting the quantity of processors or accelerators comprised in a hybrid computing system structure according to the data obtained.
    Type: Grant
    Filed: October 13, 2011
    Date of Patent: April 26, 2016
    Assignee: Federal State Unitary Enterprise—AU—Russian Scientific Research Institute of Experimental Physics—FSUE RVNC—VNIIEF
    Inventor: Sergey Alexandrovich Stepanenko
  • Patent number: 9294097
    Abstract: An array of field programmable gate array (FPGA) devices configured for execution of a source code. The array includes two or more FPGA devices, a host processor, and a host interface logic. The FPGA devices are configured to execute a parallelized portion of the source code partitioned among the FPGA devices based on data rates of computing elements of the source code, computational performance of the FPGA devices, the input/output (I/O) bandwidth of the FPGA devices. The FPGA devices include a memory bank addressable by a global memory address space for the array and an array interconnect that enables the computing elements executed by each of the FPGA devices to be programmed with a uniform address space of a global memory of the array and utilization of the global memory by the FPGA devices. The host interface logic connects the host processor with one of the FPGA devices.
    Type: Grant
    Filed: November 14, 2014
    Date of Patent: March 22, 2016
    Assignee: Scientific Concepts International Corporation
    Inventor: Andrei V. Vassiliev
  • Patent number: 9269123
    Abstract: A method, system and product are disclosed for volume rendering of medical images on a shared memory system implemented on a multi-socket mainboard with multiple multi-core processors and multiple last level caches, cores that share a cache being united in a socket. The method includes decomposing the image space to be used for rendering in regions, each region including a plurality of tiles; assigning two sockets to each of the decomposed regions; determining a tile enumeration scheme for a region; rendering all tiles within a region according to a determined tile enumeration scheme on the assigned two sockets until the respective region is finished; if a region is finished, assigning the two sockets to another region; and if no region is left, splitting an existing region of un-rendered tiles into sub-regions according to a splitting scheme and applying the steps recursively for the sub-regions.
    Type: Grant
    Filed: April 3, 2013
    Date of Patent: February 23, 2016
    Assignee: SIEMENS AKTIENGESELLSCHAFT
    Inventor: Robert Schneider
  • Patent number: 9137848
    Abstract: A mobile terminal and controlling method thereof are disclosed, by which an operable time of the mobile terminal can be increased in a manner of raising CPU power efficiency of the mobile terminal. The present invention includes a plurality of cores, a multicore adjuster configured to obtain a frequency of an active core of the plurality of cores, determine whether the obtained frequency exceeds a first threshold value for N consecutive times, wherein N is a positive integer, and activate at least one inactive core of the plurality of cores when the obtained frequency exceeds the first threshold value for N consecutive times, and a frequency adjuster configured to determine a workload of the active core, and adjust the obtained frequency of the active core according to the determined workload.
    Type: Grant
    Filed: January 8, 2013
    Date of Patent: September 15, 2015
    Assignee: LG ELECTRONICS INC.
    Inventor: Hyunwoo Nho
  • Patent number: 9075809
    Abstract: A method for creating an application cluster virtual node. The method may comprise identifying a plurality of nodes associated with an application cluster. The method may also comprise creating a virtual node that is associated with each node in the plurality of nodes. The method may comprise providing a data protection server with access to at least one node in the plurality of nodes. The access may be provided through the virtual node. A computer-readable medium is also disclosed.
    Type: Grant
    Filed: September 29, 2007
    Date of Patent: July 7, 2015
    Assignee: Symantec Corporation
    Inventors: Sunil Shah, Ynn-Pying A. Tsaur, Sudhir Subbarao
  • Patent number: 9075493
    Abstract: Techniques to present hierarchical information as orthographic projections are described. An apparatus may comprise an orthographic projection application arranged to manage a three dimensional orthographic projection of hierarchical information. The orthographic projection application may comprise a hierarchical information component operative to receive hierarchical information representing multiple nodes at different hierarchical levels, and parse the hierarchical information into a tree data structure, an orthographic generator component operative to generate a graphical tile for each node, arrange graphical tiles for each hierarchical level into graphical layers, and arrange the graphical layers in a vertical stack, and an orthographic presentation component operative to present a three dimensional orthographic projection of the hierarchical information with the stack of graphical layers each having multiple graphical tiles. Other embodiments are described and claimed.
    Type: Grant
    Filed: March 7, 2011
    Date of Patent: July 7, 2015
    Assignee: SAS INSTITUTE, INC.
    Inventors: Lee Ann Sullivan, Jordan Riley Benson, Rajiv Ramarajan, Paul Hankey, Frank Lee Wimmer
  • Publication number: 20150127924
    Abstract: A method and corresponding apparatus for processing a shuffle instruction are provided. Shuffle units are configured in a hierarchical structure, and each of the shuffle units generates a shuffled data element array by performing shuffling on an input data element array. In the hierarchical structure, which includes an upper shuffle unit and a lower shuffle unit, the shuffled data element array output from the lower shuffle unit is input to the upper shuffle unit as a portion of the input data element array for the upper shuffle unit.
    Type: Application
    Filed: July 14, 2014
    Publication date: May 7, 2015
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Keshava PRASAD, Navneet BASUTKAR, Young Hwan PARK, Ho YANG, Yeon Bok LEE
  • Patent number: 9015448
    Abstract: A processor and method for broadcasting data among a plurality of processing cores is disclosed. The processor includes a plurality of processing cores connected by point-to-point connections. A first of the processing cores includes a router that includes at least an allocation unit and an output port. The allocation unit is configured to determine that respective input buffers on at least two others of the processing cores are available to receive given data. The output port is usable by the router to send the given data across one of the point-to-point connections. The router is configured to send the given data contingent on determining that the respective input buffers are available. Furthermore, the processor is configured to deliver the data to the at least two other processing cores in response to the first processing core sending the data once across the point-to-point connection.
    Type: Grant
    Filed: June 17, 2010
    Date of Patent: April 21, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Tushar Krishna, Bradford M. Beckmann, Steven K. Reinhardt
  • Patent number: 9009444
    Abstract: A method, computer program product, and computing system for receiving a reservation for a LUN from Host A, wherein the LUN is defined within a data array. A lock for the LUN is defined as Host A. A write request is received for the LUN from Host B. The lock for the LUN is defined as Transitioning A to B. The write request is delayed for a defined period of time.
    Type: Grant
    Filed: September 29, 2012
    Date of Patent: April 14, 2015
    Assignee: EMC Corporation
    Inventors: Philip Derbeko, Arieh Don, Anat Eyal, Kevin F. Martin, Richard A. Trabing
  • Publication number: 20150100756
    Abstract: An array processor composed of processor cells that are programmed by a controlling unit, and that are reprogrammed when a cell has finished a current data processing operation, even while other cell continue to process data with their current programming.
    Type: Application
    Filed: May 13, 2014
    Publication date: April 9, 2015
    Applicant: PACT XPP TECHNOLOGIES AG
    Inventors: Martin Vorbach, Armin Nuckel
  • Patent number: 9003274
    Abstract: The illustrative embodiments provide for a system and recordable type medium for representing actions in a data processing system. A table is generated. The table comprises a plurality of rows and columns. Ones of the columns represent corresponding ones of computer applications that can start or stop in parallel with each other in a data processing system. Ones of the rows represent corresponding ones of sequences of actions within a corresponding column. Additionally, the table represents a definition of relationships among memory address spaces, wherein the table represents when each particular address space is started or stopped during one of a start-up process, a recovery process, and a shut-down process. The resulting table is stored.
    Type: Grant
    Filed: December 21, 2007
    Date of Patent: April 7, 2015
    Assignee: International Business Machines Corporation
    Inventor: Joseph John Katnic
  • Patent number: 8935510
    Abstract: For flexibly setting up an execution environment according to contents of processing to be executed while taking stability or a security level into consideration, the multiple processor system includes the execution environment main control unit 10 which determines CPU assignment at the time of deciding CPU assignment, the execution environment sub control unit 20 which controls starting, stopping and switching of an execution environment according to an instruction from the execution environment main control unit 10 to synchronize with the execution environment main control unit 10, and the execution environment management unit 30 which receives input of management information or reference refusal information of shared resources for each CPU 4 or each execution environment 100 to separate the execution environment main control unit 10 from the execution environment sub control units 20a through 20n, or the execution environment sub control units 20a through 20n from each other.
    Type: Grant
    Filed: November 1, 2007
    Date of Patent: January 13, 2015
    Assignee: NEC Corporation
    Inventors: Hiroaki Inoue, Junji Sakai, Tsuyoshi Abe, Masato Edahiro