Array Processor Operation Patents (Class 712/16)

Application specific (Class 712/17)

Data flow array processor (Class 712/18)

Systolic array processor (Class 712/19)

Multimode (e.g., mimd to simd, etc.) (Class 712/20)

Multiple instruction, multiple data (mimd) (Class 712/21)

Single instruction, multiple data (simd) (Class 712/22)

Sparse matrix calculations utilizing tightly coupled memory and gather/scatter engine

Patent number: 11836489

Abstract: A processor for sparse matrix calculation includes an on-chip memory, a cache, a gather/scatter engine, and a core. The on-chip memory stores a first matrix or vector, and the cache stores a compressed sparse second matrix data structure. The compressed sparse second matrix data structure includes a value array including non-zero element values of the sparse second matrix, where each entry includes a given number of element values; and a column index array where each entry includes the given number of offsets matching the value array. The gather/scatter engine gathers element values of the first matrix or vector using the column index array of the sparse second matrix. In a hybrid horizontal/vertical implementation, the gather/scatter engine gathers sets of element values from sets of rows and from different sub-banks within the same rows based on the column index array of the sparse matrix.

Type: Grant

Filed: October 25, 2022

Date of Patent: December 5, 2023

Assignee: Alibaba Group Holding Limited

Inventor: Fei Sun
Application specific integrated circuit accelerators

Patent number: 11652484

Abstract: An application specific integrated circuit (ASIC) chip includes: a systolic array of cells; and multiple controllable bus lines configured to convey data among the systolic array of cells, in which the systolic array of cells is arranged in multiple tiles, each tile of the multiple tiles including 1) a corresponding sub array of cells of the systolic array of cells, 2) a corresponding subset of controllable bus lines of the multiple controllable bus lines, and 3) memory coupled to the subarray of cells.

Type: Grant

Filed: August 9, 2021

Date of Patent: May 16, 2023

Assignee: Google LLC

Inventors: Michial Allen Gunter, Charles Henry Leichner, IV, Tammo Spalink
Method and device for matrix multiplication optimization using vector registers

Patent number: 11366875

Abstract: Methods and devices, the method including receiving a matrix of a neural network model; classifying at least a portion of the matrix as a first section based on a first distribution pattern of non-zero elements of the portion of the matrix; and identifying memory addresses of the non-zero elements in the first section of the matrix for loading, according to a first order determined based on the first distribution pattern, the non-zero elements in the first section into one or more vector registers.

Type: Grant

Filed: March 13, 2020

Date of Patent: June 21, 2022

Assignee: ALIBABA GROUP HOLDING LIMITED

Inventors: Guoyang Chen, Yu Pu, Yongzhi Zhang, Weifeng Zhang, Yuan Xie
System and method of obtaining multiple factor performance gain in processing system

Patent number: 11294709

Abstract: A processing system including a memory, command sequencers, accelerators, and memory banks. The memory stores program code including instruction threads sequentially listed in the program code. The command sequencers include a master command sequencer and multiple slave command sequencers. The master command sequencer executes the program code including distributing the instruction threads for parallel execution among the slave command sequencers. The instruction threads may be provided inline or accessed via inline thread line pointers. Each accelerator is available to each command sequencer in which multiple command sequencers may access multiple accelerators for parallel execution. The memory banks are simultaneously available to multiple accelerators. The master command sequencer may perform implicit synchronization by waiting for completion of simultaneous execution of multiple instruction threads. A command sequencer arbiter may arbitrate among the command sequencers.

Type: Grant

Filed: February 18, 2020

Date of Patent: April 5, 2022

Assignee: NXP USA, Inc.

Inventors: Maik Brett, Sidhartha Taneja, Christian Tuschen, Tejbal Prasad, Nikhil Tiwari, Saurabh Arora
Traversal-based sentence span judgements

Patent number: 11275892

Abstract: A method, system, and computer program product for using a natural language processor to find nodes in a span include providing a parse tree including a trigger node, a first target node connected to the trigger node by a first edge, and a second target node connected to the first target node by a second edge, wherein first trigger node includes a first attribute and a second attribute, and wherein the target node includes a third attribute and a fourth attribute. Further included are recording the first, second, third, and fourth attributes in a first tree table; creating a first consideration table from the first tree table, the first consideration table including the first, second, third, and fourth attributes; and evaluating the first target node to determine whether the first node belongs in a first span that includes the first trigger node.

Type: Grant

Filed: April 29, 2019

Date of Patent: March 15, 2022

Assignee: International Business Machines Corporation

Inventors: Joshua Cason, Kandhan Sekar, Thomas Hay Rogers
Method, circuit, and SOC for performing matrix multiplication operation

Patent number: 11263292

Abstract: A method for performing a matrix multiplication operation is provided. The method includes: obtaining a matrix B1, a matrix A2, and an index matrix, wherein the index matrix comprises indexes, in a matrix A1, of elements in the matrix A2; generating m matrices B2 based on the index matrix and the matrix B1, wherein the m matrices B2 are all matrices with t rows and n columns, and each row of each matrix B2 is a row indicated in the matrix B1 by a corresponding element in the index matrix; and generating a matrix C based on the matrix A2 and the m matrices B2, wherein the matrix C is a product of the matrix A1 and the matrix B1.

Type: Grant

Filed: May 19, 2021

Date of Patent: March 1, 2022

Assignee: HUAWEI TECHNOLOGIES CO., LTD.

Inventors: Leijun He, Bin Xu, Kaixing Wang
Matrix processing method and apparatus, and logic circuit

Patent number: 11250108

Abstract: A matrix processing method includes: determining a quantity of non-zero elements in a to-be-processed matrix, where the to-be-processed matrix is a one-dimensional matrix; generating a distribution matrix of the to-be-processed matrix, where the distribution matrix is used to indicate a position of a non-zero element in the to-be-processed matrix; combining the quantity of non-zero elements, values of all non-zero elements in the to-be-processed matrix arranged sequentially, and the distribution matrix, to obtain a compressed matrix of the to-be-processed matrix.

Type: Grant

Filed: May 8, 2020

Date of Patent: February 15, 2022

Assignee: HUAWEI TECHNOLOGIES CO., LTD.

Inventors: Zhenjiang Dong, Chio In Ieong, Hu Liu, Hai Chen
Fabric vectors for deep learning acceleration

Patent number: 11232347

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes various attributes of the fabric vector: length, microthreading eligibility, number of data elements to receive, transmit, and/or process in parallel, virtual channel and task identification information, whether to terminate upon receiving a control wavelet, and whether to mark an outgoing wavelet a control wavelet.

Type: Grant

Filed: April 17, 2018

Date of Patent: January 25, 2022

Assignee: Cerebras Systems Inc.

Inventors: Sean Lie, Michael Morrison, Michael Edwin James, Srikanth Arekapudi, Gary R. Lauterbach
System and method for cached convolution calculation

Patent number: 11074317

Abstract: A method includes identifying, using at least one processor, input words associated with a user query. The method also includes, for each of one or more of the input words that are contained in a high-frequency word set, retrieving pre-computed element-wise products associated with the input word from a cache. The method further includes performing, using the at least one processor, a convolution operation using the pre-computed element-wise products. In addition, the method includes generating, using the at least one processor, a response to the user query based on results of the convolution operation. The method may also include, for each of one or more of the input words that are not contained in the high-frequency word set, calculating additional element-wise products associated with the input word, and the convolution operation may be performed using the pre-computed element-wise products and the additional element-wise products.

Type: Grant

Filed: March 13, 2019

Date of Patent: July 27, 2021

Assignee: Samsung Electronics Co., Ltd.

Inventor: Duanduan Yang
Decoupled processor instruction window and operand buffer

Patent number: 11048517

Abstract: A processor core in an instruction block-based microarchitecture is configured so that an instruction window and operand buffers are decoupled for independent operation in which instructions in the block are not tied to resources such as control bits and operands that are maintained in the operand buffers. Instead, pointers are established among instructions in the block and the resources so that control state can be established for a refreshed instruction block (i.e., an instruction block that is reused without re-fetching it from an instruction cache) by following the pointers. Such decoupling of the instruction window from the operand space can provide greater processor efficiency, particularly in multiple core arrays where refreshing is utilized (for example when executing program code that uses tight loops), because the operands and control bits are pre-validated.

Type: Grant

Filed: June 24, 2019

Date of Patent: June 29, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Douglas C. Burger, Aaron Smith, Jan Gray
Tile subsystem and method for automated data flow and data processing within an integrated circuit architecture

Patent number: 11016810

Abstract: A system and method for a computing tile of a multi-tiled integrated circuit includes a plurality of distinct tile computing circuits, wherein each of the plurality of distinct tile computing circuits is configured to receive fixed-length instructions; a token-informed task scheduler that: tracks one or more of a plurality of distinct tokens emitted by one or more of the plurality of distinct tile computing circuits; and selects a distinct computation task of a plurality of distinct computation tasks based on the tracking; and a work queue buffer that: contains a plurality of distinct fixed-length instructions, wherein each one of the fixed-length instructions is associated with one of the plurality of distinct computation tasks; and transmits one of the plurality of distinct fixed-length instructions to one or more of the plurality of distinct tile computing circuits based on the selection of the distinct computation task by the token-informed task scheduler.

Type: Grant

Filed: November 24, 2020

Date of Patent: May 25, 2021

Assignee: Mythic, Inc.

Inventors: Malav Parikh, Sergio Schuler, Vimal Reddy, Zainab Zaidi, Paul Toth, Adam Caughron, Bryant Sorensen, Alex Dang-Tran, Scott Johnson, Raul Garibay, Andrew Morten, David Fick
Processor for detecting and preventing recognition error

Patent number: 10983878

Abstract: Provided is an image recognition processor. The image recognition processor includes a plurality of nano cores each configured to perform a pattern recognition operation and arranged in rows and columns, an instruction memory configured to provide instructions to the plurality of nano cores in a row unit, a feature memory configured to provide input features to the plurality of nano cores in a row unit, a kernel memory configured to provide a kernel coefficient to the plurality of nano cores in a column unit, and a difference checker configured to receive a result of the pattern recognition operation of each of the plurality of nano cores, detect whether there is an error by referring to the received result, and provide a fault tolerance function that allows an error below a predefined level.

Type: Grant

Filed: November 25, 2019

Date of Patent: April 20, 2021

Assignee: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE

Inventors: Jin Ho Han, Young-Su Kwon, Min-Seok Choi
Live migration of containers based on geo-location

Patent number: 10834226

Abstract: Embodiments of the present invention provide methods, systems, and computer program products for container communication. In an embodiment, it is determined whether a message is going to a container on a same machine or to a container on a machine at a geographically different location. If it is determined that the message is going to a container on a machine at a geographically different location, then it is determined whether a predetermined threshold has been reached. If it is determined that the predetermined threshold has been reached, then the container from a first machine is migrated to the container on the container on the machine at the geographically different location. A data tracking structure is used to visually represent the migration of containers to other machines.

Type: Grant

Filed: July 15, 2016

Date of Patent: November 10, 2020

Assignee: International Business Machines Corporation

Inventors: Rafael C. S. Folco, Breno H. Leitão, Desnes A. Nunes do Rosário, Jose F. Santiago Filho
Commit logic and precise exceptions in explicit dataflow graph execution architectures

Patent number: 10824429

Abstract: Systems and methods are disclosed for executing instructions with a block-based processor. Instructions can be executed in any order as their dependencies arrive, but the individual instructions are committed in a serial fashion. Further, exception handling can be performed by storing transient state for an instruction block and resuming by restoring the transient state. This allows programmers to see intermediate state for the instruction block before the subject block has committed. In one examples of the disclosed technology, a method of operating a processor executing a block-based instruction set architecture includes executing at least one instruction encoded for an instruction block, responsive to determining that an individual instruction of the instruction block can commit, advancing a commit frontier for the instruction block to include all instructions in the instruction block that can commit, and committing one or more instructions inside the advanced commit frontier.

Type: Grant

Filed: December 18, 2018

Date of Patent: November 3, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Gagan Gupta, David T. Harper
Deadlock free resource management in block based computing architectures

Patent number: 10783011

Abstract: Systems and methods are directed to efficient management of processor resources, particularly General Purpose Registers (GPRs), for example to minimize pipeline flushes prevent deadlocks by counting GPRs instead of allocating them to specific blocks of code. Blocks of code are allowed to execute if the Free GPRs count is adequate. The method contemplates counting the number of Register Writers in blocks of code which will write to GPRs which are in process of executing, and counting the GPRs which are available instead of merely allocating them to dedicated use by a block of code, or an instruction in a block of code. Because blocks do not run if there is not enough GPRs available for the block, deadlocks and pipeline flushes due to lack of resources can be minimized.

Type: Grant

Filed: September 21, 2017

Date of Patent: September 22, 2020

Assignee: Qualcomm Incorporated

Inventors: Vignyan Reddy Kothinti Naresh, Gregory Michael Wright
Shared memory access for a reconfigurable parallel processor with a plurality of chained memory ports

Patent number: 10776312

Abstract: Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) each having a plurality of arithmetic logic units (ALUs) that are configured to execute a same instruction in parallel threads and a plurality of memory ports (MPs) for the plurality of PEs to access a memory unit. Each of the plurality of MPs may comprise an address calculation unit configured to generate respective memory addresses for each thread to access a common area in the memory unit.

Type: Grant

Filed: March 13, 2018

Date of Patent: September 15, 2020

Assignee: AzurEngine Technologies Zhuhai Inc.

Inventors: Jianbin Zhu, Yuan Li
Private memory access for a reconfigurable parallel processor using a plurality of chained memory ports

Patent number: 10733139

Abstract: Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) and a plurality of memory ports (MPs) for the plurality of PEs to access a memory unit. Each PE may have a plurality of arithmetic logic units (ALUs) that are configured to execute a same instruction in parallel threads. Each of the plurality of MPs may comprise an address calculation unit configured to generate respective memory addresses for each thread to access a different memory bank in the memory unit.

Type: Grant

Filed: March 13, 2018

Date of Patent: August 4, 2020

Assignee: AzurEngine Technologies Zhuhai Inc.

Inventors: Yuan Li, Jianbin Zhu
Virtual architecture generating apparatus and method, and runtime system, multi-core system and methods of operating runtime system and multi-core system

Patent number: 10613910

Abstract: A virtual architecture generating apparatus and method, a runtime system, a multi-core system, and methods of operating the runtime system and the multi-core system may include analyzing a requirement of an application, a feature of the application, and a requirement of a system enabling an execution of the application, and include generating a virtual architecture corresponding to the application, based on a physical architecture of a reconfigurable processor, the analyzed requirements and the analyzed feature.

Type: Grant

Filed: July 11, 2017

Date of Patent: April 7, 2020

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Min Young Son, Shi Hwa Lee, Seung Won Lee, Jeong Joon Yoo, Jae Don Lee, Young Sam Shin, Hee Jin Ahn
Missing values imputation of sequential data

Patent number: 10592368

Abstract: A method and system of imputing corrupted sequential data is provided. A plurality of input data vectors of a sequential data is received. For each input data vector of the sequential data, the input data vector is corrupted. The corrupted input data vector is mapped to a staging hidden layer to create a staging vector. The input data vector is reconstructed based on the staging vector, to provide an output data vector. adjusted parameter of the staging hidden layer is iteratively trained until it is within a predetermined tolerance of a loss function. A next input data vector of the sequential data is predicted based on the staging vector. The predicted next input data vector is stored.

Type: Grant

Filed: October 26, 2017

Date of Patent: March 17, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Shi Jing Guo, Xiang Li, Hai Feng Liu, Jing Mei, Zhi Qiao, Guo Tong Xie, Shi Wan Zhao
Efficient parallel algorithm for integral image computation for many-core CPUs

Patent number: 10529049

Abstract: Techniques are provided herein for generating an integral image of an input image in parallel across the cores of a multi-core processor. The input image is split into a plurality of tiles, each of which is stored in a scratchpad memory associated with a distinct core. At each tile, a partial integral image of the tile is first computed over the tile, using a Single-Pass Algorithm. This is followed by aggregating partial sums belonging to subsets of tiles using a 2D Inclusive Parallel Prefix Algorithm. A summation is finally performed over the aggregated partial sums to generate the integral image over the entire input image.

Type: Grant

Filed: March 27, 2017

Date of Patent: January 7, 2020

Assignee: Oracle International Corporation

Inventors: Venkatanathan Varadarajan, Arun Raghavan, Sam Idicula, Nipun Agarwal
Method, program and system for simulating a 3D scene with a set of computing resources running in parallel

Patent number: 10504271

Abstract: The invention notably relates to a computer-implemented method for simulating a 3D scene. The simulation is carried out with a set of computing resources running in parallel. The method comprises partitioning a 3D scene into a plurality of zones. Each zone is sized to satisfy real-time computing constraint by one computing resource of the set. The method comprises assigning each zone of the plurality to a computing resource, computing an estimation of a load of each computing resource and determining whether one or more computing resources are over-loaded or under-loaded, computing, for each zone, a contribution of the zone to the load of the computing resource to which the zone is assigned, reassigning one or more zones of a computing resource that is over-loaded or under-loaded to another computing resource, the reassignment resulting from the computed contributions of the zones with a combinatorial optimization algorithm.

Type: Grant

Filed: September 28, 2017

Date of Patent: December 10, 2019

Assignee: DASSAULT SYSTEMES

Inventors: Malika Boulkenafed, Philippe Robert Felix Belmans
Interface circuitry for parallel computing architecture circuits

Patent number: 10437743

Abstract: The present embodiments relate to interface circuitry between a serial interface circuit and an array of processing elements in an integrated circuit. The interface circuitry may include a daisy chain of feeder circuits and a daisy chain of drain circuits. If desired, the interface circuitry may include multiple daisy chains of feeder circuits and/or multiple daisy chains of drain circuits. These multiple daisy chains of feeder circuits and drains circuits may be coupled in parallel, respectively. In some embodiments, the interface circuitry may include synchronization circuitry that is coupled between the daisy chains of drain circuits and the serial interface circuit. Pipeline register stages between feeder circuits and/or between drain circuits may enable the placement of the feeder circuits and/or the drain circuits spatially close to the processing elements of the array of processing elements.

Type: Grant

Filed: April 1, 2016

Date of Patent: October 8, 2019

Assignee: Altera Corporation

Inventors: Davor Capalija, Andrei Mihai Hagiescu Miriste, John Stuart Freeman, Alan Baker
Controlling execution of tasks in a series of operational processing by identifying processing units based on task command, task setting information, state of operational processing

Patent number: 10437650

Abstract: Provided is a processing apparatus, including: a plurality of processing unit; at least one or more data buffers that are connected between a first processing unit and a second processing unit and is able to store data output from the first processing unit and data input to the second processing unit; a command buffer that stores a task command specifying execution of a task to be executed in one or more specific processing units, the command buffer being able to output the task command to the processing unit; and a task control unit that is configured to control operational processing in the task, by controlling at least one of the data buffer and the command buffer, on the basis of the task command, task setting information representing the processing unit in which the task is executed, and information representing a state of operational processing in respective processing unit.

Type: Grant

Filed: June 11, 2015

Date of Patent: October 8, 2019

Assignee: NEC Corporation

Inventor: Tomoyoshi Kobori
Parallel decision tree processor architecture

Patent number: 10332008

Abstract: A decision tree multi-processor system includes a plurality of decision tree processors that access a common feature vector and execute one or more decision trees with respect to the common feature vector. A related method includes providing a common feature vector to a plurality of decision tree processors implemented within an on-chip decision tree scoring system, and executing, by the plurality of decision tree processors, a plurality off decision trees, by reference to the common feature vector. A related decision tree-walking system includes feature storage that stores a common feature vector and a plurality of decision tree processors that access the common feature vector from the feature storage and execute a plurality of decision trees by comparing threshold values of the decision trees to feature values within the common feature vector.

Type: Grant

Filed: March 17, 2014

Date of Patent: June 25, 2019

Assignee: Microsoft Technology Licensing, LLC

Inventors: Douglas C. Burger, James R. Larus, Andrew Putnam, Jan Gray
Method and apparatus to manage the direct interconnect switch wiring and growth in computer networks

Patent number: 10303640

Abstract: The present invention provides a method for managing the wiring and growth of a direct interconnect network implemented on a torus or higher radix interconnect structure based on an architecture that replaces the Network Interface Card (NIC) with PCIe switching cards housed in the server. Also provided is a passive patch panel for use in the implementation of the interconnect, comprising: a passive backplane that houses node to node connectivity for the interconnect; and at least one connector board plugged into the passive backplane comprising multiple connectors. The multiple connectors are capable of receiving an interconnecting plug to maintain the continuity of the torus or higher radix topology when not fully enabled. The PCIe card for use in the implementation of the interconnect comprises: at least 4 electrical or optical ports for the interconnect; a local switch; a processor with RAM and ROM memory; and a PCI interface.

Type: Grant

Filed: March 26, 2018

Date of Patent: May 28, 2019

Assignee: ROCKPORT NETWORKS INC.

Inventor: Dan Oprea
Hardware accelerator architecture for processing very-sparse and hyper-sparse matrix data

Patent number: 10146738

Abstract: An accelerator architecture for processing very-sparse and hyper-sparse matrix data is disclosed. A hardware accelerator comprises one or more tiles, each including a plurality of processing elements (PEs) and a data management unit (DMU). The PEs are to perform matrix operations involving very- or hyper-sparse matrices that are stored by a memory. The DMU is to provide the plurality of PEs access to the memory via an interface that is optimized to provide low-latency, parallel, random accesses to the memory. The PEs, via the DMU, perform the matrix operations by, issuing random access read requests for values of the one or more matrices, issuing random access read requests for values of one or more vectors serving as a second operand, and issuing random access write requests for values of one or more vectors serving as a result.

Type: Grant

Filed: December 31, 2016

Date of Patent: December 4, 2018

Assignee: Intel Corporation

Inventors: Eriko Nurvitadhi, Deborah Marr
Application data synchronization method and apparatus

Patent number: 10140118

Abstract: The present invention discloses an application data synchronization method and an apparatus. When a first operating system and a second operating system are installed in a terminal, and a first application and a second application that have a same function are installed on the first operating system and the second operating system respectively, the method includes: when the second application runs on the second operating system, performing the function by using second application data, and updating the second application data, where the second application data is updated according to first application data, and the first application data is updated when the first application runs on the first operating system to perform the function; where the first application data and the second application data are stored in the terminal. By using the solutions, sharing of data of a same application between different systems is more convenient and less time-consuming.

Type: Grant

Filed: March 19, 2014

Date of Patent: November 27, 2018

Assignee: Huawei Device (Dongguan) Co., Ltd.

Inventors: Xi Huang, Jianxin Ding, Huangwei Wu
Runtime reconfigurable dataflow processor with multi-port memory access module

Patent number: 10078620

Abstract: A processor includes a plurality of processing tiles, wherein each tile is configured at runtime to perforin a configurable operation. A first subset of tiles are configured to perform in a pipeline a first plurality of configurable operations in parallel. A second subset of tiles are configured to perform a second plurality of configurable operations in parallel with the first plurality of configurable operations. The process also includes a multi-port memory access module operably connected to the plurality of tiles via a data bus configured to control access to a memory and to provide data to two or more processing tiles simultaneously. The processor also includes a controller operably connected to the plurality of tiles and the multi-port memory access module via a runtime bus. The processor configures the tiles and the multi-port memory access module to execute a computation.

Type: Grant

Filed: May 24, 2012

Date of Patent: September 18, 2018

Assignee: NEW YORK UNIVERSITY

Inventors: Clément Farabet, Yann LeCun
Method and apparatus to manage the direct interconnect switch wiring and growth in computer networks

Patent number: 9965429

Abstract: The present invention provides a method for managing the wiring and growth of a direct interconnect network implemented on a torus or higher radix interconnect structure based on an architecture that replaces the Network Interface Card (NIC) with PCIe switching cards housed in the server. Also provided is a passive patch panel for use in the implementation of the interconnect, comprising: a passive backplane that houses node to node connectivity for the interconnect; and at least one connector board plugged into the passive backplane comprising multiple connectors. The multiple connectors are capable of receiving an interconnecting plug to maintain the continuity of the torus or higher radix topology when not fully enabled. The PCIe card for use in the implementation of the interconnect comprises: at least 4 electrical or optical ports for the interconnect; a local switch; a processor with RAM and ROM memory; and a PCI interface.

Type: Grant

Filed: August 29, 2014

Date of Patent: May 8, 2018

Assignee: ROCKPORT NETWORKS INC.

Inventor: Dan Oprea
Techniques to manage virtual classes for statistical tests

Patent number: 9798575

Abstract: Techniques to manage virtual classes for statistical tests are described. An apparatus may comprise a simulated data component to generate simulated data for a statistical test, statistics of the statistical test based on parameter vectors to follow a probability distribution, a statistic simulator component to simulate statistics for the parameter vectors from the simulated data with a distributed computing system comprising multiple nodes each having one or more processors capable of executing multiple threads, the simulation to occur by distribution of portions of the simulated data across the multiple nodes of the distributed computing system, and a distributed control engine to control task execution on the distributed portions of the simulated data on each node of the distributed computing system with a virtual software class arranged to coordinate task and sub-task operations across the nodes of the distributed computing system. Other embodiments are described and claimed.

Type: Grant

Filed: May 6, 2014

Date of Patent: October 24, 2017

Assignee: SAS Institute Inc.

Inventors: Xilong Chen, Mark Roland Little
Approach for more efficient use of computing resources while calculating cross product or its approximation for logistic regression on big data sets

Patent number: 9715481

Abstract: According to one technique, a modeling computer computes a Hessian matrix by determining whether an input matrix contains more than a threshold number of dense columns. If so, the modeling computer computes a sparsified version of the input matrix and uses the sparsified matrix to compute the Hessian. Otherwise, the modeling computer identifies which columns are dense and which columns are sparse. The modeling computer then partitions the input matrix by column density and uses sparse matrix format to store the sparse columns and dense matrix format to store the dense columns. The modeling computer then computes component parts which combine to form the Hessian, wherein component parts that rely on dense columns are computed using dense matrix multiplication and component parts that rely on sparse columns are computed using sparse matrix multiplication.

Type: Grant

Filed: March 9, 2015

Date of Patent: July 25, 2017

Assignee: Oracle International Corporation

Inventors: Dmitry Golovashkin, Uladzislau Sharanhovich, Vaishnavi Sashikanth
Multiple engine sequencer

Patent number: 9702305

Abstract: Multiple engine sequencers in memory interfaces are disclosed. Individual sequencer engines of multiple engine sequencers perform at least portions of their respective operations in parallel with other individual sequencer engine operations performed in the memory interface. In at least one embodiment, sequencer engine operations are performed at least partially concurrently with other sequencer engine operations in the memory interface.

Type: Grant

Filed: April 17, 2013

Date of Patent: July 11, 2017

Assignee: Micron Technology, Inc.

Inventors: William H. Radke, Laszlo Borbely, David Christopher Pruett
Image search device, image search method, program, and computer-readable storage medium

Patent number: 9588991

Abstract: An image search device includes a common memory and a plurality of parallel processors for executing a same instruction. The image search device transfers, from storage, a plurality of representative feature vectors, which respectively represent a plurality of clusters including a plurality of image feature vectors, stores, in the common memory, one or more query feature vectors extracted from an image serving as a query, calculates a distance between the plurality of transferred representative feature vectors and the query feature vector using the plurality of parallel processors, and selects one or more of a plurality of images based on a distance between the plurality of image feature vectors, which belong to the cluster selected by the calculated distance, and the query feature vector.

Type: Grant

Filed: November 25, 2011

Date of Patent: March 7, 2017

Assignee: RAKUTEN, INC.

Inventors: Ali Cevahir, Junji Torii
I/O routing in a multidimensional torus network

Patent number: 9565094

Abstract: A method, system and computer program product are disclosed for routing data packet in a computing system comprising a multidimensional torus compute node network including a multitude of compute nodes, and an I/O node network including a plurality of I/O nodes. In one embodiment, the method comprises assigning to each of the data packets a destination address identifying one of the compute nodes; providing each of the data packets with a toio value; routing the data packets through the compute node network to the destination addresses of the data packets; and when each of the data packets reaches the destination address assigned to said each data packet, routing said each data packet to one of the I/O nodes if the toio value of said each data packet is a specified value. In one embodiment, each of the data packets is also provided with an ioreturn value used to route the data packets through the compute node network.

Type: Grant

Filed: January 29, 2010

Date of Patent: February 7, 2017

Assignee: International Business Machines Corporation

Inventors: Dong Chen, Noel A. Eisley, Philip Heidelberger
Data processing device and data processing method thereof

Patent number: 9558151

Abstract: Disclosed is a data processing device capable of efficiently performing an arithmetic process on variable-length data and an arithmetic process on fixed-length data. The data processing device includes first PEs of SIMD type, SRAMs provided respectively for the first PEs, and second PEs. The first PEs each perform an arithmetic operation on data stored in a corresponding one of the SRAMs. The second PEs each perform an arithmetic operation on data stored in corresponding ones of the SRAMs. Therefore, the SRAMs can be shared so as to efficiently perform the arithmetic process on variable-length data and the arithmetic process on fixed-length data.

Type: Grant

Filed: February 3, 2012

Date of Patent: January 31, 2017

Assignee: RENESAS ELECTRONICS CORPORATION

Inventors: Kan Murata, Hideyuki Noda, Masaru Haraguchi
Data processing apparatus and method for performing segmented operations

Patent number: 9557995

Abstract: A data processing apparatus and method are provided for performing segmented operations. The data processing apparatus comprises a vector register store for storing vector operands, and vector processing circuitry providing N lanes of parallel processing, and arranged to perform a segmented operation on up to N data elements provided by a specified vector operand, each data element being allocated to one of the N lanes. The up to N data elements forms a plurality of segments, and performance of the segmented operation comprises performing a separate operation on the data elements of each segment, the separate operation involving interaction between the lanes containing the data elements of the associated segment.

Type: Grant

Filed: February 7, 2014

Date of Patent: January 31, 2017

Assignee: ARM Limited

Inventors: Mbou Eyole-Monono, Alastair David Reid, Matthias Lothar Böttcher, Giacomo Gabrielli
FPGA matrix architecture

Patent number: 9443269

Abstract: High volume data processing systems and methods are provided to enable ultra-low latency processing and distribution of data. The systems and methods can be implemented to service primary trading houses where microsecond delays can significantly impact performance and value. According to one aspect, the systems and methods are configured to process data from a variety of market data sources in a variety of formats, while maintaining target latencies of less than 1 microsecond. A matrix of FPGA nodes is configured to provide ultra-low latencies while enabling deterministic and distributed processing. In some embodiments, the matrix can be configured to provide consistent latencies even during micro burst conditions. Further book building operations (determination of current holdings and assets) can occur under ultra-low latency timing, providing for near instantaneous risk management, management, and execution processes, even under microburst conditions.

Type: Grant

Filed: February 15, 2013

Date of Patent: September 13, 2016

Assignee: NovaSparks, Inc.

Inventor: Marc Battyani
Variable clocked serial array processor

Patent number: 9329621

Abstract: A serial array processor may have an execution unit, which is comprised of a multiplicity of single bit arithmetic logic units (ALUs), and which may perform parallel operations on a subset of all the words in memory by serially accessing and processing them, one bit at a time, while an instruction unit of the processor is pre-fetching the next instruction, a word at a time, in a manner orthogonal to the execution unit.

Type: Grant

Filed: December 9, 2013

Date of Patent: May 3, 2016

Inventor: Laurence H. Cooke
Method for determining the structure of a hybrid computing system

Patent number: 9323537

Abstract: A method comprises measuring the execution time T1 for a problem to be solved with a program being run by a single processor, measuring the execution time TM and TS of MIMD and SIMD program fragments being run by a single processor and a single accelerator correspondingly, determining the specific acceleration ? of the execution time for an SIMD program fragment being run by a single accelerator in comparison with the execution time for the fragment being run by a single processor, determining a portion of the execution time for an MIMD fragment being run by a single processor and a portion of the execution time for an SIMD fragment being run by a single processor and adjusting the quantity of processors or accelerators comprised in a hybrid computing system structure according to the data obtained.

Type: Grant

Filed: October 13, 2011

Date of Patent: April 26, 2016

Assignee: Federal State Unitary Enterprise—AU—Russian Scientific Research Institute of Experimental Physics—FSUE RVNC—VNIIEF

Inventor: Sergey Alexandrovich Stepanenko
Device array topology configuration and source code partitioning for device arrays

Patent number: 9294097

Abstract: An array of field programmable gate array (FPGA) devices configured for execution of a source code. The array includes two or more FPGA devices, a host processor, and a host interface logic. The FPGA devices are configured to execute a parallelized portion of the source code partitioned among the FPGA devices based on data rates of computing elements of the source code, computational performance of the FPGA devices, the input/output (I/O) bandwidth of the FPGA devices. The FPGA devices include a memory bank addressable by a global memory address space for the array and an array interconnect that enables the computing elements executed by each of the FPGA devices to be programmed with a uniform address space of a global memory of the array and utilization of the global memory by the FPGA devices. The host interface logic connects the host processor with one of the FPGA devices.

Type: Grant

Filed: November 14, 2014

Date of Patent: March 22, 2016

Assignee: Scientific Concepts International Corporation

Inventor: Andrei V. Vassiliev
Volume rendering on shared memory systems with multiple processors by optimizing cache reuse

Patent number: 9269123

Abstract: A method, system and product are disclosed for volume rendering of medical images on a shared memory system implemented on a multi-socket mainboard with multiple multi-core processors and multiple last level caches, cores that share a cache being united in a socket. The method includes decomposing the image space to be used for rendering in regions, each region including a plurality of tiles; assigning two sockets to each of the decomposed regions; determining a tile enumeration scheme for a region; rendering all tiles within a region according to a determined tile enumeration scheme on the assigned two sockets until the respective region is finished; if a region is finished, assigning the two sockets to another region; and if no region is left, splitting an existing region of un-rendered tiles into sub-regions according to a splitting scheme and applying the steps recursively for the sub-regions.

Type: Grant

Filed: April 3, 2013

Date of Patent: February 23, 2016

Assignee: SIEMENS AKTIENGESELLSCHAFT

Inventor: Robert Schneider
Mobile terminal, controlling method thereof and recording medium thereof

Patent number: 9137848

Abstract: A mobile terminal and controlling method thereof are disclosed, by which an operable time of the mobile terminal can be increased in a manner of raising CPU power efficiency of the mobile terminal. The present invention includes a plurality of cores, a multicore adjuster configured to obtain a frequency of an active core of the plurality of cores, determine whether the obtained frequency exceeds a first threshold value for N consecutive times, wherein N is a positive integer, and activate at least one inactive core of the plurality of cores when the obtained frequency exceeds the first threshold value for N consecutive times, and a frequency adjuster configured to determine a workload of the active core, and adjust the obtained frequency of the active core according to the determined workload.

Type: Grant

Filed: January 8, 2013

Date of Patent: September 15, 2015

Assignee: LG ELECTRONICS INC.

Inventor: Hyunwoo Nho
Methods and systems for application cluster virtual nodes

Patent number: 9075809

Abstract: A method for creating an application cluster virtual node. The method may comprise identifying a plurality of nodes associated with an application cluster. The method may also comprise creating a virtual node that is associated with each node in the plurality of nodes. The method may comprise providing a data protection server with access to at least one node in the plurality of nodes. The access may be provided through the virtual node. A computer-readable medium is also disclosed.

Type: Grant

Filed: September 29, 2007

Date of Patent: July 7, 2015

Assignee: Symantec Corporation

Inventors: Sunil Shah, Ynn-Pying A. Tsaur, Sudhir Subbarao
Techniques to present hierarchical information using orthographic projections

Patent number: 9075493

Abstract: Techniques to present hierarchical information as orthographic projections are described. An apparatus may comprise an orthographic projection application arranged to manage a three dimensional orthographic projection of hierarchical information. The orthographic projection application may comprise a hierarchical information component operative to receive hierarchical information representing multiple nodes at different hierarchical levels, and parse the hierarchical information into a tree data structure, an orthographic generator component operative to generate a graphical tile for each node, arrange graphical tiles for each hierarchical level into graphical layers, and arrange the graphical layers in a vertical stack, and an orthographic presentation component operative to present a three dimensional orthographic projection of the hierarchical information with the stack of graphical layers each having multiple graphical tiles. Other embodiments are described and claimed.

Type: Grant

Filed: March 7, 2011

Date of Patent: July 7, 2015

Assignee: SAS INSTITUTE, INC.

Inventors: Lee Ann Sullivan, Jordan Riley Benson, Rajiv Ramarajan, Paul Hankey, Frank Lee Wimmer
METHOD AND APPARATUS FOR PROCESSING SHUFFLE INSTRUCTION

Publication number: 20150127924

Abstract: A method and corresponding apparatus for processing a shuffle instruction are provided. Shuffle units are configured in a hierarchical structure, and each of the shuffle units generates a shuffled data element array by performing shuffling on an input data element array. In the hierarchical structure, which includes an upper shuffle unit and a lower shuffle unit, the shuffled data element array output from the lower shuffle unit is input to the upper shuffle unit as a portion of the input data element array for the upper shuffle unit.

Type: Application

Filed: July 14, 2014

Publication date: May 7, 2015

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Keshava PRASAD, Navneet BASUTKAR, Young Hwan PARK, Ho YANG, Yeon Bok LEE
Message broadcast with router bypassing

Patent number: 9015448

Abstract: A processor and method for broadcasting data among a plurality of processing cores is disclosed. The processor includes a plurality of processing cores connected by point-to-point connections. A first of the processing cores includes a router that includes at least an allocation unit and an output port. The allocation unit is configured to determine that respective input buffers on at least two others of the processing cores are available to receive given data. The output port is usable by the router to send the given data across one of the point-to-point connections. The router is configured to send the given data contingent on determining that the respective input buffers are available. Furthermore, the processor is configured to deliver the data to the at least two other processing cores in response to the first processing core sending the data once across the point-to-point connection.

Type: Grant

Filed: June 17, 2010

Date of Patent: April 21, 2015

Assignee: Advanced Micro Devices, Inc.

Inventors: Tushar Krishna, Bradford M. Beckmann, Steven K. Reinhardt
System and method for LUN control management

Patent number: 9009444

Abstract: A method, computer program product, and computing system for receiving a reservation for a LUN from Host A, wherein the LUN is defined within a data array. A lock for the LUN is defined as Host A. A write request is received for the LUN from Host B. The lock for the LUN is defined as Transitioning A to B. The write request is delayed for a defined period of time.

Type: Grant

Filed: September 29, 2012

Date of Patent: April 14, 2015

Assignee: EMC Corporation

Inventors: Philip Derbeko, Arieh Don, Anat Eyal, Kevin F. Martin, Richard A. Trabing
CONFIGURABLE LOGIC INTEGRATED CIRCUIT HAVING A MULTIDIMENSIONAL STRUCTURE OF CONFIGURABLE ELEMENTS

Publication number: 20150100756

Abstract: An array processor composed of processor cells that are programmed by a controlling unit, and that are reprogrammed when a cell has finished a current data processing operation, even while other cell continue to process data with their current programming.

Type: Application

Filed: May 13, 2014

Publication date: April 9, 2015

Applicant: PACT XPP TECHNOLOGIES AG

Inventors: Martin Vorbach, Armin Nuckel
Scheduling start-up and shut-down of mainframe applications using topographical relationships

Patent number: 9003274

Abstract: The illustrative embodiments provide for a system and recordable type medium for representing actions in a data processing system. A table is generated. The table comprises a plurality of rows and columns. Ones of the columns represent corresponding ones of computer applications that can start or stop in parallel with each other in a data processing system. Ones of the rows represent corresponding ones of sequences of actions within a corresponding column. Additionally, the table represents a definition of relationships among memory address spaces, wherein the table represents when each particular address space is started or stopped during one of a start-up process, a recovery process, and a shut-down process. The resulting table is stored.

Type: Grant

Filed: December 21, 2007

Date of Patent: April 7, 2015

Assignee: International Business Machines Corporation

Inventor: Joseph John Katnic
System structuring method in multiprocessor system and switching execution environment by separating from or rejoining the primary execution environment

Patent number: 8935510

Abstract: For flexibly setting up an execution environment according to contents of processing to be executed while taking stability or a security level into consideration, the multiple processor system includes the execution environment main control unit 10 which determines CPU assignment at the time of deciding CPU assignment, the execution environment sub control unit 20 which controls starting, stopping and switching of an execution environment according to an instruction from the execution environment main control unit 10 to synchronize with the execution environment main control unit 10, and the execution environment management unit 30 which receives input of management information or reference refusal information of shared resources for each CPU 4 or each execution environment 100 to separate the execution environment main control unit 10 from the execution environment sub control units 20a through 20n, or the execution environment sub control units 20a through 20n from each other.

Type: Grant

Filed: November 1, 2007

Date of Patent: January 13, 2015

Assignee: NEC Corporation

Inventors: Hiroaki Inoue, Junji Sakai, Tsuyoshi Abe, Masato Edahiro

1 2 3 4 5 … next