Multimode (e.g., Mimd To Simd, Etc.) Patents (Class 712/20)

System and method for automatic dependency analysis for use with a multidimensional database

Patent number: 11922221

Abstract: In accordance with an embodiment, described herein is a system and method for dependency analysis for a calculation script in a multidimensional database computing environment. A multidimensional database cube aggregation can be represented as a lattice of blocks or cube, arranged according to a database outline (e.g., intra-dimensional or member hierarchy). When the multidimensional database system performs computations in parallel for a given calculation script, portions of the cube that can be computed concurrently are identified.

Type: Grant

Filed: September 23, 2021

Date of Patent: March 5, 2024

Assignee: ORACLE INTERNATIONAL CORPORATION

Inventors: Vinod Padinjat Menon, Kumar Ramaiyer
Scalable distributed computing system with deterministic communication

Patent number: 11860814

Abstract: A scalable multi-stage hypercube-based interconnection network with deterministic communication between two or more processing elements (“PEs”) or processing cores (“PCs”) arranged in a 2D-grid using vertical and horizontal buses (i.e., each bus is one or more wires) is disclosed. In one embodiment the buses are connected in pyramid network configuration. At each PE, the interconnection network comprises one or more switches (“interconnect”) with each switch concurrently capable to send and receive packets from one PE to another PE through the bus connected between them. Each packet comprises data token, routing information such as source and destination addresses of PEs and other information. Each PE, in addition to interconnect, comprises a processor and/or memory. In one embodiment the processor is a Central Processing Unit (“CPU”) comprises functional units that perform such as additions, multiplications, or logical operations, for executing computer programs.

Type: Grant

Filed: November 1, 2021

Date of Patent: January 2, 2024

Assignee: Konda Technologies Inc.

Inventor: Venkat Konda
Thread embedded cache management

Patent number: 11593167

Abstract: Methods and systems for locking a cache line of a cache. A cache line is locked based on a count of a plurality of threads that access the cache line and maintained in the cache until all of the plurality of threads have loaded the cache line.

Type: Grant

Filed: May 9, 2019

Date of Patent: February 28, 2023

Assignee: International Business Machines Corporation

Inventors: Changhoan Kim, John A. Gunnels
Artificial intelligence accelerator

Patent number: 11573705

Abstract: The present disclosure includes apparatuses and methods related to memory with an artificial intelligence (AI) accelerator. An example apparatus can include receive a command indicating that the apparatus operate in an artificial intelligence (AI) mode and perform AI operations using an AI accelerator based on a status of a number of register on the controller. The AI accelerator can include hardware, software, and or firmware that is configured to perform operations (e.g., logic operations, among other operations) associated with AI operations. The hardware can include circuitry configured as an adder and/or multiplier to perform operations, such as logic operations, associated with AI operations.

Type: Grant

Filed: August 28, 2019

Date of Patent: February 7, 2023

Assignee: Micron Technology, Inc.

Inventor: Alberto Troia
Packed data element predication processors, methods, systems, and instructions

Patent number: 11442734

Abstract: A processor includes a first mode where the processor is not to use packed data operation masking, and a second mode where the processor is to use packed data operation masking. A decode unit to decode an unmasked packed data instruction for a given packed data operation in the first mode, and to decode a masked packed data instruction for a masked version of the given packed data operation in the second mode. The instructions have a same instruction length. The masked instruction has bit(s) to specify a mask. Execution unit(s) are coupled with the decode unit. The execution unit(s), in response to the decode unit decoding the unmasked instruction in the first mode, to perform the given packed data operation. The execution unit(s), in response to the decode unit decoding the masked instruction in the second mode, to perform the masked version of the given packed data operation.

Type: Grant

Filed: March 29, 2021

Date of Patent: September 13, 2022

Assignee: Intel Corporation

Inventors: Bret L. Toll, Buford M. Guy, Ronak Singhal, Mishali Naik
Systems and methods for collecting, tracking, and storing system performance and event data for computing devices

Patent number: 11429506

Abstract: A system is configured to track and store system and event data for various computing devices. The system is configured to associate the various computing devices with profiles based at least in part on characteristics of the computing devices. The system is further configured to compare performance data and/or performance metrics for particular computing devices having a particular profile against all other devices that share the particular profile. The system then displays this comparison to a user of the particular computing device, substantially automatically diagnoses an issue with the particular computing device based on the performance and system event data, and/or enables the user to diagnose the problem based on the performance and system event data.

Type: Grant

Filed: November 19, 2020

Date of Patent: August 30, 2022

Assignee: Assurant, Inc.

Inventors: Dustin Brewer, Stuart Saunders, Cameron Hurst
Method and apparatus for dual issue multiply instructions

Patent number: 11294673

Abstract: A method is provided that includes performing, by a processor in response to a dual issue multiply instruction, multiplication of operands of the dual issue multiply instruction using multiplication units comprised in a data path of the processor and configured to operate together to determine a product of the operands, and storing, by the processor, the product in a storage location indicated by the dual issue multiply instruction.

Type: Grant

Filed: May 20, 2020

Date of Patent: April 5, 2022

Assignee: TEXAS INSTRUMENTS INCORPORATED

Inventors: Timothy David Anderson, Mujibur Rahman
Systems and methods for executing a fused multiply-add instruction for complex numbers

Patent number: 11023231

Abstract: Disclosed embodiments relate to executing a vector-complex fused multiply-add Instruction. In one example, a method includes fetching an instruction, a format of the instruction including an opcode, a first source operand identifier, a second source operand identifier, and a destination operand identifier, wherein each of the identifiers identifies a location storing a packed data comprising at least one complex number, decoding the instruction, retrieving data associated with the first and second source operand identifiers, and executing the decoded instruction to, for each packed data element position of the identified first and second source operands, cross-multiply the real and imaginary components to generate four products: a product of real components, a product of imaginary components, and two mixed products, generate a complex result by using the four products according to the instruction, and store a result to the corresponding position of the identified destination operand.

Type: Grant

Filed: October 1, 2016

Date of Patent: June 1, 2021

Assignee: Intel Corporation

Inventors: Roman S. Dubtsov, Robert Valentine, Jesus Corbal, Milind Girkar, Elmoustapha Ould-Ahmed-Vall
Programmable multiply-add array hardware

Patent number: 10970043

Abstract: An integrated circuit including a data architecture including N adders and N multipliers configured to receive operands. The data architecture receives instructions for selecting a data flow between the N multipliers and the N adders of the data architecture. The selected data flow includes the options: (1) a first data flow using the N multipliers and the N adders to provide a multiply-accumulate mode and (2) a second data flow to provide a multiply-reduce mode.

Type: Grant

Filed: May 28, 2020

Date of Patent: April 6, 2021

Assignee: ALIBABA GROUP HOLDING LIMITED

Inventors: Liang Han, Xiaowei Jiang
Systems and methods for collecting, tracking, and storing system performance and event data for computing devices

Patent number: 10872022

Abstract: A system is configured to track and store system and event data for various computing devices. The system is configured to associate the various computing devices with profiles based at least in part on characteristics of the computing devices. The system is further configured to compare performance data and/or performance metrics for particular computing devices having a particular profile against all other devices that share the particular profile. The system then displays this comparison to a user of the particular computing device, substantially automatically diagnoses an issue with the particular computing device based on the performance and system event data, and/or enables the user to diagnose the problem based on the performance and system event data.

Type: Grant

Filed: August 21, 2018

Date of Patent: December 22, 2020

Assignee: Assurant, Inc.

Inventors: Dustin Brewer, Stuart Saunders, Cameron Hurst
Processor and control method of processor for address generating and address displacement

Patent number: 10754652

Abstract: A processor includes: an address generating unit that, when an instruction decoded by a decoding unit is an instruction to execute arithmetic processing on a plurality of operand sets each including a plurality of operands that are objects of the arithmetic processing, in parallel a plurality of times, generates an address set corresponding to each of the operand sets of the arithmetic processing for each time, based on a certain address displacement with respect to the plurality of operands included in each of the operand sets; a plurality of instruction queues that hold the generated address sets corresponding to the respective operand sets, in correspondence to respective processing units; and a plurality of processing units that perform the arithmetic processing in parallel on the operand sets obtained based on the respective address sets outputted by the plurality of instruction queues.

Type: Grant

Filed: May 26, 2017

Date of Patent: August 25, 2020

Assignee: FUJITSU LIMITED

Inventors: Shuji Yamamura, Takumi Maruyama, Masato Nakagawa, Masahiro Kuramoto
System and method for generation of event driven, tuple-space based programs

Patent number: 10564949

Abstract: In a system for automatic generation of event-driven, tuple-space based programs from a sequential specification, a hierarchical mapping solution can target different runtimes relying on event-driven tasks (EDTs). The solution uses loop types to encode short, transitive relations among EDTs that can be evaluated efficiently at runtime. Specifically, permutable loops translate immediately into conservative point-to-point synchronizations of distance one. A runtime-agnostic which can be used to target the transformed code to different runtimes.

Type: Grant

Filed: September 22, 2014

Date of Patent: February 18, 2020

Assignee: Reservoir Labs, Inc.

Inventors: Muthu M. Baskaran, Thomas Henretty, M. H. Langston, Richard A. Lethin, Benoit J. Meister, Nicolas T. Vasilache, David E. Wohlford
Systems and methods for implementing an intelligence processing computing architecture

Patent number: 10521395

Abstract: Systems and methods include an integrated circuit that includes a plurality of computing tiles, wherein each of the plurality of computing tiles includes: a matrix multiply accelerator, a computing processing circuit; and a flow scoreboard module; a local data buffer, wherein the plurality of computing tiles together define an intelligence processing array; a network-on-chip system comprising: a plurality of network-on-chip routers establishing a communication network among the plurality of computing tiles, wherein each network-on-chip router is in operable communication connection with at least one of the plurality of computing tiles and a distinct network-on-chip router of the plurality of network-on-chip routers; and an off-tile buffer that is arranged in remote communication with the plurality of computing tiles, wherein the off-tile buffer stores raw input data and/or data received from an upstream process or an upstream device.

Type: Grant

Filed: July 1, 2019

Date of Patent: December 31, 2019

Assignee: Mythic, Inc.

Inventors: David Fick, Malav Parikh, Paul Toth, Adam Caughron, Vimal Reddy, Erik Schlanger, Sergio Schuler, Zainab Nasreen Zaidi, Alex Dang-Tran, Raul Garibay, Bryant Sorensen
Vector Galois Field Multiply Sum and Accumulate instruction

Patent number: 10338918

Abstract: A Vector Galois Field Multiply Sum and Accumulate instruction. Each element of a second operand of the instruction is multiplied in a Galois field with the corresponding element of the third operand to provide one or more products. The one or more products are exclusively ORed with each other and exclusively ORed with a corresponding element of a fourth operand of the instruction. The results are placed in a selected operand.

Type: Grant

Filed: June 5, 2017

Date of Patent: July 2, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Jonathan D. Bradbury
Method and device for generating configuration information of dynamic reconfigurable processor

Patent number: 10310894

Abstract: Provided is a method for generating configuration information of a dynamic reconfigurable processor. The dynamic reconfigurable processor includes a processing unit array, and the processing unit array includes a plurality of processing units. The method includes steps of: reading information of a task to be executed and generating an array configuration information top of the processing unit array according to the information; generating a plurality of processing unit configuration information corresponding to the plurality of processing units respectively according to the information; and assembling the array configuration information top and the plurality of processing unit configuration information.

Type: Grant

Filed: June 16, 2014

Date of Patent: June 4, 2019

Assignee: TSINGHUA UNIVERSITY

Inventors: Leibo Liu, Yansheng Wang, Guiqiang Peng, Zhaoshi Li, Shouyi Yin, Shaojun Wei
Smart tuple stream alteration

Patent number: 10296620

Abstract: A stream application receives a stream of tuples to be processed by a plurality of processing elements that are operating on one or more compute nodes. Each processing element has one or more stream operators. The stream application assigns one or more processing cycles to software code embedded in a tuple of the stream of tuples. The tuple obtains a first status of one or more first tuples of a set of targeted tuples to be modified by a tuple modification of a stream operator. The tuple obtains a second status of one or more second tuples of the set of targeted tuples after the stream operator performs the tuple modification. The tuple determines a potential degradation based on the first status and the second status. The tuple alters the one or more first tuples to prevent the tuple modification in response to the determined potential degradation.

Type: Grant

Filed: September 30, 2015

Date of Patent: May 21, 2019

Assignee: International Business Machines Corporation

Inventors: Bin Cao, Jessica R. Eidem, Brian R. Muras, Jingdong Sun
Method to control the number of active vector lanes for power efficiency

Patent number: 10175981

Abstract: The vector data path is divided into smaller vector lanes. The number of active vector lanes is controllable on the fly by the programmer to match the requirements of the executing program, and inactive vector lanes are powered down by the CPU to increase power efficiency of the vector processor.

Type: Grant

Filed: July 9, 2014

Date of Patent: January 8, 2019

Assignee: TEXAS INSTRUMENTS INCORPORATED

Inventors: Timothy David Anderson, Duc Quang Bui
Vector Galois field multiply sum and accumulate instruction

Patent number: 10146534

Abstract: A Vector Galois Field Multiply Sum and Accumulate instruction. Each element of a second operand of the instruction is multiplied in a Galois field with the corresponding element of the third operand to provide one or more products. The one or more products are exclusively ORed with each other and exclusively ORed with a corresponding element of a fourth operand of the instruction. The results are placed in a selected operand.

Type: Grant

Filed: October 6, 2016

Date of Patent: December 4, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Jonathan D. Bradbury
Processor and method for dynamically allocating processing elements to front end units using a plurality of registers

Patent number: 10120833

Abstract: Embodiments include a processor capable of supporting multi-mode and corresponding methods. The processor includes front end units, a number of processing elements more than a number of the front end units; and a controller configured to determine if thread divergence occurs due to conditional branching. If there is thread divergence, the processor may set control information to control processing elements using currently activated front end units. If there is not, the processor may set control information to control processing elements using a currently activated front end unit.

Type: Grant

Filed: January 28, 2014

Date of Patent: November 6, 2018

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Woong Seo, Yeon-Gon Cho, Soo-Jung Ryu
Method, system and device for protection against reverse engineering and/or tampering with programs

Patent number: 10095847

Abstract: Unauthorized use of computer programs is made difficult by compiling a processor rather than just compiling a program into machine code. The way in which the processor should respond to machine instructions, i.e. its translation data, is computed from an arbitrary bit string B and a program P as inputs. The translation data of a processor are computed that will execute operations defined by the program P when the processor uses the given bit string B as a source of machine instructions. A processor is configured so that it will execute machine instructions according to said translation data. Other programs P? may then be compiled into machine instructions B? for that processor and executed by the processor. Without knowledge of the bit string B and the original program P it is difficult to modify the machine instructions B? so that a different processor will execute the other program P?.

Type: Grant

Filed: May 17, 2013

Date of Patent: October 9, 2018

Assignee: KONINKLIJKE PHILIPS N.V.

Inventor: Willem Charles Mallon
Compressed instruction format

Patent number: 10095515

Abstract: A technique for decoding an instruction in a variable-length instruction set. In one embodiment, an instruction encoding is described, in which legacy, present, and future instruction set extensions are supported, and increased functionality is provided, without expanding the code size and, in some cases, reducing the code size.

Type: Grant

Filed: February 13, 2017

Date of Patent: October 9, 2018

Assignee: Intel Corporation

Inventors: Robert Valentine, Doron Orenstein, Bret L. Toll
Processors, methods, systems, and instructions to generate sequences of integers in which integers in consecutive positions differ by a constant integer stride and where a smallest integer is offset from zero by an integer offset

Patent number: 9898283

Abstract: A method of an aspect includes receiving an instruction. The instruction indicates an integer stride, indicates an integer offset, and indicates a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four integers in numerical order with a smallest one of the at least four integers differing from zero by the integer offset and with all integers of the sequence in consecutive positions differing by the integer stride. Other methods, apparatus, systems, and instructions are disclosed.

Type: Grant

Filed: December 22, 2011

Date of Patent: February 20, 2018

Assignee: Intel Corporation

Inventors: Seth Abraham, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Zeev Sperber, Amit Gradstein
Parallel signal processing system and method

Patent number: 9832543

Abstract: A system and method for processing a plurality of channels, for example audio channels, in parallel is provided. For example, a plurality of telephony channels are processed in order to detect and respond to call progress tones. The channels may be processed according to a common transform algorithm. Advantageously, a massively parallel architecture is employed, in which operations on many channels are synchronized, to achieve a high efficiency parallel processing environment. The parallel processor may be situated on a data bus, separate from a main general purpose processor, or integrated with the processor in a common board or integrated device. All, or a portion of a speech processing algorithm may also be performed in a massively parallel manner.

Type: Grant

Filed: June 16, 2014

Date of Patent: November 28, 2017

Assignee: Calltrol Corporation

Inventor: Wai Wu
Calculation device

Patent number: 9798305

Abstract: A calculation device includes a plurality of calculation processing units configured to perform different processes with each other, a plurality of calculators configured to perform a same calculation, and a control unit configured to control a number of the calculators to be operated during each of a plurality of divided periods based on a length of a predetermined processing period and a number of calculations to be performed, such that a number of data which is equal to a number of calculations is processed within a predetermined processing period, and that the number of the calculators to be operated during each of the plurality of divided periods is averaged, the divided periods being obtained by dividing up the predetermined processing period.

Type: Grant

Filed: November 10, 2014

Date of Patent: October 24, 2017

Assignee: OLYMPUS CORPORATION

Inventors: Kazue Chida, Akira Ueno
Video encoding and decoding using parallel processors

Patent number: 9747251

Abstract: A method is disclosed for the decoding and encoding of a block-based video bit-stream such as MPEG2, H.264-AVC, VC1, or VP6 using a system containing one or more high speed sequential processors, a homogenous array of software configurable general purpose parallel processors, and a high speed memory system to transfer data between processors or processor sets. This disclosure includes a method for load balancing between the two sets of processors.

Type: Grant

Filed: December 7, 2011

Date of Patent: August 29, 2017

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Jesse J. Rosenzweig, Brian Gregory Lewis
Data processing apparatus and method for processing a plurality of threads

Patent number: 9547530

Abstract: A data processing apparatus has processing circuitry for processing threads each having thread state data. The threads may be processed in thread groups, with each thread group comprising a number of threads processed in parallel with a common program executed for each thread. Several thread state storage regions are provided with fixed number of thread state entries for storing thread state data for a corresponding thread. At least two of the storage regions have different fixed numbers of entries. The processing circuitry processes as the same thread group threads having thread state data stored in the same storage region and processes threads having thread state data stored in different storage regions as different thread groups.

Type: Grant

Filed: November 1, 2013

Date of Patent: January 17, 2017

Assignee: ARM Limited

Inventor: David Hennah Mansell
Multi-input and binary reproducible, high bandwidth floating point adder in a collective network

Patent number: 9495131

Abstract: To add floating point numbers in a parallel computing system, a collective logic device receives the floating point numbers from computing nodes. The collective logic devices converts the floating point numbers to integer numbers. The collective logic device adds the integer numbers and generating a summation of the integer numbers. The collective logic device converts the summation to a floating point number. The collective logic device performs the receiving, the converting the floating point numbers, the adding, the generating and the converting the summation in one pass. One pass indicates that the computing nodes send inputs only once to the collective logic device and receive outputs only once from the collective logic device.

Type: Grant

Filed: March 9, 2015

Date of Patent: November 15, 2016

Assignee: International Business Machines Corporation

Inventors: Dong Chen, Noel A. Eisley, Philip Heidelberger, Burkhard Steinmacher-Burow
Vector floating point test data class immediate instruction

Patent number: 9471308

Abstract: A Vector Floating Point Test Data Class Immediate instruction is provided that determines whether one or more elements of a vector specified in the instruction are of one or more selected classes and signs. If a vector element is of a selected class and sign, an element in an operand of the instruction corresponding to the vector element is set to a first defined value, and if the vector element is not of the selected class and sign, the operand element corresponding to the vector element is set to a second defined value.

Type: Grant

Filed: January 23, 2013

Date of Patent: October 18, 2016

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jonathan D. Bradbury, Eric M. Schwarz
Device for offloading instructions and data from primary to secondary data path

Patent number: 9354893

Abstract: Provided is an information processing device including an instruction cache, a data cache, first and second arithmetic unit groups including a plurality of arithmetic units capable of parallel operation, a first arithmetic-control circuit that generates one or more operation instructions for the first arithmetic unit group, and a second arithmetic-control circuit that generates one or more operation instructions for the second arithmetic unit group based on an instruction code of a fixed instruction register. The first arithmetic unit group sets the instruction code to the fixed instruction register according to an operation instruction generated based on a first specific instruction code by the first arithmetic-control circuit, and provides data to the second arithmetic unit group according to an operation instruction generated based on a second specific instruction code by the first arithmetic-control circuit.

Type: Grant

Filed: May 29, 2012

Date of Patent: May 31, 2016

Assignee: Renesas Electronics Corporation

Inventors: Yuki Kobayashi, Shohei Nomoto
Expressing parallel execution relationships in a sequential programming language

Patent number: 9317290

Abstract: Circuits, methods, and apparatus that provide parallel execution relationships to be included in a function call or other appropriate portion of a command or instruction in a sequential programming language. One example provides a token-based method of expressing parallel execution relationships. Each process that can be executed in parallel is given a separate token. Later processes that depend on earlier processes wait to receive the appropriate token before being executed. In another example, counters are used in place to tokens to determine when a process is completed. Each function is a number of individual functions or threads, where each thread performs the same operation on a different piece of data. A counter is used to track the number of threads that have been executed. When each thread in the function has been executed, a later function that relies on data generated by the earlier function may be executed.

Type: Grant

Filed: January 7, 2013

Date of Patent: April 19, 2016

Assignee: NVIDIA Corporation

Inventors: Ian A. Buck, Bastiaan Aarts
Partitioning and repartitioning for data parallel operations

Patent number: 9251207

Abstract: A query that identifies an input data source is rewritten to contain data parallel operations that include partitioning and merging. The input data source is partitioned into a plurality of initial partitions. A parallel repartitioning operation is performed on the initial partitions to generate a plurality of secondary partitions. A parallel execution of the query is performed using the secondary partitions to generate a plurality of output sets. The plurality of output sets are merged into a merged output set.

Type: Grant

Filed: November 29, 2007

Date of Patent: February 2, 2016

Assignee: Microsoft Technology Licensing, LLC

Inventors: John Duffy, Edward G. Essey, Charles D. Callahan, II
Providing point to point communications among compute nodes in a global combining network of a parallel computer

Patent number: 9246792

Abstract: Methods, apparatus, and products are disclosed for providing point to point data communications among compute nodes in a global combining network of a parallel computer that include: determining a class route identifier available for all of the nodes along a communications path from an origin node to a target node; configuring network hardware of each node along the communications path with routing instructions in dependence upon the available class route identifier and the network's topology; transmitting, by the origin node along the communications path, a network packet to the target node, including encoding the available class route identifier in the network packet; and routing, by the network hardware of each node along the communications path, the network packet to the target node in dependence upon the routing instructions for each node and the available class route identifier.

Type: Grant

Filed: April 5, 2012

Date of Patent: January 26, 2016

Assignee: International Business Machines Corporation

Inventors: Charles J. Archer, Ahmad A. Faraj, Todd A. Inglett
Apparatus, method, and medium for controlling transmission of data

Patent number: 9225547

Abstract: A data processing apparatus can reduce an occupancy rate of a ring bus by suppressing occurrence of a stall packet, and can change a processing sequence. In the data processing apparatus, a buffer is provided in each communication unit connecting the ring bus and the associated processing unit. Transfer of data from the communication unit to the processing unit is controlled by an enable signal. Consequently, occurrence of a stall packet is suppressed. Accordingly, frequency of occurrence of a deadlock state is reduced by decreasing the occupancy rate of the ring bus.

Type: Grant

Filed: March 15, 2010

Date of Patent: December 29, 2015

Assignee: Canon Kabushiki Kaisha

Inventors: Yuji Hara, Hisashi Ishikawa, Akinobu Mori, Takeo Kimura, Hirowo Inoue
Physical manager of synchronization barrier between multiple processes

Patent number: 9218222

Abstract: A computer device with synchronization barrier including a memory and a processing unit capable of multiprocess processing on various processors and enabling the parallel execution of blocks by processes, the blocks being associated by groups in successive work steps. The device further includes a hardware circuit with a usable address space to the memory, capable of receiving a call from each process indicating the end of execution of a current block, each call comprising data. The hardware circuit is arranged to authorize the execution of blocks of a later work step when all the blocks of the current work step have been executed. The accessibility to the address space is achieved by segments drawn from the data of each call.

Type: Grant

Filed: November 27, 2009

Date of Patent: December 22, 2015

Assignee: BULL SAS

Inventors: Angelo Solinas, Jordan Chicheportiche, Saïd Derradji, Jean-Jacques Pairault, Zoltan Menyhart, Sylvain Jeaugey, Philippe Couvee
Cluster architecture for network security processing

Patent number: 9203865

Abstract: A computing device may be joined to a cluster by discovering the device, determining whether the device is eligible to join the cluster, configuring the device, and assigning the device a cluster role. A device may be assigned to act as a cluster master, backup master, active device, standby device, or another role. The cluster master may be configured to assign tasks, such as network flow processing to the cluster devices. The cluster master and backup master may maintain global, run-time synchronization data pertaining to each of the network flows, shared resources, cluster configuration, and the like. The devices within the cluster may monitor one another. Monitoring may include transmitting status messages comprising indicators of device health to the other devices in the cluster. In the event a device satisfies failover conditions, a failover operation to replace the device with another standby device, may be performed.

Type: Grant

Filed: March 4, 2013

Date of Patent: December 1, 2015

Assignee: WATCHGUARD TECHNOLOGIES, INC.

Inventors: Thomas Linden, James Huang, Jeff Hsu, Ming-Jeng Lee
Processor topology switches

Patent number: 9094317

Abstract: A first processor has a processor port for peer-to-peer processor communications. A switch provides for switching communications from a path between said first processor and a second processor to a path between said first processor and a third processor (and vice-versa).

Type: Grant

Filed: June 18, 2009

Date of Patent: July 28, 2015

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Martin Goldstein, Kamran H. Casim, Loren M. Koehler
Methods and apparatus for independent processor node operations in a SIMD array processor

Patent number: 9063722

Abstract: A control processor is used for fetching and distributing single instruction multiple data (SIMD) instructions to a plurality of processing elements (PEs). One of the SIMD instructions is a thread start (Tstart) instruction, which causes the control processor to pause its instruction fetching. A local PE instruction memory (PE Imem) is associated with each PE and contains local PE instructions for execution on the local PE. Local PE Imem fetch, decode, and execute logic are associated with each PE. Instruction path selection logic in each PE is used to select between control processor distributed instructions and local PE instructions fetched from the local PE Imem. Each PE is also initialized to receive control processor distributed instructions. In addition, local hold generation logic is associated with each PE. A PE receiving a Tstart instruction causes the instruction path selection logic to switch to fetch local PE Imem instructions.

Type: Grant

Filed: December 21, 2011

Date of Patent: June 23, 2015

Assignee: Altera Corporation

Inventors: Gerald George Pechanek, Edwin Franklin Barry, Mihailo Stojancic
PROCESSOR CAPABLE OF SUPPORTING MULTIMODE AND MULTIMODE SUPPORTING METHOD THEREOF

Publication number: 20150143081

Abstract: Embodiments include a processor capable of supporting multi-mode and corresponding methods. The processor includes front end units, a number of processing elements more than a number of the front end units; and a controller configured to determine if thread divergence occurs due to conditional branching. If there is thread divergence, the processor may set control information to control processing elements using currently activated front end units. If there is not, the processor may set control information to control processing elements using a currently activated front end unit.

Type: Application

Filed: January 27, 2015

Publication date: May 21, 2015

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Woong SEO, Yeon-Gon CHO, Soo-Jung RYU
INCORPORATING A SPATIAL ARRAY INTO ONE OR MORE PROGRAMMABLE PROCESSOR CORES

Publication number: 20150100757

Abstract: Functional units disposed in one or more processor cores are communicatively coupled using both a shared bypass network and a switched network. The shared bypass network enables the functional units to be operated conventionally for general processing while the switched network enables specialized processing in which the functional units are configured as a spatial array. In the spatial array configuration, operands produced by one functional unit can only be sent to a subset of functional units to which dependent instructions have been mapped a priori. The functional units may be dynamically reconfigured at runtime to toggle between operating in the general configuration and operating as the spatial array. Information to control the toggling between operating configurations may be provided in instructions received by the functional units.

Type: Application

Filed: April 14, 2014

Publication date: April 9, 2015

Applicant: Microsoft Corporation

Inventors: Douglas C. Burger, Aaron Smith, Milovan Duric
Scaling and managing work requests on a massively parallel machine

Patent number: 8918624

Abstract: A method, computer program product and computer system for scaling and managing requests on a massively parallel machine, such as one running in MIMD mode on a SIMD machine. A submit mux (multiplexer) is used to federate work requests and to forward the requests to the management node. A resource arbiter receives and manges these work requests. A MIMD job controller works with the resource arbiter to manage the work requests on the SIMD partition. The SIMD partition may utilize a mux of its own to federate the work requests and the computer nodes. Instructions are also provided to control and monitor the work requests.

Type: Grant

Filed: May 15, 2008

Date of Patent: December 23, 2014

Assignee: International Business Machines Corporation

Inventors: Paul V. Allen, Thomas A. Budnik, Mark G. Megerian, Samuel J. Miller
RECONFIGURABLE PROCESSOR AND OPERATION METHOD THEREOF

Publication number: 20140331025

Abstract: A reconfigurable processor and an operation method thereof are provided. The reconfigurable processor may include: a controller configured to control operations of a first mode, in which a first portion of a program that does not utilize loop acceleration is processed, and a second mode, in which a second portion for the program that utilizes the loop acceleration is processed, based on whether an instruction to control parallel operations of the first mode and the second mode is executed; and a shared register file configured to transfer data between the first mode and the second mode.

Type: Application

Filed: May 5, 2014

Publication date: November 6, 2014

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Ki-Seok KWON, Suk-Jin KIM
Method and apparatus for a general-purpose, multiple-core system for implementing stream-based computations

Patent number: 8843928

Abstract: A method and system of efficient use and programming of a multi-processing core device. The system includes a programming construct that is based on stream-domain code. A programmable core based computing device is disclosed. The computing device includes a plurality of processing cores coupled to each other. A memory stores stream-domain code including a stream defining a stream destination module and a stream source module. The stream source module places data values in the stream and the stream conveys data values from the stream source module to the stream destination module. A runtime system detects when the data values are available to the stream destination module and schedules the stream destination module for execution on one of the plurality of processing cores.

Type: Grant

Filed: January 21, 2011

Date of Patent: September 23, 2014

Assignee: QST Holdings, LLC

Inventors: Paul Master, Frederick Furtek
Processor with arbiter sending simultaneously requested instructions from processing elements in SIMD / MIMD modes

Patent number: 8719551

Abstract: The present invention provides an information processing apparatus and an integrated circuit which realize parallel execution of different processing systems, and which do not require the provision of a dedicated memory storing instructions for common processing The information processing apparatus comprises: a plurality of processor elements; an instruction memory storing a first program and a second program; and an arbiter interposed between the processor elements and the instruction memory, the arbiter receiving, from each of the processor elements, a request for an instruction, from among instructions included in the first program and the second program, and controlling access to the instruction memory by the processor elements, wherein the arbiter arbitrates requests made by the processor elements when the requests are (i) simultaneous requests for different instructions included in one of the first program and the second program or (ii) simultaneous requests for an instruction included in the first prog

Type: Grant

Filed: April 15, 2010

Date of Patent: May 6, 2014

Assignee: Panasonic Corporation

Inventor: Hideshi Nishida
Latency tolerant system for executing video processing operations

Patent number: 8687008

Abstract: A latency tolerant system for executing video processing operations. The system includes a host interface for implementing communication between the video processor and a host CPU, a scalar execution unit coupled to the host interface and configured to execute scalar video processing operations, and a vector execution unit coupled to the host interface and configured to execute vector video processing operations. A command FIFO is included for enabling the vector execution unit to operate on a demand driven basis by accessing the memory command FIFO. A memory interface is included for implementing communication between the video processor and a frame buffer memory. A DMA engine is built into the memory interface for implementing DMA transfers between a plurality of different memory locations and for loading the command FIFO with data and instructions for the vector execution unit.

Type: Grant

Filed: November 4, 2005

Date of Patent: April 1, 2014

Assignee: NVIDIA Corporation

Inventors: Ashish Karandikar, Shirish Gadre, Stephen D. Lew
Accounting apparatus and method for SMT processor

Patent number: 8683474

Abstract: In an accounting apparatus, a conflict determination unit determines whether or not the accounting mode is in a conflict state where a process is executing in another logical CPU and stores the determination result in an accounting information storage unit, when a process of the user starts to be executed in a logical CPU of an SMT processor. And a CPU use time acquisition unit collects the CPU use time of the process in the conflict state or the non-conflict state distinctively and stores it in an accounting information storage unit. Thereafter, a CPU use time conversion unit converts the CPU use time in the conflict state, with a predetermined weighting, based on the CPU use time in the conflict state and the non-conflict state, after the end of executing the process, and an accounting calculation unit calculates the accounting amount for the process from an effective use time.

Type: Grant

Filed: February 27, 2006

Date of Patent: March 25, 2014

Assignee: Fujitsu Limited

Inventors: Shuji Yamamura, Kouichi Kumon
Compiler for providing intrinsic supports for VLIW PAC processors with distributed register files and method thereof

Patent number: 8656376

Abstract: A method for providing intrinsic supports for a VLIW DSP processor with distributed register files comprises the steps of: generating a program representation with cluster information on instructions of the DSP processor, wherein the cluster information is provided by a program with cluster intrinsic coding; identifying data stream operations indicating parallel instruction sequences applied on different data sets in the program representation; identifying data sharing relations indicating data shared by the data stream operations in the program representation; identifying data aggregation relations indicating results aggregated from the data stream operations in the program representation; and performing register allocation for the DSP processor according to the identified data stream operations, the data sharing relations and the data aggregation relations.

Type: Grant

Filed: September 1, 2011

Date of Patent: February 18, 2014

Assignee: National Tsing Hua University

Inventors: Jenq Kuen Lee, Chi Bang Kuan
Packet draining from a scheduling hierarchy in a traffic manager of a network processor

Patent number: 8638805

Abstract: Described embodiments provide for restructuring a scheduling hierarchy of a network processor having a plurality of processing modules and a shared memory. The scheduling hierarchy schedules packets for transmission. The network processor generates tasks corresponding to each received packet associated with a data flow. A traffic manager receives tasks provided by one of the processing modules and determines a queue of the scheduling hierarchy corresponding to the task. The queue has a parent scheduler at each of one or more next levels of the scheduling hierarchy up to a root scheduler, forming a branch of the hierarchy. The traffic manager determines if the queue and one or more of the parent schedulers of the branch should be restructured. If so, the traffic manager drops subsequently received tasks for the branch, drains all tasks of the branch, and removes the corresponding nodes of the branch from the scheduling hierarchy.

Type: Grant

Filed: September 30, 2011

Date of Patent: January 28, 2014

Assignee: LSI Corporation

Inventors: Balakrishnan Sundararaman, Shashank Nemawarkar, David Sonnier, Shailendra Aulakh, Allen Vestal
Method and system for managing hardware resources to implement system functions using an adaptive computing architecture

Patent number: 8589660

Abstract: The present invention concerns a new category of integrated circuitry and a new methodology for adaptive or reconfigurable computing. The exemplary IC embodiment includes a plurality of heterogeneous computational elements coupled to an interconnection network. The plurality of heterogeneous computational elements include corresponding computational elements having fixed and differing architectures, such as fixed architectures for different functions such as memory, addition, multiplication, complex multiplication, subtraction, configuration, reconfiguration, control, input, output, and field programmability. In response to configuration information, the interconnection network is operative in real-time to configure and reconfigure the plurality of heterogeneous computational elements for a plurality of different functional modes, including linear algorithmic operations, non-linear algorithmic operations, finite state machine operations, memory operations, and bit-level manipulations.

Type: Grant

Filed: May 24, 2010

Date of Patent: November 19, 2013

Assignee: Altera Corporation

Inventors: Robert T. Plunkett, Ghobad Heidari, Paul L. Master
Dynamic load balancing of instructions for execution by heterogeneous processing engines

Patent number: 8578387

Abstract: An embodiment of a computing system is configured to process data using a multithreaded SIMD architecture that includes heterogeneous processing engines to execute a program. The program is constructed of various program instructions. A first type of the program instructions can only be executed by a first type of processing engine and a third type of program instructions can only be executed by a second type of processing engine. A second type of program instructions can be executed by the first and the second type of processing engines. An assignment unit may be configured to dynamically determine which of the two processing engines executes any program instructions of the second type in order to balance the workload between the heterogeneous processing engines.

Type: Grant

Filed: July 31, 2007

Date of Patent: November 5, 2013

Assignee: Nvidia Corporation

Inventors: Peter C. Mills, Stuart F. Oberman, John Erik Lindholm, Samuel Liu
Selectively isolating processor elements into subsets of processor elements

Patent number: 8532288

Abstract: A cryptographic engine for modulo N multiplication, which is structured as a plurality of almost identical, serially connected Processing Elements, is controlled so as to accept input in blocks that are smaller than the maximum capability of the engine in terms of bits multiplied at one time. The serially connected hardware is thus partitioned on the fly to process a variety of cryptographic key sizes while still maintaining all of the hardware in an active processing state.

Type: Grant

Filed: December 1, 2006

Date of Patent: September 10, 2013

Assignee: International Business Machines Corporation

Inventors: Camil Fayad, John K. Li, Siegfried K. H. Sutter, Phil C. Yeh

1 2 3 4 5 next