Multimode (e.g., Mimd To Simd, Etc.) Patents (Class 712/20)
  • Patent number: 11922221
    Abstract: In accordance with an embodiment, described herein is a system and method for dependency analysis for a calculation script in a multidimensional database computing environment. A multidimensional database cube aggregation can be represented as a lattice of blocks or cube, arranged according to a database outline (e.g., intra-dimensional or member hierarchy). When the multidimensional database system performs computations in parallel for a given calculation script, portions of the cube that can be computed concurrently are identified.
    Type: Grant
    Filed: September 23, 2021
    Date of Patent: March 5, 2024
    Assignee: ORACLE INTERNATIONAL CORPORATION
    Inventors: Vinod Padinjat Menon, Kumar Ramaiyer
  • Patent number: 11860814
    Abstract: A scalable multi-stage hypercube-based interconnection network with deterministic communication between two or more processing elements (“PEs”) or processing cores (“PCs”) arranged in a 2D-grid using vertical and horizontal buses (i.e., each bus is one or more wires) is disclosed. In one embodiment the buses are connected in pyramid network configuration. At each PE, the interconnection network comprises one or more switches (“interconnect”) with each switch concurrently capable to send and receive packets from one PE to another PE through the bus connected between them. Each packet comprises data token, routing information such as source and destination addresses of PEs and other information. Each PE, in addition to interconnect, comprises a processor and/or memory. In one embodiment the processor is a Central Processing Unit (“CPU”) comprises functional units that perform such as additions, multiplications, or logical operations, for executing computer programs.
    Type: Grant
    Filed: November 1, 2021
    Date of Patent: January 2, 2024
    Assignee: Konda Technologies Inc.
    Inventor: Venkat Konda
  • Patent number: 11593167
    Abstract: Methods and systems for locking a cache line of a cache. A cache line is locked based on a count of a plurality of threads that access the cache line and maintained in the cache until all of the plurality of threads have loaded the cache line.
    Type: Grant
    Filed: May 9, 2019
    Date of Patent: February 28, 2023
    Assignee: International Business Machines Corporation
    Inventors: Changhoan Kim, John A. Gunnels
  • Patent number: 11573705
    Abstract: The present disclosure includes apparatuses and methods related to memory with an artificial intelligence (AI) accelerator. An example apparatus can include receive a command indicating that the apparatus operate in an artificial intelligence (AI) mode and perform AI operations using an AI accelerator based on a status of a number of register on the controller. The AI accelerator can include hardware, software, and or firmware that is configured to perform operations (e.g., logic operations, among other operations) associated with AI operations. The hardware can include circuitry configured as an adder and/or multiplier to perform operations, such as logic operations, associated with AI operations.
    Type: Grant
    Filed: August 28, 2019
    Date of Patent: February 7, 2023
    Assignee: Micron Technology, Inc.
    Inventor: Alberto Troia
  • Patent number: 11442734
    Abstract: A processor includes a first mode where the processor is not to use packed data operation masking, and a second mode where the processor is to use packed data operation masking. A decode unit to decode an unmasked packed data instruction for a given packed data operation in the first mode, and to decode a masked packed data instruction for a masked version of the given packed data operation in the second mode. The instructions have a same instruction length. The masked instruction has bit(s) to specify a mask. Execution unit(s) are coupled with the decode unit. The execution unit(s), in response to the decode unit decoding the unmasked instruction in the first mode, to perform the given packed data operation. The execution unit(s), in response to the decode unit decoding the masked instruction in the second mode, to perform the masked version of the given packed data operation.
    Type: Grant
    Filed: March 29, 2021
    Date of Patent: September 13, 2022
    Assignee: Intel Corporation
    Inventors: Bret L. Toll, Buford M. Guy, Ronak Singhal, Mishali Naik
  • Patent number: 11429506
    Abstract: A system is configured to track and store system and event data for various computing devices. The system is configured to associate the various computing devices with profiles based at least in part on characteristics of the computing devices. The system is further configured to compare performance data and/or performance metrics for particular computing devices having a particular profile against all other devices that share the particular profile. The system then displays this comparison to a user of the particular computing device, substantially automatically diagnoses an issue with the particular computing device based on the performance and system event data, and/or enables the user to diagnose the problem based on the performance and system event data.
    Type: Grant
    Filed: November 19, 2020
    Date of Patent: August 30, 2022
    Assignee: Assurant, Inc.
    Inventors: Dustin Brewer, Stuart Saunders, Cameron Hurst
  • Patent number: 11294673
    Abstract: A method is provided that includes performing, by a processor in response to a dual issue multiply instruction, multiplication of operands of the dual issue multiply instruction using multiplication units comprised in a data path of the processor and configured to operate together to determine a product of the operands, and storing, by the processor, the product in a storage location indicated by the dual issue multiply instruction.
    Type: Grant
    Filed: May 20, 2020
    Date of Patent: April 5, 2022
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Timothy David Anderson, Mujibur Rahman
  • Patent number: 11023231
    Abstract: Disclosed embodiments relate to executing a vector-complex fused multiply-add Instruction. In one example, a method includes fetching an instruction, a format of the instruction including an opcode, a first source operand identifier, a second source operand identifier, and a destination operand identifier, wherein each of the identifiers identifies a location storing a packed data comprising at least one complex number, decoding the instruction, retrieving data associated with the first and second source operand identifiers, and executing the decoded instruction to, for each packed data element position of the identified first and second source operands, cross-multiply the real and imaginary components to generate four products: a product of real components, a product of imaginary components, and two mixed products, generate a complex result by using the four products according to the instruction, and store a result to the corresponding position of the identified destination operand.
    Type: Grant
    Filed: October 1, 2016
    Date of Patent: June 1, 2021
    Assignee: Intel Corporation
    Inventors: Roman S. Dubtsov, Robert Valentine, Jesus Corbal, Milind Girkar, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 10970043
    Abstract: An integrated circuit including a data architecture including N adders and N multipliers configured to receive operands. The data architecture receives instructions for selecting a data flow between the N multipliers and the N adders of the data architecture. The selected data flow includes the options: (1) a first data flow using the N multipliers and the N adders to provide a multiply-accumulate mode and (2) a second data flow to provide a multiply-reduce mode.
    Type: Grant
    Filed: May 28, 2020
    Date of Patent: April 6, 2021
    Assignee: ALIBABA GROUP HOLDING LIMITED
    Inventors: Liang Han, Xiaowei Jiang
  • Patent number: 10872022
    Abstract: A system is configured to track and store system and event data for various computing devices. The system is configured to associate the various computing devices with profiles based at least in part on characteristics of the computing devices. The system is further configured to compare performance data and/or performance metrics for particular computing devices having a particular profile against all other devices that share the particular profile. The system then displays this comparison to a user of the particular computing device, substantially automatically diagnoses an issue with the particular computing device based on the performance and system event data, and/or enables the user to diagnose the problem based on the performance and system event data.
    Type: Grant
    Filed: August 21, 2018
    Date of Patent: December 22, 2020
    Assignee: Assurant, Inc.
    Inventors: Dustin Brewer, Stuart Saunders, Cameron Hurst
  • Patent number: 10754652
    Abstract: A processor includes: an address generating unit that, when an instruction decoded by a decoding unit is an instruction to execute arithmetic processing on a plurality of operand sets each including a plurality of operands that are objects of the arithmetic processing, in parallel a plurality of times, generates an address set corresponding to each of the operand sets of the arithmetic processing for each time, based on a certain address displacement with respect to the plurality of operands included in each of the operand sets; a plurality of instruction queues that hold the generated address sets corresponding to the respective operand sets, in correspondence to respective processing units; and a plurality of processing units that perform the arithmetic processing in parallel on the operand sets obtained based on the respective address sets outputted by the plurality of instruction queues.
    Type: Grant
    Filed: May 26, 2017
    Date of Patent: August 25, 2020
    Assignee: FUJITSU LIMITED
    Inventors: Shuji Yamamura, Takumi Maruyama, Masato Nakagawa, Masahiro Kuramoto
  • Patent number: 10564949
    Abstract: In a system for automatic generation of event-driven, tuple-space based programs from a sequential specification, a hierarchical mapping solution can target different runtimes relying on event-driven tasks (EDTs). The solution uses loop types to encode short, transitive relations among EDTs that can be evaluated efficiently at runtime. Specifically, permutable loops translate immediately into conservative point-to-point synchronizations of distance one. A runtime-agnostic which can be used to target the transformed code to different runtimes.
    Type: Grant
    Filed: September 22, 2014
    Date of Patent: February 18, 2020
    Assignee: Reservoir Labs, Inc.
    Inventors: Muthu M. Baskaran, Thomas Henretty, M. H. Langston, Richard A. Lethin, Benoit J. Meister, Nicolas T. Vasilache, David E. Wohlford
  • Patent number: 10521395
    Abstract: Systems and methods include an integrated circuit that includes a plurality of computing tiles, wherein each of the plurality of computing tiles includes: a matrix multiply accelerator, a computing processing circuit; and a flow scoreboard module; a local data buffer, wherein the plurality of computing tiles together define an intelligence processing array; a network-on-chip system comprising: a plurality of network-on-chip routers establishing a communication network among the plurality of computing tiles, wherein each network-on-chip router is in operable communication connection with at least one of the plurality of computing tiles and a distinct network-on-chip router of the plurality of network-on-chip routers; and an off-tile buffer that is arranged in remote communication with the plurality of computing tiles, wherein the off-tile buffer stores raw input data and/or data received from an upstream process or an upstream device.
    Type: Grant
    Filed: July 1, 2019
    Date of Patent: December 31, 2019
    Assignee: Mythic, Inc.
    Inventors: David Fick, Malav Parikh, Paul Toth, Adam Caughron, Vimal Reddy, Erik Schlanger, Sergio Schuler, Zainab Nasreen Zaidi, Alex Dang-Tran, Raul Garibay, Bryant Sorensen
  • Patent number: 10338918
    Abstract: A Vector Galois Field Multiply Sum and Accumulate instruction. Each element of a second operand of the instruction is multiplied in a Galois field with the corresponding element of the third operand to provide one or more products. The one or more products are exclusively ORed with each other and exclusively ORed with a corresponding element of a fourth operand of the instruction. The results are placed in a selected operand.
    Type: Grant
    Filed: June 5, 2017
    Date of Patent: July 2, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Jonathan D. Bradbury
  • Patent number: 10310894
    Abstract: Provided is a method for generating configuration information of a dynamic reconfigurable processor. The dynamic reconfigurable processor includes a processing unit array, and the processing unit array includes a plurality of processing units. The method includes steps of: reading information of a task to be executed and generating an array configuration information top of the processing unit array according to the information; generating a plurality of processing unit configuration information corresponding to the plurality of processing units respectively according to the information; and assembling the array configuration information top and the plurality of processing unit configuration information.
    Type: Grant
    Filed: June 16, 2014
    Date of Patent: June 4, 2019
    Assignee: TSINGHUA UNIVERSITY
    Inventors: Leibo Liu, Yansheng Wang, Guiqiang Peng, Zhaoshi Li, Shouyi Yin, Shaojun Wei
  • Patent number: 10296620
    Abstract: A stream application receives a stream of tuples to be processed by a plurality of processing elements that are operating on one or more compute nodes. Each processing element has one or more stream operators. The stream application assigns one or more processing cycles to software code embedded in a tuple of the stream of tuples. The tuple obtains a first status of one or more first tuples of a set of targeted tuples to be modified by a tuple modification of a stream operator. The tuple obtains a second status of one or more second tuples of the set of targeted tuples after the stream operator performs the tuple modification. The tuple determines a potential degradation based on the first status and the second status. The tuple alters the one or more first tuples to prevent the tuple modification in response to the determined potential degradation.
    Type: Grant
    Filed: September 30, 2015
    Date of Patent: May 21, 2019
    Assignee: International Business Machines Corporation
    Inventors: Bin Cao, Jessica R. Eidem, Brian R. Muras, Jingdong Sun
  • Patent number: 10175981
    Abstract: The vector data path is divided into smaller vector lanes. The number of active vector lanes is controllable on the fly by the programmer to match the requirements of the executing program, and inactive vector lanes are powered down by the CPU to increase power efficiency of the vector processor.
    Type: Grant
    Filed: July 9, 2014
    Date of Patent: January 8, 2019
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Timothy David Anderson, Duc Quang Bui
  • Patent number: 10146534
    Abstract: A Vector Galois Field Multiply Sum and Accumulate instruction. Each element of a second operand of the instruction is multiplied in a Galois field with the corresponding element of the third operand to provide one or more products. The one or more products are exclusively ORed with each other and exclusively ORed with a corresponding element of a fourth operand of the instruction. The results are placed in a selected operand.
    Type: Grant
    Filed: October 6, 2016
    Date of Patent: December 4, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Jonathan D. Bradbury
  • Patent number: 10120833
    Abstract: Embodiments include a processor capable of supporting multi-mode and corresponding methods. The processor includes front end units, a number of processing elements more than a number of the front end units; and a controller configured to determine if thread divergence occurs due to conditional branching. If there is thread divergence, the processor may set control information to control processing elements using currently activated front end units. If there is not, the processor may set control information to control processing elements using a currently activated front end unit.
    Type: Grant
    Filed: January 28, 2014
    Date of Patent: November 6, 2018
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Woong Seo, Yeon-Gon Cho, Soo-Jung Ryu
  • Patent number: 10095847
    Abstract: Unauthorized use of computer programs is made difficult by compiling a processor rather than just compiling a program into machine code. The way in which the processor should respond to machine instructions, i.e. its translation data, is computed from an arbitrary bit string B and a program P as inputs. The translation data of a processor are computed that will execute operations defined by the program P when the processor uses the given bit string B as a source of machine instructions. A processor is configured so that it will execute machine instructions according to said translation data. Other programs P? may then be compiled into machine instructions B? for that processor and executed by the processor. Without knowledge of the bit string B and the original program P it is difficult to modify the machine instructions B? so that a different processor will execute the other program P?.
    Type: Grant
    Filed: May 17, 2013
    Date of Patent: October 9, 2018
    Assignee: KONINKLIJKE PHILIPS N.V.
    Inventor: Willem Charles Mallon
  • Patent number: 10095515
    Abstract: A technique for decoding an instruction in a variable-length instruction set. In one embodiment, an instruction encoding is described, in which legacy, present, and future instruction set extensions are supported, and increased functionality is provided, without expanding the code size and, in some cases, reducing the code size.
    Type: Grant
    Filed: February 13, 2017
    Date of Patent: October 9, 2018
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Doron Orenstein, Bret L. Toll
  • Patent number: 9898283
    Abstract: A method of an aspect includes receiving an instruction. The instruction indicates an integer stride, indicates an integer offset, and indicates a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four integers in numerical order with a smallest one of the at least four integers differing from zero by the integer offset and with all integers of the sequence in consecutive positions differing by the integer stride. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Grant
    Filed: December 22, 2011
    Date of Patent: February 20, 2018
    Assignee: Intel Corporation
    Inventors: Seth Abraham, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Zeev Sperber, Amit Gradstein
  • Patent number: 9832543
    Abstract: A system and method for processing a plurality of channels, for example audio channels, in parallel is provided. For example, a plurality of telephony channels are processed in order to detect and respond to call progress tones. The channels may be processed according to a common transform algorithm. Advantageously, a massively parallel architecture is employed, in which operations on many channels are synchronized, to achieve a high efficiency parallel processing environment. The parallel processor may be situated on a data bus, separate from a main general purpose processor, or integrated with the processor in a common board or integrated device. All, or a portion of a speech processing algorithm may also be performed in a massively parallel manner.
    Type: Grant
    Filed: June 16, 2014
    Date of Patent: November 28, 2017
    Assignee: Calltrol Corporation
    Inventor: Wai Wu
  • Patent number: 9798305
    Abstract: A calculation device includes a plurality of calculation processing units configured to perform different processes with each other, a plurality of calculators configured to perform a same calculation, and a control unit configured to control a number of the calculators to be operated during each of a plurality of divided periods based on a length of a predetermined processing period and a number of calculations to be performed, such that a number of data which is equal to a number of calculations is processed within a predetermined processing period, and that the number of the calculators to be operated during each of the plurality of divided periods is averaged, the divided periods being obtained by dividing up the predetermined processing period.
    Type: Grant
    Filed: November 10, 2014
    Date of Patent: October 24, 2017
    Assignee: OLYMPUS CORPORATION
    Inventors: Kazue Chida, Akira Ueno
  • Patent number: 9747251
    Abstract: A method is disclosed for the decoding and encoding of a block-based video bit-stream such as MPEG2, H.264-AVC, VC1, or VP6 using a system containing one or more high speed sequential processors, a homogenous array of software configurable general purpose parallel processors, and a high speed memory system to transfer data between processors or processor sets. This disclosure includes a method for load balancing between the two sets of processors.
    Type: Grant
    Filed: December 7, 2011
    Date of Patent: August 29, 2017
    Assignee: AMAZON TECHNOLOGIES, INC.
    Inventors: Jesse J. Rosenzweig, Brian Gregory Lewis
  • Patent number: 9547530
    Abstract: A data processing apparatus has processing circuitry for processing threads each having thread state data. The threads may be processed in thread groups, with each thread group comprising a number of threads processed in parallel with a common program executed for each thread. Several thread state storage regions are provided with fixed number of thread state entries for storing thread state data for a corresponding thread. At least two of the storage regions have different fixed numbers of entries. The processing circuitry processes as the same thread group threads having thread state data stored in the same storage region and processes threads having thread state data stored in different storage regions as different thread groups.
    Type: Grant
    Filed: November 1, 2013
    Date of Patent: January 17, 2017
    Assignee: ARM Limited
    Inventor: David Hennah Mansell
  • Patent number: 9495131
    Abstract: To add floating point numbers in a parallel computing system, a collective logic device receives the floating point numbers from computing nodes. The collective logic devices converts the floating point numbers to integer numbers. The collective logic device adds the integer numbers and generating a summation of the integer numbers. The collective logic device converts the summation to a floating point number. The collective logic device performs the receiving, the converting the floating point numbers, the adding, the generating and the converting the summation in one pass. One pass indicates that the computing nodes send inputs only once to the collective logic device and receive outputs only once from the collective logic device.
    Type: Grant
    Filed: March 9, 2015
    Date of Patent: November 15, 2016
    Assignee: International Business Machines Corporation
    Inventors: Dong Chen, Noel A. Eisley, Philip Heidelberger, Burkhard Steinmacher-Burow
  • Patent number: 9471308
    Abstract: A Vector Floating Point Test Data Class Immediate instruction is provided that determines whether one or more elements of a vector specified in the instruction are of one or more selected classes and signs. If a vector element is of a selected class and sign, an element in an operand of the instruction corresponding to the vector element is set to a first defined value, and if the vector element is not of the selected class and sign, the operand element corresponding to the vector element is set to a second defined value.
    Type: Grant
    Filed: January 23, 2013
    Date of Patent: October 18, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Eric M. Schwarz
  • Patent number: 9354893
    Abstract: Provided is an information processing device including an instruction cache, a data cache, first and second arithmetic unit groups including a plurality of arithmetic units capable of parallel operation, a first arithmetic-control circuit that generates one or more operation instructions for the first arithmetic unit group, and a second arithmetic-control circuit that generates one or more operation instructions for the second arithmetic unit group based on an instruction code of a fixed instruction register. The first arithmetic unit group sets the instruction code to the fixed instruction register according to an operation instruction generated based on a first specific instruction code by the first arithmetic-control circuit, and provides data to the second arithmetic unit group according to an operation instruction generated based on a second specific instruction code by the first arithmetic-control circuit.
    Type: Grant
    Filed: May 29, 2012
    Date of Patent: May 31, 2016
    Assignee: Renesas Electronics Corporation
    Inventors: Yuki Kobayashi, Shohei Nomoto
  • Patent number: 9317290
    Abstract: Circuits, methods, and apparatus that provide parallel execution relationships to be included in a function call or other appropriate portion of a command or instruction in a sequential programming language. One example provides a token-based method of expressing parallel execution relationships. Each process that can be executed in parallel is given a separate token. Later processes that depend on earlier processes wait to receive the appropriate token before being executed. In another example, counters are used in place to tokens to determine when a process is completed. Each function is a number of individual functions or threads, where each thread performs the same operation on a different piece of data. A counter is used to track the number of threads that have been executed. When each thread in the function has been executed, a later function that relies on data generated by the earlier function may be executed.
    Type: Grant
    Filed: January 7, 2013
    Date of Patent: April 19, 2016
    Assignee: NVIDIA Corporation
    Inventors: Ian A. Buck, Bastiaan Aarts
  • Patent number: 9251207
    Abstract: A query that identifies an input data source is rewritten to contain data parallel operations that include partitioning and merging. The input data source is partitioned into a plurality of initial partitions. A parallel repartitioning operation is performed on the initial partitions to generate a plurality of secondary partitions. A parallel execution of the query is performed using the secondary partitions to generate a plurality of output sets. The plurality of output sets are merged into a merged output set.
    Type: Grant
    Filed: November 29, 2007
    Date of Patent: February 2, 2016
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: John Duffy, Edward G. Essey, Charles D. Callahan, II
  • Patent number: 9246792
    Abstract: Methods, apparatus, and products are disclosed for providing point to point data communications among compute nodes in a global combining network of a parallel computer that include: determining a class route identifier available for all of the nodes along a communications path from an origin node to a target node; configuring network hardware of each node along the communications path with routing instructions in dependence upon the available class route identifier and the network's topology; transmitting, by the origin node along the communications path, a network packet to the target node, including encoding the available class route identifier in the network packet; and routing, by the network hardware of each node along the communications path, the network packet to the target node in dependence upon the routing instructions for each node and the available class route identifier.
    Type: Grant
    Filed: April 5, 2012
    Date of Patent: January 26, 2016
    Assignee: International Business Machines Corporation
    Inventors: Charles J. Archer, Ahmad A. Faraj, Todd A. Inglett
  • Patent number: 9225547
    Abstract: A data processing apparatus can reduce an occupancy rate of a ring bus by suppressing occurrence of a stall packet, and can change a processing sequence. In the data processing apparatus, a buffer is provided in each communication unit connecting the ring bus and the associated processing unit. Transfer of data from the communication unit to the processing unit is controlled by an enable signal. Consequently, occurrence of a stall packet is suppressed. Accordingly, frequency of occurrence of a deadlock state is reduced by decreasing the occupancy rate of the ring bus.
    Type: Grant
    Filed: March 15, 2010
    Date of Patent: December 29, 2015
    Assignee: Canon Kabushiki Kaisha
    Inventors: Yuji Hara, Hisashi Ishikawa, Akinobu Mori, Takeo Kimura, Hirowo Inoue
  • Patent number: 9218222
    Abstract: A computer device with synchronization barrier including a memory and a processing unit capable of multiprocess processing on various processors and enabling the parallel execution of blocks by processes, the blocks being associated by groups in successive work steps. The device further includes a hardware circuit with a usable address space to the memory, capable of receiving a call from each process indicating the end of execution of a current block, each call comprising data. The hardware circuit is arranged to authorize the execution of blocks of a later work step when all the blocks of the current work step have been executed. The accessibility to the address space is achieved by segments drawn from the data of each call.
    Type: Grant
    Filed: November 27, 2009
    Date of Patent: December 22, 2015
    Assignee: BULL SAS
    Inventors: Angelo Solinas, Jordan Chicheportiche, Saïd Derradji, Jean-Jacques Pairault, Zoltan Menyhart, Sylvain Jeaugey, Philippe Couvee
  • Patent number: 9203865
    Abstract: A computing device may be joined to a cluster by discovering the device, determining whether the device is eligible to join the cluster, configuring the device, and assigning the device a cluster role. A device may be assigned to act as a cluster master, backup master, active device, standby device, or another role. The cluster master may be configured to assign tasks, such as network flow processing to the cluster devices. The cluster master and backup master may maintain global, run-time synchronization data pertaining to each of the network flows, shared resources, cluster configuration, and the like. The devices within the cluster may monitor one another. Monitoring may include transmitting status messages comprising indicators of device health to the other devices in the cluster. In the event a device satisfies failover conditions, a failover operation to replace the device with another standby device, may be performed.
    Type: Grant
    Filed: March 4, 2013
    Date of Patent: December 1, 2015
    Assignee: WATCHGUARD TECHNOLOGIES, INC.
    Inventors: Thomas Linden, James Huang, Jeff Hsu, Ming-Jeng Lee
  • Patent number: 9094317
    Abstract: A first processor has a processor port for peer-to-peer processor communications. A switch provides for switching communications from a path between said first processor and a second processor to a path between said first processor and a third processor (and vice-versa).
    Type: Grant
    Filed: June 18, 2009
    Date of Patent: July 28, 2015
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Martin Goldstein, Kamran H. Casim, Loren M. Koehler
  • Patent number: 9063722
    Abstract: A control processor is used for fetching and distributing single instruction multiple data (SIMD) instructions to a plurality of processing elements (PEs). One of the SIMD instructions is a thread start (Tstart) instruction, which causes the control processor to pause its instruction fetching. A local PE instruction memory (PE Imem) is associated with each PE and contains local PE instructions for execution on the local PE. Local PE Imem fetch, decode, and execute logic are associated with each PE. Instruction path selection logic in each PE is used to select between control processor distributed instructions and local PE instructions fetched from the local PE Imem. Each PE is also initialized to receive control processor distributed instructions. In addition, local hold generation logic is associated with each PE. A PE receiving a Tstart instruction causes the instruction path selection logic to switch to fetch local PE Imem instructions.
    Type: Grant
    Filed: December 21, 2011
    Date of Patent: June 23, 2015
    Assignee: Altera Corporation
    Inventors: Gerald George Pechanek, Edwin Franklin Barry, Mihailo Stojancic
  • Publication number: 20150143081
    Abstract: Embodiments include a processor capable of supporting multi-mode and corresponding methods. The processor includes front end units, a number of processing elements more than a number of the front end units; and a controller configured to determine if thread divergence occurs due to conditional branching. If there is thread divergence, the processor may set control information to control processing elements using currently activated front end units. If there is not, the processor may set control information to control processing elements using a currently activated front end unit.
    Type: Application
    Filed: January 27, 2015
    Publication date: May 21, 2015
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Woong SEO, Yeon-Gon CHO, Soo-Jung RYU
  • Publication number: 20150100757
    Abstract: Functional units disposed in one or more processor cores are communicatively coupled using both a shared bypass network and a switched network. The shared bypass network enables the functional units to be operated conventionally for general processing while the switched network enables specialized processing in which the functional units are configured as a spatial array. In the spatial array configuration, operands produced by one functional unit can only be sent to a subset of functional units to which dependent instructions have been mapped a priori. The functional units may be dynamically reconfigured at runtime to toggle between operating in the general configuration and operating as the spatial array. Information to control the toggling between operating configurations may be provided in instructions received by the functional units.
    Type: Application
    Filed: April 14, 2014
    Publication date: April 9, 2015
    Applicant: Microsoft Corporation
    Inventors: Douglas C. Burger, Aaron Smith, Milovan Duric
  • Patent number: 8918624
    Abstract: A method, computer program product and computer system for scaling and managing requests on a massively parallel machine, such as one running in MIMD mode on a SIMD machine. A submit mux (multiplexer) is used to federate work requests and to forward the requests to the management node. A resource arbiter receives and manges these work requests. A MIMD job controller works with the resource arbiter to manage the work requests on the SIMD partition. The SIMD partition may utilize a mux of its own to federate the work requests and the computer nodes. Instructions are also provided to control and monitor the work requests.
    Type: Grant
    Filed: May 15, 2008
    Date of Patent: December 23, 2014
    Assignee: International Business Machines Corporation
    Inventors: Paul V. Allen, Thomas A. Budnik, Mark G. Megerian, Samuel J. Miller
  • Publication number: 20140331025
    Abstract: A reconfigurable processor and an operation method thereof are provided. The reconfigurable processor may include: a controller configured to control operations of a first mode, in which a first portion of a program that does not utilize loop acceleration is processed, and a second mode, in which a second portion for the program that utilizes the loop acceleration is processed, based on whether an instruction to control parallel operations of the first mode and the second mode is executed; and a shared register file configured to transfer data between the first mode and the second mode.
    Type: Application
    Filed: May 5, 2014
    Publication date: November 6, 2014
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Ki-Seok KWON, Suk-Jin KIM
  • Patent number: 8843928
    Abstract: A method and system of efficient use and programming of a multi-processing core device. The system includes a programming construct that is based on stream-domain code. A programmable core based computing device is disclosed. The computing device includes a plurality of processing cores coupled to each other. A memory stores stream-domain code including a stream defining a stream destination module and a stream source module. The stream source module places data values in the stream and the stream conveys data values from the stream source module to the stream destination module. A runtime system detects when the data values are available to the stream destination module and schedules the stream destination module for execution on one of the plurality of processing cores.
    Type: Grant
    Filed: January 21, 2011
    Date of Patent: September 23, 2014
    Assignee: QST Holdings, LLC
    Inventors: Paul Master, Frederick Furtek
  • Patent number: 8719551
    Abstract: The present invention provides an information processing apparatus and an integrated circuit which realize parallel execution of different processing systems, and which do not require the provision of a dedicated memory storing instructions for common processing The information processing apparatus comprises: a plurality of processor elements; an instruction memory storing a first program and a second program; and an arbiter interposed between the processor elements and the instruction memory, the arbiter receiving, from each of the processor elements, a request for an instruction, from among instructions included in the first program and the second program, and controlling access to the instruction memory by the processor elements, wherein the arbiter arbitrates requests made by the processor elements when the requests are (i) simultaneous requests for different instructions included in one of the first program and the second program or (ii) simultaneous requests for an instruction included in the first prog
    Type: Grant
    Filed: April 15, 2010
    Date of Patent: May 6, 2014
    Assignee: Panasonic Corporation
    Inventor: Hideshi Nishida
  • Patent number: 8687008
    Abstract: A latency tolerant system for executing video processing operations. The system includes a host interface for implementing communication between the video processor and a host CPU, a scalar execution unit coupled to the host interface and configured to execute scalar video processing operations, and a vector execution unit coupled to the host interface and configured to execute vector video processing operations. A command FIFO is included for enabling the vector execution unit to operate on a demand driven basis by accessing the memory command FIFO. A memory interface is included for implementing communication between the video processor and a frame buffer memory. A DMA engine is built into the memory interface for implementing DMA transfers between a plurality of different memory locations and for loading the command FIFO with data and instructions for the vector execution unit.
    Type: Grant
    Filed: November 4, 2005
    Date of Patent: April 1, 2014
    Assignee: NVIDIA Corporation
    Inventors: Ashish Karandikar, Shirish Gadre, Stephen D. Lew
  • Patent number: 8683474
    Abstract: In an accounting apparatus, a conflict determination unit determines whether or not the accounting mode is in a conflict state where a process is executing in another logical CPU and stores the determination result in an accounting information storage unit, when a process of the user starts to be executed in a logical CPU of an SMT processor. And a CPU use time acquisition unit collects the CPU use time of the process in the conflict state or the non-conflict state distinctively and stores it in an accounting information storage unit. Thereafter, a CPU use time conversion unit converts the CPU use time in the conflict state, with a predetermined weighting, based on the CPU use time in the conflict state and the non-conflict state, after the end of executing the process, and an accounting calculation unit calculates the accounting amount for the process from an effective use time.
    Type: Grant
    Filed: February 27, 2006
    Date of Patent: March 25, 2014
    Assignee: Fujitsu Limited
    Inventors: Shuji Yamamura, Kouichi Kumon
  • Patent number: 8656376
    Abstract: A method for providing intrinsic supports for a VLIW DSP processor with distributed register files comprises the steps of: generating a program representation with cluster information on instructions of the DSP processor, wherein the cluster information is provided by a program with cluster intrinsic coding; identifying data stream operations indicating parallel instruction sequences applied on different data sets in the program representation; identifying data sharing relations indicating data shared by the data stream operations in the program representation; identifying data aggregation relations indicating results aggregated from the data stream operations in the program representation; and performing register allocation for the DSP processor according to the identified data stream operations, the data sharing relations and the data aggregation relations.
    Type: Grant
    Filed: September 1, 2011
    Date of Patent: February 18, 2014
    Assignee: National Tsing Hua University
    Inventors: Jenq Kuen Lee, Chi Bang Kuan
  • Patent number: 8638805
    Abstract: Described embodiments provide for restructuring a scheduling hierarchy of a network processor having a plurality of processing modules and a shared memory. The scheduling hierarchy schedules packets for transmission. The network processor generates tasks corresponding to each received packet associated with a data flow. A traffic manager receives tasks provided by one of the processing modules and determines a queue of the scheduling hierarchy corresponding to the task. The queue has a parent scheduler at each of one or more next levels of the scheduling hierarchy up to a root scheduler, forming a branch of the hierarchy. The traffic manager determines if the queue and one or more of the parent schedulers of the branch should be restructured. If so, the traffic manager drops subsequently received tasks for the branch, drains all tasks of the branch, and removes the corresponding nodes of the branch from the scheduling hierarchy.
    Type: Grant
    Filed: September 30, 2011
    Date of Patent: January 28, 2014
    Assignee: LSI Corporation
    Inventors: Balakrishnan Sundararaman, Shashank Nemawarkar, David Sonnier, Shailendra Aulakh, Allen Vestal
  • Patent number: 8589660
    Abstract: The present invention concerns a new category of integrated circuitry and a new methodology for adaptive or reconfigurable computing. The exemplary IC embodiment includes a plurality of heterogeneous computational elements coupled to an interconnection network. The plurality of heterogeneous computational elements include corresponding computational elements having fixed and differing architectures, such as fixed architectures for different functions such as memory, addition, multiplication, complex multiplication, subtraction, configuration, reconfiguration, control, input, output, and field programmability. In response to configuration information, the interconnection network is operative in real-time to configure and reconfigure the plurality of heterogeneous computational elements for a plurality of different functional modes, including linear algorithmic operations, non-linear algorithmic operations, finite state machine operations, memory operations, and bit-level manipulations.
    Type: Grant
    Filed: May 24, 2010
    Date of Patent: November 19, 2013
    Assignee: Altera Corporation
    Inventors: Robert T. Plunkett, Ghobad Heidari, Paul L. Master
  • Patent number: 8578387
    Abstract: An embodiment of a computing system is configured to process data using a multithreaded SIMD architecture that includes heterogeneous processing engines to execute a program. The program is constructed of various program instructions. A first type of the program instructions can only be executed by a first type of processing engine and a third type of program instructions can only be executed by a second type of processing engine. A second type of program instructions can be executed by the first and the second type of processing engines. An assignment unit may be configured to dynamically determine which of the two processing engines executes any program instructions of the second type in order to balance the workload between the heterogeneous processing engines.
    Type: Grant
    Filed: July 31, 2007
    Date of Patent: November 5, 2013
    Assignee: Nvidia Corporation
    Inventors: Peter C. Mills, Stuart F. Oberman, John Erik Lindholm, Samuel Liu
  • Patent number: 8532288
    Abstract: A cryptographic engine for modulo N multiplication, which is structured as a plurality of almost identical, serially connected Processing Elements, is controlled so as to accept input in blocks that are smaller than the maximum capability of the engine in terms of bits multiplied at one time. The serially connected hardware is thus partitioned on the fly to process a variety of cryptographic key sizes while still maintaining all of the hardware in an active processing state.
    Type: Grant
    Filed: December 1, 2006
    Date of Patent: September 10, 2013
    Assignee: International Business Machines Corporation
    Inventors: Camil Fayad, John K. Li, Siegfried K. H. Sutter, Phil C. Yeh