Multimode (e.g., Mimd To Simd, Etc.) Patents (Class 712/20)
-
Patent number: 11922221Abstract: In accordance with an embodiment, described herein is a system and method for dependency analysis for a calculation script in a multidimensional database computing environment. A multidimensional database cube aggregation can be represented as a lattice of blocks or cube, arranged according to a database outline (e.g., intra-dimensional or member hierarchy). When the multidimensional database system performs computations in parallel for a given calculation script, portions of the cube that can be computed concurrently are identified.Type: GrantFiled: September 23, 2021Date of Patent: March 5, 2024Assignee: ORACLE INTERNATIONAL CORPORATIONInventors: Vinod Padinjat Menon, Kumar Ramaiyer
-
Patent number: 11860814Abstract: A scalable multi-stage hypercube-based interconnection network with deterministic communication between two or more processing elements (“PEs”) or processing cores (“PCs”) arranged in a 2D-grid using vertical and horizontal buses (i.e., each bus is one or more wires) is disclosed. In one embodiment the buses are connected in pyramid network configuration. At each PE, the interconnection network comprises one or more switches (“interconnect”) with each switch concurrently capable to send and receive packets from one PE to another PE through the bus connected between them. Each packet comprises data token, routing information such as source and destination addresses of PEs and other information. Each PE, in addition to interconnect, comprises a processor and/or memory. In one embodiment the processor is a Central Processing Unit (“CPU”) comprises functional units that perform such as additions, multiplications, or logical operations, for executing computer programs.Type: GrantFiled: November 1, 2021Date of Patent: January 2, 2024Assignee: Konda Technologies Inc.Inventor: Venkat Konda
-
Patent number: 11593167Abstract: Methods and systems for locking a cache line of a cache. A cache line is locked based on a count of a plurality of threads that access the cache line and maintained in the cache until all of the plurality of threads have loaded the cache line.Type: GrantFiled: May 9, 2019Date of Patent: February 28, 2023Assignee: International Business Machines CorporationInventors: Changhoan Kim, John A. Gunnels
-
Patent number: 11573705Abstract: The present disclosure includes apparatuses and methods related to memory with an artificial intelligence (AI) accelerator. An example apparatus can include receive a command indicating that the apparatus operate in an artificial intelligence (AI) mode and perform AI operations using an AI accelerator based on a status of a number of register on the controller. The AI accelerator can include hardware, software, and or firmware that is configured to perform operations (e.g., logic operations, among other operations) associated with AI operations. The hardware can include circuitry configured as an adder and/or multiplier to perform operations, such as logic operations, associated with AI operations.Type: GrantFiled: August 28, 2019Date of Patent: February 7, 2023Assignee: Micron Technology, Inc.Inventor: Alberto Troia
-
Patent number: 11442734Abstract: A processor includes a first mode where the processor is not to use packed data operation masking, and a second mode where the processor is to use packed data operation masking. A decode unit to decode an unmasked packed data instruction for a given packed data operation in the first mode, and to decode a masked packed data instruction for a masked version of the given packed data operation in the second mode. The instructions have a same instruction length. The masked instruction has bit(s) to specify a mask. Execution unit(s) are coupled with the decode unit. The execution unit(s), in response to the decode unit decoding the unmasked instruction in the first mode, to perform the given packed data operation. The execution unit(s), in response to the decode unit decoding the masked instruction in the second mode, to perform the masked version of the given packed data operation.Type: GrantFiled: March 29, 2021Date of Patent: September 13, 2022Assignee: Intel CorporationInventors: Bret L. Toll, Buford M. Guy, Ronak Singhal, Mishali Naik
-
Patent number: 11429506Abstract: A system is configured to track and store system and event data for various computing devices. The system is configured to associate the various computing devices with profiles based at least in part on characteristics of the computing devices. The system is further configured to compare performance data and/or performance metrics for particular computing devices having a particular profile against all other devices that share the particular profile. The system then displays this comparison to a user of the particular computing device, substantially automatically diagnoses an issue with the particular computing device based on the performance and system event data, and/or enables the user to diagnose the problem based on the performance and system event data.Type: GrantFiled: November 19, 2020Date of Patent: August 30, 2022Assignee: Assurant, Inc.Inventors: Dustin Brewer, Stuart Saunders, Cameron Hurst
-
Patent number: 11294673Abstract: A method is provided that includes performing, by a processor in response to a dual issue multiply instruction, multiplication of operands of the dual issue multiply instruction using multiplication units comprised in a data path of the processor and configured to operate together to determine a product of the operands, and storing, by the processor, the product in a storage location indicated by the dual issue multiply instruction.Type: GrantFiled: May 20, 2020Date of Patent: April 5, 2022Assignee: TEXAS INSTRUMENTS INCORPORATEDInventors: Timothy David Anderson, Mujibur Rahman
-
Patent number: 11023231Abstract: Disclosed embodiments relate to executing a vector-complex fused multiply-add Instruction. In one example, a method includes fetching an instruction, a format of the instruction including an opcode, a first source operand identifier, a second source operand identifier, and a destination operand identifier, wherein each of the identifiers identifies a location storing a packed data comprising at least one complex number, decoding the instruction, retrieving data associated with the first and second source operand identifiers, and executing the decoded instruction to, for each packed data element position of the identified first and second source operands, cross-multiply the real and imaginary components to generate four products: a product of real components, a product of imaginary components, and two mixed products, generate a complex result by using the four products according to the instruction, and store a result to the corresponding position of the identified destination operand.Type: GrantFiled: October 1, 2016Date of Patent: June 1, 2021Assignee: Intel CorporationInventors: Roman S. Dubtsov, Robert Valentine, Jesus Corbal, Milind Girkar, Elmoustapha Ould-Ahmed-Vall
-
Patent number: 10970043Abstract: An integrated circuit including a data architecture including N adders and N multipliers configured to receive operands. The data architecture receives instructions for selecting a data flow between the N multipliers and the N adders of the data architecture. The selected data flow includes the options: (1) a first data flow using the N multipliers and the N adders to provide a multiply-accumulate mode and (2) a second data flow to provide a multiply-reduce mode.Type: GrantFiled: May 28, 2020Date of Patent: April 6, 2021Assignee: ALIBABA GROUP HOLDING LIMITEDInventors: Liang Han, Xiaowei Jiang
-
Patent number: 10872022Abstract: A system is configured to track and store system and event data for various computing devices. The system is configured to associate the various computing devices with profiles based at least in part on characteristics of the computing devices. The system is further configured to compare performance data and/or performance metrics for particular computing devices having a particular profile against all other devices that share the particular profile. The system then displays this comparison to a user of the particular computing device, substantially automatically diagnoses an issue with the particular computing device based on the performance and system event data, and/or enables the user to diagnose the problem based on the performance and system event data.Type: GrantFiled: August 21, 2018Date of Patent: December 22, 2020Assignee: Assurant, Inc.Inventors: Dustin Brewer, Stuart Saunders, Cameron Hurst
-
Patent number: 10754652Abstract: A processor includes: an address generating unit that, when an instruction decoded by a decoding unit is an instruction to execute arithmetic processing on a plurality of operand sets each including a plurality of operands that are objects of the arithmetic processing, in parallel a plurality of times, generates an address set corresponding to each of the operand sets of the arithmetic processing for each time, based on a certain address displacement with respect to the plurality of operands included in each of the operand sets; a plurality of instruction queues that hold the generated address sets corresponding to the respective operand sets, in correspondence to respective processing units; and a plurality of processing units that perform the arithmetic processing in parallel on the operand sets obtained based on the respective address sets outputted by the plurality of instruction queues.Type: GrantFiled: May 26, 2017Date of Patent: August 25, 2020Assignee: FUJITSU LIMITEDInventors: Shuji Yamamura, Takumi Maruyama, Masato Nakagawa, Masahiro Kuramoto
-
Patent number: 10564949Abstract: In a system for automatic generation of event-driven, tuple-space based programs from a sequential specification, a hierarchical mapping solution can target different runtimes relying on event-driven tasks (EDTs). The solution uses loop types to encode short, transitive relations among EDTs that can be evaluated efficiently at runtime. Specifically, permutable loops translate immediately into conservative point-to-point synchronizations of distance one. A runtime-agnostic which can be used to target the transformed code to different runtimes.Type: GrantFiled: September 22, 2014Date of Patent: February 18, 2020Assignee: Reservoir Labs, Inc.Inventors: Muthu M. Baskaran, Thomas Henretty, M. H. Langston, Richard A. Lethin, Benoit J. Meister, Nicolas T. Vasilache, David E. Wohlford
-
Patent number: 10521395Abstract: Systems and methods include an integrated circuit that includes a plurality of computing tiles, wherein each of the plurality of computing tiles includes: a matrix multiply accelerator, a computing processing circuit; and a flow scoreboard module; a local data buffer, wherein the plurality of computing tiles together define an intelligence processing array; a network-on-chip system comprising: a plurality of network-on-chip routers establishing a communication network among the plurality of computing tiles, wherein each network-on-chip router is in operable communication connection with at least one of the plurality of computing tiles and a distinct network-on-chip router of the plurality of network-on-chip routers; and an off-tile buffer that is arranged in remote communication with the plurality of computing tiles, wherein the off-tile buffer stores raw input data and/or data received from an upstream process or an upstream device.Type: GrantFiled: July 1, 2019Date of Patent: December 31, 2019Assignee: Mythic, Inc.Inventors: David Fick, Malav Parikh, Paul Toth, Adam Caughron, Vimal Reddy, Erik Schlanger, Sergio Schuler, Zainab Nasreen Zaidi, Alex Dang-Tran, Raul Garibay, Bryant Sorensen
-
Patent number: 10338918Abstract: A Vector Galois Field Multiply Sum and Accumulate instruction. Each element of a second operand of the instruction is multiplied in a Galois field with the corresponding element of the third operand to provide one or more products. The one or more products are exclusively ORed with each other and exclusively ORed with a corresponding element of a fourth operand of the instruction. The results are placed in a selected operand.Type: GrantFiled: June 5, 2017Date of Patent: July 2, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Jonathan D. Bradbury
-
Patent number: 10310894Abstract: Provided is a method for generating configuration information of a dynamic reconfigurable processor. The dynamic reconfigurable processor includes a processing unit array, and the processing unit array includes a plurality of processing units. The method includes steps of: reading information of a task to be executed and generating an array configuration information top of the processing unit array according to the information; generating a plurality of processing unit configuration information corresponding to the plurality of processing units respectively according to the information; and assembling the array configuration information top and the plurality of processing unit configuration information.Type: GrantFiled: June 16, 2014Date of Patent: June 4, 2019Assignee: TSINGHUA UNIVERSITYInventors: Leibo Liu, Yansheng Wang, Guiqiang Peng, Zhaoshi Li, Shouyi Yin, Shaojun Wei
-
Patent number: 10296620Abstract: A stream application receives a stream of tuples to be processed by a plurality of processing elements that are operating on one or more compute nodes. Each processing element has one or more stream operators. The stream application assigns one or more processing cycles to software code embedded in a tuple of the stream of tuples. The tuple obtains a first status of one or more first tuples of a set of targeted tuples to be modified by a tuple modification of a stream operator. The tuple obtains a second status of one or more second tuples of the set of targeted tuples after the stream operator performs the tuple modification. The tuple determines a potential degradation based on the first status and the second status. The tuple alters the one or more first tuples to prevent the tuple modification in response to the determined potential degradation.Type: GrantFiled: September 30, 2015Date of Patent: May 21, 2019Assignee: International Business Machines CorporationInventors: Bin Cao, Jessica R. Eidem, Brian R. Muras, Jingdong Sun
-
Patent number: 10175981Abstract: The vector data path is divided into smaller vector lanes. The number of active vector lanes is controllable on the fly by the programmer to match the requirements of the executing program, and inactive vector lanes are powered down by the CPU to increase power efficiency of the vector processor.Type: GrantFiled: July 9, 2014Date of Patent: January 8, 2019Assignee: TEXAS INSTRUMENTS INCORPORATEDInventors: Timothy David Anderson, Duc Quang Bui
-
Patent number: 10146534Abstract: A Vector Galois Field Multiply Sum and Accumulate instruction. Each element of a second operand of the instruction is multiplied in a Galois field with the corresponding element of the third operand to provide one or more products. The one or more products are exclusively ORed with each other and exclusively ORed with a corresponding element of a fourth operand of the instruction. The results are placed in a selected operand.Type: GrantFiled: October 6, 2016Date of Patent: December 4, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Jonathan D. Bradbury
-
Patent number: 10120833Abstract: Embodiments include a processor capable of supporting multi-mode and corresponding methods. The processor includes front end units, a number of processing elements more than a number of the front end units; and a controller configured to determine if thread divergence occurs due to conditional branching. If there is thread divergence, the processor may set control information to control processing elements using currently activated front end units. If there is not, the processor may set control information to control processing elements using a currently activated front end unit.Type: GrantFiled: January 28, 2014Date of Patent: November 6, 2018Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Woong Seo, Yeon-Gon Cho, Soo-Jung Ryu
-
Patent number: 10095847Abstract: Unauthorized use of computer programs is made difficult by compiling a processor rather than just compiling a program into machine code. The way in which the processor should respond to machine instructions, i.e. its translation data, is computed from an arbitrary bit string B and a program P as inputs. The translation data of a processor are computed that will execute operations defined by the program P when the processor uses the given bit string B as a source of machine instructions. A processor is configured so that it will execute machine instructions according to said translation data. Other programs P? may then be compiled into machine instructions B? for that processor and executed by the processor. Without knowledge of the bit string B and the original program P it is difficult to modify the machine instructions B? so that a different processor will execute the other program P?.Type: GrantFiled: May 17, 2013Date of Patent: October 9, 2018Assignee: KONINKLIJKE PHILIPS N.V.Inventor: Willem Charles Mallon
-
Patent number: 10095515Abstract: A technique for decoding an instruction in a variable-length instruction set. In one embodiment, an instruction encoding is described, in which legacy, present, and future instruction set extensions are supported, and increased functionality is provided, without expanding the code size and, in some cases, reducing the code size.Type: GrantFiled: February 13, 2017Date of Patent: October 9, 2018Assignee: Intel CorporationInventors: Robert Valentine, Doron Orenstein, Bret L. Toll
-
Patent number: 9898283Abstract: A method of an aspect includes receiving an instruction. The instruction indicates an integer stride, indicates an integer offset, and indicates a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four integers in numerical order with a smallest one of the at least four integers differing from zero by the integer offset and with all integers of the sequence in consecutive positions differing by the integer stride. Other methods, apparatus, systems, and instructions are disclosed.Type: GrantFiled: December 22, 2011Date of Patent: February 20, 2018Assignee: Intel CorporationInventors: Seth Abraham, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Zeev Sperber, Amit Gradstein
-
Patent number: 9832543Abstract: A system and method for processing a plurality of channels, for example audio channels, in parallel is provided. For example, a plurality of telephony channels are processed in order to detect and respond to call progress tones. The channels may be processed according to a common transform algorithm. Advantageously, a massively parallel architecture is employed, in which operations on many channels are synchronized, to achieve a high efficiency parallel processing environment. The parallel processor may be situated on a data bus, separate from a main general purpose processor, or integrated with the processor in a common board or integrated device. All, or a portion of a speech processing algorithm may also be performed in a massively parallel manner.Type: GrantFiled: June 16, 2014Date of Patent: November 28, 2017Assignee: Calltrol CorporationInventor: Wai Wu
-
Patent number: 9798305Abstract: A calculation device includes a plurality of calculation processing units configured to perform different processes with each other, a plurality of calculators configured to perform a same calculation, and a control unit configured to control a number of the calculators to be operated during each of a plurality of divided periods based on a length of a predetermined processing period and a number of calculations to be performed, such that a number of data which is equal to a number of calculations is processed within a predetermined processing period, and that the number of the calculators to be operated during each of the plurality of divided periods is averaged, the divided periods being obtained by dividing up the predetermined processing period.Type: GrantFiled: November 10, 2014Date of Patent: October 24, 2017Assignee: OLYMPUS CORPORATIONInventors: Kazue Chida, Akira Ueno
-
Patent number: 9747251Abstract: A method is disclosed for the decoding and encoding of a block-based video bit-stream such as MPEG2, H.264-AVC, VC1, or VP6 using a system containing one or more high speed sequential processors, a homogenous array of software configurable general purpose parallel processors, and a high speed memory system to transfer data between processors or processor sets. This disclosure includes a method for load balancing between the two sets of processors.Type: GrantFiled: December 7, 2011Date of Patent: August 29, 2017Assignee: AMAZON TECHNOLOGIES, INC.Inventors: Jesse J. Rosenzweig, Brian Gregory Lewis
-
Patent number: 9547530Abstract: A data processing apparatus has processing circuitry for processing threads each having thread state data. The threads may be processed in thread groups, with each thread group comprising a number of threads processed in parallel with a common program executed for each thread. Several thread state storage regions are provided with fixed number of thread state entries for storing thread state data for a corresponding thread. At least two of the storage regions have different fixed numbers of entries. The processing circuitry processes as the same thread group threads having thread state data stored in the same storage region and processes threads having thread state data stored in different storage regions as different thread groups.Type: GrantFiled: November 1, 2013Date of Patent: January 17, 2017Assignee: ARM LimitedInventor: David Hennah Mansell
-
Patent number: 9495131Abstract: To add floating point numbers in a parallel computing system, a collective logic device receives the floating point numbers from computing nodes. The collective logic devices converts the floating point numbers to integer numbers. The collective logic device adds the integer numbers and generating a summation of the integer numbers. The collective logic device converts the summation to a floating point number. The collective logic device performs the receiving, the converting the floating point numbers, the adding, the generating and the converting the summation in one pass. One pass indicates that the computing nodes send inputs only once to the collective logic device and receive outputs only once from the collective logic device.Type: GrantFiled: March 9, 2015Date of Patent: November 15, 2016Assignee: International Business Machines CorporationInventors: Dong Chen, Noel A. Eisley, Philip Heidelberger, Burkhard Steinmacher-Burow
-
Patent number: 9471308Abstract: A Vector Floating Point Test Data Class Immediate instruction is provided that determines whether one or more elements of a vector specified in the instruction are of one or more selected classes and signs. If a vector element is of a selected class and sign, an element in an operand of the instruction corresponding to the vector element is set to a first defined value, and if the vector element is not of the selected class and sign, the operand element corresponding to the vector element is set to a second defined value.Type: GrantFiled: January 23, 2013Date of Patent: October 18, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jonathan D. Bradbury, Eric M. Schwarz
-
Patent number: 9354893Abstract: Provided is an information processing device including an instruction cache, a data cache, first and second arithmetic unit groups including a plurality of arithmetic units capable of parallel operation, a first arithmetic-control circuit that generates one or more operation instructions for the first arithmetic unit group, and a second arithmetic-control circuit that generates one or more operation instructions for the second arithmetic unit group based on an instruction code of a fixed instruction register. The first arithmetic unit group sets the instruction code to the fixed instruction register according to an operation instruction generated based on a first specific instruction code by the first arithmetic-control circuit, and provides data to the second arithmetic unit group according to an operation instruction generated based on a second specific instruction code by the first arithmetic-control circuit.Type: GrantFiled: May 29, 2012Date of Patent: May 31, 2016Assignee: Renesas Electronics CorporationInventors: Yuki Kobayashi, Shohei Nomoto
-
Patent number: 9317290Abstract: Circuits, methods, and apparatus that provide parallel execution relationships to be included in a function call or other appropriate portion of a command or instruction in a sequential programming language. One example provides a token-based method of expressing parallel execution relationships. Each process that can be executed in parallel is given a separate token. Later processes that depend on earlier processes wait to receive the appropriate token before being executed. In another example, counters are used in place to tokens to determine when a process is completed. Each function is a number of individual functions or threads, where each thread performs the same operation on a different piece of data. A counter is used to track the number of threads that have been executed. When each thread in the function has been executed, a later function that relies on data generated by the earlier function may be executed.Type: GrantFiled: January 7, 2013Date of Patent: April 19, 2016Assignee: NVIDIA CorporationInventors: Ian A. Buck, Bastiaan Aarts
-
Patent number: 9251207Abstract: A query that identifies an input data source is rewritten to contain data parallel operations that include partitioning and merging. The input data source is partitioned into a plurality of initial partitions. A parallel repartitioning operation is performed on the initial partitions to generate a plurality of secondary partitions. A parallel execution of the query is performed using the secondary partitions to generate a plurality of output sets. The plurality of output sets are merged into a merged output set.Type: GrantFiled: November 29, 2007Date of Patent: February 2, 2016Assignee: Microsoft Technology Licensing, LLCInventors: John Duffy, Edward G. Essey, Charles D. Callahan, II
-
Patent number: 9246792Abstract: Methods, apparatus, and products are disclosed for providing point to point data communications among compute nodes in a global combining network of a parallel computer that include: determining a class route identifier available for all of the nodes along a communications path from an origin node to a target node; configuring network hardware of each node along the communications path with routing instructions in dependence upon the available class route identifier and the network's topology; transmitting, by the origin node along the communications path, a network packet to the target node, including encoding the available class route identifier in the network packet; and routing, by the network hardware of each node along the communications path, the network packet to the target node in dependence upon the routing instructions for each node and the available class route identifier.Type: GrantFiled: April 5, 2012Date of Patent: January 26, 2016Assignee: International Business Machines CorporationInventors: Charles J. Archer, Ahmad A. Faraj, Todd A. Inglett
-
Patent number: 9225547Abstract: A data processing apparatus can reduce an occupancy rate of a ring bus by suppressing occurrence of a stall packet, and can change a processing sequence. In the data processing apparatus, a buffer is provided in each communication unit connecting the ring bus and the associated processing unit. Transfer of data from the communication unit to the processing unit is controlled by an enable signal. Consequently, occurrence of a stall packet is suppressed. Accordingly, frequency of occurrence of a deadlock state is reduced by decreasing the occupancy rate of the ring bus.Type: GrantFiled: March 15, 2010Date of Patent: December 29, 2015Assignee: Canon Kabushiki KaishaInventors: Yuji Hara, Hisashi Ishikawa, Akinobu Mori, Takeo Kimura, Hirowo Inoue
-
Patent number: 9218222Abstract: A computer device with synchronization barrier including a memory and a processing unit capable of multiprocess processing on various processors and enabling the parallel execution of blocks by processes, the blocks being associated by groups in successive work steps. The device further includes a hardware circuit with a usable address space to the memory, capable of receiving a call from each process indicating the end of execution of a current block, each call comprising data. The hardware circuit is arranged to authorize the execution of blocks of a later work step when all the blocks of the current work step have been executed. The accessibility to the address space is achieved by segments drawn from the data of each call.Type: GrantFiled: November 27, 2009Date of Patent: December 22, 2015Assignee: BULL SASInventors: Angelo Solinas, Jordan Chicheportiche, Saïd Derradji, Jean-Jacques Pairault, Zoltan Menyhart, Sylvain Jeaugey, Philippe Couvee
-
Patent number: 9203865Abstract: A computing device may be joined to a cluster by discovering the device, determining whether the device is eligible to join the cluster, configuring the device, and assigning the device a cluster role. A device may be assigned to act as a cluster master, backup master, active device, standby device, or another role. The cluster master may be configured to assign tasks, such as network flow processing to the cluster devices. The cluster master and backup master may maintain global, run-time synchronization data pertaining to each of the network flows, shared resources, cluster configuration, and the like. The devices within the cluster may monitor one another. Monitoring may include transmitting status messages comprising indicators of device health to the other devices in the cluster. In the event a device satisfies failover conditions, a failover operation to replace the device with another standby device, may be performed.Type: GrantFiled: March 4, 2013Date of Patent: December 1, 2015Assignee: WATCHGUARD TECHNOLOGIES, INC.Inventors: Thomas Linden, James Huang, Jeff Hsu, Ming-Jeng Lee
-
Patent number: 9094317Abstract: A first processor has a processor port for peer-to-peer processor communications. A switch provides for switching communications from a path between said first processor and a second processor to a path between said first processor and a third processor (and vice-versa).Type: GrantFiled: June 18, 2009Date of Patent: July 28, 2015Assignee: Hewlett-Packard Development Company, L.P.Inventors: Martin Goldstein, Kamran H. Casim, Loren M. Koehler
-
Patent number: 9063722Abstract: A control processor is used for fetching and distributing single instruction multiple data (SIMD) instructions to a plurality of processing elements (PEs). One of the SIMD instructions is a thread start (Tstart) instruction, which causes the control processor to pause its instruction fetching. A local PE instruction memory (PE Imem) is associated with each PE and contains local PE instructions for execution on the local PE. Local PE Imem fetch, decode, and execute logic are associated with each PE. Instruction path selection logic in each PE is used to select between control processor distributed instructions and local PE instructions fetched from the local PE Imem. Each PE is also initialized to receive control processor distributed instructions. In addition, local hold generation logic is associated with each PE. A PE receiving a Tstart instruction causes the instruction path selection logic to switch to fetch local PE Imem instructions.Type: GrantFiled: December 21, 2011Date of Patent: June 23, 2015Assignee: Altera CorporationInventors: Gerald George Pechanek, Edwin Franklin Barry, Mihailo Stojancic
-
Publication number: 20150143081Abstract: Embodiments include a processor capable of supporting multi-mode and corresponding methods. The processor includes front end units, a number of processing elements more than a number of the front end units; and a controller configured to determine if thread divergence occurs due to conditional branching. If there is thread divergence, the processor may set control information to control processing elements using currently activated front end units. If there is not, the processor may set control information to control processing elements using a currently activated front end unit.Type: ApplicationFiled: January 27, 2015Publication date: May 21, 2015Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventors: Woong SEO, Yeon-Gon CHO, Soo-Jung RYU
-
Publication number: 20150100757Abstract: Functional units disposed in one or more processor cores are communicatively coupled using both a shared bypass network and a switched network. The shared bypass network enables the functional units to be operated conventionally for general processing while the switched network enables specialized processing in which the functional units are configured as a spatial array. In the spatial array configuration, operands produced by one functional unit can only be sent to a subset of functional units to which dependent instructions have been mapped a priori. The functional units may be dynamically reconfigured at runtime to toggle between operating in the general configuration and operating as the spatial array. Information to control the toggling between operating configurations may be provided in instructions received by the functional units.Type: ApplicationFiled: April 14, 2014Publication date: April 9, 2015Applicant: Microsoft CorporationInventors: Douglas C. Burger, Aaron Smith, Milovan Duric
-
Patent number: 8918624Abstract: A method, computer program product and computer system for scaling and managing requests on a massively parallel machine, such as one running in MIMD mode on a SIMD machine. A submit mux (multiplexer) is used to federate work requests and to forward the requests to the management node. A resource arbiter receives and manges these work requests. A MIMD job controller works with the resource arbiter to manage the work requests on the SIMD partition. The SIMD partition may utilize a mux of its own to federate the work requests and the computer nodes. Instructions are also provided to control and monitor the work requests.Type: GrantFiled: May 15, 2008Date of Patent: December 23, 2014Assignee: International Business Machines CorporationInventors: Paul V. Allen, Thomas A. Budnik, Mark G. Megerian, Samuel J. Miller
-
Publication number: 20140331025Abstract: A reconfigurable processor and an operation method thereof are provided. The reconfigurable processor may include: a controller configured to control operations of a first mode, in which a first portion of a program that does not utilize loop acceleration is processed, and a second mode, in which a second portion for the program that utilizes the loop acceleration is processed, based on whether an instruction to control parallel operations of the first mode and the second mode is executed; and a shared register file configured to transfer data between the first mode and the second mode.Type: ApplicationFiled: May 5, 2014Publication date: November 6, 2014Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventors: Ki-Seok KWON, Suk-Jin KIM
-
Patent number: 8843928Abstract: A method and system of efficient use and programming of a multi-processing core device. The system includes a programming construct that is based on stream-domain code. A programmable core based computing device is disclosed. The computing device includes a plurality of processing cores coupled to each other. A memory stores stream-domain code including a stream defining a stream destination module and a stream source module. The stream source module places data values in the stream and the stream conveys data values from the stream source module to the stream destination module. A runtime system detects when the data values are available to the stream destination module and schedules the stream destination module for execution on one of the plurality of processing cores.Type: GrantFiled: January 21, 2011Date of Patent: September 23, 2014Assignee: QST Holdings, LLCInventors: Paul Master, Frederick Furtek
-
Patent number: 8719551Abstract: The present invention provides an information processing apparatus and an integrated circuit which realize parallel execution of different processing systems, and which do not require the provision of a dedicated memory storing instructions for common processing The information processing apparatus comprises: a plurality of processor elements; an instruction memory storing a first program and a second program; and an arbiter interposed between the processor elements and the instruction memory, the arbiter receiving, from each of the processor elements, a request for an instruction, from among instructions included in the first program and the second program, and controlling access to the instruction memory by the processor elements, wherein the arbiter arbitrates requests made by the processor elements when the requests are (i) simultaneous requests for different instructions included in one of the first program and the second program or (ii) simultaneous requests for an instruction included in the first progType: GrantFiled: April 15, 2010Date of Patent: May 6, 2014Assignee: Panasonic CorporationInventor: Hideshi Nishida
-
Patent number: 8687008Abstract: A latency tolerant system for executing video processing operations. The system includes a host interface for implementing communication between the video processor and a host CPU, a scalar execution unit coupled to the host interface and configured to execute scalar video processing operations, and a vector execution unit coupled to the host interface and configured to execute vector video processing operations. A command FIFO is included for enabling the vector execution unit to operate on a demand driven basis by accessing the memory command FIFO. A memory interface is included for implementing communication between the video processor and a frame buffer memory. A DMA engine is built into the memory interface for implementing DMA transfers between a plurality of different memory locations and for loading the command FIFO with data and instructions for the vector execution unit.Type: GrantFiled: November 4, 2005Date of Patent: April 1, 2014Assignee: NVIDIA CorporationInventors: Ashish Karandikar, Shirish Gadre, Stephen D. Lew
-
Patent number: 8683474Abstract: In an accounting apparatus, a conflict determination unit determines whether or not the accounting mode is in a conflict state where a process is executing in another logical CPU and stores the determination result in an accounting information storage unit, when a process of the user starts to be executed in a logical CPU of an SMT processor. And a CPU use time acquisition unit collects the CPU use time of the process in the conflict state or the non-conflict state distinctively and stores it in an accounting information storage unit. Thereafter, a CPU use time conversion unit converts the CPU use time in the conflict state, with a predetermined weighting, based on the CPU use time in the conflict state and the non-conflict state, after the end of executing the process, and an accounting calculation unit calculates the accounting amount for the process from an effective use time.Type: GrantFiled: February 27, 2006Date of Patent: March 25, 2014Assignee: Fujitsu LimitedInventors: Shuji Yamamura, Kouichi Kumon
-
Patent number: 8656376Abstract: A method for providing intrinsic supports for a VLIW DSP processor with distributed register files comprises the steps of: generating a program representation with cluster information on instructions of the DSP processor, wherein the cluster information is provided by a program with cluster intrinsic coding; identifying data stream operations indicating parallel instruction sequences applied on different data sets in the program representation; identifying data sharing relations indicating data shared by the data stream operations in the program representation; identifying data aggregation relations indicating results aggregated from the data stream operations in the program representation; and performing register allocation for the DSP processor according to the identified data stream operations, the data sharing relations and the data aggregation relations.Type: GrantFiled: September 1, 2011Date of Patent: February 18, 2014Assignee: National Tsing Hua UniversityInventors: Jenq Kuen Lee, Chi Bang Kuan
-
Patent number: 8638805Abstract: Described embodiments provide for restructuring a scheduling hierarchy of a network processor having a plurality of processing modules and a shared memory. The scheduling hierarchy schedules packets for transmission. The network processor generates tasks corresponding to each received packet associated with a data flow. A traffic manager receives tasks provided by one of the processing modules and determines a queue of the scheduling hierarchy corresponding to the task. The queue has a parent scheduler at each of one or more next levels of the scheduling hierarchy up to a root scheduler, forming a branch of the hierarchy. The traffic manager determines if the queue and one or more of the parent schedulers of the branch should be restructured. If so, the traffic manager drops subsequently received tasks for the branch, drains all tasks of the branch, and removes the corresponding nodes of the branch from the scheduling hierarchy.Type: GrantFiled: September 30, 2011Date of Patent: January 28, 2014Assignee: LSI CorporationInventors: Balakrishnan Sundararaman, Shashank Nemawarkar, David Sonnier, Shailendra Aulakh, Allen Vestal
-
Patent number: 8589660Abstract: The present invention concerns a new category of integrated circuitry and a new methodology for adaptive or reconfigurable computing. The exemplary IC embodiment includes a plurality of heterogeneous computational elements coupled to an interconnection network. The plurality of heterogeneous computational elements include corresponding computational elements having fixed and differing architectures, such as fixed architectures for different functions such as memory, addition, multiplication, complex multiplication, subtraction, configuration, reconfiguration, control, input, output, and field programmability. In response to configuration information, the interconnection network is operative in real-time to configure and reconfigure the plurality of heterogeneous computational elements for a plurality of different functional modes, including linear algorithmic operations, non-linear algorithmic operations, finite state machine operations, memory operations, and bit-level manipulations.Type: GrantFiled: May 24, 2010Date of Patent: November 19, 2013Assignee: Altera CorporationInventors: Robert T. Plunkett, Ghobad Heidari, Paul L. Master
-
Patent number: 8578387Abstract: An embodiment of a computing system is configured to process data using a multithreaded SIMD architecture that includes heterogeneous processing engines to execute a program. The program is constructed of various program instructions. A first type of the program instructions can only be executed by a first type of processing engine and a third type of program instructions can only be executed by a second type of processing engine. A second type of program instructions can be executed by the first and the second type of processing engines. An assignment unit may be configured to dynamically determine which of the two processing engines executes any program instructions of the second type in order to balance the workload between the heterogeneous processing engines.Type: GrantFiled: July 31, 2007Date of Patent: November 5, 2013Assignee: Nvidia CorporationInventors: Peter C. Mills, Stuart F. Oberman, John Erik Lindholm, Samuel Liu
-
Patent number: 8532288Abstract: A cryptographic engine for modulo N multiplication, which is structured as a plurality of almost identical, serially connected Processing Elements, is controlled so as to accept input in blocks that are smaller than the maximum capability of the engine in terms of bits multiplied at one time. The serially connected hardware is thus partitioned on the fly to process a variety of cryptographic key sizes while still maintaining all of the hardware in an active processing state.Type: GrantFiled: December 1, 2006Date of Patent: September 10, 2013Assignee: International Business Machines CorporationInventors: Camil Fayad, John K. Li, Siegfried K. H. Sutter, Phil C. Yeh