Multiple Instruction, Multiple Data (mimd) Patents (Class 712/21)
-
Patent number: 12147379Abstract: Examples herein describe techniques for performing parallel processing using a plurality of processing elements (PEs) and a controller for data that has data dependencies. For example, a calculation may require an entire row or column to be summed, or to determine its mean. The PEs can be assigned different chunks of a data set (e.g., a tensor set, a column, or a row) for processing. The PEs can use one or more tokens to inform the controller when they are done with partial processing of their data chunks. The controller can then gather the partial results and determine an intermediate value for the data set. The controller can then distribute this intermediate value to the PEs which then re-process their respective data chunks using the intermediate value to generate final results.Type: GrantFiled: December 28, 2022Date of Patent: November 19, 2024Assignee: XILINX, INC.Inventors: Rajeev Patwari, Jorn Tuyls, Elliott Delaye, Xiao Teng, Ephrem Wu
-
Patent number: 12141092Abstract: The invention relates to a computer program comprising a sequence of instructions for execution on a processing unit having instruction storage for holding the computer program, an execution unit for executing the computer program and data storage for holding data, the computer program comprising one or more computer executable instruction which, when executed, implements: a send function which causes a data packet destined for a recipient processing unit to be transmitted on a set of connection wires connected to the processing unit, the data packet having no destination identifier but being transmitted at a predetermined transmit time; and a switch control function which causes the processing unit to control switching circuitry to connect a set of connection wires of the processing unit to a switching fabric to receive a data packet at a predetermined receive time.Type: GrantFiled: April 6, 2022Date of Patent: November 12, 2024Assignee: GRAPHCORE LIMTIEDInventors: Simon Christian Knowles, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Alan Graham Alexander, Stephen Felix, Jonathan Mangnall, David Lacey
-
Patent number: 11818587Abstract: When data transmitted from devices is processed by servers in a shared manner, the data is processed in a constantly stable manner when the devices move. Based on information indicating installation positions and processing capabilities of the servers, information on installation positions of base stations, and acquired location information of the devices, the coverage area control apparatus performs, at a certain time interval, a clustering calculation for obtaining an optimum coverage area for the servers that satisfies the requirements that the communication distances between the servers and the respective devices be minimized and the servers not be overloaded. Based on the information indicating the optimum coverage area obtained by the clustering calculation, assignments of the base stations to the servers are updated.Type: GrantFiled: October 9, 2019Date of Patent: November 14, 2023Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Masahiro Yoshida, Koya Mori, Tomohiro Inoue, Hiroyuki Tanaka
-
Patent number: 11537301Abstract: A system comprises a processor and a plurality of memory units. The processor is coupled to each of the plurality of memory units by a plurality of network connections. The processor includes a plurality of processing elements arranged in a two-dimensional array and a corresponding two-dimensional communication network communicatively connecting each of the plurality of processing elements to other processing elements on same axes of the two-dimensional array. Each processing element that is located along a diagonal of the two-dimensional array is configured as a request broadcasting master for a respective group of processing elements located along a same axis of the two-dimensional array.Type: GrantFiled: May 4, 2021Date of Patent: December 27, 2022Assignee: Meta Platforms, Inc.Inventors: Abdulkadir Utku Diril, Olivia Wu, Krishnakumar Narayanan Nair, Aravind Kalaiah, Anup Ramesh Kadkol, Pankaj Kansal
-
Patent number: 11385692Abstract: A remote automatic control power supply system is disclosed, comprising a power supply control device and an electronic device having a control circuit, in which the power supply control device is configured to control whether the power supply is to be outputted, and the control circuit can set the GPS coordinate and the starting distance value close to the power supply control device; afterwards, it is possible to operate the control circuit via the backend of the electronic device such that, when the distance between the real-time GPS coordinate of the electronic device and the GPS coordinate of the power supply control device is equivalent to the starting distance value, the control circuit can transmit a power control signal to the power supply control device thereby allowing the power supplying control device to output the electric power to the receiving end.Type: GrantFiled: November 27, 2019Date of Patent: July 12, 2022Inventor: Chao-Cheng Yu
-
Patent number: 11238045Abstract: Disclosed aspects relate to data arrangement management in a distributed data cluster environment of a shared pool of configurable computing resources. In the distributed data cluster environment, a set of data is monitored for a data redistribution candidate trigger. The data redistribution candidate trigger is detected with respect to the set of data. Based on the data redistribution candidate trigger, the set of data is analyzed with respect to a candidate data redistribution action. Using the candidate data redistribution action, a new data arrangement associated with the set of data is determined. Accordingly, the new data arrangement is established.Type: GrantFiled: June 24, 2019Date of Patent: February 1, 2022Assignee: International Business Machines CorporationInventors: Naresh K. Chainani, James H. Cho
-
Patent number: 11210104Abstract: A system may include a plurality of processors and a coprocessor. A plurality of coprocessor context priority registers corresponding to a plurality of contexts supported by the coprocessor may be included. The plurality of processors may use the plurality of contexts, and may program the coprocessor context priority register corresponding to a context with a value specifying a priority of the context relative to other contexts. An arbiter may arbitrate among instructions issued by the plurality of processors based on the priorities in the plurality of coprocessor context priority registers. In one embodiment, real-time threads may be assigned higher priorities than bulk processing tasks, improving bandwidth allocated to the real-time threads as compared to the bulk tasks.Type: GrantFiled: September 11, 2020Date of Patent: December 28, 2021Assignee: Apple Inc.Inventors: Aditya Kesiraju, Andrew J. Beaumont-Smith, Brian P. Lilly, James Vash, Jason M. Kassoff, Krishna C. Potnuru, Rajdeep L. Bhuyar, Ran A. Chachick, Tyler J. Huberty, Derek R. Kumar
-
Patent number: 11030017Abstract: Technologies for efficiently booting sleds in a disaggregated architecture include a sled. The sled includes a network interface controller, a set of processors, and firmware that includes an operating system. Additionally, the sled includes circuitry to perform, with multiple processors in the set of processors, a boot process. The circuitry is also to initialize the operating system present in the firmware, receive, with the network interface controller and from another sled, an assignment of a workload, and execute the assigned workload with the operating system.Type: GrantFiled: March 23, 2018Date of Patent: June 8, 2021Assignee: Intel CorporationInventors: Mohan J. Kumar, Murugasamy K. Nachimuthu
-
Patent number: 10853541Abstract: Some examples described herein relate to global mapping of program nodes of a netlist of an application. In an example, a design system includes a processor and a memory coupled to the processor. The memory stores instruction code. The processor is configured to execute the instruction code to obtain a netlist of an application. The netlist contains program nodes and respective edges between the program nodes. The application is to be implemented on a device comprising an array of data processing engines. The processor is also configured to execute the instruction code to generate a global mapping of the program nodes based on a representation of the array of data processing engines and using an integer linear programming (ILP) algorithm; generate a detailed mapping of the program nodes based on the global mapping; and translate the detailed mapping to a file.Type: GrantFiled: April 30, 2019Date of Patent: December 1, 2020Assignee: XILINX, INC.Inventors: Abhishek Joshi, Grigor S. Gasparyan, Aditya Chaubal, Sridhar Kirshnamurthy, Xiao Dong
-
Patent number: 9811621Abstract: Circuit design computing equipment may perform depopulation operations, constraint generation, and repopulation operations in a circuit design in anticipation of register retiming operations. A depopulation operation before placement and/or before routing operations may prevent the respective placement and/or routing operations from placing and/or routing registers from the circuit design. Constraint generation may create constraints for placement and/or routing operations that allow for the reinsertion of registers after routing operations. Repopulation operations may reinsert registers in the circuit design after routing operations according to the constraints. If desired, the circuit design computing equipment may perform register retiming operations to further improve the performance of the circuit design.Type: GrantFiled: May 1, 2015Date of Patent: November 7, 2017Assignee: Altera CorporationInventors: Kimberly Anne Bozman, David Ian Milton, Nishanth Sinnadurai
-
Patent number: 9632790Abstract: A processing device comprises select logic to schedule a plurality of instructions for execution. The select logic calculates a reconstructed program order (RPO) value for each of a plurality of instructions that are ready to be scheduled for execution. The select logic creates an ordered list of instructions based on the delayed RPO values, the delayed RPO values comprising the calculated RPO values from a previous execution cycle, and dispatches instructions for scheduling based on the ordered list.Type: GrantFiled: December 26, 2012Date of Patent: April 25, 2017Assignee: Intel CorporationInventors: Jayesh Iyer, Nikolay Kosarev, Sergey Y. Shishlov, Alexey Y. Sivtsov, Yuriy V Baida, Alexander V Butuzov, Bob Babayan, Vladimir Pentkovski
-
Patent number: 9589138Abstract: Various embodiments are generally directed to authenticating a chain of components of boot software of a computing device. An apparatus comprises a processor circuit and storage storing an initial boot software component comprising instructions operative on the processor circuit to select a first set of boot software components of multiple sets of boot software components, each set of boot software components defines a pathway that branches from the initial boot software component and that rejoins at a latter boot software component; authenticate a first boot software component of the first set of boot software components; and execute a sequence of instructions of the first boot software component to authenticate a second boot software component of the first set of boot software components to form a chain of authentication through a first pathway defined by the first set of boot software components. Other embodiments are described and claimed herein.Type: GrantFiled: September 21, 2015Date of Patent: March 7, 2017Assignee: INTEL CORPORATIONInventors: Jiewen Yao, Vincent J. Zimmer
-
Patent number: 9424135Abstract: A method for content management in a file system includes creating a file system storage device with a storage area symbolic name. A core table data structure is generated including a core table partition field. An ancillary table data structure is generated including an ancillary table partition field. Partitioning rules for the file system storage device are cached based on the storage area symbolic name. Property values for a document are compared with the cached partitioning rules to determine a match for the storage area symbolic name. The document is stored into a partition of the file system storage device based on the storage area symbolic name. Metadata for the document is stored into a partition of the core table based on the core table partition field. Ancillary objects for the document are stored into a partition of the ancillary table based on the ancillary partition fields.Type: GrantFiled: December 30, 2015Date of Patent: August 23, 2016Assignee: International Business Machines CorporationInventors: William J. Carpenter, David G. Skinner, Hailin Wang, Michael G. Winter, Jun Xu
-
Patent number: 9400747Abstract: A data storage device includes a non-volatile memory and a controller. A method includes sending a memory command from the controller to the non-volatile memory. The memory command indicates multiple sense operations to be performed at a single plane of the non-volatile memory.Type: GrantFiled: April 29, 2014Date of Patent: July 26, 2016Assignee: SANDISK TECHNOLOGIES LLCInventors: Daniel Edward Tuers, Abhijeet Manohar, Mark Murin, Mark Shlick, Menahem Lasser
-
Patent number: 9009448Abstract: Disclosed is an architecture, system and method for performing multi-thread DFA descents on a single input stream. An executer performs DFA transitions from a plurality of threads each starting at a different point in an input stream. A plurality of executers may operate in parallel to each other and a plurality of thread contexts operate concurrently within each executer to maintain the context of each thread which is state transitioning. A scheduler in each executer arbitrates instructions for the thread into an at least one pipeline where the instructions are executed. Tokens may be output from each of the plurality of executers to a token processor which sorts and filters the tokens into dispatch order.Type: GrantFiled: January 18, 2012Date of Patent: April 14, 2015Assignee: Intel CorporationInventors: Michael Ruehle, Umesh Ramkrishnarao Kasture, Vinay Janardan Naik, Nayan Amrutlal Suthar, Robert J. McMillen
-
Patent number: 8856494Abstract: Data processing circuit containing an instruction execution circuit having an instruction set comprising a SIMD instruction. The instruction execution circuit comprises arithmetic circuits, arranged to perform N respective identical operations in parallel in response to the SIMD instruction. The SIMD instruction selects a first one and a second one of the registers. The SIMD instruction defines a first and second series of N respective SIMD instruction operands of the SIMD instruction from the addressed registers. Each arithmetic circuit receives a respective first operand and a respective second operand from the first and second series respectively. The instruction execution circuit selects the first and second series so they partially overlap. Positioning the operands is under program control.Type: GrantFiled: January 11, 2012Date of Patent: October 7, 2014Assignee: Intel CorporationInventor: Antonius A. M. Van Wel
-
Patent number: 8798386Abstract: Methods and systems for processing image data on a per tile basis in an image sensor pipeline (ISP) are disclosed and may include communicating, to one or more processing modules via control logic circuits integrated in the ISP, corresponding configuration parameters that are associated with each of a plurality of data tiles comprising an image. The ISP may be integrated in a video processing core. The plurality of data tiles may vary in size. A processing complete signal may be communicated to the control logic circuits when the processing of each of the data tiles is complete prior to configuring a subsequent processing module. The processing may comprise one or more of: lens shading correction, statistics, distortion correction, demosaicing, denoising, defective pixel correction, color correction, and resizing. Each of the data tiles may overlap with adjacent data tiles, and at least a portion of them may be processed concurrently.Type: GrantFiled: July 13, 2010Date of Patent: August 5, 2014Assignee: Broadcom CorporationInventors: Adrian Lees, David Plowman
-
Publication number: 20140195777Abstract: In a particular embodiment, a method may include creating a plurality of variable depth instruction FIFOs and a plurality of data caches from a plurality of caches corresponding to a plurality of processors, where the plurality of caches and the plurality of processors correspond to MIMD architecture. The method may also include configuring the plurality of variable depth instruction FIFOs to implement SIMD architecture. The method may also include configuring the plurality of variable depth instruction FIFOs for at least one of SIMD operation, SIMD operation with staging, or RC-SIMD operation.Type: ApplicationFiled: January 10, 2013Publication date: July 10, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Mark D. Bellows, Mark S. Fredrickson, Scott D. Frei, Steven P. Jones, Chad B. McBride
-
Patent number: 8732359Abstract: When executing a graphical model of a dynamic system that includes two or more concurrently executing sets of operations, a processor is configured to create a first buffer and a second buffer within the executable graphical model. A first set of operations is configured to write data to the first buffer during a first execution instance of the first set of operations. The first set of operations is configured to write data to the second buffer during a second execution instance of the first thread. A second set of operations is configured to read the data from the first buffer during an instance of the second thread that executes contemporaneously with the second execution instance of the first set of operations. Determinations regarding access to the first buffer and second buffer by the first thread and second thread are self-contained within the first thread and second thread, respectively.Type: GrantFiled: December 7, 2012Date of Patent: May 20, 2014Assignee: The MathWorks, Inc.Inventors: James E. Carrick, Biao Yu
-
Patent number: 8638805Abstract: Described embodiments provide for restructuring a scheduling hierarchy of a network processor having a plurality of processing modules and a shared memory. The scheduling hierarchy schedules packets for transmission. The network processor generates tasks corresponding to each received packet associated with a data flow. A traffic manager receives tasks provided by one of the processing modules and determines a queue of the scheduling hierarchy corresponding to the task. The queue has a parent scheduler at each of one or more next levels of the scheduling hierarchy up to a root scheduler, forming a branch of the hierarchy. The traffic manager determines if the queue and one or more of the parent schedulers of the branch should be restructured. If so, the traffic manager drops subsequently received tasks for the branch, drains all tasks of the branch, and removes the corresponding nodes of the branch from the scheduling hierarchy.Type: GrantFiled: September 30, 2011Date of Patent: January 28, 2014Assignee: LSI CorporationInventors: Balakrishnan Sundararaman, Shashank Nemawarkar, David Sonnier, Shailendra Aulakh, Allen Vestal
-
Patent number: 8624910Abstract: One embodiment of the present invention sets forth a technique for dynamically specifying a texture header and texture sampler using an index. The index corresponds to a particular register value that may be static or computed during execution of a shader program. Any texture operation instruction may specify an index value for each of the texture header and the texture sampler.Type: GrantFiled: August 25, 2010Date of Patent: January 7, 2014Assignee: Nvidia CorporationInventors: John Erik Lindholm, Yan Yan Tang
-
Patent number: 8532288Abstract: A cryptographic engine for modulo N multiplication, which is structured as a plurality of almost identical, serially connected Processing Elements, is controlled so as to accept input in blocks that are smaller than the maximum capability of the engine in terms of bits multiplied at one time. The serially connected hardware is thus partitioned on the fly to process a variety of cryptographic key sizes while still maintaining all of the hardware in an active processing state.Type: GrantFiled: December 1, 2006Date of Patent: September 10, 2013Assignee: International Business Machines CorporationInventors: Camil Fayad, John K. Li, Siegfried K. H. Sutter, Phil C. Yeh
-
Patent number: 8499139Abstract: An apparatus having a processor and a circuit is disclosed. The processor generally has a pipeline. The circuit may be configured to (i) detect a first write instruction in the pipeline that writes to a resource, (ii) stall a read instruction in the pipeline where (a) a first read-after-write conflict exists between the first write instruction and the read instruction and (b) no other write instruction to the resource is scheduled between the first write instruction and the read instruction and (iii) not stall the read instruction due to the first read-after-write conflict where a second write instruction to the resource is scheduled between the first write instruction and the read instruction.Type: GrantFiled: August 12, 2010Date of Patent: July 30, 2013Assignee: LSI CorporationInventors: Leonid Dubrovin, Alexander Rabinovitch, Hagit Margolin, Noam Abda
-
Patent number: 8493979Abstract: Executing a single instruction/multiple data (SIMD) instruction of a program to process a vector of data wherein each element of the packet vector corresponds to a different received packet.Type: GrantFiled: December 30, 2008Date of Patent: July 23, 2013Assignee: Intel CorporationInventors: Bryan E. Veal, Travis T. Schluessler
-
Patent number: 8316190Abstract: Computers and other computing machines and information appliances having a modified computer architecture and program structure which enables the operation of an application program concurrently or simultaneously on a plurality of computers interconnected via a communications link or network using a special distributed runtime (DRT), and that provides for a redundant array of independent computing systems that include computer code distribution using code-striping onto the plurality of the computers or computing machines. A redundant array of independent computing systems operating in concert and code-striping features.Type: GrantFiled: March 19, 2008Date of Patent: November 20, 2012Assignee: Waratek Pty. Ltd.Inventor: John M. Holt
-
Patent number: 8185700Abstract: In one embodiment, the present invention includes a method for receiving a bus message in a first cache corresponding to a speculative access to a portion of a second cache by a second thread, and dynamically determining in the first cache if an inter-thread dependency exists between the second thread and a first thread associated with the first cache with respect to the portion. Other embodiments are described and claimed.Type: GrantFiled: May 30, 2006Date of Patent: May 22, 2012Assignee: Intel CorporationInventors: Carlos Madriles Gimeno, Carlos GarcÃa Quinones, Pedro Marcuello, Jesús Sánchez, Fernando Latorre, Antonio González
-
Patent number: 8180998Abstract: A system for performing data-parallel operations and task-parallel operations. A first switch fabric node (SFN) includes first and second lane processing engines (LPEs). The first LPE includes a first set of lane processing units (LPUs) configured to perform data-parallel operations, where each LPU performs a set of operations, and each LPU uses a different set of data for the set of operations, and each LPU within the first set of LPUs uses a different set of data for the set of operations. The second LPE includes a second set of LPUs configured to perform task-parallel operations, where each LPU performs a different set of operations. A processing control engine (PCE) is configured to distribute instructions and data to the first LPE and the second LPE. Advantageously, data parallel operations and task parallel operations are able to be performed on the same processor simultaneously.Type: GrantFiled: September 10, 2008Date of Patent: May 15, 2012Assignee: NVIDIA CorporationInventors: Monier Maher, Christopher Lamb, Sanjay J. Patel, Peter Hsu
-
Publication number: 20120089812Abstract: A shared resource multi-thread processor array wherein an array of heterogeneous function blocks are interconnected via a self-routing switch fabric, in which the individual function blocks have an associated switch port address. Each switch output port comprises a FIFO style memory that implements a plurality of separate queues. Thread queue empty flags are grouped using programmable circuit means to form self-synchronised threads. Data from different threads are passed to the various addressable function blocks in a predefined sequence in order to implement the desired function. The separate port queues allows data from different threads to share the same hardware resources and the reconfiguration of switch fabric addresses further enables the formation of different data-paths allowing the array to be configured for use in various applications.Type: ApplicationFiled: June 9, 2010Publication date: April 12, 2012Inventor: Graeme Roy Smith
-
Patent number: 8103854Abstract: A control processor is used for fetching and distributing single instruction multiple data (SIMD) instructions to a plurality of processing elements (PEs). One of the SIMD instructions is a thread start (Tstart) instruction, which causes the control processor to pause its instruction fetching. A local PE instruction memory (PE Imem) is associated with each PE and contains local PE instructions for execution on the local PE. Local PE Imem fetch, decode, and execute logic are associated with each PE. Instruction path selection logic in each PE is used to select between control processor distributed instructions and local PE instructions fetched from the local PE Imem. Each PE is also initialized to receive control processor distributed instructions. In addition, local hold generation logic is associated with each PE. A PE receiving a Tstart instruction causes the instruction path selection logic to switch to fetch local PE Imem instructions.Type: GrantFiled: April 12, 2010Date of Patent: January 24, 2012Assignee: Altera CorporationInventors: Gerald George Pechanek, Edwin Franklin Barry, Mihailo M. Stojancic
-
Patent number: 8051273Abstract: Disclosed is a mixed mode parallel processor system in which N number of processing elements PEs, capable of performing SIMD operation, are grouped into M (=N÷S) processing units PUs performing MIMD operation. In MIMD operation, P out of S memories in each PU, which S memories inherently belong to the PEs, where P<S, operate as an instruction cache. The remaining memories operate as data memories or as data cache memories. One out of S sets of general-purpose registers, inherently belonging to the PEs, directly operates as a general register group for the PU. Out of the remaining S?1 sets, T set or a required number of sets, where T<S?1, are used as storage registers that store tags of the instruction cache.Type: GrantFiled: November 2, 2010Date of Patent: November 1, 2011Assignee: NEC CorporationInventor: Shorin Kyo
-
Patent number: 8020168Abstract: A NOC for dynamic virtual software pipelining including IP blocks, routers, memory communications controllers, and network interface controllers, each IP block adapted to a router through a memory communications controller and a network interface controller, the NOC also including: a computer software application segmented into stages, each stage comprising a flexibly configurable module of computer program instructions identified by a stage ID, each stage assigned to a thread of execution on an IP block; and each stage executing on a thread of execution on an IP block, including a first stage executing on an IP block, producing output data and sending by the first stage the produced output data to a second stage, the output data including control information for the next stage and payload data; and the second stage consuming the produced output data in dependence upon the control information.Type: GrantFiled: May 9, 2008Date of Patent: September 13, 2011Assignee: International Business Machines CorporationInventors: Russell D. Hoover, Eric O. Mejdrich, Paul E. Schardt, Robert A. Shearer
-
Patent number: 7979674Abstract: Executing MIMD programs on a SIMD machine, the SIMD machine including a plurality of compute nodes, each compute node capable of executing only a single thread of execution, the compute nodes initially configured exclusively for SIMD operations, the SIMD machine further comprising a data communications network, the network comprising synchronous data communications links among the compute nodes, including establishing one or more SIMD partitions, booting one or more SIMD partitions in MIMD mode; establishing a MIMD partition; executing by launcher programs a plurality of MIMD programs on two or more of the compute nodes of the MIMD partition; and re-executing a launcher program by an operating system on a compute node in the MIMD partition upon termination of the MIMD program executed by the launcher program.Type: GrantFiled: May 16, 2007Date of Patent: July 12, 2011Assignee: International Business Machines CorporationInventors: Todd A. Inglett, Patrick J. McCarthy, Amanda Peters, Thomas A. Budnik, Michael B. Mundy, Gordon G. Stewart
-
Patent number: 7941644Abstract: A processing unit includes multiple execution units and sequencer logic that is disposed downstream of instruction buffer logic, and that is responsive to a sequencer instruction present in an instruction stream. In response to such an instruction, the sequencer logic issues a plurality of instructions associated with a long latency operation to one execution unit, while blocking instructions from the instruction buffer logic from being issued to that execution unit. In addition, the blocking of instructions from being issued to the execution unit does not affect the issuance of instructions to any other execution unit, and as such, other instructions from the instruction buffer logic are still capable of being issued to and executed by other execution units even while the sequencer logic is issuing the plurality of instructions associated with the long latency operation.Type: GrantFiled: October 16, 2008Date of Patent: May 10, 2011Assignee: International Business Machines CorporationInventors: Eric Oliver Mejdrich, Adam James Muff, Matthew Ray Tubbs
-
Patent number: 7916864Abstract: A graphics processing unit is programmed to carry out cryptographic processing so that fast, effective cryptographic processing solutions can be provided without incurring additional hardware costs. The graphics processing unit can efficiently carry out cryptographic processing because it has an architecture that is configured to handle a large number of parallel processes. The cryptographic processing carried out on the graphics processing unit can be further improved by configuring the graphics processing unit to be capable of both floating point and integer operations.Type: GrantFiled: February 8, 2006Date of Patent: March 29, 2011Assignee: NVIDIA CorporationInventor: Norbert Juffa
-
Patent number: 7900067Abstract: A computing device operates over a range of voltages and frequencies and over a range of processor usage levels. The computing device includes at least a variable frequency generator, a variable voltage power supply and voltage supply level and clocking frequency management circuitry. The variable frequency generator is coupled to the processor and delivers a clock signal to the processor. The variable voltage power supply is coupled to the processor and delivers voltage to the processor. The voltage supply level and clocking frequency management circuitry adjust both the voltage provided by the variable voltage power supply and the frequency of the signal provided by the variable frequency generator. The computing device includes a temperature sensor that provides signals indicative of the temperature of the processor and the voltage supply level and clocking frequency management circuitry adjusts the voltage and/or the clocking frequency provided by the variable voltage power supply.Type: GrantFiled: May 13, 2008Date of Patent: March 1, 2011Inventor: Paul Beard
-
Patent number: 7810084Abstract: Computer-implemented methods, computer systems and computer program products are provided for parallel processing a plurality of data objects with a plurality of processors. As disclosed herein, the data objects to be assembled for further processing may be in bundles, the bundles obeying first predefined criteria, which is dynamically controlled by using a bundle specific master table. The methods and systems may generate pipelines of data objects by pre-selecting and grouping the data objects according to second predefined criteria by a first group of the plurality of processors, and create the bundles from each pipeline of the pre-selected data objects by a second group of the plurality of processors.Type: GrantFiled: June 1, 2006Date of Patent: October 5, 2010Assignee: SAP AGInventor: Karsten S. Egetoft
-
Patent number: 7774189Abstract: A system and method for implementing a unified model for integration systems is presented. A user provides inputs to an integrated language engine for placing operator components and arc components onto a dataflow diagram. Operator components include data ports for expressing data flow, and also include meta-ports for expressing control flow. Arc components connect operator components together for data and control information to flow between the operator components. The dataflow diagram is a directed acyclic graph that expresses an application without including artificial boundaries during the application design process. Once the integrated language engine generates the dataflow diagram, the integrated language engine compiles the dataflow diagram to generated application code.Type: GrantFiled: December 1, 2006Date of Patent: August 10, 2010Assignee: International Business Machines CorporationInventors: Amir Bar-Or, Michael James Beckerle
-
Patent number: 7769980Abstract: In arithmetic/logic units (ALU) provided corresponding to entries, an MIMD instruction decoder generating a group of control signals in accordance with a Multiple Instruction-Multiple Data (MIMD) instruction and an MIMD register storing data designating the MIMD instruction are provided, and an inter-ALU communication circuit is provided. The amount and direction of movement of the inter-ALU communication circuit are set by data bits stored in a movement data register. It is possible to execute data movement and arithmetic/logic operation with the amount of movement and operation instruction set individually for each ALU unit. Therefore, in a Single Instruction-Multiple Data type processing device, Multiple Instruction-Multiple Data operation can be executed at high speed in a flexible manner.Type: GrantFiled: August 16, 2007Date of Patent: August 3, 2010Assignee: Renesas Technology Corp.Inventors: Toshinori Sueyoshi, Masahiro Iida, Mitsutaka Nakano, Fumiaki Senoue, Katsuya Mizumoto
-
Patent number: 7747771Abstract: A method and mechanism for managing access to a plurality of registers in a processing device are contemplated. A processing device includes multiple nodes coupled to a ring bus, each of which include one or more registers which may be accessed by processes executing within the device. Also coupled to the ring bus is a ring control unit which is configured to initiate transactions targeted to nodes on the ring bus. Each of the nodes are configured receive and process bus transaction with a fixed latency whether or not the first transaction is targeted to the receiving node. The ring control unit is configured to periodically convey idle transactions on the ring bus in order to allow nodes responding to indeterminate transactions to gain access to the bus.Type: GrantFiled: June 30, 2004Date of Patent: June 29, 2010Assignee: Oracle America, Inc.Inventors: Manish Shah, Robert T. Golla, Mark A. Luttrell, Gregory F. Grohoski
-
Patent number: 7730463Abstract: A computer implemented method, system and computer program product for automatically generating SIMD code. The method begins by analyzing data to be accessed by a targeted loop including at least one statement, where each statement has at least one memory reference, to determine if memory accesses are safe. If memory accesses are safe, the targeted loop is simdized. If not safe, it is determined if a scheme can be applied in which safety need not be guaranteed. If such a scheme can be applied, the targeted loop is simdized according to the scheme. If such a scheme cannot be applied, it is determined if padding is appropriate. If padding is appropriate, the data is padded and the targeted loop is simdized. If padding is not appropriate, non-simdized code is generated based on the targeted loop for handling boundary conditions, the targeted loop is simdized and combined with the non-simdized code.Type: GrantFiled: February 21, 2006Date of Patent: June 1, 2010Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu, Peng Zhao
-
Patent number: 7730280Abstract: A control processor is used for fetching and distributing single instruction multiple data (SIMD) instructions to a plurality of processing elements (PEs). One of the SIMD instructions is a thread start (Tstart) instruction, which causes the control processor to pause its instruction fetching. A local PE instruction memory (PE Imem) is associated with each PE and contains local PE instructions for execution on the local PE. Local PE Imem fetch, decode, and execute logic are associated with each PE. Instruction path selection logic in each PE is used to select between control processor distributed instructions and local PE instructions fetched from the local PE Imem. Each PE is also initialized to receive control processor distributed instructions. In addition, local hold generation logic is associated with each PE. A PE receiving a Tstart instruction causes the instruction path selection logic to switch to fetch local PE Imem instructions.Type: GrantFiled: April 18, 2007Date of Patent: June 1, 2010Assignee: Vicore Technologies, Inc.Inventors: Gerald George Pechanek, Edwin Franklin Barry, Mihailo M. Stojancic
-
Patent number: 7661006Abstract: A computer implemented method, apparatus, and computer program product for managing symmetric multiprocessor interconnects. The process identifies functional communication connections between each processor in a plurality of processors on a multiprocessor to form identified functional communication connections. The process maps every functional communication connection between any two processors in the plurality of processors, based on the identified functional communication connections, to form an interconnect matrix. The process creates a path map using the interconnect matrix. The path map comprises a sequence of communication connections between the plurality of processors. The process initializes the plurality of processors using the path map.Type: GrantFiled: January 9, 2007Date of Patent: February 9, 2010Assignee: International Business Machines CorporationInventors: Luai A. Abou-Emara, Mark David McLaughlin, Jorge N. Yanez
-
Patent number: 7657880Abstract: The latencies associated with retrieving instruction information for a main thread are decreased through the use of a simultaneous helper thread. The helper thread is permitted to execute Store instructions. Store blocker logic operates to prevent data associated with a Store instruction in a helper thread from being committed to memory. Dependence blocker logic operates to prevent data associated with a Store instruction in a speculative helper thread from being bypassed to a Load instruction in a non-speculative thread.Type: GrantFiled: August 1, 2003Date of Patent: February 2, 2010Assignee: Intel CorporationInventors: Hong Wang, Tor Aamodt, Per Hammarlund, John Shen, Xinmin Tian, Milind Girkar, Perry Wang, Steve Shih-wei Liao
-
Patent number: 7656412Abstract: A system, a method and computer-readable media for performing texture resampling algorithms on a processing device. A texture resampling algorithm is selected. This algorithm is decomposed into multiple one-dimensional transformations. Instructions for performing each of the one-dimensional transformations are communicated to a processing device, such as a GPU. The processing device may generate an output image by separately executing the instructions associated with each of the one-dimensional transformations.Type: GrantFiled: December 21, 2005Date of Patent: February 2, 2010Assignee: Microsoft CorporationInventors: Denis Demandolx, Steven White
-
Patent number: 7610371Abstract: A mediation method and a mediation system divided into independent node components that process event records independently of the other components of the system. In addition, the system is provided with at least one node manager component that configures the node components and starts them up, when required. Further, the node manager component monitors the functioning of the node components and also stops the node components, if required. Each of the independent node components operates according to its own settings and is thus self-contained and capable of continuing operation even though some of the other components are temporarily inoperative. The system also includes a system database that manages configuration information and stores audit trail data.Type: GrantFiled: April 23, 2004Date of Patent: October 27, 2009Assignee: Comptel OyjInventor: Juhana Enqvist
-
Patent number: 7593947Abstract: A system, method and program product for processing a multiplicity of data update requests made by a customer. All of the data update requests are grouped into a plurality of blocks for execution by a data processor. The data update requests within each of the blocks and from one of the blocks to a next one of the blocks are arranged in an order that the data update requests need to be executed to yield a proper data result. Each of the blocks have approximately a same capacity for the data update requests. The capacity corresponds to a number of the data update requests which the data processor can efficiently process in order before processing the data update requests in the next one of the blocks. Then, the data processor processes the data update requests within the one block in the order. Then, the data processor processes the data update requests within the next block in the order. The order is an order in which the data update requests were made.Type: GrantFiled: March 16, 2004Date of Patent: September 22, 2009Assignees: IBM Corporation, The Bank of Tokyo-Mitsubishi UFJ, Ltd.Inventors: Izumi Nagai, Yohichi Hoshijima, Kazuoki Takahashi
-
Patent number: 7512724Abstract: One embodiment of the present invention performs peripheral operations in a multi-thread processor. A peripheral bus is coupled to a peripheral unit to transfer peripheral information including a command message specifying a peripheral operation. A processing slice is coupled to the peripheral bus to execute a plurality of threads. The plurality of threads includes a first thread sending the command message to the peripheral unit.Type: GrantFiled: November 17, 2000Date of Patent: March 31, 2009Assignee: The United States of America as represented by the Secretary of the NavyInventors: Jack B. Dennis, Sam B. Sandbote
-
Publication number: 20080059763Abstract: A method and system of processing compressed multimedia data using fine-grain instruction parallelism is provided. The method of processing multimedia data includes transferring an instruction from each of a plurality of sequencers to associated processing elements within an array of processing elements. The instructions can be processed by the array of processing elements using fine-grain instruction parallelism. A selection mechanism using selection instructions can select the associated processing elements. The plurality of sequencers comprise fine-grain instructions for decoding the compressed multimedia data. A system for multimedia data processing includes a data parallel system which can include an array of processing elements. A plurality of sequencers are coupled to the array of processing elements. A direct memory access component is coupled to the array of processing elements. A diagonal mapping scheme can be used in transferring instructions and data to the processing elements.Type: ApplicationFiled: August 30, 2007Publication date: March 6, 2008Inventor: Lazar Bivolarski
-
Patent number: 7340591Abstract: A number of architectural and implementation approaches are described for using extra path (Epath) storage that operate in conjunction with a compute register file to obtain increased instruction level parallelism that more flexibly addresses the requirements of high performance algorithms. A processor that supports a single load data to a register file operation can be doubled in load capability through the use of an extra path storage, an additional independently addressable data memory path, and instruction decode information that specifies two independently load data operations. By allowing the extra path storage to be accessible by arithmetic facilities, the increased data bandwidth can be fully utilized.Type: GrantFiled: October 28, 2004Date of Patent: March 4, 2008Assignee: Altera CorporationInventors: Gerald George Pechanek, Patrick R. Marchand, Larry D. Larsen
-
Patent number: RE41703Abstract: A SIMD machine employing a plurality of parallel processor (PEs) in which communications hazards are eliminated in an efficient manner. An indirect Very Long Instruction Word instruction memory (VIM) is employed along with execute and delimiter instructions. A masking mechanism may be employed to control which PEs have their VIMs loaded. Further, a receive model of operation is preferably employed. In one aspect, each PE operates to control a switch that selects from which PE it receives. The present invention addresses a better machine organization for execution of parallel algorithms that reduces hardware cost and complexity while maintaining the best characteristics of both SIMD and MIMD machines and minimizing communication latency. This invention brings a level of MIMD computational autonomy to SIMD indirect Very Long Instruction Word (iVLIW) processing elements while maintaining the single thread of control used in the SIMD machine organization.Type: GrantFiled: June 21, 2004Date of Patent: September 14, 2010Assignee: Altera Corp.Inventors: Gerald George Pechanek, Thomas L. Drabenstott, Juan Guillermo Revilla, David Strube, Grayson Morris