Multiple Instruction, Multiple Data (mimd) Patents (Class 712/21)

Scalable acceleration of reentrant compute operations

Patent number: 12147379

Abstract: Examples herein describe techniques for performing parallel processing using a plurality of processing elements (PEs) and a controller for data that has data dependencies. For example, a calculation may require an entire row or column to be summed, or to determine its mean. The PEs can be assigned different chunks of a data set (e.g., a tensor set, a column, or a row) for processing. The PEs can use one or more tokens to inform the controller when they are done with partial processing of their data chunks. The controller can then gather the partial results and determine an intermediate value for the data set. The controller can then distribute this intermediate value to the PEs which then re-process their respective data chunks using the intermediate value to generate final results.

Type: Grant

Filed: December 28, 2022

Date of Patent: November 19, 2024

Assignee: XILINX, INC.

Inventors: Rajeev Patwari, Jorn Tuyls, Elliott Delaye, Xiao Teng, Ephrem Wu
Instruction set

Patent number: 12141092

Abstract: The invention relates to a computer program comprising a sequence of instructions for execution on a processing unit having instruction storage for holding the computer program, an execution unit for executing the computer program and data storage for holding data, the computer program comprising one or more computer executable instruction which, when executed, implements: a send function which causes a data packet destined for a recipient processing unit to be transmitted on a set of connection wires connected to the processing unit, the data packet having no destination identifier but being transmitted at a predetermined transmit time; and a switch control function which causes the processing unit to control switching circuitry to connect a set of connection wires of the processing unit to a switching fabric to receive a data packet at a predetermined receive time.

Type: Grant

Filed: April 6, 2022

Date of Patent: November 12, 2024

Assignee: GRAPHCORE LIMTIED

Inventors: Simon Christian Knowles, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Alan Graham Alexander, Stephen Felix, Jonathan Mangnall, David Lacey
Communication range control device, method, and program

Patent number: 11818587

Abstract: When data transmitted from devices is processed by servers in a shared manner, the data is processed in a constantly stable manner when the devices move. Based on information indicating installation positions and processing capabilities of the servers, information on installation positions of base stations, and acquired location information of the devices, the coverage area control apparatus performs, at a certain time interval, a clustering calculation for obtaining an optimum coverage area for the servers that satisfies the requirements that the communication distances between the servers and the respective devices be minimized and the servers not be overloaded. Based on the information indicating the optimum coverage area obtained by the clustering calculation, assignments of the base stations to the servers are updated.

Type: Grant

Filed: October 9, 2019

Date of Patent: November 14, 2023

Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION

Inventors: Masahiro Yoshida, Koya Mori, Tomohiro Inoue, Hiroyuki Tanaka
High bandwidth memory system with distributed request broadcasting masters

Patent number: 11537301

Abstract: A system comprises a processor and a plurality of memory units. The processor is coupled to each of the plurality of memory units by a plurality of network connections. The processor includes a plurality of processing elements arranged in a two-dimensional array and a corresponding two-dimensional communication network communicatively connecting each of the plurality of processing elements to other processing elements on same axes of the two-dimensional array. Each processing element that is located along a diagonal of the two-dimensional array is configured as a request broadcasting master for a respective group of processing elements located along a same axis of the two-dimensional array.

Type: Grant

Filed: May 4, 2021

Date of Patent: December 27, 2022

Assignee: Meta Platforms, Inc.

Inventors: Abdulkadir Utku Diril, Olivia Wu, Krishnakumar Narayanan Nair, Aravind Kalaiah, Anup Ramesh Kadkol, Pankaj Kansal
Remote automatic control power supply system

Patent number: 11385692

Abstract: A remote automatic control power supply system is disclosed, comprising a power supply control device and an electronic device having a control circuit, in which the power supply control device is configured to control whether the power supply is to be outputted, and the control circuit can set the GPS coordinate and the starting distance value close to the power supply control device; afterwards, it is possible to operate the control circuit via the backend of the electronic device such that, when the distance between the real-time GPS coordinate of the electronic device and the GPS coordinate of the power supply control device is equivalent to the starting distance value, the control circuit can transmit a power control signal to the power supply control device thereby allowing the power supplying control device to output the electric power to the receiving end.

Type: Grant

Filed: November 27, 2019

Date of Patent: July 12, 2022

Inventor: Chao-Cheng Yu
Data arrangement management in a distributed data cluster environment of a shared pool of configurable computing resources

Patent number: 11238045

Abstract: Disclosed aspects relate to data arrangement management in a distributed data cluster environment of a shared pool of configurable computing resources. In the distributed data cluster environment, a set of data is monitored for a data redistribution candidate trigger. The data redistribution candidate trigger is detected with respect to the set of data. Based on the data redistribution candidate trigger, the set of data is analyzed with respect to a candidate data redistribution action. Using the candidate data redistribution action, a new data arrangement associated with the set of data is determined. Accordingly, the new data arrangement is established.

Type: Grant

Filed: June 24, 2019

Date of Patent: February 1, 2022

Assignee: International Business Machines Corporation

Inventors: Naresh K. Chainani, James H. Cho
Coprocessor context priority

Patent number: 11210104

Abstract: A system may include a plurality of processors and a coprocessor. A plurality of coprocessor context priority registers corresponding to a plurality of contexts supported by the coprocessor may be included. The plurality of processors may use the plurality of contexts, and may program the coprocessor context priority register corresponding to a context with a value specifying a priority of the context relative to other contexts. An arbiter may arbitrate among instructions issued by the plurality of processors based on the priorities in the plurality of coprocessor context priority registers. In one embodiment, real-time threads may be assigned higher priorities than bulk processing tasks, improving bandwidth allocated to the real-time threads as compared to the bulk tasks.

Type: Grant

Filed: September 11, 2020

Date of Patent: December 28, 2021

Assignee: Apple Inc.

Inventors: Aditya Kesiraju, Andrew J. Beaumont-Smith, Brian P. Lilly, James Vash, Jason M. Kassoff, Krishna C. Potnuru, Rajdeep L. Bhuyar, Ran A. Chachick, Tyler J. Huberty, Derek R. Kumar
Technologies for efficiently booting sleds in a disaggregated architecture

Patent number: 11030017

Abstract: Technologies for efficiently booting sleds in a disaggregated architecture include a sled. The sled includes a network interface controller, a set of processors, and firmware that includes an operating system. Additionally, the sled includes circuitry to perform, with multiple processors in the set of processors, a boot process. The circuitry is also to initialize the operating system present in the firmware, receive, with the network interface controller and from another sled, an assignment of a workload, and execute the assigned workload with the operating system.

Type: Grant

Filed: March 23, 2018

Date of Patent: June 8, 2021

Assignee: Intel Corporation

Inventors: Mohan J. Kumar, Murugasamy K. Nachimuthu
Data processing engine (DPE) array global mapping

Patent number: 10853541

Abstract: Some examples described herein relate to global mapping of program nodes of a netlist of an application. In an example, a design system includes a processor and a memory coupled to the processor. The memory stores instruction code. The processor is configured to execute the instruction code to obtain a netlist of an application. The netlist contains program nodes and respective edges between the program nodes. The application is to be implemented on a device comprising an array of data processing engines. The processor is also configured to execute the instruction code to generate a global mapping of the program nodes based on a representation of the array of data processing engines and using an integer linear programming (ILP) algorithm; generate a detailed mapping of the program nodes based on the global mapping; and translate the detailed mapping to a file.

Type: Grant

Filed: April 30, 2019

Date of Patent: December 1, 2020

Assignee: XILINX, INC.

Inventors: Abhishek Joshi, Grigor S. Gasparyan, Aditya Chaubal, Sridhar Kirshnamurthy, Xiao Dong
Implementing integrated circuit designs using depopulation and repopulation operations

Patent number: 9811621

Abstract: Circuit design computing equipment may perform depopulation operations, constraint generation, and repopulation operations in a circuit design in anticipation of register retiming operations. A depopulation operation before placement and/or before routing operations may prevent the respective placement and/or routing operations from placing and/or routing registers from the circuit design. Constraint generation may create constraints for placement and/or routing operations that allow for the reinsertion of registers after routing operations. Repopulation operations may reinsert registers in the circuit design after routing operations according to the constraints. If desired, the circuit design computing equipment may perform register retiming operations to further improve the performance of the circuit design.

Type: Grant

Filed: May 1, 2015

Date of Patent: November 7, 2017

Assignee: Altera Corporation

Inventors: Kimberly Anne Bozman, David Ian Milton, Nishanth Sinnadurai
Select logic for the instruction scheduler of a multi strand out-of-order processor based on delayed reconstructed program order

Patent number: 9632790

Abstract: A processing device comprises select logic to schedule a plurality of instructions for execution. The select logic calculates a reconstructed program order (RPO) value for each of a plurality of instructions that are ready to be scheduled for execution. The select logic creates an ordered list of instructions based on the delayed RPO values, the delayed RPO values comprising the calculated RPO values from a previous execution cycle, and dispatches instructions for scheduling based on the ordered list.

Type: Grant

Filed: December 26, 2012

Date of Patent: April 25, 2017

Assignee: Intel Corporation

Inventors: Jayesh Iyer, Nikolay Kosarev, Sergey Y. Shishlov, Alexey Y. Sivtsov, Yuriy V Baida, Alexander V Butuzov, Bob Babayan, Vladimir Pentkovski
Computing device boot software authentication

Patent number: 9589138

Abstract: Various embodiments are generally directed to authenticating a chain of components of boot software of a computing device. An apparatus comprises a processor circuit and storage storing an initial boot software component comprising instructions operative on the processor circuit to select a first set of boot software components of multiple sets of boot software components, each set of boot software components defines a pathway that branches from the initial boot software component and that rejoins at a latter boot software component; authenticate a first boot software component of the first set of boot software components; and execute a sequence of instructions of the first boot software component to authenticate a second boot software component of the first set of boot software components to form a chain of authentication through a first pathway defined by the first set of boot software components. Other embodiments are described and claimed herein.

Type: Grant

Filed: September 21, 2015

Date of Patent: March 7, 2017

Assignee: INTEL CORPORATION

Inventors: Jiewen Yao, Vincent J. Zimmer
Migration of large data from on-line content management to archival content management

Patent number: 9424135

Abstract: A method for content management in a file system includes creating a file system storage device with a storage area symbolic name. A core table data structure is generated including a core table partition field. An ancillary table data structure is generated including an ancillary table partition field. Partitioning rules for the file system storage device are cached based on the storage area symbolic name. Property values for a document are compared with the cached partitioning rules to determine a match for the storage area symbolic name. The document is stored into a partition of the file system storage device based on the storage area symbolic name. Metadata for the document is stored into a partition of the core table based on the core table partition field. Ancillary objects for the document are stored into a partition of the ancillary table based on the ancillary partition fields.

Type: Grant

Filed: December 30, 2015

Date of Patent: August 23, 2016

Assignee: International Business Machines Corporation

Inventors: William J. Carpenter, David G. Skinner, Hailin Wang, Michael G. Winter, Jun Xu
Batch command techniques for a data storage device

Patent number: 9400747

Abstract: A data storage device includes a non-volatile memory and a controller. A method includes sending a memory command from the controller to the non-volatile memory. The memory command indicates multiple sense operations to be performed at a single plane of the non-volatile memory.

Type: Grant

Filed: April 29, 2014

Date of Patent: July 26, 2016

Assignee: SANDISK TECHNOLOGIES LLC

Inventors: Daniel Edward Tuers, Abhijeet Manohar, Mark Murin, Mark Shlick, Menahem Lasser
Multithreaded DFA architecture for finding rules match by concurrently performing at varying input stream positions and sorting result tokens

Patent number: 9009448

Abstract: Disclosed is an architecture, system and method for performing multi-thread DFA descents on a single input stream. An executer performs DFA transitions from a plurality of threads each starting at a different point in an input stream. A plurality of executers may operate in parallel to each other and a plurality of thread contexts operate concurrently within each executer to maintain the context of each thread which is state transitioning. A scheduler in each executer arbitrates instructions for the thread into an at least one pipeline where the instructions are executed. Tokens may be output from each of the plurality of executers to a token processor which sorts and filters the tokens into dispatch order.

Type: Grant

Filed: January 18, 2012

Date of Patent: April 14, 2015

Assignee: Intel Corporation

Inventors: Michael Ruehle, Umesh Ramkrishnarao Kasture, Vinay Janardan Naik, Nayan Amrutlal Suthar, Robert J. McMillen
SIMD processor for performing data filtering and/or interpolation

Patent number: 8856494

Abstract: Data processing circuit containing an instruction execution circuit having an instruction set comprising a SIMD instruction. The instruction execution circuit comprises arithmetic circuits, arranged to perform N respective identical operations in parallel in response to the SIMD instruction. The SIMD instruction selects a first one and a second one of the registers. The SIMD instruction defines a first and second series of N respective SIMD instruction operands of the SIMD instruction from the addressed registers. Each arithmetic circuit receives a respective first operand and a respective second operand from the first and second series respectively. The instruction execution circuit selects the first and second series so they partially overlap. Positioning the operands is under program control.

Type: Grant

Filed: January 11, 2012

Date of Patent: October 7, 2014

Assignee: Intel Corporation

Inventor: Antonius A. M. Van Wel
Method and system for processing image data on a per tile basis in an image sensor pipeline

Patent number: 8798386

Abstract: Methods and systems for processing image data on a per tile basis in an image sensor pipeline (ISP) are disclosed and may include communicating, to one or more processing modules via control logic circuits integrated in the ISP, corresponding configuration parameters that are associated with each of a plurality of data tiles comprising an image. The ISP may be integrated in a video processing core. The plurality of data tiles may vary in size. A processing complete signal may be communicated to the control logic circuits when the processing of each of the data tiles is complete prior to configuring a subsequent processing module. The processing may comprise one or more of: lens shading correction, statistics, distortion correction, demosaicing, denoising, defective pixel correction, color correction, and resizing. Each of the data tiles may overlap with adjacent data tiles, and at least a portion of them may be processed concurrently.

Type: Grant

Filed: July 13, 2010

Date of Patent: August 5, 2014

Assignee: Broadcom Corporation

Inventors: Adrian Lees, David Plowman
VARIABLE DEPTH INSTRUCTION FIFOS TO IMPLEMENT SIMD ARCHITECTURE

Publication number: 20140195777

Abstract: In a particular embodiment, a method may include creating a plurality of variable depth instruction FIFOs and a plurality of data caches from a plurality of caches corresponding to a plurality of processors, where the plurality of caches and the plurality of processors correspond to MIMD architecture. The method may also include configuring the plurality of variable depth instruction FIFOs to implement SIMD architecture. The method may also include configuring the plurality of variable depth instruction FIFOs for at least one of SIMD operation, SIMD operation with staging, or RC-SIMD operation.

Type: Application

Filed: January 10, 2013

Publication date: July 10, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Mark D. Bellows, Mark S. Fredrickson, Scott D. Frei, Steven P. Jones, Chad B. McBride
Data sharing in high-fidelity simulation and real-time multi-core execution

Patent number: 8732359

Abstract: When executing a graphical model of a dynamic system that includes two or more concurrently executing sets of operations, a processor is configured to create a first buffer and a second buffer within the executable graphical model. A first set of operations is configured to write data to the first buffer during a first execution instance of the first set of operations. The first set of operations is configured to write data to the second buffer during a second execution instance of the first thread. A second set of operations is configured to read the data from the first buffer during an instance of the second thread that executes contemporaneously with the second execution instance of the first set of operations. Determinations regarding access to the first buffer and second buffer by the first thread and second thread are self-contained within the first thread and second thread, respectively.

Type: Grant

Filed: December 7, 2012

Date of Patent: May 20, 2014

Assignee: The MathWorks, Inc.

Inventors: James E. Carrick, Biao Yu
Packet draining from a scheduling hierarchy in a traffic manager of a network processor

Patent number: 8638805

Abstract: Described embodiments provide for restructuring a scheduling hierarchy of a network processor having a plurality of processing modules and a shared memory. The scheduling hierarchy schedules packets for transmission. The network processor generates tasks corresponding to each received packet associated with a data flow. A traffic manager receives tasks provided by one of the processing modules and determines a queue of the scheduling hierarchy corresponding to the task. The queue has a parent scheduler at each of one or more next levels of the scheduling hierarchy up to a root scheduler, forming a branch of the hierarchy. The traffic manager determines if the queue and one or more of the parent schedulers of the branch should be restructured. If so, the traffic manager drops subsequently received tasks for the branch, drains all tasks of the branch, and removes the corresponding nodes of the branch from the scheduling hierarchy.

Type: Grant

Filed: September 30, 2011

Date of Patent: January 28, 2014

Assignee: LSI Corporation

Inventors: Balakrishnan Sundararaman, Shashank Nemawarkar, David Sonnier, Shailendra Aulakh, Allen Vestal
Register indexed sampler for texture opcodes

Patent number: 8624910

Abstract: One embodiment of the present invention sets forth a technique for dynamically specifying a texture header and texture sampler using an index. The index corresponds to a particular register value that may be static or computed during execution of a shader program. Any texture operation instruction may specify an index value for each of the texture header and the texture sampler.

Type: Grant

Filed: August 25, 2010

Date of Patent: January 7, 2014

Assignee: Nvidia Corporation

Inventors: John Erik Lindholm, Yan Yan Tang
Selectively isolating processor elements into subsets of processor elements

Patent number: 8532288

Abstract: A cryptographic engine for modulo N multiplication, which is structured as a plurality of almost identical, serially connected Processing Elements, is controlled so as to accept input in blocks that are smaller than the maximum capability of the engine in terms of bits multiplied at one time. The serially connected hardware is thus partitioned on the fly to process a variety of cryptographic key sizes while still maintaining all of the hardware in an active processing state.

Type: Grant

Filed: December 1, 2006

Date of Patent: September 10, 2013

Assignee: International Business Machines Corporation

Inventors: Camil Fayad, John K. Li, Siegfried K. H. Sutter, Phil C. Yeh
Avoiding stall in processor pipeline upon read after write resource conflict when intervening write present

Patent number: 8499139

Abstract: An apparatus having a processor and a circuit is disclosed. The processor generally has a pipeline. The circuit may be configured to (i) detect a first write instruction in the pipeline that writes to a resource, (ii) stall a read instruction in the pipeline where (a) a first read-after-write conflict exists between the first write instruction and the read instruction and (b) no other write instruction to the resource is scheduled between the first write instruction and the read instruction and (iii) not stall the read instruction due to the first read-after-write conflict where a second write instruction to the resource is scheduled between the first write instruction and the read instruction.

Type: Grant

Filed: August 12, 2010

Date of Patent: July 30, 2013

Assignee: LSI Corporation

Inventors: Leonid Dubrovin, Alexander Rabinovitch, Hagit Margolin, Noam Abda
Single instruction processing of network packets

Patent number: 8493979

Abstract: Executing a single instruction/multiple data (SIMD) instruction of a program to process a vector of data wherein each element of the packet vector corresponds to a different received packet.

Type: Grant

Filed: December 30, 2008

Date of Patent: July 23, 2013

Assignee: Intel Corporation

Inventors: Bryan E. Veal, Travis T. Schluessler
Computer architecture and method of operation for multi-computer distributed processing having redundant array of independent systems with replicated memory and code striping

Patent number: 8316190

Abstract: Computers and other computing machines and information appliances having a modified computer architecture and program structure which enables the operation of an application program concurrently or simultaneously on a plurality of computers interconnected via a communications link or network using a special distributed runtime (DRT), and that provides for a redundant array of independent computing systems that include computer code distribution using code-striping onto the plurality of the computers or computing machines. A redundant array of independent computing systems operating in concert and code-striping features.

Type: Grant

Filed: March 19, 2008

Date of Patent: November 20, 2012

Assignee: Waratek Pty. Ltd.

Inventor: John M. Holt
Enabling speculative state information in a cache coherency protocol

Patent number: 8185700

Abstract: In one embodiment, the present invention includes a method for receiving a bus message in a first cache corresponding to a speculative access to a portion of a second cache by a second thread, and dynamically determining in the first cache if an inter-thread dependency exists between the second thread and a first thread associated with the first cache with respect to the portion. Other embodiments are described and claimed.

Type: Grant

Filed: May 30, 2006

Date of Patent: May 22, 2012

Assignee: Intel Corporation

Inventors: Carlos Madriles Gimeno, Carlos García Quinones, Pedro Marcuello, Jesús Sánchez, Fernando Latorre, Antonio González
System of lanes of processing units receiving instructions via shared memory units for data-parallel or task-parallel operations

Patent number: 8180998

Abstract: A system for performing data-parallel operations and task-parallel operations. A first switch fabric node (SFN) includes first and second lane processing engines (LPEs). The first LPE includes a first set of lane processing units (LPUs) configured to perform data-parallel operations, where each LPU performs a set of operations, and each LPU uses a different set of data for the set of operations, and each LPU within the first set of LPUs uses a different set of data for the set of operations. The second LPE includes a second set of LPUs configured to perform task-parallel operations, where each LPU performs a different set of operations. A processing control engine (PCE) is configured to distribute instructions and data to the first LPE and the second LPE. Advantageously, data parallel operations and task parallel operations are able to be performed on the same processor simultaneously.

Type: Grant

Filed: September 10, 2008

Date of Patent: May 15, 2012

Assignee: NVIDIA Corporation

Inventors: Monier Maher, Christopher Lamb, Sanjay J. Patel, Peter Hsu
SHARED RESOURCE MULTI-THREAD PROCESSOR ARRAY

Publication number: 20120089812

Abstract: A shared resource multi-thread processor array wherein an array of heterogeneous function blocks are interconnected via a self-routing switch fabric, in which the individual function blocks have an associated switch port address. Each switch output port comprises a FIFO style memory that implements a plurality of separate queues. Thread queue empty flags are grouped using programmable circuit means to form self-synchronised threads. Data from different threads are passed to the various addressable function blocks in a predefined sequence in order to implement the desired function. The separate port queues allows data from different threads to share the same hardware resources and the reconfiguration of switch fabric addresses further enables the formation of different data-paths allowing the array to be configured for use in various applications.

Type: Application

Filed: June 9, 2010

Publication date: April 12, 2012

Inventor: Graeme Roy Smith
Methods and apparatus for independent processor node operations in a SIMD array processor

Patent number: 8103854

Abstract: A control processor is used for fetching and distributing single instruction multiple data (SIMD) instructions to a plurality of processing elements (PEs). One of the SIMD instructions is a thread start (Tstart) instruction, which causes the control processor to pause its instruction fetching. A local PE instruction memory (PE Imem) is associated with each PE and contains local PE instructions for execution on the local PE. Local PE Imem fetch, decode, and execute logic are associated with each PE. Instruction path selection logic in each PE is used to select between control processor distributed instructions and local PE instructions fetched from the local PE Imem. Each PE is also initialized to receive control processor distributed instructions. In addition, local hold generation logic is associated with each PE. A PE receiving a Tstart instruction causes the instruction path selection logic to switch to fetch local PE Imem instructions.

Type: Grant

Filed: April 12, 2010

Date of Patent: January 24, 2012

Assignee: Altera Corporation

Inventors: Gerald George Pechanek, Edwin Franklin Barry, Mihailo M. Stojancic
Supplying instruction stored in local memory configured as cache to peer processing elements in MIMD processing units

Patent number: 8051273

Abstract: Disclosed is a mixed mode parallel processor system in which N number of processing elements PEs, capable of performing SIMD operation, are grouped into M (=N÷S) processing units PUs performing MIMD operation. In MIMD operation, P out of S memories in each PU, which S memories inherently belong to the PEs, where P<S, operate as an instruction cache. The remaining memories operate as data memories or as data cache memories. One out of S sets of general-purpose registers, inherently belonging to the PEs, directly operates as a general register group for the PU. Out of the remaining S?1 sets, T set or a required number of sets, where T<S?1, are used as storage registers that store tags of the instruction cache.

Type: Grant

Filed: November 2, 2010

Date of Patent: November 1, 2011

Assignee: NEC Corporation

Inventor: Shorin Kyo
Dynamic virtual software pipelining on a network on chip

Patent number: 8020168

Abstract: A NOC for dynamic virtual software pipelining including IP blocks, routers, memory communications controllers, and network interface controllers, each IP block adapted to a router through a memory communications controller and a network interface controller, the NOC also including: a computer software application segmented into stages, each stage comprising a flexibly configurable module of computer program instructions identified by a stage ID, each stage assigned to a thread of execution on an IP block; and each stage executing on a thread of execution on an IP block, including a first stage executing on an IP block, producing output data and sending by the first stage the produced output data to a second stage, the output data including control information for the next stage and payload data; and the second stage consuming the produced output data in dependence upon the control information.

Type: Grant

Filed: May 9, 2008

Date of Patent: September 13, 2011

Assignee: International Business Machines Corporation

Inventors: Russell D. Hoover, Eric O. Mejdrich, Paul E. Schardt, Robert A. Shearer
Re-executing launcher program upon termination of launched programs in MIMD mode booted SIMD partitions

Patent number: 7979674

Abstract: Executing MIMD programs on a SIMD machine, the SIMD machine including a plurality of compute nodes, each compute node capable of executing only a single thread of execution, the compute nodes initially configured exclusively for SIMD operations, the SIMD machine further comprising a data communications network, the network comprising synchronous data communications links among the compute nodes, including establishing one or more SIMD partitions, booting one or more SIMD partitions in MIMD mode; establishing a MIMD partition; executing by launcher programs a plurality of MIMD programs on two or more of the compute nodes of the MIMD partition; and re-executing a launcher program by an operating system on a compute node in the MIMD partition upon termination of the MIMD program executed by the launcher program.

Type: Grant

Filed: May 16, 2007

Date of Patent: July 12, 2011

Assignee: International Business Machines Corporation

Inventors: Todd A. Inglett, Patrick J. McCarthy, Amanda Peters, Thomas A. Budnik, Michael B. Mundy, Gordon G. Stewart
Simultaneous multi-thread instructions issue to execution units while substitute injecting sequence of instructions for long latency sequencer instruction via multiplexer

Patent number: 7941644

Abstract: A processing unit includes multiple execution units and sequencer logic that is disposed downstream of instruction buffer logic, and that is responsive to a sequencer instruction present in an instruction stream. In response to such an instruction, the sequencer logic issues a plurality of instructions associated with a long latency operation to one execution unit, while blocking instructions from the instruction buffer logic from being issued to that execution unit. In addition, the blocking of instructions from being issued to the execution unit does not affect the issuance of instructions to any other execution unit, and as such, other instructions from the instruction buffer logic are still capable of being issued to and executed by other execution units even while the sequencer logic is issuing the plurality of instructions associated with the long latency operation.

Type: Grant

Filed: October 16, 2008

Date of Patent: May 10, 2011

Assignee: International Business Machines Corporation

Inventors: Eric Oliver Mejdrich, Adam James Muff, Matthew Ray Tubbs
Graphics processing unit used for cryptographic processing

Patent number: 7916864

Abstract: A graphics processing unit is programmed to carry out cryptographic processing so that fast, effective cryptographic processing solutions can be provided without incurring additional hardware costs. The graphics processing unit can efficiently carry out cryptographic processing because it has an architecture that is configured to handle a large number of parallel processes. The cryptographic processing carried out on the graphics processing unit can be further improved by configuring the graphics processing unit to be capable of both floating point and integer operations.

Type: Grant

Filed: February 8, 2006

Date of Patent: March 29, 2011

Assignee: NVIDIA Corporation

Inventor: Norbert Juffa
Battery powered device with dynamic and performance management

Patent number: 7900067

Abstract: A computing device operates over a range of voltages and frequencies and over a range of processor usage levels. The computing device includes at least a variable frequency generator, a variable voltage power supply and voltage supply level and clocking frequency management circuitry. The variable frequency generator is coupled to the processor and delivers a clock signal to the processor. The variable voltage power supply is coupled to the processor and delivers voltage to the processor. The voltage supply level and clocking frequency management circuitry adjust both the voltage provided by the variable voltage power supply and the frequency of the signal provided by the variable frequency generator. The computing device includes a temperature sensor that provides signals indicative of the temperature of the processor and the voltage supply level and clocking frequency management circuitry adjusts the voltage and/or the clocking frequency provided by the variable voltage power supply.

Type: Grant

Filed: May 13, 2008

Date of Patent: March 1, 2011

Inventor: Paul Beard
Parallel generating of bundles of data objects

Patent number: 7810084

Abstract: Computer-implemented methods, computer systems and computer program products are provided for parallel processing a plurality of data objects with a plurality of processors. As disclosed herein, the data objects to be assembled for further processing may be in bundles, the bundles obeying first predefined criteria, which is dynamically controlled by using a bundle specific master table. The methods and systems may generate pipelines of data objects by pre-selecting and grouping the data objects according to second predefined criteria by a first group of the plurality of processors, and create the bundles from each pipeline of the pre-selected data objects by a second group of the plurality of processors.

Type: Grant

Filed: June 1, 2006

Date of Patent: October 5, 2010

Assignee: SAP AG

Inventor: Karsten S. Egetoft
System and method for simulating data flow using dataflow computing system

Patent number: 7774189

Abstract: A system and method for implementing a unified model for integration systems is presented. A user provides inputs to an integrated language engine for placing operator components and arc components onto a dataflow diagram. Operator components include data ports for expressing data flow, and also include meta-ports for expressing control flow. Arc components connect operator components together for data and control information to flow between the operator components. The dataflow diagram is a directed acyclic graph that expresses an application without including artificial boundaries during the application design process. Once the integrated language engine generates the dataflow diagram, the integrated language engine compiles the dataflow diagram to generated application code.

Type: Grant

Filed: December 1, 2006

Date of Patent: August 10, 2010

Assignee: International Business Machines Corporation

Inventors: Amir Bar-Or, Michael James Beckerle
Parallel operation device allowing efficient parallel operational processing

Patent number: 7769980

Abstract: In arithmetic/logic units (ALU) provided corresponding to entries, an MIMD instruction decoder generating a group of control signals in accordance with a Multiple Instruction-Multiple Data (MIMD) instruction and an MIMD register storing data designating the MIMD instruction are provided, and an inter-ALU communication circuit is provided. The amount and direction of movement of the inter-ALU communication circuit are set by data bits stored in a movement data register. It is possible to execute data movement and arithmetic/logic operation with the amount of movement and operation instruction set individually for each ALU unit. Therefore, in a Single Instruction-Multiple Data type processing device, Multiple Instruction-Multiple Data operation can be executed at high speed in a flexible manner.

Type: Grant

Filed: August 16, 2007

Date of Patent: August 3, 2010

Assignee: Renesas Technology Corp.

Inventors: Toshinori Sueyoshi, Masahiro Iida, Mitsutaka Nakano, Fumiaki Senoue, Katsuya Mizumoto
Register access protocol in a multihreaded multi-core processor

Patent number: 7747771

Abstract: A method and mechanism for managing access to a plurality of registers in a processing device are contemplated. A processing device includes multiple nodes coupled to a ring bus, each of which include one or more registers which may be accessed by processes executing within the device. Also coupled to the ring bus is a ring control unit which is configured to initiate transactions targeted to nodes on the ring bus. Each of the nodes are configured receive and process bus transaction with a fixed latency whether or not the first transaction is targeted to the receiving node. The ring control unit is configured to periodically convey idle transactions on the ring bus in order to allow nodes responding to indeterminate transactions to gain access to the bus.

Type: Grant

Filed: June 30, 2004

Date of Patent: June 29, 2010

Assignee: Oracle America, Inc.

Inventors: Manish Shah, Robert T. Golla, Mark A. Luttrell, Gregory F. Grohoski
Efficient generation of SIMD code in presence of multi-threading and other false sharing conditions and in machines having memory protection support

Patent number: 7730463

Abstract: A computer implemented method, system and computer program product for automatically generating SIMD code. The method begins by analyzing data to be accessed by a targeted loop including at least one statement, where each statement has at least one memory reference, to determine if memory accesses are safe. If memory accesses are safe, the targeted loop is simdized. If not safe, it is determined if a scheme can be applied in which safety need not be guaranteed. If such a scheme can be applied, the targeted loop is simdized according to the scheme. If such a scheme cannot be applied, it is determined if padding is appropriate. If padding is appropriate, the data is padded and the targeted loop is simdized. If padding is not appropriate, non-simdized code is generated based on the targeted loop for handling boundary conditions, the targeted loop is simdized and combined with the non-simdized code.

Type: Grant

Filed: February 21, 2006

Date of Patent: June 1, 2010

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu, Peng Zhao
Methods and apparatus for independent processor node operations in a SIMD array processor

Patent number: 7730280

Abstract: A control processor is used for fetching and distributing single instruction multiple data (SIMD) instructions to a plurality of processing elements (PEs). One of the SIMD instructions is a thread start (Tstart) instruction, which causes the control processor to pause its instruction fetching. A local PE instruction memory (PE Imem) is associated with each PE and contains local PE instructions for execution on the local PE. Local PE Imem fetch, decode, and execute logic are associated with each PE. Instruction path selection logic in each PE is used to select between control processor distributed instructions and local PE instructions fetched from the local PE Imem. Each PE is also initialized to receive control processor distributed instructions. In addition, local hold generation logic is associated with each PE. A PE receiving a Tstart instruction causes the instruction path selection logic to switch to fetch local PE Imem instructions.

Type: Grant

Filed: April 18, 2007

Date of Patent: June 1, 2010

Assignee: Vicore Technologies, Inc.

Inventors: Gerald George Pechanek, Edwin Franklin Barry, Mihailo M. Stojancic
Method and apparatus for self-healing symmetric multi-processor system interconnects

Patent number: 7661006

Abstract: A computer implemented method, apparatus, and computer program product for managing symmetric multiprocessor interconnects. The process identifies functional communication connections between each processor in a plurality of processors on a multiprocessor to form identified functional communication connections. The process maps every functional communication connection between any two processors in the plurality of processors, based on the identified functional communication connections, to form an interconnect matrix. The process creates a path map using the interconnect matrix. The path map comprises a sequence of communication connections between the plurality of processors. The process initializes the plurality of processors using the path map.

Type: Grant

Filed: January 9, 2007

Date of Patent: February 9, 2010

Assignee: International Business Machines Corporation

Inventors: Luai A. Abou-Emara, Mark David McLaughlin, Jorge N. Yanez
Safe store for speculative helper threads

Patent number: 7657880

Abstract: The latencies associated with retrieving instruction information for a main thread are decreased through the use of a simultaneous helper thread. The helper thread is permitted to execute Store instructions. Store blocker logic operates to prevent data associated with a Store instruction in a helper thread from being committed to memory. Dependence blocker logic operates to prevent data associated with a Store instruction in a speculative helper thread from being bypassed to a Load instruction in a non-speculative thread.

Type: Grant

Filed: August 1, 2003

Date of Patent: February 2, 2010

Assignee: Intel Corporation

Inventors: Hong Wang, Tor Aamodt, Per Hammarlund, John Shen, Xinmin Tian, Milind Girkar, Perry Wang, Steve Shih-wei Liao
Texture resampling with a processor

Patent number: 7656412

Abstract: A system, a method and computer-readable media for performing texture resampling algorithms on a processing device. A texture resampling algorithm is selected. This algorithm is decomposed into multiple one-dimensional transformations. Instructions for performing each of the one-dimensional transformations are communicated to a processing device, such as a GPU. The processing device may generate an output image by separately executing the instructions associated with each of the one-dimensional transformations.

Type: Grant

Filed: December 21, 2005

Date of Patent: February 2, 2010

Assignee: Microsoft Corporation

Inventors: Denis Demandolx, Steven White
Mediation system and method with real time processing capability

Patent number: 7610371

Abstract: A mediation method and a mediation system divided into independent node components that process event records independently of the other components of the system. In addition, the system is provided with at least one node manager component that configures the node components and starts them up, when required. Further, the node manager component monitors the functioning of the node components and also stops the node components, if required. Each of the independent node components operates according to its own settings and is thus self-contained and capable of continuing operation even though some of the other components are temporarily inoperative. The system also includes a system database that manages configuration information and stores audit trail data.

Type: Grant

Filed: April 23, 2004

Date of Patent: October 27, 2009

Assignee: Comptel Oyj

Inventor: Juhana Enqvist
System, method and program for grouping data update requests for efficient processing

Patent number: 7593947

Abstract: A system, method and program product for processing a multiplicity of data update requests made by a customer. All of the data update requests are grouped into a plurality of blocks for execution by a data processor. The data update requests within each of the blocks and from one of the blocks to a next one of the blocks are arranged in an order that the data update requests need to be executed to yield a proper data result. Each of the blocks have approximately a same capacity for the data update requests. The capacity corresponds to a number of the data update requests which the data processor can efficiently process in order before processing the data update requests in the next one of the blocks. Then, the data processor processes the data update requests within the one block in the order. Then, the data processor processes the data update requests within the next block in the order. The order is an order in which the data update requests were made.

Type: Grant

Filed: March 16, 2004

Date of Patent: September 22, 2009

Assignees: IBM Corporation, The Bank of Tokyo-Mitsubishi UFJ, Ltd.

Inventors: Izumi Nagai, Yohichi Hoshijima, Kazuoki Takahashi
Multi-thread peripheral processing using dedicated peripheral bus

Patent number: 7512724

Abstract: One embodiment of the present invention performs peripheral operations in a multi-thread processor. A peripheral bus is coupled to a peripheral unit to transfer peripheral information including a command message specifying a peripheral operation. A processing slice is coupled to the peripheral bus to execute a plurality of threads. The plurality of threads includes a first thread sending the command message to the peripheral unit.

Type: Grant

Filed: November 17, 2000

Date of Patent: March 31, 2009

Assignee: The United States of America as represented by the Secretary of the Navy

Inventors: Jack B. Dennis, Sam B. Sandbote
System and method for fine-grain instruction parallelism for increased efficiency of processing compressed multimedia data

Publication number: 20080059763

Abstract: A method and system of processing compressed multimedia data using fine-grain instruction parallelism is provided. The method of processing multimedia data includes transferring an instruction from each of a plurality of sequencers to associated processing elements within an array of processing elements. The instructions can be processed by the array of processing elements using fine-grain instruction parallelism. A selection mechanism using selection instructions can select the associated processing elements. The plurality of sequencers comprise fine-grain instructions for decoding the compressed multimedia data. A system for multimedia data processing includes a data parallel system which can include an array of processing elements. A plurality of sequencers are coupled to the array of processing elements. A direct memory access component is coupled to the array of processing elements. A diagonal mapping scheme can be used in transferring instructions and data to the processing elements.

Type: Application

Filed: August 30, 2007

Publication date: March 6, 2008

Inventor: Lazar Bivolarski
Providing parallel operand functions using register file and extra path storage

Patent number: 7340591

Abstract: A number of architectural and implementation approaches are described for using extra path (Epath) storage that operate in conjunction with a compute register file to obtain increased instruction level parallelism that more flexibly addresses the requirements of high performance algorithms. A processor that supports a single load data to a register file operation can be doubled in load capability through the use of an extra path storage, an additional independently addressable data memory path, and instruction decode information that specifies two independently load data operations. By allowing the extra path storage to be accessible by arithmetic facilities, the increased data bandwidth can be fully utilized.

Type: Grant

Filed: October 28, 2004

Date of Patent: March 4, 2008

Assignee: Altera Corporation

Inventors: Gerald George Pechanek, Patrick R. Marchand, Larry D. Larsen
Methods and apparatus for efficient synchronous MIMD operations with IVLIW PE-TO-PE communication

Patent number: RE41703

Abstract: A SIMD machine employing a plurality of parallel processor (PEs) in which communications hazards are eliminated in an efficient manner. An indirect Very Long Instruction Word instruction memory (VIM) is employed along with execute and delimiter instructions. A masking mechanism may be employed to control which PEs have their VIMs loaded. Further, a receive model of operation is preferably employed. In one aspect, each PE operates to control a switch that selects from which PE it receives. The present invention addresses a better machine organization for execution of parallel algorithms that reduces hardware cost and complexity while maintaining the best characteristics of both SIMD and MIMD machines and minimizing communication latency. This invention brings a level of MIMD computational autonomy to SIMD indirect Very Long Instruction Word (iVLIW) processing elements while maintaining the single thread of control used in the SIMD machine organization.

Type: Grant

Filed: June 21, 2004

Date of Patent: September 14, 2010

Assignee: Altera Corp.

Inventors: Gerald George Pechanek, Thomas L. Drabenstott, Juan Guillermo Revilla, David Strube, Grayson Morris

1 2 next