Multiple Instruction, Multiple Data (mimd) Patents (Class 712/21)
  • Patent number: 10853541
    Abstract: Some examples described herein relate to global mapping of program nodes of a netlist of an application. In an example, a design system includes a processor and a memory coupled to the processor. The memory stores instruction code. The processor is configured to execute the instruction code to obtain a netlist of an application. The netlist contains program nodes and respective edges between the program nodes. The application is to be implemented on a device comprising an array of data processing engines. The processor is also configured to execute the instruction code to generate a global mapping of the program nodes based on a representation of the array of data processing engines and using an integer linear programming (ILP) algorithm; generate a detailed mapping of the program nodes based on the global mapping; and translate the detailed mapping to a file.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: December 1, 2020
    Assignee: XILINX, INC.
    Inventors: Abhishek Joshi, Grigor S. Gasparyan, Aditya Chaubal, Sridhar Kirshnamurthy, Xiao Dong
  • Patent number: 9811621
    Abstract: Circuit design computing equipment may perform depopulation operations, constraint generation, and repopulation operations in a circuit design in anticipation of register retiming operations. A depopulation operation before placement and/or before routing operations may prevent the respective placement and/or routing operations from placing and/or routing registers from the circuit design. Constraint generation may create constraints for placement and/or routing operations that allow for the reinsertion of registers after routing operations. Repopulation operations may reinsert registers in the circuit design after routing operations according to the constraints. If desired, the circuit design computing equipment may perform register retiming operations to further improve the performance of the circuit design.
    Type: Grant
    Filed: May 1, 2015
    Date of Patent: November 7, 2017
    Assignee: Altera Corporation
    Inventors: Kimberly Anne Bozman, David Ian Milton, Nishanth Sinnadurai
  • Patent number: 9632790
    Abstract: A processing device comprises select logic to schedule a plurality of instructions for execution. The select logic calculates a reconstructed program order (RPO) value for each of a plurality of instructions that are ready to be scheduled for execution. The select logic creates an ordered list of instructions based on the delayed RPO values, the delayed RPO values comprising the calculated RPO values from a previous execution cycle, and dispatches instructions for scheduling based on the ordered list.
    Type: Grant
    Filed: December 26, 2012
    Date of Patent: April 25, 2017
    Assignee: Intel Corporation
    Inventors: Jayesh Iyer, Nikolay Kosarev, Sergey Y. Shishlov, Alexey Y. Sivtsov, Yuriy V Baida, Alexander V Butuzov, Bob Babayan, Vladimir Pentkovski
  • Patent number: 9589138
    Abstract: Various embodiments are generally directed to authenticating a chain of components of boot software of a computing device. An apparatus comprises a processor circuit and storage storing an initial boot software component comprising instructions operative on the processor circuit to select a first set of boot software components of multiple sets of boot software components, each set of boot software components defines a pathway that branches from the initial boot software component and that rejoins at a latter boot software component; authenticate a first boot software component of the first set of boot software components; and execute a sequence of instructions of the first boot software component to authenticate a second boot software component of the first set of boot software components to form a chain of authentication through a first pathway defined by the first set of boot software components. Other embodiments are described and claimed herein.
    Type: Grant
    Filed: September 21, 2015
    Date of Patent: March 7, 2017
    Assignee: INTEL CORPORATION
    Inventors: Jiewen Yao, Vincent J. Zimmer
  • Patent number: 9424135
    Abstract: A method for content management in a file system includes creating a file system storage device with a storage area symbolic name. A core table data structure is generated including a core table partition field. An ancillary table data structure is generated including an ancillary table partition field. Partitioning rules for the file system storage device are cached based on the storage area symbolic name. Property values for a document are compared with the cached partitioning rules to determine a match for the storage area symbolic name. The document is stored into a partition of the file system storage device based on the storage area symbolic name. Metadata for the document is stored into a partition of the core table based on the core table partition field. Ancillary objects for the document are stored into a partition of the ancillary table based on the ancillary partition fields.
    Type: Grant
    Filed: December 30, 2015
    Date of Patent: August 23, 2016
    Assignee: International Business Machines Corporation
    Inventors: William J. Carpenter, David G. Skinner, Hailin Wang, Michael G. Winter, Jun Xu
  • Patent number: 9400747
    Abstract: A data storage device includes a non-volatile memory and a controller. A method includes sending a memory command from the controller to the non-volatile memory. The memory command indicates multiple sense operations to be performed at a single plane of the non-volatile memory.
    Type: Grant
    Filed: April 29, 2014
    Date of Patent: July 26, 2016
    Assignee: SANDISK TECHNOLOGIES LLC
    Inventors: Daniel Edward Tuers, Abhijeet Manohar, Mark Murin, Mark Shlick, Menahem Lasser
  • Patent number: 9009448
    Abstract: Disclosed is an architecture, system and method for performing multi-thread DFA descents on a single input stream. An executer performs DFA transitions from a plurality of threads each starting at a different point in an input stream. A plurality of executers may operate in parallel to each other and a plurality of thread contexts operate concurrently within each executer to maintain the context of each thread which is state transitioning. A scheduler in each executer arbitrates instructions for the thread into an at least one pipeline where the instructions are executed. Tokens may be output from each of the plurality of executers to a token processor which sorts and filters the tokens into dispatch order.
    Type: Grant
    Filed: January 18, 2012
    Date of Patent: April 14, 2015
    Assignee: Intel Corporation
    Inventors: Michael Ruehle, Umesh Ramkrishnarao Kasture, Vinay Janardan Naik, Nayan Amrutlal Suthar, Robert J. McMillen
  • Patent number: 8856494
    Abstract: Data processing circuit containing an instruction execution circuit having an instruction set comprising a SIMD instruction. The instruction execution circuit comprises arithmetic circuits, arranged to perform N respective identical operations in parallel in response to the SIMD instruction. The SIMD instruction selects a first one and a second one of the registers. The SIMD instruction defines a first and second series of N respective SIMD instruction operands of the SIMD instruction from the addressed registers. Each arithmetic circuit receives a respective first operand and a respective second operand from the first and second series respectively. The instruction execution circuit selects the first and second series so they partially overlap. Positioning the operands is under program control.
    Type: Grant
    Filed: January 11, 2012
    Date of Patent: October 7, 2014
    Assignee: Intel Corporation
    Inventor: Antonius A. M. Van Wel
  • Patent number: 8798386
    Abstract: Methods and systems for processing image data on a per tile basis in an image sensor pipeline (ISP) are disclosed and may include communicating, to one or more processing modules via control logic circuits integrated in the ISP, corresponding configuration parameters that are associated with each of a plurality of data tiles comprising an image. The ISP may be integrated in a video processing core. The plurality of data tiles may vary in size. A processing complete signal may be communicated to the control logic circuits when the processing of each of the data tiles is complete prior to configuring a subsequent processing module. The processing may comprise one or more of: lens shading correction, statistics, distortion correction, demosaicing, denoising, defective pixel correction, color correction, and resizing. Each of the data tiles may overlap with adjacent data tiles, and at least a portion of them may be processed concurrently.
    Type: Grant
    Filed: July 13, 2010
    Date of Patent: August 5, 2014
    Assignee: Broadcom Corporation
    Inventors: Adrian Lees, David Plowman
  • Publication number: 20140195777
    Abstract: In a particular embodiment, a method may include creating a plurality of variable depth instruction FIFOs and a plurality of data caches from a plurality of caches corresponding to a plurality of processors, where the plurality of caches and the plurality of processors correspond to MIMD architecture. The method may also include configuring the plurality of variable depth instruction FIFOs to implement SIMD architecture. The method may also include configuring the plurality of variable depth instruction FIFOs for at least one of SIMD operation, SIMD operation with staging, or RC-SIMD operation.
    Type: Application
    Filed: January 10, 2013
    Publication date: July 10, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Mark D. Bellows, Mark S. Fredrickson, Scott D. Frei, Steven P. Jones, Chad B. McBride
  • Patent number: 8732359
    Abstract: When executing a graphical model of a dynamic system that includes two or more concurrently executing sets of operations, a processor is configured to create a first buffer and a second buffer within the executable graphical model. A first set of operations is configured to write data to the first buffer during a first execution instance of the first set of operations. The first set of operations is configured to write data to the second buffer during a second execution instance of the first thread. A second set of operations is configured to read the data from the first buffer during an instance of the second thread that executes contemporaneously with the second execution instance of the first set of operations. Determinations regarding access to the first buffer and second buffer by the first thread and second thread are self-contained within the first thread and second thread, respectively.
    Type: Grant
    Filed: December 7, 2012
    Date of Patent: May 20, 2014
    Assignee: The MathWorks, Inc.
    Inventors: James E. Carrick, Biao Yu
  • Patent number: 8638805
    Abstract: Described embodiments provide for restructuring a scheduling hierarchy of a network processor having a plurality of processing modules and a shared memory. The scheduling hierarchy schedules packets for transmission. The network processor generates tasks corresponding to each received packet associated with a data flow. A traffic manager receives tasks provided by one of the processing modules and determines a queue of the scheduling hierarchy corresponding to the task. The queue has a parent scheduler at each of one or more next levels of the scheduling hierarchy up to a root scheduler, forming a branch of the hierarchy. The traffic manager determines if the queue and one or more of the parent schedulers of the branch should be restructured. If so, the traffic manager drops subsequently received tasks for the branch, drains all tasks of the branch, and removes the corresponding nodes of the branch from the scheduling hierarchy.
    Type: Grant
    Filed: September 30, 2011
    Date of Patent: January 28, 2014
    Assignee: LSI Corporation
    Inventors: Balakrishnan Sundararaman, Shashank Nemawarkar, David Sonnier, Shailendra Aulakh, Allen Vestal
  • Patent number: 8624910
    Abstract: One embodiment of the present invention sets forth a technique for dynamically specifying a texture header and texture sampler using an index. The index corresponds to a particular register value that may be static or computed during execution of a shader program. Any texture operation instruction may specify an index value for each of the texture header and the texture sampler.
    Type: Grant
    Filed: August 25, 2010
    Date of Patent: January 7, 2014
    Assignee: Nvidia Corporation
    Inventors: John Erik Lindholm, Yan Yan Tang
  • Patent number: 8532288
    Abstract: A cryptographic engine for modulo N multiplication, which is structured as a plurality of almost identical, serially connected Processing Elements, is controlled so as to accept input in blocks that are smaller than the maximum capability of the engine in terms of bits multiplied at one time. The serially connected hardware is thus partitioned on the fly to process a variety of cryptographic key sizes while still maintaining all of the hardware in an active processing state.
    Type: Grant
    Filed: December 1, 2006
    Date of Patent: September 10, 2013
    Assignee: International Business Machines Corporation
    Inventors: Camil Fayad, John K. Li, Siegfried K. H. Sutter, Phil C. Yeh
  • Patent number: 8499139
    Abstract: An apparatus having a processor and a circuit is disclosed. The processor generally has a pipeline. The circuit may be configured to (i) detect a first write instruction in the pipeline that writes to a resource, (ii) stall a read instruction in the pipeline where (a) a first read-after-write conflict exists between the first write instruction and the read instruction and (b) no other write instruction to the resource is scheduled between the first write instruction and the read instruction and (iii) not stall the read instruction due to the first read-after-write conflict where a second write instruction to the resource is scheduled between the first write instruction and the read instruction.
    Type: Grant
    Filed: August 12, 2010
    Date of Patent: July 30, 2013
    Assignee: LSI Corporation
    Inventors: Leonid Dubrovin, Alexander Rabinovitch, Hagit Margolin, Noam Abda
  • Patent number: 8493979
    Abstract: Executing a single instruction/multiple data (SIMD) instruction of a program to process a vector of data wherein each element of the packet vector corresponds to a different received packet.
    Type: Grant
    Filed: December 30, 2008
    Date of Patent: July 23, 2013
    Assignee: Intel Corporation
    Inventors: Bryan E. Veal, Travis T. Schluessler
  • Patent number: 8316190
    Abstract: Computers and other computing machines and information appliances having a modified computer architecture and program structure which enables the operation of an application program concurrently or simultaneously on a plurality of computers interconnected via a communications link or network using a special distributed runtime (DRT), and that provides for a redundant array of independent computing systems that include computer code distribution using code-striping onto the plurality of the computers or computing machines. A redundant array of independent computing systems operating in concert and code-striping features.
    Type: Grant
    Filed: March 19, 2008
    Date of Patent: November 20, 2012
    Assignee: Waratek Pty. Ltd.
    Inventor: John M. Holt
  • Patent number: 8185700
    Abstract: In one embodiment, the present invention includes a method for receiving a bus message in a first cache corresponding to a speculative access to a portion of a second cache by a second thread, and dynamically determining in the first cache if an inter-thread dependency exists between the second thread and a first thread associated with the first cache with respect to the portion. Other embodiments are described and claimed.
    Type: Grant
    Filed: May 30, 2006
    Date of Patent: May 22, 2012
    Assignee: Intel Corporation
    Inventors: Carlos Madriles Gimeno, Carlos García Quinones, Pedro Marcuello, Jesús Sánchez, Fernando Latorre, Antonio González
  • Patent number: 8180998
    Abstract: A system for performing data-parallel operations and task-parallel operations. A first switch fabric node (SFN) includes first and second lane processing engines (LPEs). The first LPE includes a first set of lane processing units (LPUs) configured to perform data-parallel operations, where each LPU performs a set of operations, and each LPU uses a different set of data for the set of operations, and each LPU within the first set of LPUs uses a different set of data for the set of operations. The second LPE includes a second set of LPUs configured to perform task-parallel operations, where each LPU performs a different set of operations. A processing control engine (PCE) is configured to distribute instructions and data to the first LPE and the second LPE. Advantageously, data parallel operations and task parallel operations are able to be performed on the same processor simultaneously.
    Type: Grant
    Filed: September 10, 2008
    Date of Patent: May 15, 2012
    Assignee: NVIDIA Corporation
    Inventors: Monier Maher, Christopher Lamb, Sanjay J. Patel, Peter Hsu
  • Publication number: 20120089812
    Abstract: A shared resource multi-thread processor array wherein an array of heterogeneous function blocks are interconnected via a self-routing switch fabric, in which the individual function blocks have an associated switch port address. Each switch output port comprises a FIFO style memory that implements a plurality of separate queues. Thread queue empty flags are grouped using programmable circuit means to form self-synchronised threads. Data from different threads are passed to the various addressable function blocks in a predefined sequence in order to implement the desired function. The separate port queues allows data from different threads to share the same hardware resources and the reconfiguration of switch fabric addresses further enables the formation of different data-paths allowing the array to be configured for use in various applications.
    Type: Application
    Filed: June 9, 2010
    Publication date: April 12, 2012
    Inventor: Graeme Roy Smith
  • Patent number: 8103854
    Abstract: A control processor is used for fetching and distributing single instruction multiple data (SIMD) instructions to a plurality of processing elements (PEs). One of the SIMD instructions is a thread start (Tstart) instruction, which causes the control processor to pause its instruction fetching. A local PE instruction memory (PE Imem) is associated with each PE and contains local PE instructions for execution on the local PE. Local PE Imem fetch, decode, and execute logic are associated with each PE. Instruction path selection logic in each PE is used to select between control processor distributed instructions and local PE instructions fetched from the local PE Imem. Each PE is also initialized to receive control processor distributed instructions. In addition, local hold generation logic is associated with each PE. A PE receiving a Tstart instruction causes the instruction path selection logic to switch to fetch local PE Imem instructions.
    Type: Grant
    Filed: April 12, 2010
    Date of Patent: January 24, 2012
    Assignee: Altera Corporation
    Inventors: Gerald George Pechanek, Edwin Franklin Barry, Mihailo M. Stojancic
  • Patent number: 8051273
    Abstract: Disclosed is a mixed mode parallel processor system in which N number of processing elements PEs, capable of performing SIMD operation, are grouped into M (=N÷S) processing units PUs performing MIMD operation. In MIMD operation, P out of S memories in each PU, which S memories inherently belong to the PEs, where P<S, operate as an instruction cache. The remaining memories operate as data memories or as data cache memories. One out of S sets of general-purpose registers, inherently belonging to the PEs, directly operates as a general register group for the PU. Out of the remaining S?1 sets, T set or a required number of sets, where T<S?1, are used as storage registers that store tags of the instruction cache.
    Type: Grant
    Filed: November 2, 2010
    Date of Patent: November 1, 2011
    Assignee: NEC Corporation
    Inventor: Shorin Kyo
  • Patent number: 8020168
    Abstract: A NOC for dynamic virtual software pipelining including IP blocks, routers, memory communications controllers, and network interface controllers, each IP block adapted to a router through a memory communications controller and a network interface controller, the NOC also including: a computer software application segmented into stages, each stage comprising a flexibly configurable module of computer program instructions identified by a stage ID, each stage assigned to a thread of execution on an IP block; and each stage executing on a thread of execution on an IP block, including a first stage executing on an IP block, producing output data and sending by the first stage the produced output data to a second stage, the output data including control information for the next stage and payload data; and the second stage consuming the produced output data in dependence upon the control information.
    Type: Grant
    Filed: May 9, 2008
    Date of Patent: September 13, 2011
    Assignee: International Business Machines Corporation
    Inventors: Russell D. Hoover, Eric O. Mejdrich, Paul E. Schardt, Robert A. Shearer
  • Patent number: 7979674
    Abstract: Executing MIMD programs on a SIMD machine, the SIMD machine including a plurality of compute nodes, each compute node capable of executing only a single thread of execution, the compute nodes initially configured exclusively for SIMD operations, the SIMD machine further comprising a data communications network, the network comprising synchronous data communications links among the compute nodes, including establishing one or more SIMD partitions, booting one or more SIMD partitions in MIMD mode; establishing a MIMD partition; executing by launcher programs a plurality of MIMD programs on two or more of the compute nodes of the MIMD partition; and re-executing a launcher program by an operating system on a compute node in the MIMD partition upon termination of the MIMD program executed by the launcher program.
    Type: Grant
    Filed: May 16, 2007
    Date of Patent: July 12, 2011
    Assignee: International Business Machines Corporation
    Inventors: Todd A. Inglett, Patrick J. McCarthy, Amanda Peters, Thomas A. Budnik, Michael B. Mundy, Gordon G. Stewart
  • Patent number: 7941644
    Abstract: A processing unit includes multiple execution units and sequencer logic that is disposed downstream of instruction buffer logic, and that is responsive to a sequencer instruction present in an instruction stream. In response to such an instruction, the sequencer logic issues a plurality of instructions associated with a long latency operation to one execution unit, while blocking instructions from the instruction buffer logic from being issued to that execution unit. In addition, the blocking of instructions from being issued to the execution unit does not affect the issuance of instructions to any other execution unit, and as such, other instructions from the instruction buffer logic are still capable of being issued to and executed by other execution units even while the sequencer logic is issuing the plurality of instructions associated with the long latency operation.
    Type: Grant
    Filed: October 16, 2008
    Date of Patent: May 10, 2011
    Assignee: International Business Machines Corporation
    Inventors: Eric Oliver Mejdrich, Adam James Muff, Matthew Ray Tubbs
  • Patent number: 7916864
    Abstract: A graphics processing unit is programmed to carry out cryptographic processing so that fast, effective cryptographic processing solutions can be provided without incurring additional hardware costs. The graphics processing unit can efficiently carry out cryptographic processing because it has an architecture that is configured to handle a large number of parallel processes. The cryptographic processing carried out on the graphics processing unit can be further improved by configuring the graphics processing unit to be capable of both floating point and integer operations.
    Type: Grant
    Filed: February 8, 2006
    Date of Patent: March 29, 2011
    Assignee: NVIDIA Corporation
    Inventor: Norbert Juffa
  • Patent number: 7900067
    Abstract: A computing device operates over a range of voltages and frequencies and over a range of processor usage levels. The computing device includes at least a variable frequency generator, a variable voltage power supply and voltage supply level and clocking frequency management circuitry. The variable frequency generator is coupled to the processor and delivers a clock signal to the processor. The variable voltage power supply is coupled to the processor and delivers voltage to the processor. The voltage supply level and clocking frequency management circuitry adjust both the voltage provided by the variable voltage power supply and the frequency of the signal provided by the variable frequency generator. The computing device includes a temperature sensor that provides signals indicative of the temperature of the processor and the voltage supply level and clocking frequency management circuitry adjusts the voltage and/or the clocking frequency provided by the variable voltage power supply.
    Type: Grant
    Filed: May 13, 2008
    Date of Patent: March 1, 2011
    Inventor: Paul Beard
  • Patent number: 7810084
    Abstract: Computer-implemented methods, computer systems and computer program products are provided for parallel processing a plurality of data objects with a plurality of processors. As disclosed herein, the data objects to be assembled for further processing may be in bundles, the bundles obeying first predefined criteria, which is dynamically controlled by using a bundle specific master table. The methods and systems may generate pipelines of data objects by pre-selecting and grouping the data objects according to second predefined criteria by a first group of the plurality of processors, and create the bundles from each pipeline of the pre-selected data objects by a second group of the plurality of processors.
    Type: Grant
    Filed: June 1, 2006
    Date of Patent: October 5, 2010
    Assignee: SAP AG
    Inventor: Karsten S. Egetoft
  • Patent number: 7774189
    Abstract: A system and method for implementing a unified model for integration systems is presented. A user provides inputs to an integrated language engine for placing operator components and arc components onto a dataflow diagram. Operator components include data ports for expressing data flow, and also include meta-ports for expressing control flow. Arc components connect operator components together for data and control information to flow between the operator components. The dataflow diagram is a directed acyclic graph that expresses an application without including artificial boundaries during the application design process. Once the integrated language engine generates the dataflow diagram, the integrated language engine compiles the dataflow diagram to generated application code.
    Type: Grant
    Filed: December 1, 2006
    Date of Patent: August 10, 2010
    Assignee: International Business Machines Corporation
    Inventors: Amir Bar-Or, Michael James Beckerle
  • Patent number: 7769980
    Abstract: In arithmetic/logic units (ALU) provided corresponding to entries, an MIMD instruction decoder generating a group of control signals in accordance with a Multiple Instruction-Multiple Data (MIMD) instruction and an MIMD register storing data designating the MIMD instruction are provided, and an inter-ALU communication circuit is provided. The amount and direction of movement of the inter-ALU communication circuit are set by data bits stored in a movement data register. It is possible to execute data movement and arithmetic/logic operation with the amount of movement and operation instruction set individually for each ALU unit. Therefore, in a Single Instruction-Multiple Data type processing device, Multiple Instruction-Multiple Data operation can be executed at high speed in a flexible manner.
    Type: Grant
    Filed: August 16, 2007
    Date of Patent: August 3, 2010
    Assignee: Renesas Technology Corp.
    Inventors: Toshinori Sueyoshi, Masahiro Iida, Mitsutaka Nakano, Fumiaki Senoue, Katsuya Mizumoto
  • Patent number: 7747771
    Abstract: A method and mechanism for managing access to a plurality of registers in a processing device are contemplated. A processing device includes multiple nodes coupled to a ring bus, each of which include one or more registers which may be accessed by processes executing within the device. Also coupled to the ring bus is a ring control unit which is configured to initiate transactions targeted to nodes on the ring bus. Each of the nodes are configured receive and process bus transaction with a fixed latency whether or not the first transaction is targeted to the receiving node. The ring control unit is configured to periodically convey idle transactions on the ring bus in order to allow nodes responding to indeterminate transactions to gain access to the bus.
    Type: Grant
    Filed: June 30, 2004
    Date of Patent: June 29, 2010
    Assignee: Oracle America, Inc.
    Inventors: Manish Shah, Robert T. Golla, Mark A. Luttrell, Gregory F. Grohoski
  • Patent number: 7730280
    Abstract: A control processor is used for fetching and distributing single instruction multiple data (SIMD) instructions to a plurality of processing elements (PEs). One of the SIMD instructions is a thread start (Tstart) instruction, which causes the control processor to pause its instruction fetching. A local PE instruction memory (PE Imem) is associated with each PE and contains local PE instructions for execution on the local PE. Local PE Imem fetch, decode, and execute logic are associated with each PE. Instruction path selection logic in each PE is used to select between control processor distributed instructions and local PE instructions fetched from the local PE Imem. Each PE is also initialized to receive control processor distributed instructions. In addition, local hold generation logic is associated with each PE. A PE receiving a Tstart instruction causes the instruction path selection logic to switch to fetch local PE Imem instructions.
    Type: Grant
    Filed: April 18, 2007
    Date of Patent: June 1, 2010
    Assignee: Vicore Technologies, Inc.
    Inventors: Gerald George Pechanek, Edwin Franklin Barry, Mihailo M. Stojancic
  • Patent number: 7730463
    Abstract: A computer implemented method, system and computer program product for automatically generating SIMD code. The method begins by analyzing data to be accessed by a targeted loop including at least one statement, where each statement has at least one memory reference, to determine if memory accesses are safe. If memory accesses are safe, the targeted loop is simdized. If not safe, it is determined if a scheme can be applied in which safety need not be guaranteed. If such a scheme can be applied, the targeted loop is simdized according to the scheme. If such a scheme cannot be applied, it is determined if padding is appropriate. If padding is appropriate, the data is padded and the targeted loop is simdized. If padding is not appropriate, non-simdized code is generated based on the targeted loop for handling boundary conditions, the targeted loop is simdized and combined with the non-simdized code.
    Type: Grant
    Filed: February 21, 2006
    Date of Patent: June 1, 2010
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu, Peng Zhao
  • Patent number: 7661006
    Abstract: A computer implemented method, apparatus, and computer program product for managing symmetric multiprocessor interconnects. The process identifies functional communication connections between each processor in a plurality of processors on a multiprocessor to form identified functional communication connections. The process maps every functional communication connection between any two processors in the plurality of processors, based on the identified functional communication connections, to form an interconnect matrix. The process creates a path map using the interconnect matrix. The path map comprises a sequence of communication connections between the plurality of processors. The process initializes the plurality of processors using the path map.
    Type: Grant
    Filed: January 9, 2007
    Date of Patent: February 9, 2010
    Assignee: International Business Machines Corporation
    Inventors: Luai A. Abou-Emara, Mark David McLaughlin, Jorge N. Yanez
  • Patent number: 7657880
    Abstract: The latencies associated with retrieving instruction information for a main thread are decreased through the use of a simultaneous helper thread. The helper thread is permitted to execute Store instructions. Store blocker logic operates to prevent data associated with a Store instruction in a helper thread from being committed to memory. Dependence blocker logic operates to prevent data associated with a Store instruction in a speculative helper thread from being bypassed to a Load instruction in a non-speculative thread.
    Type: Grant
    Filed: August 1, 2003
    Date of Patent: February 2, 2010
    Assignee: Intel Corporation
    Inventors: Hong Wang, Tor Aamodt, Per Hammarlund, John Shen, Xinmin Tian, Milind Girkar, Perry Wang, Steve Shih-wei Liao
  • Patent number: 7656412
    Abstract: A system, a method and computer-readable media for performing texture resampling algorithms on a processing device. A texture resampling algorithm is selected. This algorithm is decomposed into multiple one-dimensional transformations. Instructions for performing each of the one-dimensional transformations are communicated to a processing device, such as a GPU. The processing device may generate an output image by separately executing the instructions associated with each of the one-dimensional transformations.
    Type: Grant
    Filed: December 21, 2005
    Date of Patent: February 2, 2010
    Assignee: Microsoft Corporation
    Inventors: Denis Demandolx, Steven White
  • Patent number: 7610371
    Abstract: A mediation method and a mediation system divided into independent node components that process event records independently of the other components of the system. In addition, the system is provided with at least one node manager component that configures the node components and starts them up, when required. Further, the node manager component monitors the functioning of the node components and also stops the node components, if required. Each of the independent node components operates according to its own settings and is thus self-contained and capable of continuing operation even though some of the other components are temporarily inoperative. The system also includes a system database that manages configuration information and stores audit trail data.
    Type: Grant
    Filed: April 23, 2004
    Date of Patent: October 27, 2009
    Assignee: Comptel Oyj
    Inventor: Juhana Enqvist
  • Patent number: 7593947
    Abstract: A system, method and program product for processing a multiplicity of data update requests made by a customer. All of the data update requests are grouped into a plurality of blocks for execution by a data processor. The data update requests within each of the blocks and from one of the blocks to a next one of the blocks are arranged in an order that the data update requests need to be executed to yield a proper data result. Each of the blocks have approximately a same capacity for the data update requests. The capacity corresponds to a number of the data update requests which the data processor can efficiently process in order before processing the data update requests in the next one of the blocks. Then, the data processor processes the data update requests within the one block in the order. Then, the data processor processes the data update requests within the next block in the order. The order is an order in which the data update requests were made.
    Type: Grant
    Filed: March 16, 2004
    Date of Patent: September 22, 2009
    Assignees: IBM Corporation, The Bank of Tokyo-Mitsubishi UFJ, Ltd.
    Inventors: Izumi Nagai, Yohichi Hoshijima, Kazuoki Takahashi
  • Patent number: 7512724
    Abstract: One embodiment of the present invention performs peripheral operations in a multi-thread processor. A peripheral bus is coupled to a peripheral unit to transfer peripheral information including a command message specifying a peripheral operation. A processing slice is coupled to the peripheral bus to execute a plurality of threads. The plurality of threads includes a first thread sending the command message to the peripheral unit.
    Type: Grant
    Filed: November 17, 2000
    Date of Patent: March 31, 2009
    Assignee: The United States of America as represented by the Secretary of the Navy
    Inventors: Jack B. Dennis, Sam B. Sandbote
  • Publication number: 20080059763
    Abstract: A method and system of processing compressed multimedia data using fine-grain instruction parallelism is provided. The method of processing multimedia data includes transferring an instruction from each of a plurality of sequencers to associated processing elements within an array of processing elements. The instructions can be processed by the array of processing elements using fine-grain instruction parallelism. A selection mechanism using selection instructions can select the associated processing elements. The plurality of sequencers comprise fine-grain instructions for decoding the compressed multimedia data. A system for multimedia data processing includes a data parallel system which can include an array of processing elements. A plurality of sequencers are coupled to the array of processing elements. A direct memory access component is coupled to the array of processing elements. A diagonal mapping scheme can be used in transferring instructions and data to the processing elements.
    Type: Application
    Filed: August 30, 2007
    Publication date: March 6, 2008
    Inventor: Lazar Bivolarski
  • Patent number: 7340591
    Abstract: A number of architectural and implementation approaches are described for using extra path (Epath) storage that operate in conjunction with a compute register file to obtain increased instruction level parallelism that more flexibly addresses the requirements of high performance algorithms. A processor that supports a single load data to a register file operation can be doubled in load capability through the use of an extra path storage, an additional independently addressable data memory path, and instruction decode information that specifies two independently load data operations. By allowing the extra path storage to be accessible by arithmetic facilities, the increased data bandwidth can be fully utilized.
    Type: Grant
    Filed: October 28, 2004
    Date of Patent: March 4, 2008
    Assignee: Altera Corporation
    Inventors: Gerald George Pechanek, Patrick R. Marchand, Larry D. Larsen
  • Patent number: 7313646
    Abstract: An electronic system comprises an initiator module and a target module addressable by the initiator module, and an interface and control module for interfacing between respective communication protocols of the initiator module and of the target module. The interface and control module is constructed to set a composite instruction detection signal in response to the detection of a composite instruction executed by the initiator module, which composite instruction detection signal is used for the interfacing. The interface and control module is constructed to detect a composite instruction executed by the initiator module when, at a determined clock cycle of the initiator module, a change of the elementary operation executed by the initiator module is detected with respect to the previous clock cycle of the initiator module, while, at the same time, a signal for selecting the target module which was active is kept active.
    Type: Grant
    Filed: May 26, 2005
    Date of Patent: December 25, 2007
    Assignee: STMicroelectronics S.A.
    Inventors: Hervé Chalopin, Laurent Tabaries
  • Patent number: 7124318
    Abstract: A multiple parallel pipeline digital processing apparatus has the capability to substitute a second pipeline for a first in the event that a failure is detected in the first pipeline. Preferably, a redundant pipeline is shared by multiple primary pipelines. Preferably, the pipelines are located physically adjacent one another in an array. A pipeline failure causes data to be shifted one position within the array of pipelines, to by-pass the failing pipeline, so that each pipeline has only two sources of data, a primary and an alternate. Preferably, selection logic controlling the selection between a primary and alternate source of pipeline data is integrated with other pipeline operand selection logic.
    Type: Grant
    Filed: September 18, 2003
    Date of Patent: October 17, 2006
    Assignee: International Business Machines Corporation
    Inventor: David Arnold Luick
  • Patent number: 7035991
    Abstract: A surface computer includes an address generator for generating an address for adjusting surface region data concerning at least a storage region and a concurrent computer, provided at a subsequent stage of the address generator, having a plurality of unit computers.
    Type: Grant
    Filed: October 2, 2003
    Date of Patent: April 25, 2006
    Assignee: Sony Computer Entertainment Inc.
    Inventor: Akio Ohba
  • Patent number: 6993639
    Abstract: Embodiments of the invention relate to a processing cell for use in computing systems. Generally, a processing cell generates remote instructions to be received and processed by at least one other processing cell. A processing cell may include a program counter, an instruction memory, and appropriate elements such as a branch lookup, a branch unit, etc. Alternatively, the processing cell may include a state machine that replaces the program counter and the instruction memory. Embodiments of the invention are able to support the VLIW mode, the MIMD) mode, a mixture of both modes of execution, etc.
    Type: Grant
    Filed: April 1, 2003
    Date of Patent: January 31, 2006
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Michael S. Schlansker, Boon Seong Ang
  • Patent number: 6950893
    Abstract: A hybrid switching module includes a hybrid switching module processor data channel; a hybrid switching module main data channel; an input/output link data channel; a switch coupled to the hybrid switching module processor data channel; and a bridge coupled to the hybrid switching module main data channel; wherein the switch selectively coupled to the bridge and selectively coupled to the input/output link data channel, wherein the hybrid switching module processor data channel is thereby selectively coupled to the bridge and selectively coupled to the input/output link data channel.
    Type: Grant
    Filed: March 22, 2001
    Date of Patent: September 27, 2005
    Assignee: I-Bus Corporation
    Inventor: Johni Chan
  • Patent number: 6925548
    Abstract: A data processor can assign a greater number of operations to instruction codes with shorter length, thereby implementing high performance, high code efficiency and low cost data processor. The data processor is a VLIW (Very Long Instruction Word) system that can execute a plurality of operations in parallel, and specify the execution sequence of the operations. It can assign a plurality of operations to the same operation code, and the operations that are executed in a second or subsequent sequence are limited to only predetermined operations among the plurality of operations.
    Type: Grant
    Filed: October 9, 2001
    Date of Patent: August 2, 2005
    Assignee: Renesas Technology Corp.
    Inventor: Masahito Matsuo
  • Patent number: 6915410
    Abstract: A system for designing and implementing digital integrated circuits utilizing a set of synchronized sequencers that permit quick and efficient parallel processing of system level designs. The system and method converts digital schematics and hardware description language (HDL) based designs into a set of logic equations and single bit arithmetic-logic operations executed by a set of parallel operating sequencers. The system includes software for converting netlists and HDL designs into Boolean logic equations, and a compiler for distributing these logic equations between multiple sequencers. Each sequencer is comprised of a logic processor and the associated program memory for storing the executable code of the assigned Boolean logic equations and data memory for storing the results of processing of logic equations. To synchronize execution of logic equations by multiple sequencers, all program memories are addressed by one common address register.
    Type: Grant
    Filed: January 23, 2003
    Date of Patent: July 5, 2005
    Inventor: Stanley M. Hyduke
  • Patent number: 6836837
    Abstract: There is disclosed a technique for accessing a register file which comprises defining a first register address as a plurality of bits and using said first register address to access said register file generating a second register address by using a sequence of said plurality of bits with at least one of said plurality of bits supplied via a unitary operator, the unitary operator being effective to selectively alter the logical value of said bit depending on its logical value in the first register address, and using said second register address to access said register file. A computer system for carrying out such a technique is also enclosed.
    Type: Grant
    Filed: June 19, 2003
    Date of Patent: December 28, 2004
    Assignee: Broadcom Corporation
    Inventors: Mark Taunton, Sophie Wilson, Timothy Martin Dobson
  • Patent number: RE41703
    Abstract: A SIMD machine employing a plurality of parallel processor (PEs) in which communications hazards are eliminated in an efficient manner. An indirect Very Long Instruction Word instruction memory (VIM) is employed along with execute and delimiter instructions. A masking mechanism may be employed to control which PEs have their VIMs loaded. Further, a receive model of operation is preferably employed. In one aspect, each PE operates to control a switch that selects from which PE it receives. The present invention addresses a better machine organization for execution of parallel algorithms that reduces hardware cost and complexity while maintaining the best characteristics of both SIMD and MIMD machines and minimizing communication latency. This invention brings a level of MIMD computational autonomy to SIMD indirect Very Long Instruction Word (iVLIW) processing elements while maintaining the single thread of control used in the SIMD machine organization.
    Type: Grant
    Filed: June 21, 2004
    Date of Patent: September 14, 2010
    Assignee: Altera Corp.
    Inventors: Gerald George Pechanek, Thomas L. Drabenstott, Juan Guillermo Revilla, David Strube, Grayson Morris