Single Instruction, Multiple Data (simd) Patents (Class 712/22)
  • Publication number: 20090125702
    Abstract: A single instruction, multiple data (SIMD) processor including a plurality of addressing register sets, used to flexibly calculate effective operand source and destination memory addresses is disclosed. Two or more address generators calculate effective addresses using the register sets. Each register set includes a pointer register, and a scale register. An address generator forms effective addresses from a selected register set's pointer register and scale register; and an offset. For example, the effective memory address may be formed by multiplying the scale value by an offset value and summing the pointer and the scale value multiplied by the offset value.
    Type: Application
    Filed: August 29, 2008
    Publication date: May 14, 2009
    Applicant: ATI Technologies Inc.
    Inventors: Richard J. Selvaggi, Larry A. Pearlstein
  • Patent number: 7526630
    Abstract: A controller operable to control an array of processing elements comprises a retrieval unit operable to retrieve instruction items for each of a plurality of instructions streams, each instruction stream having a plurality of instructions items, a combining unit operable to combine the plurality of instruction streams into a serial instruction stream, and a distribution unit operable to distribute the serial instruction stream to an array of processing elements.
    Type: Grant
    Filed: January 4, 2007
    Date of Patent: April 28, 2009
    Assignee: Clearspeed Technology, PLC
    Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russel David, Ray McConnell, Tim Day, Trey Greer
  • Publication number: 20090106528
    Abstract: To reduce the required amount of program codes when processing the whole image in a one-dimensional SIMD parallel image processing system having a smaller number of PEs than the number of pixels in the width direction of the image to be processed. A controller for controlling a PE array includes a command repetitive-execution part, which includes an operand converting part, a memory address converting part, and an operation code converting part. When a command fetching/decoding part reads and executes program codes stored in a program memory, the repetitive-execution part determines the program codes to cause the operand converting part, memory address converting part and operation code converting part to perform conversions in accordance with the command, thereby performing a repetitive execution of the one-command program description adaptive to a plurality of related pixels assigned to the PEs, whereby the program code amount can be reduced.
    Type: Application
    Filed: December 5, 2006
    Publication date: April 23, 2009
    Inventor: Takuya Koga
  • Patent number: 7516299
    Abstract: A method for transferring data from a general purpose register (GPR) to a vector register (VR), the method including vectorially combining data in the VR from the GPR, by executing instructions of a PowerPC Instruction Set Architecture (ISA), the step of combining including splatting a low nibble from the GPR into a low nibble in each element of a first VR by executing two “load vector for shift left” (lvsl) or “load vector for shift right” (lvsr) and one “vector subtract unsigned byte modulo” (vsububm), shifting a high nibble of the GPR into a low nibble the GPR, splatting the low nibble of the GPR into a low nibble in each element of a second VR by re-executing the two lvsl or lvsr and one vsububm instructions, shifting the low nibble of the second VR into a high nibble of the second VR and combining both first and second VRs into one VR.
    Type: Grant
    Filed: August 29, 2005
    Date of Patent: April 7, 2009
    Assignee: International Business Machines Corporation
    Inventors: Daniel Citron, Ayal Zaks
  • Patent number: 7509602
    Abstract: A logic simulation acceleration processor optimized for multi-value logic level simulation of electronic systems described in hardware description languages.
    Type: Grant
    Filed: January 25, 2006
    Date of Patent: March 24, 2009
    Assignee: Eve S.A.
    Inventors: Subbu Ganesan, Leonid Alexander Broukhis, Ramesh Narayanaswamy, Ian Michael Nixon, Thomas Hanni Spencer
  • Patent number: 7506136
    Abstract: A controller for controlling a data processor having a plurality of processor arrays, each of which includes a plurality of processing elements, comprises a retrieval unit operable to retrieve a plurality of incoming instructions streams in parallel with one another, and a distribution unit operable to supply such incoming instruction streams to respective ones of the said plurality of processor arrays.
    Type: Grant
    Filed: January 10, 2007
    Date of Patent: March 17, 2009
    Assignee: Clearspeed Technology PLC
    Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
  • Patent number: 7506135
    Abstract: The present invention provides histogram calculation for images and video applications using a SIMD and VLIW processor with vector Look-Up Table (LUT) operations. This provides a speed up of histogram calculation by a factor of N times over a scalar processor where the SIMD processor could perform N LUT operations per instruction. Histogram operation is partitioned into a vector LUT operation, followed by vector increment, vector LUT update, and at the end by reduction of vector histogram components. The present invention could be used for intensity, RGBA, YUV, and other type of multi-component images.
    Type: Grant
    Filed: May 20, 2003
    Date of Patent: March 17, 2009
    Inventor: Tibet Mimar
  • Patent number: 7496673
    Abstract: A computer architecture and programming model for high speed processing over broadband networks are provided. The architecture employs a consistent modular structure, a common computing module and uniform software cells. The common computing module includes a control processor, a plurality of processing units, a plurality of local memories from which the processing units process programs, a direct memory access controller and a shared main memory. A synchronized system and method for the coordinated reading and writing of data to and from the shared main memory by the processing units also are provided. A hardware sandbox structure is provided for security against the corruption of data among the programs being processed by the processing units. The uniform software cells contain both data and applications and are structured for processing by any of the processors of the network. Each software cell is uniquely identified on the network.
    Type: Grant
    Filed: February 24, 2005
    Date of Patent: February 24, 2009
    Assignee: International Business Machines Corporation
    Inventors: Michael Karl Gschwind, Harm Peter Hofstee, Martin E. Hopkins, James Allan Kahle
  • Patent number: 7490190
    Abstract: A single instruction, multiple data (“SIMD”) computer system includes a central control unit coupled to 256 processing elements (“PEs”) and to 32 static random access memory (“SRAM”) devices. Each group of eight PEs can access respective groups of eight columns in a respective SRAM device. Each PE includes a local column address register that can be loaded through a data bus of the respective PE. A local column address stored in the local column address register is applied to an AND gate, which selects either the local column address or a column address applied to the AND gate by the central control unit. As a result, the central control unit can globally access the SRAM device, or a specific one of the eight columns that can be accessed by each PE can be selected locally by the PE.
    Type: Grant
    Filed: October 5, 2006
    Date of Patent: February 10, 2009
    Assignee: Micron Technology, Inc.
    Inventor: Jon Skull
  • Patent number: 7483595
    Abstract: An image processing method and device for processing multiple rows of pixels of an image simultaneously with a single instruction. The processing includes selecting a pixel window having a plurality of pixels of an image spanning across multiple rows and columns, building vertical and horizontal load registers to include the plurality of pixels of the selected pixel window, and simultaneously processing selected pixels of the plurality of pixels included in the vertical and horizontal load registers using a single instruction, wherein the vertical and horizontal load registers are shifted when the selected pixels are processed. Accordingly, a method and device for efficient processing of an image is provided.
    Type: Grant
    Filed: September 16, 2004
    Date of Patent: January 27, 2009
    Assignee: Marvell International Technology Ltd.
    Inventors: Douglas Gene Keithley, Roy Gideon Moss
  • Patent number: 7484076
    Abstract: Methods, apparatuses, and systems are presented for performing instructions using multiple execution units in a graphics processing unit involving issuing an instruction for P executions of the instruction wherein each execution uses different data, P being a positive integer, the instruction being issued based on a first clock having a first clock rate, operating Q execution units to achieve the P executions of the instruction, Q being a positive integer less than P and greater than one, each of the execution units being operated based on a second clock having a second clock rate higher than the first clock rate of the first clock, and wherein the second clock rate of the second clock is equal to the first clock rate of the first clock multiplied by the ratio P/Q.
    Type: Grant
    Filed: September 18, 2006
    Date of Patent: January 27, 2009
    Assignee: Nvidia Corporation
    Inventors: Stuart F. Oberman, Ming Y. Siu, Sameer D. Halepete
  • Publication number: 20090024832
    Abstract: The invention is based on the task to undertake machine descriptions, with which an automated optimal hardware design of SIMD processors can be carried out. This is solved by the fact that functional units are selected from a criterion in the machine description, which is vector processible. A first or second reduced functional unit are selectively defined from a respective vector-processing functional unit, in which the reduced functional units process only a data element of a vectoral value. All reduced functional units, which use common control signals for the processing of a respective data element belonging to the vectoral value, are condensed to a disk. Reduced functional units, which process the same data elements in a sequence at least indirectly, are condensed to a disk module. The disk is reproduced with the contained reduced functional units so often that all reduced functional units represent the functionality of their respective selected vector-processing functional unit.
    Type: Application
    Filed: November 23, 2004
    Publication date: January 22, 2009
    Inventor: Gordon Cichon
  • Patent number: 7480785
    Abstract: A row decoding circuit (171) outputs a select signal to a row set in a row range setting unit (172) to select a select signal line (103), processing results from processing circuits (102) on this row are output to a data output line (104), and a row adder (106) adds processing results output to a data output line (104) of a column set in a column range selector (105).
    Type: Grant
    Filed: February 13, 2004
    Date of Patent: January 20, 2009
    Assignee: Nippon Telegraph and Telephone Corporation
    Inventors: Toshishige Shimamura, Hiroki Morimura, Koji Fujii, Satoshi Shigematsu, Katsuyuki Machida
  • Publication number: 20090013152
    Abstract: A processor capable of performing a filter processing in a high speed is provided. A computing unit comprises a computer for performing a filter processing. Data supply to the computer is performed by an internal register configured by a flip-flop. Data read from the internal register is outputted to a shift register and the data is supplied to the computer per cycle. And, the computing unit comprises a mechanism for changing a filter computing direction according to a motion vector, thereby preventing performance lowering due to branched command by performing a horizontal filtering and a vertical filtering by a same command.
    Type: Application
    Filed: July 7, 2008
    Publication date: January 8, 2009
    Inventors: Masakazu EHAMA, Koji Hosogi, Seiji Mochizuki
  • Publication number: 20090013150
    Abstract: A disclosed SIMD microprocessor includes plural processor elements each having n arithmetic circuits and n registers configured to temporarily store data pieces to be input to the arithmetic circuits, n being a natural number equal to or greater than 2, and; a control circuit configured to determine an arrangement order of the processor elements and an arrangement order of the arithmetic circuits in the processor elements and determine whether to use the n arithmetic circuits as a single arithmetic circuit or as n arithmetic circuits. Each processor element further includes n shifter pairs each including a PE shifter and a bit shifter; and n shift data selection circuits configured to select arbitrary data pieces from the data pieces in the shifter pairs, perform bit extension on the data pieces, and transfer the data pieces to the arithmetic circuits.
    Type: Application
    Filed: June 24, 2008
    Publication date: January 8, 2009
    Inventor: TOSHIKI YAMANAKA
  • Publication number: 20090013151
    Abstract: An SIMD type microprocessor is disclosed. The SIMD type microprocessor includes plural PEs (processor elements) each of which provides an ALU (arithmetic and logic unit) for lower-order bits, an ALU for upper-order bits, a control circuit for lower-order bits, a control circuit for upper-order bits, a range determining circuit for lower-order bits, and a range determining circuit for upper-order bits. The SIMD type microprocessor further includes a global processor, a range designation bus for lower-order bits which connects the global processor to the range determining circuit for lower-order bits, and a range designation bus for upper-order bits which connects the global processor to the range determining circuit for upper-order bits.
    Type: Application
    Filed: July 3, 2008
    Publication date: January 8, 2009
    Inventor: KAZUHIKO HARA
  • Publication number: 20080320273
    Abstract: A single instruction multiple data (SIMD) processor (1) comprises a processing element array (10) including a plurality of processing elements (PEO . . . PEN), and a memory array (14) operably divided into memory portions (141 . . . 14N), each memory portion being assigned to a particular processing element. A first processing element (PEN) is operable to access a portion of the memory array (14) assigned to that first processing element and also to access a portion of the memory array assigned to a second processing element. Such access is made using an index value indicative of the processing element assigned to the memory position to be accessed.
    Type: Application
    Filed: September 8, 2005
    Publication date: December 25, 2008
    Applicant: KONINKLIJKE PHILIPS ELECTRONICS N.V.
    Inventors: Anteneh A. Abbo, Leo Sevat, Richard P. Kleihorst
  • Publication number: 20080313423
    Abstract: An information processing system includes a plurality of PMM and data transmission paths for connection between the PMM and transmitting a value of a PMM to another PMM. A memory of each PMM holds a list of values of first items arranged in the ascending order or descending order without overlap and/or a list of values of the second item to be shared. A memory module of each PMM transmits a value contained in the value list to another PMM, receives a value contained in the value list from the another PMM, references the value list of the first item and the value list of the second item of the another PMM, and generates a list of common values considering the values contained in the value lists of the first item and the second item of all the other PMM.
    Type: Application
    Filed: January 25, 2005
    Publication date: December 18, 2008
    Inventor: Shinji Furusho
  • Patent number: 7467286
    Abstract: A method and apparatus are provided for executing packed data instructions. According to one aspect of the invention, a processor includes registers, a register renaming unit coupled to the registers, a decoder coupled to the register renaming unit, and a partial-width execution unit coupled to the decoder. The register renaming unit provides an architectural register file to store packed data operands that include data elements. The decoder is to decode a first and second set of instructions that each specify one or more registers in the architectural register file. Each of the instructions in the first set specify operations to be performed on all of the data elements. In contrast, each of the instructions in the second set specify operations to be performed on only a subset of the data elements. The partial-width execution unit is to execute operations specified by either the first or second set of instructions.
    Type: Grant
    Filed: May 9, 2005
    Date of Patent: December 16, 2008
    Assignee: Intel Corporation
    Inventors: Mohammad Abdallah, James Coke, Vladimir Pentkovski, Patrice Roussel, Shreekant S. Thakkar
  • Patent number: 7467288
    Abstract: A system and method for processing operations that use data vectors each comprising a plurality of data elements, in accordance with the present invention, includes a vector data file comprising a plurality of storage elements for storing data elements of the data vectors. A pointer array is coupled by a bus to the vector data file. The pointer array includes a plurality of entries wherein each entry identifies at least one storage element in the vector data file. The at least one storage element stores at least one data element of the data vectors, wherein for at least one particular entry in the pointer array, the at least one storage element identified by the particular entry has an arbitrary starting address in the vector data file.
    Type: Grant
    Filed: November 15, 2003
    Date of Patent: December 16, 2008
    Assignee: International Business Machines Corporation
    Inventors: Clair John Glossner, III, Erdem Hokenek, David Meltzer, Mayan Moudgill
  • Publication number: 20080301403
    Abstract: An apparatus including a first circuit and a second circuit. The first circuit may be configured to generate one or more command signals, a read data path control signal and one or more write data path control signals in response to an integrity protection control signal and one or more arbitration signals. The second circuit may be configured to write data to a memory and read data from the memory in response to the one or more command signals, the read data path control signal and the one or more write data path control signals. In a first mode, the data may be written and read without integrity protection. In a second mode the data may be written and read with integrity protection, and the integrity protection is written and read separately from the data.
    Type: Application
    Filed: May 29, 2007
    Publication date: December 4, 2008
    Inventors: Eskild T Arntzen, Jackson L. Ellis
  • Patent number: 7460989
    Abstract: A method is provided, wherein a virtual internal master clock is used in connection with a RISC CPU. The RISC CPU comprises a number of concurrently operating function units, wherein each unit runs according to its own clocks, including multiple-stage totally unsynchronized clocks, in order to process a stream of instructions. The method includes the steps of generating a virtual model master clock having a clock cycle, and initializing each of the function units at the beginning of respectively corresponding processing cycles. The method further includes operating each function unit during a respectively corresponding processing cycle to carry out a task with respect to one of the instructions, in order to produce a result. Respective results are all evaluated in synchronization, by means of the master clock. This enables the instruction processing operation to be modeled using a sequential computer language, such as C or C++.
    Type: Grant
    Filed: October 14, 2004
    Date of Patent: December 2, 2008
    Assignee: International Business Machines Corporation
    Inventor: Oliver Keren Ban
  • Publication number: 20080294871
    Abstract: A processor architecture includes a number of processing elements for treating input signals. The architecture is organized according to a matrix including rows and columns, the columns of which each include at least one microprocessor block having a computational part and a set of associated processing elements that are able to receive the same input signals. The number of associated processing elements is selectively variable in the direction of the column so as to exploit the parallelism of said signals. Additionally the processor architecture of the present invention enable dynamic switching between instruction parallelism and data parallel processing typical of vectorial functionality. The architecture can be scaled in various dimensions in an optimal configuration for the algorithm to be executed.
    Type: Application
    Filed: May 30, 2008
    Publication date: November 27, 2008
    Applicant: STMICROELECTRONICS S.R.L.
    Inventors: Francesco Pappalardo, Giuseppe Notarangelo, Elio Guidetti
  • Patent number: 7457941
    Abstract: A vector processing system for executing vector instructions, each instruction defining multiple pairs of values, an operation to be executed on each of said value pairs and a scalar modifier, the vector processing system comprising a plurality of parallel processing units, each arranged to receive one of said pairs of values and to implement the defined operation on said value pair to generate a respective result; and a scalar result unit for receiving the results of the parallel processing units and for using said results in a manner defined by the scalar modifier to generate a single output value for said instruction.
    Type: Grant
    Filed: January 3, 2006
    Date of Patent: November 25, 2008
    Assignee: Broadcom Corporation
    Inventors: Stephen Barlow, Neil Bailey, Timothy Ramsdale, David Plowman, Robert Swann
  • Patent number: 7454594
    Abstract: A processor and its arithmetic instruction processing method and arithmetic operation control method are disclosed that add a new operand designation option to SIMD arithmetic instructions and permit software pipelining between arithmetic operations performed in parallel by a SIMD arithmetic unit. A selector for adding an operation for interchanging multiple outputs of a SIMD arithmetic unit is added to a data path. A register file is divided in accordance with the output bit fields of the SIMD arithmetic unit. A means of specifying multiple registers as a SIMD instruction's output operand is added. Therefore, part of the output results of arithmetic operations performed in parallel by the SIMD arithmetic unit can be stored in a register providing the input for another arithmetic operation. Software pipelining is rendered achievable in this manner.
    Type: Grant
    Filed: December 17, 2002
    Date of Patent: November 18, 2008
    Assignee: Renesas Technology Corp.
    Inventor: Yuki Kondoh
  • Patent number: 7453882
    Abstract: One embodiment of the present invention provides a system that asynchronously controls the sending of data items from a sender to a receiver. The system includes a data path between the sender and the receiver, a first control path between the sender and the receiver, and a second control path between the sender and the receiver. The first control path and the second control path alternately control the asynchronous transmission of consecutive data items on the data path between the sender and the receiver.
    Type: Grant
    Filed: August 25, 2004
    Date of Patent: November 18, 2008
    Assignee: Sun Microsystems, Inc.
    Inventors: Ronald Ho, Jonathan K. Gainsley, Robert J. Drost
  • Patent number: 7454749
    Abstract: A virtual parallel computer is created within a programming environment comprising both shared memory and distributed memory architectures. At run time, the virtual architecture is mapped to a physical hardware architecture. In this manner, a massively parallel computing program may be developed and tested on a first architecture and run on a second architecture without reprogramming.
    Type: Grant
    Filed: November 12, 2002
    Date of Patent: November 18, 2008
    Assignee: Engineered Intelligence Corporation
    Inventor: Matthias Oberdorfer
  • Patent number: 7451294
    Abstract: A method and apparatus for a two micro-operation flow using source override. In one embodiment, the method includes the identification of a macro-instruction having one or more streaming single instruction multiple data extension type operands. Once received, the macro-instruction is decoded into a first micro-operation (uOP) and a second uOP. Once decoded, a signal is asserted to disable source operand override logic if the first micro-operation updates a logical destination register that matches a logical source register of the micro-operation. Otherwise, the mutual source override is active and executed by a register alias table (RAT) when uOP with matching logic source and destination register are detected in a same clock cycle. In doing so, macro-instructions having 128-bit operands may be processed using, for example, two uOPs (one for the lower half and one for the upper half) in a 64-bit implementation, while preserving the atomicity of the original instruction.
    Type: Grant
    Filed: July 30, 2003
    Date of Patent: November 11, 2008
    Assignee: Intel Corporation
    Inventors: Zeev Sperber, Yuval Bustan, Robert Valentine
  • Patent number: 7447873
    Abstract: In a multithreaded processing core, groups of threads are executed using single instruction, multiple data (SIMD) parallelism by a set of parallel processing engines. Input data defining objects to be processed received as a stream of input data blocks, and the input data blocks are loaded into a local register file in the core such that all of the data for one of the input objects is accessible to one of the processing engines. The input data can be loaded directly into the local register file, or the data can be accumulated in a buffer and loaded after accumulation, for instance during a launch operation for a SIMD group. Shared input data can also be loaded into a shared memory in the processing core.
    Type: Grant
    Filed: November 29, 2005
    Date of Patent: November 4, 2008
    Assignee: NVIDIA Corporation
    Inventor: Bryon S. Nordquist
  • Patent number: 7444496
    Abstract: An apparatus, system, and method are disclosed for determining the consistency of a database including indirect reference to data elements. There is provided an apparatus for determining consistency of a database. This database includes, in association with each data element, an indirect list element including a storage address of the associated or corresponding data element so that other data elements can reference that data element. This apparatus reads, from each data element, identification information of an indirect list element corresponding to that data element and generates a hash value. This apparatus further reads, from each data element, identification information of an indirect list element corresponding to a data element referenced by that data element and generates a hash value. On condition that these hash values are equivalent to each other, the apparatus determines that the database is consistent.
    Type: Grant
    Filed: September 27, 2006
    Date of Patent: October 28, 2008
    Assignee: International Business Machines Corporation
    Inventors: Tatsuyuki Shiomi, Shigeko Mori, Takashi Yonezawa
  • Patent number: 7441098
    Abstract: A method of executing instructions in a computer system on operands containing a plurality of packed objects in respective lanes of the operand is described. Each instruction defines an operation and contains a condition setting indicator settable independently of the operation. The status of the condition setting indicator determines whether or not multibit condition codes are set. When they are to be set, they are set depending on the results for carrying out the operation for each lane.
    Type: Grant
    Filed: May 6, 2005
    Date of Patent: October 21, 2008
    Assignee: Broadcom Corporation
    Inventor: Sophie Wilson
  • Patent number: 7441099
    Abstract: Methods and apparatuses for processing a Configurable Single-Instruction-Multiple-Data (CSIMD) instruction are disclosed. In the method, a lookup table (LUT) storing information is provided to support random access of memory locations associated with a plurality of processing elements (PEs) and to perform instruction variances by the PEs. A CSIMD instruction is received, comprising a command and an index to the lookup table (LUT), to be executed by the PEs. The command of the received CSIMD instruction is executed in parallel differently by the PEs using the LUT index to randomly access the memory locations.
    Type: Grant
    Filed: October 3, 2006
    Date of Patent: October 21, 2008
    Assignee: Hong Kong Applied Science and Technology Research Institute Company Limited
    Inventors: Wing Yee Lo, Simon Moy
  • Publication number: 20080215851
    Abstract: A method is provided for the functional control of program and/or data flows in digital signal processors and processors, which have respective closed and separated modules for program and data flow control, working in parallel with computers. The method enables a power-efficient adaptation of the signal processing with the applied SIMD command-type in the individual paths and minimizes the emergence of the appearance of NOP-commands with which the VLIW-architecture of the processor must be supplied. The adaptation of the signal processing is achieved by individually controlling the parallel signal processing of the processor in the data paths (DP) which respectively belong to a first and second slice. For this purpose, a single slice halt outputted from an SSM register bank switches the register clockline according to state-dependent signal processing.
    Type: Application
    Filed: May 5, 2008
    Publication date: September 4, 2008
    Inventors: Uwe Porst, Wolfram Drescher
  • Publication number: 20080209164
    Abstract: A microprocessor architecture comprises a plurality of processing elements arranged in a single instruction multiple data SIMD array, wherein each processing element includes a plurality of execution units, each of which is operable to process an instruction of a particular instruction type, a serial processor which includes a plurality of execution units, each of which is operable to process an instruction of a particular instruction type, and an instruction controller operable to receive a plurality of instructions, and to distribute received instructions to the execution units in dependence upon the instruction types of the received instruction. The execution units of the serial processor are operable to process respective instructions in parallel.
    Type: Application
    Filed: February 7, 2006
    Publication date: August 28, 2008
    Inventor: Leon David Wildman
  • Publication number: 20080209165
    Abstract: A SIMD microprocessor, which can be included in an image processing apparatus using an image processing method used therein, includes a global processor and multiple processor elements controlled by the global processor. Each single processor element of the multiple processor elements includes multiple operation units. The global processor is configured to control the multiple processing elements to uniformly change a configuration of the multiple operation units in the single processor element to determine a number of data units of operation according to the multiple operation units either operated individually or in cooperation with each other in the single processor element and a width of data processed per data unit of operation performed in the single processor element. A processor element number is assigned per data unit of operation to the single processor element to use for executing an operation.
    Type: Application
    Filed: February 27, 2008
    Publication date: August 28, 2008
    Inventor: Tomoaki Ozaki
  • Patent number: 7412587
    Abstract: A processor having a plurality of processing elements and a decoder operable to decode an instruction. Each of the plurality of processing elements includes: a transfer pattern storage unit operable to store a transfer pattern value that indicates a processing element from which data is transferred; a transfer unit operable to perform a data transfer from the processing element indicated by the transfer pattern value; and an update unit operable to update the transfer pattern value stored in the transfer pattern storage unit, in accordance with a result of decoding a latest instruction by the decoder.
    Type: Grant
    Filed: February 9, 2005
    Date of Patent: August 12, 2008
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Takeshi Tanaka, Hideshi Nishida, Masashi Hoshino, Takeshi Furuta
  • Patent number: 7401204
    Abstract: A parallel processor performs efficient parallel processing of one or more basic instructions contained in each of a plurality of instruction words delimited by instruction delimiting information. The processor includes: a plurality of instruction execution units performing processes in accordance with corresponding, supplied basic instructions in parallel; an instruction fetch unit fetching the instruction words one by one in accordance with the instruction delimiting information; and an instruction issue unit recognizing and, in accordance therewith, selecting each of the basic instructions contained in each of the instruction words fetched by the instruction fetch unit to a corresponding instruction execution unit to execute the basic instruction.
    Type: Grant
    Filed: September 1, 2000
    Date of Patent: July 15, 2008
    Assignee: Fujitsu Limited
    Inventors: Hideo Miyake, Atsuhiro Suga, Yasuki Nakamura, Yoshimasa Takebe
  • Publication number: 20080162873
    Abstract: In some embodiments, the invention involves a system and method to provide maximal boot-time parallelism for future multi-core, multi-node, and many-core systems. In an embodiment, the security (SEC), pre-EFI initialization (PEI), and then driver execution environment (DXE) phases are executed in parallel on multiple compute nodes (sockets) of a platform. Once the SEC/PEI/DXE phases are executed on all compute nodes having a processor, the boot device select (BDS) phase completes the boot by merging or partitioning the compute nodes based on a platform policy. Partitioned compute nodes each run their own instance of EFI. A common memory map may be generated prior to operating system (OS) launch when compute nodes are to be merged. Other embodiments are described and claimed.
    Type: Application
    Filed: December 28, 2006
    Publication date: July 3, 2008
    Inventors: Vincent J. Zimmer, Yufu Li, Michael A. Rothman, Burges M. Karkaria
  • Publication number: 20080162874
    Abstract: A data transfer controller for controlling transfer of data items in a data processing system comprising a single instruction multiple data (SIMD) array of processing elements is disclosed. The controller comprises a transfer controller operable to control transfer of data to and/or from an internal memory unit of a processing element in said array, each processing element including a processing unit and an internal memory unit, the transfer controller being operable such that data transfer to and/or from the internal memory unit is performed independently of the operation of the processing unit of the processing element concerned. Operation by said processing unit on a predetermined type of instruction may be blocked until after said data transfer is complete or, if said data transfer started after said operation commenced, said data transfer may be blocked until after said operation is complete.
    Type: Application
    Filed: June 19, 2007
    Publication date: July 3, 2008
    Inventors: Dave STUTTARD, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhodes, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
  • Publication number: 20080162875
    Abstract: A method of controlling access to memory by a processing element in a plurality of processing elements arranged in a single instruction multiple data (SIMD) processing array is disclosed. Each processing element includes an internal memory unit, and a register file. The method comprises retrieving an address value from the register file of the processing element, the address value relating to an address in the internal memory of the processing element, and accessing the internal memory on the basis of the address value.
    Type: Application
    Filed: July 6, 2007
    Publication date: July 3, 2008
    Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
  • Patent number: 7395531
    Abstract: A system and method is provided for vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores. In this framework, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirements of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residual iteration counts, and multiple statements with arbitrary alignment combinations. Loop peeling is used to reduce the computational overhead associated with misaligned data. A loop prologue and epilogue are peeled from individual iterations in the simdized loop, and vector-splicing instructions are applied to the peeled iterations, while the steady-state loop body incurs no additional computational overhead.
    Type: Grant
    Filed: August 16, 2004
    Date of Patent: July 1, 2008
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
  • Patent number: 7392329
    Abstract: In accordance with one embodiment of the present invention, a method of applying an action initiated for a portion of a plurality of devices to all of the plurality of devices is provided. The method comprises establishing a status block for a plurality of devices that are implemented on a system, and initiating an action for a portion of the plurality of devices. The method further comprises writing information to the status block identifying that the action was initiated, and based at least in part on the information written to the status block, applying the action to all of the plurality of devices.
    Type: Grant
    Filed: March 28, 2003
    Date of Patent: June 24, 2008
    Assignee: Hewlett-Packard Devopment, L.P.
    Inventors: Scott Lynn Michaelis, Marvin J. Spinhirne
  • Patent number: 7392368
    Abstract: Methods and apparatus for calculating Single-Instruction-Multiple-Data (SIMD) complex arithmetic. A coprocessor instruction has a format identifying a multiply and subtract instruction to generate real components for complex multiplication of first operand complex data and corresponding second operand complex data, a cross multiply and add instruction to generate imaginary components for complex multiplication of the first operand complex data and the corresponding second operand complex data, an add-subtract instruction to add real components of the first operand to imaginary components of the second operand and to subtract real components of the second operand from imaginary components of the first operand, and a subtract-add instruction to subtract the real components of the second operand from the imaginary components of the first operand and to add the real components of the first operand to the imaginary components of the second operand.
    Type: Grant
    Filed: June 30, 2005
    Date of Patent: June 24, 2008
    Assignee: Marvell International Ltd.
    Inventors: Moinul H. Khan, Nigel C. Paver, Bradley C. Aldrich
  • Publication number: 20080148011
    Abstract: The present disclosure provides a system and method for performing carry/borrow handling. A method according to one embodiment may include generating a first result having a first carry or borrow from a first mathematical operation and storing the first carry or borrow and a first pointer address in a temporary register. The method may further include generating a second result having a second carry or borrow from a second mathematical operation and calling a subroutine configured to perform carry and borrow handling. The method may also include copying the first pointer address from the temporary register into a global variable. Of course, many alternatives, variations and modifications are possible without departing from this embodiment.
    Type: Application
    Filed: December 14, 2006
    Publication date: June 19, 2008
    Applicant: INTEL CORPORATION
    Inventors: Vinodh Gopal, Gilbert M. Wolrich, Gunnar Gaubatz, Daniel Cutter, Wajdi Feghali, Kaan Yuksel, Erdinc Ozturk
  • Publication number: 20080140750
    Abstract: An apparatus and method are provided for performing rearrangement operations and arithmetic operations on data. The data processing apparatus has processing circuitry for performing SIMD processing operations and scalar processing operations, a register bank for storing data and control circuitry responsive to program instructions to control the processing circuitry to perform data processing operations. The control circuitry is arranged to responsive to a combined rearrangement arithmetic instruction to control the processing circuitry to perform a rearrangement operation and at least one SIMD arithmetic operation on a plurality of data elements stored in the register bank. The rearrangement operation is configurable by a size parameter derived at least in part from the register bank. The size parameter provides an indication of a number of data elements forming a rearrangement element for the purposes of the rearrangement operation.
    Type: Application
    Filed: November 29, 2007
    Publication date: June 12, 2008
    Inventors: Daniel Kershaw, Mladen Wilder, Dominic Hugo Symes
  • Patent number: 7383421
    Abstract: A data processing system includes an associative memory device containing n-cells, each of the n-cells includes a processing circuit. A controller is utilized for issuing one of a plurality of instructions to the associative memory device, while a clock device is utilized for outputting a synchronizing clock signal comprised of a predetermined number of clock cycles per second. The clock device outputs the synchronizing clock signal to the associative memory device and the controller which globally communicates one of the plurality of instructions to all of the n-cells simultaneously, within one of the clock cycles.
    Type: Grant
    Filed: December 4, 2003
    Date of Patent: June 3, 2008
    Assignee: Brightscale, Inc.
    Inventors: Gheorghe Stefan, Dan Tomescu
  • Patent number: 7376812
    Abstract: A processor can achieve high code density while allowing higher performance than existing architectures, particularly for Digital Signal Processing (DSP) applications. In accordance with one aspect, the processor supports three possible instruction sizes while maintaining the simplicity of programming and allowing efficient physical implementation. Most of the application code can be encoded using two sets of narrow size instructions to achieve high code density. Adding a third (and larger, i.e. VLIW) instruction size allows the architecture to encode multiple operations per instruction for the performance critical section of the code. Further, each operation of the VLIW format instruction can optionally be a SIMD operation that operates upon vector data. A scheme for the optimal utilization (highest achievable performance for the given amount of hardware) of multiply-accumulate (MAC) hardware is also provided.
    Type: Grant
    Filed: May 13, 2002
    Date of Patent: May 20, 2008
    Assignee: Tensilica, Inc.
    Inventors: Himanshu A. Sanghavi, Earl A. Killian, James Robert Kennedy, Darin S. Petkov, Peng Tu, William A. Huffman
  • Patent number: 7373488
    Abstract: A method and apparatus for calculation and storage of Single-Instruction-Multiple-Data (SIMD) saturation history information. A first coprocessor instruction has a first format identifying a saturating operation, a first source having packed data elements and a second source having packed data elements. The saturating operation is executed on the packed data elements of the first and second sources. Saturation flags are stored in the Wireless Coprocessor Saturation Status Flag (wCSSF) register to indicate if a result of the saturating operation saturated. A second coprocessor instruction has a second format identifying a saturation history processing operation and a saturation data size. An operand for the processing operation is determined based on the saturation data size, and the processing operation is executed on the saturation flags and the operand for the saturation data size. Condition code flags are stored in a status register to indicate the result of processing operation.
    Type: Grant
    Filed: April 30, 2007
    Date of Patent: May 13, 2008
    Assignee: Marvell International Ltd.
    Inventors: Nigel C. Paver, Bradley C. Aldrich
  • Patent number: 7367026
    Abstract: A method, computer program product, and information handling system for generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop contains multiple non-stride-one memory accesses that operate over a contiguous stream of memory is disclosed. A preferred embodiment identifies groups of isomorphic statements within a loop body where the isomorphic statements operate over a contiguous stream of memory over the iteration of the loop. Those identified statements are then converted into virtual-length vector operations. Next, the hardware's available vector length is used to determine a number of virtual-length vectors to aggregate into a single vector operation for each iteration of the loop. Finally, the aggregated, vectorized loop code is converted into SIMD operations.
    Type: Grant
    Filed: August 16, 2004
    Date of Patent: April 29, 2008
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
  • Patent number: 7363472
    Abstract: A data processing apparatus includes a SIMD (Single Instruction Multiple Data) array (10) of processing elements. The processing elements are operably divided into a plurality of processing blocks, the processing blocks being operable to process respective groups of data items.
    Type: Grant
    Filed: October 9, 2001
    Date of Patent: April 22, 2008
    Assignee: Clearspeed Technology Limited
    Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer