Single Instruction, Multiple Data (simd) Patents (Class 712/22)

SIMD processor and addressing method

Publication number: 20090125702

Abstract: A single instruction, multiple data (SIMD) processor including a plurality of addressing register sets, used to flexibly calculate effective operand source and destination memory addresses is disclosed. Two or more address generators calculate effective addresses using the register sets. Each register set includes a pointer register, and a scale register. An address generator forms effective addresses from a selected register set's pointer register and scale register; and an offset. For example, the effective memory address may be formed by multiplying the scale value by an offset value and summing the pointer and the scale value multiplied by the offset value.

Type: Application

Filed: August 29, 2008

Publication date: May 14, 2009

Applicant: ATI Technologies Inc.

Inventors: Richard J. Selvaggi, Larry A. Pearlstein
Parallel data processing apparatus

Patent number: 7526630

Abstract: A controller operable to control an array of processing elements comprises a retrieval unit operable to retrieve instruction items for each of a plurality of instructions streams, each instruction stream having a plurality of instructions items, a combining unit operable to combine the plurality of instruction streams into a serial instruction stream, and a distribution unit operable to distribute the serial instruction stream to an array of processing elements.

Type: Grant

Filed: January 4, 2007

Date of Patent: April 28, 2009

Assignee: Clearspeed Technology, PLC

Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russel David, Ray McConnell, Tim Day, Trey Greer
Parallel Image Processing System Control Method And Apparatus

Publication number: 20090106528

Abstract: To reduce the required amount of program codes when processing the whole image in a one-dimensional SIMD parallel image processing system having a smaller number of PEs than the number of pixels in the width direction of the image to be processed. A controller for controlling a PE array includes a command repetitive-execution part, which includes an operand converting part, a memory address converting part, and an operation code converting part. When a command fetching/decoding part reads and executes program codes stored in a program memory, the repetitive-execution part determines the program codes to cause the operand converting part, memory address converting part and operation code converting part to perform conversions in accordance with the command, thereby performing a repetitive execution of the one-command program description adaptive to a plurality of related pixels assigned to the PEs, whereby the program code amount can be reduced.

Type: Application

Filed: December 5, 2006

Publication date: April 23, 2009

Inventor: Takuya Koga
Splat copying GPR data to vector register elements by executing lvsr or lvsl and vector subtract instructions

Patent number: 7516299

Abstract: A method for transferring data from a general purpose register (GPR) to a vector register (VR), the method including vectorially combining data in the VR from the GPR, by executing instructions of a PowerPC Instruction Set Architecture (ISA), the step of combining including splatting a low nibble from the GPR into a low nibble in each element of a first VR by executing two “load vector for shift left” (lvsl) or “load vector for shift right” (lvsr) and one “vector subtract unsigned byte modulo” (vsububm), shifting a high nibble of the GPR into a low nibble the GPR, splatting the low nibble of the GPR into a low nibble in each element of a second VR by re-executing the two lvsl or lvsr and one vsububm instructions, shifting the low nibble of the second VR into a high nibble of the second VR and combining both first and second VRs into one VR.

Type: Grant

Filed: August 29, 2005

Date of Patent: April 7, 2009

Assignee: International Business Machines Corporation

Inventors: Daniel Citron, Ayal Zaks
Compact processor element for a scalable digital logic verification and emulation system

Patent number: 7509602

Abstract: A logic simulation acceleration processor optimized for multi-value logic level simulation of electronic systems described in hardware description languages.

Type: Grant

Filed: January 25, 2006

Date of Patent: March 24, 2009

Assignee: Eve S.A.

Inventors: Subbu Ganesan, Leonid Alexander Broukhis, Ramesh Narayanaswamy, Ian Michael Nixon, Thomas Hanni Spencer
Parallel data processing apparatus

Patent number: 7506136

Abstract: A controller for controlling a data processor having a plurality of processor arrays, each of which includes a plurality of processing elements, comprises a retrieval unit operable to retrieve a plurality of incoming instructions streams in parallel with one another, and a distribution unit operable to supply such incoming instruction streams to respective ones of the said plurality of processor arrays.

Type: Grant

Filed: January 10, 2007

Date of Patent: March 17, 2009

Assignee: Clearspeed Technology PLC

Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
Histogram generation with vector operations in SIMD and VLIW processor by consolidating LUTs storing parallel update incremented count values for vector data elements

Patent number: 7506135

Abstract: The present invention provides histogram calculation for images and video applications using a SIMD and VLIW processor with vector Look-Up Table (LUT) operations. This provides a speed up of histogram calculation by a factor of N times over a scalar processor where the SIMD processor could perform N LUT operations per instruction. Histogram operation is partitioned into a vector LUT operation, followed by vector increment, vector LUT update, and at the end by reduction of vector histogram components. The present invention could be used for intensity, RGBA, YUV, and other type of multi-component images.

Type: Grant

Filed: May 20, 2003

Date of Patent: March 17, 2009

Inventor: Tibet Mimar
SIMD-RISC microprocessor architecture

Patent number: 7496673

Abstract: A computer architecture and programming model for high speed processing over broadband networks are provided. The architecture employs a consistent modular structure, a common computing module and uniform software cells. The common computing module includes a control processor, a plurality of processing units, a plurality of local memories from which the processing units process programs, a direct memory access controller and a shared main memory. A synchronized system and method for the coordinated reading and writing of data to and from the shared main memory by the processing units also are provided. A hardware sandbox structure is provided for security against the corruption of data among the programs being processed by the processing units. The uniform software cells contain both data and applications and are structured for processing by any of the processors of the network. Each software cell is uniquely identified on the network.

Type: Grant

Filed: February 24, 2005

Date of Patent: February 24, 2009

Assignee: International Business Machines Corporation

Inventors: Michael Karl Gschwind, Harm Peter Hofstee, Martin E. Hopkins, James Allan Kahle
Method and system for local memory addressing in single instruction, multiple data computer system

Patent number: 7490190

Abstract: A single instruction, multiple data (“SIMD”) computer system includes a central control unit coupled to 256 processing elements (“PEs”) and to 32 static random access memory (“SRAM”) devices. Each group of eight PEs can access respective groups of eight columns in a respective SRAM device. Each PE includes a local column address register that can be loaded through a data bus of the respective PE. A local column address stored in the local column address register is applied to an AND gate, which selects either the local column address or a column address applied to the AND gate by the central control unit. As a result, the central control unit can globally access the SRAM device, or a specific one of the eight columns that can be accessed by each PE can be selected locally by the PE.

Type: Grant

Filed: October 5, 2006

Date of Patent: February 10, 2009

Assignee: Micron Technology, Inc.

Inventor: Jon Skull
Image processing method and device

Patent number: 7483595

Abstract: An image processing method and device for processing multiple rows of pixels of an image simultaneously with a single instruction. The processing includes selecting a pixel window having a plurality of pixels of an image spanning across multiple rows and columns, building vertical and horizontal load registers to include the plurality of pixels of the selected pixel window, and simultaneously processing selected pixels of the plurality of pixels included in the vertical and horizontal load registers using a single instruction, wherein the vertical and horizontal load registers are shifted when the selected pixels are processed. Accordingly, a method and device for efficient processing of an image is provided.

Type: Grant

Filed: September 16, 2004

Date of Patent: January 27, 2009

Assignee: Marvell International Technology Ltd.

Inventors: Douglas Gene Keithley, Roy Gideon Moss
Executing an SIMD instruction requiring P operations on an execution unit that performs Q operations at a time (Q

Patent number: 7484076

Abstract: Methods, apparatuses, and systems are presented for performing instructions using multiple execution units in a graphics processing unit involving issuing an instruction for P executions of the instruction wherein each execution uses different data, P being a positive integer, the instruction being issued based on a first clock having a first clock rate, operating Q execution units to achieve the P executions of the instruction, Q being a positive integer less than P and greater than one, each of the execution units being operated based on a second clock having a second clock rate higher than the first clock rate of the first clock, and wherein the second clock rate of the second clock is equal to the first clock rate of the first clock multiplied by the ratio P/Q.

Type: Grant

Filed: September 18, 2006

Date of Patent: January 27, 2009

Assignee: Nvidia Corporation

Inventors: Stuart F. Oberman, Ming Y. Siu, Sameer D. Halepete

PROCESS FOR THE AUTOMATIC PRODUCTION OF A PROCESSOR FROM A MACHINE DESCRIPTION

Publication number: 20090024832

Abstract: The invention is based on the task to undertake machine descriptions, with which an automated optimal hardware design of SIMD processors can be carried out. This is solved by the fact that functional units are selected from a criterion in the machine description, which is vector processible. A first or second reduced functional unit are selectively defined from a respective vector-processing functional unit, in which the reduced functional units process only a data element of a vectoral value. All reduced functional units, which use common control signals for the processing of a respective data element belonging to the vectoral value, are condensed to a disk. Reduced functional units, which process the same data elements in a sequence at least indirectly, are condensed to a disk module. The disk is reproduced with the contained reduced functional units so often that all reduced functional units represent the functionality of their respective selected vector-processing functional unit.

Type: Application

Filed: November 23, 2004

Publication date: January 22, 2009

Inventor: Gordon Cichon
Parallel processing device and parallel processing method

Patent number: 7480785

Abstract: A row decoding circuit (171) outputs a select signal to a row set in a row range setting unit (172) to select a select signal line (103), processing results from processing circuits (102) on this row are output to a data output line (104), and a row adder (106) adds processing results output to a data output line (104) of a column set in a column range selector (105).

Type: Grant

Filed: February 13, 2004

Date of Patent: January 20, 2009

Assignee: Nippon Telegraph and Telephone Corporation

Inventors: Toshishige Shimamura, Hiroki Morimura, Koji Fujii, Satoshi Shigematsu, Katsuyuki Machida
COMPUTING UNIT AND IMAGE FILTERING DEVICE

Publication number: 20090013152

Abstract: A processor capable of performing a filter processing in a high speed is provided. A computing unit comprises a computer for performing a filter processing. Data supply to the computer is performed by an internal register configured by a flip-flop. Data read from the internal register is outputted to a shift register and the data is supplied to the computer per cycle. And, the computing unit comprises a mechanism for changing a filter computing direction according to a motion vector, thereby preventing performance lowering due to branched command by performing a horizontal filtering and a vertical filtering by a same command.

Type: Application

Filed: July 7, 2008

Publication date: January 8, 2009

Inventors: Masakazu EHAMA, Koji Hosogi, Seiji Mochizuki
SIMD MICROPROCESSOR AND DATA TRANSFER METHOD FOR USE IN SIMD MICROPROCESSOR

Publication number: 20090013150

Abstract: A disclosed SIMD microprocessor includes plural processor elements each having n arithmetic circuits and n registers configured to temporarily store data pieces to be input to the arithmetic circuits, n being a natural number equal to or greater than 2, and; a control circuit configured to determine an arrangement order of the processor elements and an arrangement order of the arithmetic circuits in the processor elements and determine whether to use the n arithmetic circuits as a single arithmetic circuit or as n arithmetic circuits. Each processor element further includes n shifter pairs each including a PE shifter and a bit shifter; and n shift data selection circuits configured to select arbitrary data pieces from the data pieces in the shifter pairs, perform bit extension on the data pieces, and transfer the data pieces to the arithmetic circuits.

Type: Application

Filed: June 24, 2008

Publication date: January 8, 2009

Inventor: TOSHIKI YAMANAKA
SIMD TYPE MICROPROCESSOR

Publication number: 20090013151

Abstract: An SIMD type microprocessor is disclosed. The SIMD type microprocessor includes plural PEs (processor elements) each of which provides an ALU (arithmetic and logic unit) for lower-order bits, an ALU for upper-order bits, a control circuit for lower-order bits, a control circuit for upper-order bits, a range determining circuit for lower-order bits, and a range determining circuit for upper-order bits. The SIMD type microprocessor further includes a global processor, a range designation bus for lower-order bits which connects the global processor to the range determining circuit for lower-order bits, and a range designation bus for upper-order bits which connects the global processor to the range determining circuit for upper-order bits.

Type: Application

Filed: July 3, 2008

Publication date: January 8, 2009

Inventor: KAZUHIKO HARA
Interconnections in Simd Processor Architectures

Publication number: 20080320273

Abstract: A single instruction multiple data (SIMD) processor (1) comprises a processing element array (10) including a plurality of processing elements (PEO . . . PEN), and a memory array (14) operably divided into memory portions (141 . . . 14N), each memory portion being assigned to a particular processing element. A first processing element (PEN) is operable to access a portion of the memory array (14) assigned to that first processing element and also to access a portion of the memory array assigned to a second processing element. Such access is made using an index value indicative of the processing element assigned to the memory position to be accessed.

Type: Application

Filed: September 8, 2005

Publication date: December 25, 2008

Applicant: KONINKLIJKE PHILIPS ELECTRONICS N.V.

Inventors: Anteneh A. Abbo, Leo Sevat, Richard P. Kleihorst
Distributed Memory Type Information Processing System

Publication number: 20080313423

Abstract: An information processing system includes a plurality of PMM and data transmission paths for connection between the PMM and transmitting a value of a PMM to another PMM. A memory of each PMM holds a list of values of first items arranged in the ascending order or descending order without overlap and/or a list of values of the second item to be shared. A memory module of each PMM transmits a value contained in the value list to another PMM, receives a value contained in the value list from the another PMM, references the value list of the first item and the value list of the second item of the another PMM, and generates a list of common values considering the values contained in the value lists of the first item and the second item of all the other PMM.

Type: Application

Filed: January 25, 2005

Publication date: December 18, 2008

Inventor: Shinji Furusho
Executing partial-width packed data instructions

Patent number: 7467286

Abstract: A method and apparatus are provided for executing packed data instructions. According to one aspect of the invention, a processor includes registers, a register renaming unit coupled to the registers, a decoder coupled to the register renaming unit, and a partial-width execution unit coupled to the decoder. The register renaming unit provides an architectural register file to store packed data operands that include data elements. The decoder is to decode a first and second set of instructions that each specify one or more registers in the architectural register file. Each of the instructions in the first set specify operations to be performed on all of the data elements. In contrast, each of the instructions in the second set specify operations to be performed on only a subset of the data elements. The partial-width execution unit is to execute operations specified by either the first or second set of instructions.

Type: Grant

Filed: May 9, 2005

Date of Patent: December 16, 2008

Assignee: Intel Corporation

Inventors: Mohammad Abdallah, James Coke, Vladimir Pentkovski, Patrice Roussel, Shreekant S. Thakkar
Vector register file with arbitrary vector addressing

Patent number: 7467288

Abstract: A system and method for processing operations that use data vectors each comprising a plurality of data elements, in accordance with the present invention, includes a vector data file comprising a plurality of storage elements for storing data elements of the data vectors. A pointer array is coupled by a bus to the vector data file. The pointer array includes a plurality of entries wherein each entry identifies at least one storage element in the vector data file. The at least one storage element stores at least one data element of the data vectors, wherein for at least one particular entry in the pointer array, the at least one storage element identified by the particular entry has an arbitrary starting address in the vector data file.

Type: Grant

Filed: November 15, 2003

Date of Patent: December 16, 2008

Assignee: International Business Machines Corporation

Inventors: Clair John Glossner, III, Erdem Hokenek, David Meltzer, Mayan Moudgill
SYSTEM FOR INTEGRITY PROTECTION FOR STANDARD 2N-BIT MULTIPLE SIZED MEMORY DEVICES

Publication number: 20080301403

Abstract: An apparatus including a first circuit and a second circuit. The first circuit may be configured to generate one or more command signals, a read data path control signal and one or more write data path control signals in response to an integrity protection control signal and one or more arbitration signals. The second circuit may be configured to write data to a memory and read data from the memory in response to the one or more command signals, the read data path control signal and the one or more write data path control signals. In a first mode, the data may be written and read without integrity protection. In a second mode the data may be written and read with integrity protection, and the integrity protection is written and read separately from the data.

Type: Application

Filed: May 29, 2007

Publication date: December 4, 2008

Inventors: Eskild T Arntzen, Jackson L. Ellis
Method and apparatus for modeling multiple concurrently dispatched instruction streams in super scalar CPU with a sequential language

Patent number: 7460989

Abstract: A method is provided, wherein a virtual internal master clock is used in connection with a RISC CPU. The RISC CPU comprises a number of concurrently operating function units, wherein each unit runs according to its own clocks, including multiple-stage totally unsynchronized clocks, in order to process a stream of instructions. The method includes the steps of generating a virtual model master clock having a clock cycle, and initializing each of the function units at the beginning of respectively corresponding processing cycles. The method further includes operating each function unit during a respectively corresponding processing cycle to carry out a task with respect to one of the instructions, in order to produce a result. Respective results are all evaluated in synchronization, by means of the master clock. This enables the instruction processing operation to be modeled using a sequential computer language, such as C or C++.

Type: Grant

Filed: October 14, 2004

Date of Patent: December 2, 2008

Assignee: International Business Machines Corporation

Inventor: Oliver Keren Ban
MULTIDIMENSIONAL PROCESSOR ARCHITECTURE

Publication number: 20080294871

Abstract: A processor architecture includes a number of processing elements for treating input signals. The architecture is organized according to a matrix including rows and columns, the columns of which each include at least one microprocessor block having a computational part and a set of associated processing elements that are able to receive the same input signals. The number of associated processing elements is selectively variable in the direction of the column so as to exploit the parallelism of said signals. Additionally the processor architecture of the present invention enable dynamic switching between instruction parallelism and data parallel processing typical of vectorial functionality. The architecture can be scaled in various dimensions in an optimal configuration for the algorithm to be executed.

Type: Application

Filed: May 30, 2008

Publication date: November 27, 2008

Applicant: STMICROELECTRONICS S.R.L.

Inventors: Francesco Pappalardo, Giuseppe Notarangelo, Elio Guidetti
Vector processing system

Patent number: 7457941

Abstract: A vector processing system for executing vector instructions, each instruction defining multiple pairs of values, an operation to be executed on each of said value pairs and a scalar modifier, the vector processing system comprising a plurality of parallel processing units, each arranged to receive one of said pairs of values and to implement the defined operation on said value pair to generate a respective result; and a scalar result unit for receiving the results of the parallel processing units and for using said results in a manner defined by the scalar modifier to generate a single output value for said instruction.

Type: Grant

Filed: January 3, 2006

Date of Patent: November 25, 2008

Assignee: Broadcom Corporation

Inventors: Stephen Barlow, Neil Bailey, Timothy Ramsdale, David Plowman, Robert Swann
Processor for realizing software pipelining with a SIMD arithmetic unit simultaneously processing each SIMD instruction on a plurality of discrete elements

Patent number: 7454594

Abstract: A processor and its arithmetic instruction processing method and arithmetic operation control method are disclosed that add a new operand designation option to SIMD arithmetic instructions and permit software pipelining between arithmetic operations performed in parallel by a SIMD arithmetic unit. A selector for adding an operation for interchanging multiple outputs of a SIMD arithmetic unit is added to a data path. A register file is divided in accordance with the output bit fields of the SIMD arithmetic unit. A means of specifying multiple registers as a SIMD instruction's output operand is added. Therefore, part of the output results of arithmetic operations performed in parallel by the SIMD arithmetic unit can be stored in a register providing the input for another arithmetic operation. Software pipelining is rendered achievable in this manner.

Type: Grant

Filed: December 17, 2002

Date of Patent: November 18, 2008

Assignee: Renesas Technology Corp.

Inventor: Yuki Kondoh
Apparatus and method for asynchronously controlling data transfers across long wires

Patent number: 7453882

Abstract: One embodiment of the present invention provides a system that asynchronously controls the sending of data items from a sender to a receiver. The system includes a data path between the sender and the receiver, a first control path between the sender and the receiver, and a second control path between the sender and the receiver. The first control path and the second control path alternately control the asynchronous transmission of consecutive data items on the data path between the sender and the receiver.

Type: Grant

Filed: August 25, 2004

Date of Patent: November 18, 2008

Assignee: Sun Microsystems, Inc.

Inventors: Ronald Ho, Jonathan K. Gainsley, Robert J. Drost
Scalable parallel processing on shared memory computers

Patent number: 7454749

Abstract: A virtual parallel computer is created within a programming environment comprising both shared memory and distributed memory architectures. At run time, the virtual architecture is mapped to a physical hardware architecture. In this manner, a massively parallel computing program may be developed and tested on a first architecture and run on a second architecture without reprogramming.

Type: Grant

Filed: November 12, 2002

Date of Patent: November 18, 2008

Assignee: Engineered Intelligence Corporation

Inventor: Matthias Oberdorfer
Apparatus and method for two micro-operation flow using source override

Patent number: 7451294

Abstract: A method and apparatus for a two micro-operation flow using source override. In one embodiment, the method includes the identification of a macro-instruction having one or more streaming single instruction multiple data extension type operands. Once received, the macro-instruction is decoded into a first micro-operation (uOP) and a second uOP. Once decoded, a signal is asserted to disable source operand override logic if the first micro-operation updates a logical destination register that matches a logical source register of the micro-operation. Otherwise, the mutual source override is active and executed by a register alias table (RAT) when uOP with matching logic source and destination register are detected in a same clock cycle. In doing so, macro-instructions having 128-bit operands may be processed using, for example, two uOPs (one for the lower half and one for the upper half) in a 64-bit implementation, while preserving the atomicity of the original instruction.

Type: Grant

Filed: July 30, 2003

Date of Patent: November 11, 2008

Assignee: Intel Corporation

Inventors: Zeev Sperber, Yuval Bustan, Robert Valentine
Multithreaded SIMD parallel processor with loading of groups of threads

Patent number: 7447873

Abstract: In a multithreaded processing core, groups of threads are executed using single instruction, multiple data (SIMD) parallelism by a set of parallel processing engines. Input data defining objects to be processed received as a stream of input data blocks, and the input data blocks are loaded into a local register file in the core such that all of the data for one of the input objects is accessible to one of the processing engines. The input data can be loaded directly into the local register file, or the data can be accumulated in a buffer and loaded after accumulation, for instance during a launch operation for a SIMD group. Shared input data can also be loaded into a shared memory in the processing core.

Type: Grant

Filed: November 29, 2005

Date of Patent: November 4, 2008

Assignee: NVIDIA Corporation

Inventor: Bryon S. Nordquist
Apparatus, system, and method for determining the consistency of a database

Patent number: 7444496

Abstract: An apparatus, system, and method are disclosed for determining the consistency of a database including indirect reference to data elements. There is provided an apparatus for determining consistency of a database. This database includes, in association with each data element, an indirect list element including a storage address of the associated or corresponding data element so that other data elements can reference that data element. This apparatus reads, from each data element, identification information of an indirect list element corresponding to that data element and generates a hash value. This apparatus further reads, from each data element, identification information of an indirect list element corresponding to a data element referenced by that data element and generates a hash value. On condition that these hash values are equivalent to each other, the apparatus determines that the database is consistent.

Type: Grant

Filed: September 27, 2006

Date of Patent: October 28, 2008

Assignee: International Business Machines Corporation

Inventors: Tatsuyuki Shiomi, Shigeko Mori, Takashi Yonezawa
Conditional execution of instructions in a computer

Patent number: 7441098

Abstract: A method of executing instructions in a computer system on operands containing a plurality of packed objects in respective lanes of the operand is described. Each instruction defines an operation and contains a condition setting indicator settable independently of the operation. The status of the condition setting indicator determines whether or not multibit condition codes are set. When they are to be set, they are set depending on the results for carrying out the operation for each lane.

Type: Grant

Filed: May 6, 2005

Date of Patent: October 21, 2008

Assignee: Broadcom Corporation

Inventor: Sophie Wilson
Configurable SIMD processor instruction specifying index to LUT storing information for different operation and memory location for each processing unit

Patent number: 7441099

Abstract: Methods and apparatuses for processing a Configurable Single-Instruction-Multiple-Data (CSIMD) instruction are disclosed. In the method, a lookup table (LUT) storing information is provided to support random access of memory locations associated with a plurality of processing elements (PEs) and to perform instruction variances by the PEs. A CSIMD instruction is received, comprising a command and an index to the lookup table (LUT), to be executed by the PEs. The command of the received CSIMD instruction is executed in parallel differently by the PEs using the LUT index to randomly access the memory locations.

Type: Grant

Filed: October 3, 2006

Date of Patent: October 21, 2008

Assignee: Hong Kong Applied Science and Technology Research Institute Company Limited

Inventors: Wing Yee Lo, Simon Moy
Method and arrangement for the power-efficient control of processors

Publication number: 20080215851

Abstract: A method is provided for the functional control of program and/or data flows in digital signal processors and processors, which have respective closed and separated modules for program and data flow control, working in parallel with computers. The method enables a power-efficient adaptation of the signal processing with the applied SIMD command-type in the individual paths and minimizes the emergence of the appearance of NOP-commands with which the VLIW-architecture of the processor must be supplied. The adaptation of the signal processing is achieved by individually controlling the parallel signal processing of the processor in the data paths (DP) which respectively belong to a first and second slice. For this purpose, a single slice halt outputted from an SSM register bank switches the register clockline according to state-dependent signal processing.

Type: Application

Filed: May 5, 2008

Publication date: September 4, 2008

Inventors: Uwe Porst, Wolfram Drescher
Microprocessor Architectures

Publication number: 20080209164

Abstract: A microprocessor architecture comprises a plurality of processing elements arranged in a single instruction multiple data SIMD array, wherein each processing element includes a plurality of execution units, each of which is operable to process an instruction of a particular instruction type, a serial processor which includes a plurality of execution units, each of which is operable to process an instruction of a particular instruction type, and an instruction controller operable to receive a plurality of instructions, and to distribute received instructions to the execution units in dependence upon the instruction types of the received instruction. The execution units of the serial processor are operable to process respective instructions in parallel.

Type: Application

Filed: February 7, 2006

Publication date: August 28, 2008

Inventor: Leon David Wildman
SIMD MICROPROCESSOR, IMAGE PROCESSING APPARATUS INCLUDING SAME, AND IMAGE PROCESSING METHOD USED THEREIN

Publication number: 20080209165

Abstract: A SIMD microprocessor, which can be included in an image processing apparatus using an image processing method used therein, includes a global processor and multiple processor elements controlled by the global processor. Each single processor element of the multiple processor elements includes multiple operation units. The global processor is configured to control the multiple processing elements to uniformly change a configuration of the multiple operation units in the single processor element to determine a number of data units of operation according to the multiple operation units either operated individually or in cooperation with each other in the single processor element and a width of data processed per data unit of operation performed in the single processor element. A processor element number is assigned per data unit of operation to the single processor element to use for executing an operation.

Type: Application

Filed: February 27, 2008

Publication date: August 28, 2008

Inventor: Tomoaki Ozaki
Parallel operation processor utilizing SIMD data transfers

Patent number: 7412587

Abstract: A processor having a plurality of processing elements and a decoder operable to decode an instruction. Each of the plurality of processing elements includes: a transfer pattern storage unit operable to store a transfer pattern value that indicates a processing element from which data is transferred; a transfer unit operable to perform a data transfer from the processing element indicated by the transfer pattern value; and an update unit operable to update the transfer pattern value stored in the transfer pattern storage unit, in accordance with a result of decoding a latest instruction by the decoder.

Type: Grant

Filed: February 9, 2005

Date of Patent: August 12, 2008

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Takeshi Tanaka, Hideshi Nishida, Masashi Hoshino, Takeshi Furuta
Parallel Processor efficiently executing variable instruction word

Patent number: 7401204

Abstract: A parallel processor performs efficient parallel processing of one or more basic instructions contained in each of a plurality of instruction words delimited by instruction delimiting information. The processor includes: a plurality of instruction execution units performing processes in accordance with corresponding, supplied basic instructions in parallel; an instruction fetch unit fetching the instruction words one by one in accordance with the instruction delimiting information; and an instruction issue unit recognizing and, in accordance therewith, selecting each of the basic instructions contained in each of the instruction words fetched by the instruction fetch unit to a corresponding instruction execution unit to execute the basic instruction.

Type: Grant

Filed: September 1, 2000

Date of Patent: July 15, 2008

Assignee: Fujitsu Limited

Inventors: Hideo Miyake, Atsuhiro Suga, Yasuki Nakamura, Yoshimasa Takebe
Heterogeneous multiprocessing

Publication number: 20080162873

Abstract: In some embodiments, the invention involves a system and method to provide maximal boot-time parallelism for future multi-core, multi-node, and many-core systems. In an embodiment, the security (SEC), pre-EFI initialization (PEI), and then driver execution environment (DXE) phases are executed in parallel on multiple compute nodes (sockets) of a platform. Once the SEC/PEI/DXE phases are executed on all compute nodes having a processor, the boot device select (BDS) phase completes the boot by merging or partitioning the compute nodes based on a platform policy. Partitioned compute nodes each run their own instance of EFI. A common memory map may be generated prior to operating system (OS) launch when compute nodes are to be merged. Other embodiments are described and claimed.

Type: Application

Filed: December 28, 2006

Publication date: July 3, 2008

Inventors: Vincent J. Zimmer, Yufu Li, Michael A. Rothman, Burges M. Karkaria
PARALLEL DATA PROCESSING APPARATUS

Publication number: 20080162874

Abstract: A data transfer controller for controlling transfer of data items in a data processing system comprising a single instruction multiple data (SIMD) array of processing elements is disclosed. The controller comprises a transfer controller operable to control transfer of data to and/or from an internal memory unit of a processing element in said array, each processing element including a processing unit and an internal memory unit, the transfer controller being operable such that data transfer to and/or from the internal memory unit is performed independently of the operation of the processing unit of the processing element concerned. Operation by said processing unit on a predetermined type of instruction may be blocked until after said data transfer is complete or, if said data transfer started after said operation commenced, said data transfer may be blocked until after said operation is complete.

Type: Application

Filed: June 19, 2007

Publication date: July 3, 2008

Inventors: Dave STUTTARD, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhodes, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
Parallel Data Processing Apparatus

Publication number: 20080162875

Abstract: A method of controlling access to memory by a processing element in a plurality of processing elements arranged in a single instruction multiple data (SIMD) processing array is disclosed. Each processing element includes an internal memory unit, and a register file. The method comprises retrieving an address value from the register file of the processing element, the address value relating to an address in the internal memory of the processing element, and accessing the internal memory on the basis of the address value.

Type: Application

Filed: July 6, 2007

Publication date: July 3, 2008

Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
Framework for efficient code generation using loop peeling for SIMD loop code with multiple misaligned statements

Patent number: 7395531

Abstract: A system and method is provided for vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores. In this framework, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirements of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residual iteration counts, and multiple statements with arbitrary alignment combinations. Loop peeling is used to reduce the computational overhead associated with misaligned data. A loop prologue and epilogue are peeled from individual iterations in the simdized loop, and vector-splicing instructions are applied to the peeled iterations, while the steady-state loop body incurs no additional computational overhead.

Type: Grant

Filed: August 16, 2004

Date of Patent: July 1, 2008

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
System and method for applying an action initiated for a portion of a plurality of devices to all of the plurality of devices

Patent number: 7392329

Abstract: In accordance with one embodiment of the present invention, a method of applying an action initiated for a portion of a plurality of devices to all of the plurality of devices is provided. The method comprises establishing a status block for a plurality of devices that are implemented on a system, and initiating an action for a portion of the plurality of devices. The method further comprises writing information to the status block identifying that the action was initiated, and based at least in part on the information written to the status block, applying the action to all of the plurality of devices.

Type: Grant

Filed: March 28, 2003

Date of Patent: June 24, 2008

Assignee: Hewlett-Packard Devopment, L.P.

Inventors: Scott Lynn Michaelis, Marvin J. Spinhirne
Cross multiply and add instruction and multiply and subtract instruction SIMD execution on real and imaginary components of a plurality of complex data elements

Patent number: 7392368

Abstract: Methods and apparatus for calculating Single-Instruction-Multiple-Data (SIMD) complex arithmetic. A coprocessor instruction has a format identifying a multiply and subtract instruction to generate real components for complex multiplication of first operand complex data and corresponding second operand complex data, a cross multiply and add instruction to generate imaginary components for complex multiplication of the first operand complex data and the corresponding second operand complex data, an add-subtract instruction to add real components of the first operand to imaginary components of the second operand and to subtract real components of the second operand from imaginary components of the first operand, and a subtract-add instruction to subtract the real components of the second operand from the imaginary components of the first operand and to add the real components of the first operand to the imaginary components of the second operand.

Type: Grant

Filed: June 30, 2005

Date of Patent: June 24, 2008

Assignee: Marvell International Ltd.

Inventors: Moinul H. Khan, Nigel C. Paver, Bradley C. Aldrich
Carry/Borrow Handling

Publication number: 20080148011

Abstract: The present disclosure provides a system and method for performing carry/borrow handling. A method according to one embodiment may include generating a first result having a first carry or borrow from a first mathematical operation and storing the first carry or borrow and a first pointer address in a temporary register. The method may further include generating a second result having a second carry or borrow from a second mathematical operation and calling a subroutine configured to perform carry and borrow handling. The method may also include copying the first pointer address from the temporary register into a global variable. Of course, many alternatives, variations and modifications are possible without departing from this embodiment.

Type: Application

Filed: December 14, 2006

Publication date: June 19, 2008

Applicant: INTEL CORPORATION

Inventors: Vinodh Gopal, Gilbert M. Wolrich, Gunnar Gaubatz, Daniel Cutter, Wajdi Feghali, Kaan Yuksel, Erdinc Ozturk
Apparatus and method for performing rearrangement and arithmetic operations on data

Publication number: 20080140750

Abstract: An apparatus and method are provided for performing rearrangement operations and arithmetic operations on data. The data processing apparatus has processing circuitry for performing SIMD processing operations and scalar processing operations, a register bank for storing data and control circuitry responsive to program instructions to control the processing circuitry to perform data processing operations. The control circuitry is arranged to responsive to a combined rearrangement arithmetic instruction to control the processing circuitry to perform a rearrangement operation and at least one SIMD arithmetic operation on a plurality of data elements stored in the register bank. The rearrangement operation is configurable by a size parameter derived at least in part from the register bank. The size parameter provides an indication of a number of data elements forming a rearrangement element for the purposes of the rearrangement operation.

Type: Application

Filed: November 29, 2007

Publication date: June 12, 2008

Inventors: Daniel Kershaw, Mladen Wilder, Dominic Hugo Symes
Cellular engine for a data processing system

Patent number: 7383421

Abstract: A data processing system includes an associative memory device containing n-cells, each of the n-cells includes a processing circuit. A controller is utilized for issuing one of a plurality of instructions to the associative memory device, while a clock device is utilized for outputting a synchronizing clock signal comprised of a predetermined number of clock cycles per second. The clock device outputs the synchronizing clock signal to the associative memory device and the controller which globally communicates one of the plurality of instructions to all of the n-cells simultaneously, within one of the clock cycles.

Type: Grant

Filed: December 4, 2003

Date of Patent: June 3, 2008

Assignee: Brightscale, Inc.

Inventors: Gheorghe Stefan, Dan Tomescu
Vector co-processor for configurable and extensible processor architecture

Patent number: 7376812

Abstract: A processor can achieve high code density while allowing higher performance than existing architectures, particularly for Digital Signal Processing (DSP) applications. In accordance with one aspect, the processor supports three possible instruction sizes while maintaining the simplicity of programming and allowing efficient physical implementation. Most of the application code can be encoded using two sets of narrow size instructions to achieve high code density. Adding a third (and larger, i.e. VLIW) instruction size allows the architecture to encode multiple operations per instruction for the performance critical section of the code. Further, each operation of the VLIW format instruction can optionally be a SIMD operation that operates upon vector data. A scheme for the optimal utilization (highest achievable performance for the given amount of hardware) of multiply-accumulate (MAC) hardware is also provided.

Type: Grant

Filed: May 13, 2002

Date of Patent: May 20, 2008

Assignee: Tensilica, Inc.

Inventors: Himanshu A. Sanghavi, Earl A. Killian, James Robert Kennedy, Darin S. Petkov, Peng Tu, William A. Huffman
Processing for associated data size saturation flag history stored in SIMD coprocessor register using mask and test values

Patent number: 7373488

Abstract: A method and apparatus for calculation and storage of Single-Instruction-Multiple-Data (SIMD) saturation history information. A first coprocessor instruction has a first format identifying a saturating operation, a first source having packed data elements and a second source having packed data elements. The saturating operation is executed on the packed data elements of the first and second sources. Saturation flags are stored in the Wireless Coprocessor Saturation Status Flag (wCSSF) register to indicate if a result of the saturating operation saturated. A second coprocessor instruction has a second format identifying a saturation history processing operation and a saturation data size. An operand for the processing operation is determined based on the saturation data size, and the processing operation is executed on the saturation flags and the operand for the saturation data size. Condition code flags are stored in a status register to indicate the result of processing operation.

Type: Grant

Filed: April 30, 2007

Date of Patent: May 13, 2008

Assignee: Marvell International Ltd.

Inventors: Nigel C. Paver, Bradley C. Aldrich
Framework for integrated intra- and inter-loop aggregation of contiguous memory accesses for SIMD vectorization

Patent number: 7367026

Abstract: A method, computer program product, and information handling system for generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop contains multiple non-stride-one memory accesses that operate over a contiguous stream of memory is disclosed. A preferred embodiment identifies groups of isomorphic statements within a loop body where the isomorphic statements operate over a contiguous stream of memory over the iteration of the loop. Those identified statements are then converted into virtual-length vector operations. Next, the hardware's available vector length is used to determine a number of virtual-length vectors to aggregate into a single vector operation for each iteration of the loop. Finally, the aggregated, vectorized loop code is converted into SIMD operations.

Type: Grant

Filed: August 16, 2004

Date of Patent: April 29, 2008

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
Memory access consolidation for SIMD processing elements having access indicators

Patent number: 7363472

Abstract: A data processing apparatus includes a SIMD (Single Instruction Multiple Data) array (10) of processing elements. The processing elements are operably divided into a plurality of processing blocks, the processing blocks being operable to process respective groups of data items.

Type: Grant

Filed: October 9, 2001

Date of Patent: April 22, 2008

Assignee: Clearspeed Technology Limited

Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer

prev … 5 6 7 8 9 10 11 12 13 next