Single Instruction, Multiple Data (simd) Patents (Class 712/22)
-
Publication number: 20090125702Abstract: A single instruction, multiple data (SIMD) processor including a plurality of addressing register sets, used to flexibly calculate effective operand source and destination memory addresses is disclosed. Two or more address generators calculate effective addresses using the register sets. Each register set includes a pointer register, and a scale register. An address generator forms effective addresses from a selected register set's pointer register and scale register; and an offset. For example, the effective memory address may be formed by multiplying the scale value by an offset value and summing the pointer and the scale value multiplied by the offset value.Type: ApplicationFiled: August 29, 2008Publication date: May 14, 2009Applicant: ATI Technologies Inc.Inventors: Richard J. Selvaggi, Larry A. Pearlstein
-
Patent number: 7526630Abstract: A controller operable to control an array of processing elements comprises a retrieval unit operable to retrieve instruction items for each of a plurality of instructions streams, each instruction stream having a plurality of instructions items, a combining unit operable to combine the plurality of instruction streams into a serial instruction stream, and a distribution unit operable to distribute the serial instruction stream to an array of processing elements.Type: GrantFiled: January 4, 2007Date of Patent: April 28, 2009Assignee: Clearspeed Technology, PLCInventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russel David, Ray McConnell, Tim Day, Trey Greer
-
Publication number: 20090106528Abstract: To reduce the required amount of program codes when processing the whole image in a one-dimensional SIMD parallel image processing system having a smaller number of PEs than the number of pixels in the width direction of the image to be processed. A controller for controlling a PE array includes a command repetitive-execution part, which includes an operand converting part, a memory address converting part, and an operation code converting part. When a command fetching/decoding part reads and executes program codes stored in a program memory, the repetitive-execution part determines the program codes to cause the operand converting part, memory address converting part and operation code converting part to perform conversions in accordance with the command, thereby performing a repetitive execution of the one-command program description adaptive to a plurality of related pixels assigned to the PEs, whereby the program code amount can be reduced.Type: ApplicationFiled: December 5, 2006Publication date: April 23, 2009Inventor: Takuya Koga
-
Patent number: 7516299Abstract: A method for transferring data from a general purpose register (GPR) to a vector register (VR), the method including vectorially combining data in the VR from the GPR, by executing instructions of a PowerPC Instruction Set Architecture (ISA), the step of combining including splatting a low nibble from the GPR into a low nibble in each element of a first VR by executing two “load vector for shift left” (lvsl) or “load vector for shift right” (lvsr) and one “vector subtract unsigned byte modulo” (vsububm), shifting a high nibble of the GPR into a low nibble the GPR, splatting the low nibble of the GPR into a low nibble in each element of a second VR by re-executing the two lvsl or lvsr and one vsububm instructions, shifting the low nibble of the second VR into a high nibble of the second VR and combining both first and second VRs into one VR.Type: GrantFiled: August 29, 2005Date of Patent: April 7, 2009Assignee: International Business Machines CorporationInventors: Daniel Citron, Ayal Zaks
-
Patent number: 7509602Abstract: A logic simulation acceleration processor optimized for multi-value logic level simulation of electronic systems described in hardware description languages.Type: GrantFiled: January 25, 2006Date of Patent: March 24, 2009Assignee: Eve S.A.Inventors: Subbu Ganesan, Leonid Alexander Broukhis, Ramesh Narayanaswamy, Ian Michael Nixon, Thomas Hanni Spencer
-
Patent number: 7506136Abstract: A controller for controlling a data processor having a plurality of processor arrays, each of which includes a plurality of processing elements, comprises a retrieval unit operable to retrieve a plurality of incoming instructions streams in parallel with one another, and a distribution unit operable to supply such incoming instruction streams to respective ones of the said plurality of processor arrays.Type: GrantFiled: January 10, 2007Date of Patent: March 17, 2009Assignee: Clearspeed Technology PLCInventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
-
Patent number: 7506135Abstract: The present invention provides histogram calculation for images and video applications using a SIMD and VLIW processor with vector Look-Up Table (LUT) operations. This provides a speed up of histogram calculation by a factor of N times over a scalar processor where the SIMD processor could perform N LUT operations per instruction. Histogram operation is partitioned into a vector LUT operation, followed by vector increment, vector LUT update, and at the end by reduction of vector histogram components. The present invention could be used for intensity, RGBA, YUV, and other type of multi-component images.Type: GrantFiled: May 20, 2003Date of Patent: March 17, 2009Inventor: Tibet Mimar
-
Patent number: 7496673Abstract: A computer architecture and programming model for high speed processing over broadband networks are provided. The architecture employs a consistent modular structure, a common computing module and uniform software cells. The common computing module includes a control processor, a plurality of processing units, a plurality of local memories from which the processing units process programs, a direct memory access controller and a shared main memory. A synchronized system and method for the coordinated reading and writing of data to and from the shared main memory by the processing units also are provided. A hardware sandbox structure is provided for security against the corruption of data among the programs being processed by the processing units. The uniform software cells contain both data and applications and are structured for processing by any of the processors of the network. Each software cell is uniquely identified on the network.Type: GrantFiled: February 24, 2005Date of Patent: February 24, 2009Assignee: International Business Machines CorporationInventors: Michael Karl Gschwind, Harm Peter Hofstee, Martin E. Hopkins, James Allan Kahle
-
Patent number: 7490190Abstract: A single instruction, multiple data (“SIMD”) computer system includes a central control unit coupled to 256 processing elements (“PEs”) and to 32 static random access memory (“SRAM”) devices. Each group of eight PEs can access respective groups of eight columns in a respective SRAM device. Each PE includes a local column address register that can be loaded through a data bus of the respective PE. A local column address stored in the local column address register is applied to an AND gate, which selects either the local column address or a column address applied to the AND gate by the central control unit. As a result, the central control unit can globally access the SRAM device, or a specific one of the eight columns that can be accessed by each PE can be selected locally by the PE.Type: GrantFiled: October 5, 2006Date of Patent: February 10, 2009Assignee: Micron Technology, Inc.Inventor: Jon Skull
-
Patent number: 7483595Abstract: An image processing method and device for processing multiple rows of pixels of an image simultaneously with a single instruction. The processing includes selecting a pixel window having a plurality of pixels of an image spanning across multiple rows and columns, building vertical and horizontal load registers to include the plurality of pixels of the selected pixel window, and simultaneously processing selected pixels of the plurality of pixels included in the vertical and horizontal load registers using a single instruction, wherein the vertical and horizontal load registers are shifted when the selected pixels are processed. Accordingly, a method and device for efficient processing of an image is provided.Type: GrantFiled: September 16, 2004Date of Patent: January 27, 2009Assignee: Marvell International Technology Ltd.Inventors: Douglas Gene Keithley, Roy Gideon Moss
-
Patent number: 7484076Abstract: Methods, apparatuses, and systems are presented for performing instructions using multiple execution units in a graphics processing unit involving issuing an instruction for P executions of the instruction wherein each execution uses different data, P being a positive integer, the instruction being issued based on a first clock having a first clock rate, operating Q execution units to achieve the P executions of the instruction, Q being a positive integer less than P and greater than one, each of the execution units being operated based on a second clock having a second clock rate higher than the first clock rate of the first clock, and wherein the second clock rate of the second clock is equal to the first clock rate of the first clock multiplied by the ratio P/Q.Type: GrantFiled: September 18, 2006Date of Patent: January 27, 2009Assignee: Nvidia CorporationInventors: Stuart F. Oberman, Ming Y. Siu, Sameer D. Halepete
-
Publication number: 20090024832Abstract: The invention is based on the task to undertake machine descriptions, with which an automated optimal hardware design of SIMD processors can be carried out. This is solved by the fact that functional units are selected from a criterion in the machine description, which is vector processible. A first or second reduced functional unit are selectively defined from a respective vector-processing functional unit, in which the reduced functional units process only a data element of a vectoral value. All reduced functional units, which use common control signals for the processing of a respective data element belonging to the vectoral value, are condensed to a disk. Reduced functional units, which process the same data elements in a sequence at least indirectly, are condensed to a disk module. The disk is reproduced with the contained reduced functional units so often that all reduced functional units represent the functionality of their respective selected vector-processing functional unit.Type: ApplicationFiled: November 23, 2004Publication date: January 22, 2009Inventor: Gordon Cichon
-
Patent number: 7480785Abstract: A row decoding circuit (171) outputs a select signal to a row set in a row range setting unit (172) to select a select signal line (103), processing results from processing circuits (102) on this row are output to a data output line (104), and a row adder (106) adds processing results output to a data output line (104) of a column set in a column range selector (105).Type: GrantFiled: February 13, 2004Date of Patent: January 20, 2009Assignee: Nippon Telegraph and Telephone CorporationInventors: Toshishige Shimamura, Hiroki Morimura, Koji Fujii, Satoshi Shigematsu, Katsuyuki Machida
-
Publication number: 20090013152Abstract: A processor capable of performing a filter processing in a high speed is provided. A computing unit comprises a computer for performing a filter processing. Data supply to the computer is performed by an internal register configured by a flip-flop. Data read from the internal register is outputted to a shift register and the data is supplied to the computer per cycle. And, the computing unit comprises a mechanism for changing a filter computing direction according to a motion vector, thereby preventing performance lowering due to branched command by performing a horizontal filtering and a vertical filtering by a same command.Type: ApplicationFiled: July 7, 2008Publication date: January 8, 2009Inventors: Masakazu EHAMA, Koji Hosogi, Seiji Mochizuki
-
Publication number: 20090013150Abstract: A disclosed SIMD microprocessor includes plural processor elements each having n arithmetic circuits and n registers configured to temporarily store data pieces to be input to the arithmetic circuits, n being a natural number equal to or greater than 2, and; a control circuit configured to determine an arrangement order of the processor elements and an arrangement order of the arithmetic circuits in the processor elements and determine whether to use the n arithmetic circuits as a single arithmetic circuit or as n arithmetic circuits. Each processor element further includes n shifter pairs each including a PE shifter and a bit shifter; and n shift data selection circuits configured to select arbitrary data pieces from the data pieces in the shifter pairs, perform bit extension on the data pieces, and transfer the data pieces to the arithmetic circuits.Type: ApplicationFiled: June 24, 2008Publication date: January 8, 2009Inventor: TOSHIKI YAMANAKA
-
Publication number: 20090013151Abstract: An SIMD type microprocessor is disclosed. The SIMD type microprocessor includes plural PEs (processor elements) each of which provides an ALU (arithmetic and logic unit) for lower-order bits, an ALU for upper-order bits, a control circuit for lower-order bits, a control circuit for upper-order bits, a range determining circuit for lower-order bits, and a range determining circuit for upper-order bits. The SIMD type microprocessor further includes a global processor, a range designation bus for lower-order bits which connects the global processor to the range determining circuit for lower-order bits, and a range designation bus for upper-order bits which connects the global processor to the range determining circuit for upper-order bits.Type: ApplicationFiled: July 3, 2008Publication date: January 8, 2009Inventor: KAZUHIKO HARA
-
Publication number: 20080320273Abstract: A single instruction multiple data (SIMD) processor (1) comprises a processing element array (10) including a plurality of processing elements (PEO . . . PEN), and a memory array (14) operably divided into memory portions (141 . . . 14N), each memory portion being assigned to a particular processing element. A first processing element (PEN) is operable to access a portion of the memory array (14) assigned to that first processing element and also to access a portion of the memory array assigned to a second processing element. Such access is made using an index value indicative of the processing element assigned to the memory position to be accessed.Type: ApplicationFiled: September 8, 2005Publication date: December 25, 2008Applicant: KONINKLIJKE PHILIPS ELECTRONICS N.V.Inventors: Anteneh A. Abbo, Leo Sevat, Richard P. Kleihorst
-
Publication number: 20080313423Abstract: An information processing system includes a plurality of PMM and data transmission paths for connection between the PMM and transmitting a value of a PMM to another PMM. A memory of each PMM holds a list of values of first items arranged in the ascending order or descending order without overlap and/or a list of values of the second item to be shared. A memory module of each PMM transmits a value contained in the value list to another PMM, receives a value contained in the value list from the another PMM, references the value list of the first item and the value list of the second item of the another PMM, and generates a list of common values considering the values contained in the value lists of the first item and the second item of all the other PMM.Type: ApplicationFiled: January 25, 2005Publication date: December 18, 2008Inventor: Shinji Furusho
-
Patent number: 7467286Abstract: A method and apparatus are provided for executing packed data instructions. According to one aspect of the invention, a processor includes registers, a register renaming unit coupled to the registers, a decoder coupled to the register renaming unit, and a partial-width execution unit coupled to the decoder. The register renaming unit provides an architectural register file to store packed data operands that include data elements. The decoder is to decode a first and second set of instructions that each specify one or more registers in the architectural register file. Each of the instructions in the first set specify operations to be performed on all of the data elements. In contrast, each of the instructions in the second set specify operations to be performed on only a subset of the data elements. The partial-width execution unit is to execute operations specified by either the first or second set of instructions.Type: GrantFiled: May 9, 2005Date of Patent: December 16, 2008Assignee: Intel CorporationInventors: Mohammad Abdallah, James Coke, Vladimir Pentkovski, Patrice Roussel, Shreekant S. Thakkar
-
Patent number: 7467288Abstract: A system and method for processing operations that use data vectors each comprising a plurality of data elements, in accordance with the present invention, includes a vector data file comprising a plurality of storage elements for storing data elements of the data vectors. A pointer array is coupled by a bus to the vector data file. The pointer array includes a plurality of entries wherein each entry identifies at least one storage element in the vector data file. The at least one storage element stores at least one data element of the data vectors, wherein for at least one particular entry in the pointer array, the at least one storage element identified by the particular entry has an arbitrary starting address in the vector data file.Type: GrantFiled: November 15, 2003Date of Patent: December 16, 2008Assignee: International Business Machines CorporationInventors: Clair John Glossner, III, Erdem Hokenek, David Meltzer, Mayan Moudgill
-
Publication number: 20080301403Abstract: An apparatus including a first circuit and a second circuit. The first circuit may be configured to generate one or more command signals, a read data path control signal and one or more write data path control signals in response to an integrity protection control signal and one or more arbitration signals. The second circuit may be configured to write data to a memory and read data from the memory in response to the one or more command signals, the read data path control signal and the one or more write data path control signals. In a first mode, the data may be written and read without integrity protection. In a second mode the data may be written and read with integrity protection, and the integrity protection is written and read separately from the data.Type: ApplicationFiled: May 29, 2007Publication date: December 4, 2008Inventors: Eskild T Arntzen, Jackson L. Ellis
-
Patent number: 7460989Abstract: A method is provided, wherein a virtual internal master clock is used in connection with a RISC CPU. The RISC CPU comprises a number of concurrently operating function units, wherein each unit runs according to its own clocks, including multiple-stage totally unsynchronized clocks, in order to process a stream of instructions. The method includes the steps of generating a virtual model master clock having a clock cycle, and initializing each of the function units at the beginning of respectively corresponding processing cycles. The method further includes operating each function unit during a respectively corresponding processing cycle to carry out a task with respect to one of the instructions, in order to produce a result. Respective results are all evaluated in synchronization, by means of the master clock. This enables the instruction processing operation to be modeled using a sequential computer language, such as C or C++.Type: GrantFiled: October 14, 2004Date of Patent: December 2, 2008Assignee: International Business Machines CorporationInventor: Oliver Keren Ban
-
Publication number: 20080294871Abstract: A processor architecture includes a number of processing elements for treating input signals. The architecture is organized according to a matrix including rows and columns, the columns of which each include at least one microprocessor block having a computational part and a set of associated processing elements that are able to receive the same input signals. The number of associated processing elements is selectively variable in the direction of the column so as to exploit the parallelism of said signals. Additionally the processor architecture of the present invention enable dynamic switching between instruction parallelism and data parallel processing typical of vectorial functionality. The architecture can be scaled in various dimensions in an optimal configuration for the algorithm to be executed.Type: ApplicationFiled: May 30, 2008Publication date: November 27, 2008Applicant: STMICROELECTRONICS S.R.L.Inventors: Francesco Pappalardo, Giuseppe Notarangelo, Elio Guidetti
-
Patent number: 7457941Abstract: A vector processing system for executing vector instructions, each instruction defining multiple pairs of values, an operation to be executed on each of said value pairs and a scalar modifier, the vector processing system comprising a plurality of parallel processing units, each arranged to receive one of said pairs of values and to implement the defined operation on said value pair to generate a respective result; and a scalar result unit for receiving the results of the parallel processing units and for using said results in a manner defined by the scalar modifier to generate a single output value for said instruction.Type: GrantFiled: January 3, 2006Date of Patent: November 25, 2008Assignee: Broadcom CorporationInventors: Stephen Barlow, Neil Bailey, Timothy Ramsdale, David Plowman, Robert Swann
-
Patent number: 7454594Abstract: A processor and its arithmetic instruction processing method and arithmetic operation control method are disclosed that add a new operand designation option to SIMD arithmetic instructions and permit software pipelining between arithmetic operations performed in parallel by a SIMD arithmetic unit. A selector for adding an operation for interchanging multiple outputs of a SIMD arithmetic unit is added to a data path. A register file is divided in accordance with the output bit fields of the SIMD arithmetic unit. A means of specifying multiple registers as a SIMD instruction's output operand is added. Therefore, part of the output results of arithmetic operations performed in parallel by the SIMD arithmetic unit can be stored in a register providing the input for another arithmetic operation. Software pipelining is rendered achievable in this manner.Type: GrantFiled: December 17, 2002Date of Patent: November 18, 2008Assignee: Renesas Technology Corp.Inventor: Yuki Kondoh
-
Patent number: 7453882Abstract: One embodiment of the present invention provides a system that asynchronously controls the sending of data items from a sender to a receiver. The system includes a data path between the sender and the receiver, a first control path between the sender and the receiver, and a second control path between the sender and the receiver. The first control path and the second control path alternately control the asynchronous transmission of consecutive data items on the data path between the sender and the receiver.Type: GrantFiled: August 25, 2004Date of Patent: November 18, 2008Assignee: Sun Microsystems, Inc.Inventors: Ronald Ho, Jonathan K. Gainsley, Robert J. Drost
-
Patent number: 7454749Abstract: A virtual parallel computer is created within a programming environment comprising both shared memory and distributed memory architectures. At run time, the virtual architecture is mapped to a physical hardware architecture. In this manner, a massively parallel computing program may be developed and tested on a first architecture and run on a second architecture without reprogramming.Type: GrantFiled: November 12, 2002Date of Patent: November 18, 2008Assignee: Engineered Intelligence CorporationInventor: Matthias Oberdorfer
-
Patent number: 7451294Abstract: A method and apparatus for a two micro-operation flow using source override. In one embodiment, the method includes the identification of a macro-instruction having one or more streaming single instruction multiple data extension type operands. Once received, the macro-instruction is decoded into a first micro-operation (uOP) and a second uOP. Once decoded, a signal is asserted to disable source operand override logic if the first micro-operation updates a logical destination register that matches a logical source register of the micro-operation. Otherwise, the mutual source override is active and executed by a register alias table (RAT) when uOP with matching logic source and destination register are detected in a same clock cycle. In doing so, macro-instructions having 128-bit operands may be processed using, for example, two uOPs (one for the lower half and one for the upper half) in a 64-bit implementation, while preserving the atomicity of the original instruction.Type: GrantFiled: July 30, 2003Date of Patent: November 11, 2008Assignee: Intel CorporationInventors: Zeev Sperber, Yuval Bustan, Robert Valentine
-
Patent number: 7447873Abstract: In a multithreaded processing core, groups of threads are executed using single instruction, multiple data (SIMD) parallelism by a set of parallel processing engines. Input data defining objects to be processed received as a stream of input data blocks, and the input data blocks are loaded into a local register file in the core such that all of the data for one of the input objects is accessible to one of the processing engines. The input data can be loaded directly into the local register file, or the data can be accumulated in a buffer and loaded after accumulation, for instance during a launch operation for a SIMD group. Shared input data can also be loaded into a shared memory in the processing core.Type: GrantFiled: November 29, 2005Date of Patent: November 4, 2008Assignee: NVIDIA CorporationInventor: Bryon S. Nordquist
-
Patent number: 7444496Abstract: An apparatus, system, and method are disclosed for determining the consistency of a database including indirect reference to data elements. There is provided an apparatus for determining consistency of a database. This database includes, in association with each data element, an indirect list element including a storage address of the associated or corresponding data element so that other data elements can reference that data element. This apparatus reads, from each data element, identification information of an indirect list element corresponding to that data element and generates a hash value. This apparatus further reads, from each data element, identification information of an indirect list element corresponding to a data element referenced by that data element and generates a hash value. On condition that these hash values are equivalent to each other, the apparatus determines that the database is consistent.Type: GrantFiled: September 27, 2006Date of Patent: October 28, 2008Assignee: International Business Machines CorporationInventors: Tatsuyuki Shiomi, Shigeko Mori, Takashi Yonezawa
-
Patent number: 7441098Abstract: A method of executing instructions in a computer system on operands containing a plurality of packed objects in respective lanes of the operand is described. Each instruction defines an operation and contains a condition setting indicator settable independently of the operation. The status of the condition setting indicator determines whether or not multibit condition codes are set. When they are to be set, they are set depending on the results for carrying out the operation for each lane.Type: GrantFiled: May 6, 2005Date of Patent: October 21, 2008Assignee: Broadcom CorporationInventor: Sophie Wilson
-
Patent number: 7441099Abstract: Methods and apparatuses for processing a Configurable Single-Instruction-Multiple-Data (CSIMD) instruction are disclosed. In the method, a lookup table (LUT) storing information is provided to support random access of memory locations associated with a plurality of processing elements (PEs) and to perform instruction variances by the PEs. A CSIMD instruction is received, comprising a command and an index to the lookup table (LUT), to be executed by the PEs. The command of the received CSIMD instruction is executed in parallel differently by the PEs using the LUT index to randomly access the memory locations.Type: GrantFiled: October 3, 2006Date of Patent: October 21, 2008Assignee: Hong Kong Applied Science and Technology Research Institute Company LimitedInventors: Wing Yee Lo, Simon Moy
-
Publication number: 20080215851Abstract: A method is provided for the functional control of program and/or data flows in digital signal processors and processors, which have respective closed and separated modules for program and data flow control, working in parallel with computers. The method enables a power-efficient adaptation of the signal processing with the applied SIMD command-type in the individual paths and minimizes the emergence of the appearance of NOP-commands with which the VLIW-architecture of the processor must be supplied. The adaptation of the signal processing is achieved by individually controlling the parallel signal processing of the processor in the data paths (DP) which respectively belong to a first and second slice. For this purpose, a single slice halt outputted from an SSM register bank switches the register clockline according to state-dependent signal processing.Type: ApplicationFiled: May 5, 2008Publication date: September 4, 2008Inventors: Uwe Porst, Wolfram Drescher
-
Publication number: 20080209164Abstract: A microprocessor architecture comprises a plurality of processing elements arranged in a single instruction multiple data SIMD array, wherein each processing element includes a plurality of execution units, each of which is operable to process an instruction of a particular instruction type, a serial processor which includes a plurality of execution units, each of which is operable to process an instruction of a particular instruction type, and an instruction controller operable to receive a plurality of instructions, and to distribute received instructions to the execution units in dependence upon the instruction types of the received instruction. The execution units of the serial processor are operable to process respective instructions in parallel.Type: ApplicationFiled: February 7, 2006Publication date: August 28, 2008Inventor: Leon David Wildman
-
Publication number: 20080209165Abstract: A SIMD microprocessor, which can be included in an image processing apparatus using an image processing method used therein, includes a global processor and multiple processor elements controlled by the global processor. Each single processor element of the multiple processor elements includes multiple operation units. The global processor is configured to control the multiple processing elements to uniformly change a configuration of the multiple operation units in the single processor element to determine a number of data units of operation according to the multiple operation units either operated individually or in cooperation with each other in the single processor element and a width of data processed per data unit of operation performed in the single processor element. A processor element number is assigned per data unit of operation to the single processor element to use for executing an operation.Type: ApplicationFiled: February 27, 2008Publication date: August 28, 2008Inventor: Tomoaki Ozaki
-
Patent number: 7412587Abstract: A processor having a plurality of processing elements and a decoder operable to decode an instruction. Each of the plurality of processing elements includes: a transfer pattern storage unit operable to store a transfer pattern value that indicates a processing element from which data is transferred; a transfer unit operable to perform a data transfer from the processing element indicated by the transfer pattern value; and an update unit operable to update the transfer pattern value stored in the transfer pattern storage unit, in accordance with a result of decoding a latest instruction by the decoder.Type: GrantFiled: February 9, 2005Date of Patent: August 12, 2008Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Takeshi Tanaka, Hideshi Nishida, Masashi Hoshino, Takeshi Furuta
-
Patent number: 7401204Abstract: A parallel processor performs efficient parallel processing of one or more basic instructions contained in each of a plurality of instruction words delimited by instruction delimiting information. The processor includes: a plurality of instruction execution units performing processes in accordance with corresponding, supplied basic instructions in parallel; an instruction fetch unit fetching the instruction words one by one in accordance with the instruction delimiting information; and an instruction issue unit recognizing and, in accordance therewith, selecting each of the basic instructions contained in each of the instruction words fetched by the instruction fetch unit to a corresponding instruction execution unit to execute the basic instruction.Type: GrantFiled: September 1, 2000Date of Patent: July 15, 2008Assignee: Fujitsu LimitedInventors: Hideo Miyake, Atsuhiro Suga, Yasuki Nakamura, Yoshimasa Takebe
-
Publication number: 20080162873Abstract: In some embodiments, the invention involves a system and method to provide maximal boot-time parallelism for future multi-core, multi-node, and many-core systems. In an embodiment, the security (SEC), pre-EFI initialization (PEI), and then driver execution environment (DXE) phases are executed in parallel on multiple compute nodes (sockets) of a platform. Once the SEC/PEI/DXE phases are executed on all compute nodes having a processor, the boot device select (BDS) phase completes the boot by merging or partitioning the compute nodes based on a platform policy. Partitioned compute nodes each run their own instance of EFI. A common memory map may be generated prior to operating system (OS) launch when compute nodes are to be merged. Other embodiments are described and claimed.Type: ApplicationFiled: December 28, 2006Publication date: July 3, 2008Inventors: Vincent J. Zimmer, Yufu Li, Michael A. Rothman, Burges M. Karkaria
-
Publication number: 20080162874Abstract: A data transfer controller for controlling transfer of data items in a data processing system comprising a single instruction multiple data (SIMD) array of processing elements is disclosed. The controller comprises a transfer controller operable to control transfer of data to and/or from an internal memory unit of a processing element in said array, each processing element including a processing unit and an internal memory unit, the transfer controller being operable such that data transfer to and/or from the internal memory unit is performed independently of the operation of the processing unit of the processing element concerned. Operation by said processing unit on a predetermined type of instruction may be blocked until after said data transfer is complete or, if said data transfer started after said operation commenced, said data transfer may be blocked until after said operation is complete.Type: ApplicationFiled: June 19, 2007Publication date: July 3, 2008Inventors: Dave STUTTARD, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhodes, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
-
Publication number: 20080162875Abstract: A method of controlling access to memory by a processing element in a plurality of processing elements arranged in a single instruction multiple data (SIMD) processing array is disclosed. Each processing element includes an internal memory unit, and a register file. The method comprises retrieving an address value from the register file of the processing element, the address value relating to an address in the internal memory of the processing element, and accessing the internal memory on the basis of the address value.Type: ApplicationFiled: July 6, 2007Publication date: July 3, 2008Inventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer
-
Patent number: 7395531Abstract: A system and method is provided for vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores. In this framework, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirements of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residual iteration counts, and multiple statements with arbitrary alignment combinations. Loop peeling is used to reduce the computational overhead associated with misaligned data. A loop prologue and epilogue are peeled from individual iterations in the simdized loop, and vector-splicing instructions are applied to the peeled iterations, while the steady-state loop body incurs no additional computational overhead.Type: GrantFiled: August 16, 2004Date of Patent: July 1, 2008Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
-
Patent number: 7392329Abstract: In accordance with one embodiment of the present invention, a method of applying an action initiated for a portion of a plurality of devices to all of the plurality of devices is provided. The method comprises establishing a status block for a plurality of devices that are implemented on a system, and initiating an action for a portion of the plurality of devices. The method further comprises writing information to the status block identifying that the action was initiated, and based at least in part on the information written to the status block, applying the action to all of the plurality of devices.Type: GrantFiled: March 28, 2003Date of Patent: June 24, 2008Assignee: Hewlett-Packard Devopment, L.P.Inventors: Scott Lynn Michaelis, Marvin J. Spinhirne
-
Patent number: 7392368Abstract: Methods and apparatus for calculating Single-Instruction-Multiple-Data (SIMD) complex arithmetic. A coprocessor instruction has a format identifying a multiply and subtract instruction to generate real components for complex multiplication of first operand complex data and corresponding second operand complex data, a cross multiply and add instruction to generate imaginary components for complex multiplication of the first operand complex data and the corresponding second operand complex data, an add-subtract instruction to add real components of the first operand to imaginary components of the second operand and to subtract real components of the second operand from imaginary components of the first operand, and a subtract-add instruction to subtract the real components of the second operand from the imaginary components of the first operand and to add the real components of the first operand to the imaginary components of the second operand.Type: GrantFiled: June 30, 2005Date of Patent: June 24, 2008Assignee: Marvell International Ltd.Inventors: Moinul H. Khan, Nigel C. Paver, Bradley C. Aldrich
-
Publication number: 20080148011Abstract: The present disclosure provides a system and method for performing carry/borrow handling. A method according to one embodiment may include generating a first result having a first carry or borrow from a first mathematical operation and storing the first carry or borrow and a first pointer address in a temporary register. The method may further include generating a second result having a second carry or borrow from a second mathematical operation and calling a subroutine configured to perform carry and borrow handling. The method may also include copying the first pointer address from the temporary register into a global variable. Of course, many alternatives, variations and modifications are possible without departing from this embodiment.Type: ApplicationFiled: December 14, 2006Publication date: June 19, 2008Applicant: INTEL CORPORATIONInventors: Vinodh Gopal, Gilbert M. Wolrich, Gunnar Gaubatz, Daniel Cutter, Wajdi Feghali, Kaan Yuksel, Erdinc Ozturk
-
Publication number: 20080140750Abstract: An apparatus and method are provided for performing rearrangement operations and arithmetic operations on data. The data processing apparatus has processing circuitry for performing SIMD processing operations and scalar processing operations, a register bank for storing data and control circuitry responsive to program instructions to control the processing circuitry to perform data processing operations. The control circuitry is arranged to responsive to a combined rearrangement arithmetic instruction to control the processing circuitry to perform a rearrangement operation and at least one SIMD arithmetic operation on a plurality of data elements stored in the register bank. The rearrangement operation is configurable by a size parameter derived at least in part from the register bank. The size parameter provides an indication of a number of data elements forming a rearrangement element for the purposes of the rearrangement operation.Type: ApplicationFiled: November 29, 2007Publication date: June 12, 2008Inventors: Daniel Kershaw, Mladen Wilder, Dominic Hugo Symes
-
Patent number: 7383421Abstract: A data processing system includes an associative memory device containing n-cells, each of the n-cells includes a processing circuit. A controller is utilized for issuing one of a plurality of instructions to the associative memory device, while a clock device is utilized for outputting a synchronizing clock signal comprised of a predetermined number of clock cycles per second. The clock device outputs the synchronizing clock signal to the associative memory device and the controller which globally communicates one of the plurality of instructions to all of the n-cells simultaneously, within one of the clock cycles.Type: GrantFiled: December 4, 2003Date of Patent: June 3, 2008Assignee: Brightscale, Inc.Inventors: Gheorghe Stefan, Dan Tomescu
-
Patent number: 7376812Abstract: A processor can achieve high code density while allowing higher performance than existing architectures, particularly for Digital Signal Processing (DSP) applications. In accordance with one aspect, the processor supports three possible instruction sizes while maintaining the simplicity of programming and allowing efficient physical implementation. Most of the application code can be encoded using two sets of narrow size instructions to achieve high code density. Adding a third (and larger, i.e. VLIW) instruction size allows the architecture to encode multiple operations per instruction for the performance critical section of the code. Further, each operation of the VLIW format instruction can optionally be a SIMD operation that operates upon vector data. A scheme for the optimal utilization (highest achievable performance for the given amount of hardware) of multiply-accumulate (MAC) hardware is also provided.Type: GrantFiled: May 13, 2002Date of Patent: May 20, 2008Assignee: Tensilica, Inc.Inventors: Himanshu A. Sanghavi, Earl A. Killian, James Robert Kennedy, Darin S. Petkov, Peng Tu, William A. Huffman
-
Patent number: 7373488Abstract: A method and apparatus for calculation and storage of Single-Instruction-Multiple-Data (SIMD) saturation history information. A first coprocessor instruction has a first format identifying a saturating operation, a first source having packed data elements and a second source having packed data elements. The saturating operation is executed on the packed data elements of the first and second sources. Saturation flags are stored in the Wireless Coprocessor Saturation Status Flag (wCSSF) register to indicate if a result of the saturating operation saturated. A second coprocessor instruction has a second format identifying a saturation history processing operation and a saturation data size. An operand for the processing operation is determined based on the saturation data size, and the processing operation is executed on the saturation flags and the operand for the saturation data size. Condition code flags are stored in a status register to indicate the result of processing operation.Type: GrantFiled: April 30, 2007Date of Patent: May 13, 2008Assignee: Marvell International Ltd.Inventors: Nigel C. Paver, Bradley C. Aldrich
-
Patent number: 7367026Abstract: A method, computer program product, and information handling system for generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop contains multiple non-stride-one memory accesses that operate over a contiguous stream of memory is disclosed. A preferred embodiment identifies groups of isomorphic statements within a loop body where the isomorphic statements operate over a contiguous stream of memory over the iteration of the loop. Those identified statements are then converted into virtual-length vector operations. Next, the hardware's available vector length is used to determine a number of virtual-length vectors to aggregate into a single vector operation for each iteration of the loop. Finally, the aggregated, vectorized loop code is converted into SIMD operations.Type: GrantFiled: August 16, 2004Date of Patent: April 29, 2008Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
-
Patent number: 7363472Abstract: A data processing apparatus includes a SIMD (Single Instruction Multiple Data) array (10) of processing elements. The processing elements are operably divided into a plurality of processing blocks, the processing blocks being operable to process respective groups of data items.Type: GrantFiled: October 9, 2001Date of Patent: April 22, 2008Assignee: Clearspeed Technology LimitedInventors: Dave Stuttard, Dave Williams, Eamon O'Dea, Gordon Faulds, John Rhoades, Ken Cameron, Phil Atkin, Paul Winser, Russell David, Ray McConnell, Tim Day, Trey Greer