Arithmetic Operation Instruction Processing Patents (Class 712/221)
  • Patent number: 7730287
    Abstract: Methods and software are presented for processing data in a programmable processor, involving (a) decoding instructions for execution using an execution unit operable to execute instructions by partitioning data stored in registers in a register file into multiple data elements, the instructions selected from an instruction set that includes group arithmetic instructions and group data handling instructions, (b) in response to decoding different group data handling instructions, executing group data handling operations that re-arrange data elements in different ways, and (c) in response to decoding different group arithmetic instructions, executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on the multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results.
    Type: Grant
    Filed: July 27, 2007
    Date of Patent: June 1, 2010
    Assignee: Microunity Systems Engineering, Inc.
    Inventors: Craig Hansen, John Moussouris, Alexia Massalin
  • Publication number: 20100122064
    Abstract: A device may include a data processing logic cell field and one or more sequential CPUs. The logic cell field and the CPUs may be configured to be coupled to each other for data exchange. The data exchange may be in block form using lines leading to a cache memory. In a method for operating a reconfigurable unit having runtime-limited configurations, the configurations may be able to increase their maximum allowed runtime, e.g., by triggering a parallel counter. An increase in configuration runtime by the configurations may be suppressed in response to an interrupt.
    Type: Application
    Filed: September 30, 2009
    Publication date: May 13, 2010
    Inventor: MARTIN VORBACH
  • Publication number: 20100115245
    Abstract: A system for detecting and correcting invalid calculation results due to a timing violation. A processor compares results of an instruction simultaneously executed by a first arithmetic pipeline and a second arithmetic pipeline of the processor. In the second arithmetic pipeline, the critical stage of the first arithmetic pipeline is divided to multiple stages. A first result calculated by the first arithmetic pipeline is speculatively executed within the processor. The second arithmetic pipeline calculates a second result. The processor compares the second result to the first result. When the results are identical, the first result is assigned as the final result with a complete status. When the results do not match, the processor replaces the first result with the second result. The processor may then cancel the speculatively executed instruction and issue the second result as a final result. The processor may then restart subsequent instructions using the second result.
    Type: Application
    Filed: October 30, 2008
    Publication date: May 6, 2010
    Inventor: Atsuya Okazaki
  • Publication number: 20100106947
    Abstract: The present application relates to the field of processors and in particular to the carrying out of arithmetic operations. Many of the computations performed by processors consist of a large number of simple operations. As a result, a multiplication operation may take a significant number of clock cycles to complete. The present application provides a processor having a trivial operand register, which is used in the carrying out of arithmetic or storage operations for data values stored in a data store.
    Type: Application
    Filed: March 16, 2008
    Publication date: April 29, 2010
    Inventor: David Moloney
  • Publication number: 20100095306
    Abstract: An arithmetic device simultaneously processes a plurality of threads and may continue the process by minimizing the degradation of the entire performance although a hardware error occurs. An arithmetic device 100 includes: an instruction execution circuit 101 capable of selectively executing a mode in which the instruction sequences of a plurality of threads are executed and a mode in which the instruction sequence of a single thread is executed; and a switch indication circuit 102 instructing the instruction execution circuit 101 to switch a thread mode.
    Type: Application
    Filed: December 15, 2009
    Publication date: April 15, 2010
    Applicant: FUJITSU LIMITED
    Inventors: Norihito GOMYO, Toshio Yoshida, Ryuichi Sunayama
  • Publication number: 20100088493
    Abstract: A restriction is given to the calculation function for image processing achieved by the hard-wired system and the memory access control of a buffer memory, and a range of the restriction is made variable by a program control and others. Data is inputted to the buffer memory from the outside with a restriction of “in units of memory line”, and the number of memory lines and positions of the same to which data is inputted can be programmable by the control circuit. The arithmetic circuit is subjected to the restriction of performing the calculation in units of data of one or plural memory lines supplied from the buffer memory, and a calculation processing content in units of calculation processing for the units of data can be programmably assigned by the control circuit.
    Type: Application
    Filed: September 24, 2009
    Publication date: April 8, 2010
    Inventors: Yoshitaka TAKAHASHI, Shoji MURAMATSU, Tetsuaki NAKAMIKAWA, Hiroyuki HAMASAKI, So OTSUKA
  • Patent number: 7694112
    Abstract: A method for executing multiple computational primitives is provided in accordance with exemplary embodiments. A first computational unit and at least a second computational unit cooperate to execute multiple computational primitives. The first computational unit independently computes other computational primitives. By virtue of arbitration for shared source operand buses or shared result buses, availability of the first and second computational units needed to execute cooperatively the multiple computational primitives is assured by a process of reservation as used for a computational primitive executed on a dedicated computational unit.
    Type: Grant
    Filed: January 31, 2008
    Date of Patent: April 6, 2010
    Assignee: International Business Machines Corporation
    Inventors: Harry S. Barowski, J. Adam Butts, Stephen V. Kosonocky, Silvia M. Mueller, Jochen Preiss
  • Publication number: 20100082949
    Abstract: A processor and associated methodology employ a SIMD architecture and instruction set to efficiently perform video analytics operation on images. The processor contains a group of SIMD instructions used by the method to implement video analytic filters that avoid bit expansion of the pixels to be filtered. The filters hold the number of bits representing a pixel constant throughout the entire operation, conserving processor capacity and throughput when performing video analytics.
    Type: Application
    Filed: November 21, 2008
    Publication date: April 1, 2010
    Applicant: AXIS AB
    Inventor: Johan Almbladh
  • Publication number: 20100077176
    Abstract: Apparatus and methods for storing data in a block to provide improved accessibility of the stored data in two or more dimensions. The data is loaded into memory macros constituting a row of the block such that sequential values in the data are loaded into sequential memory macros. The data loaded in the row is circularly shifted a predetermined number of columns relative to the preceding row. The circularly shifted row of data is stored, and the process is repeated until a predetermined number of rows of data are stored. A two dimensional (2D) data block is thereby formed. Each memory macro is a predetermined number of bits wide and each column is one memory macro wide.
    Type: Application
    Filed: September 8, 2009
    Publication date: March 25, 2010
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Larry Pearlstein, Richard K. Sita
  • Publication number: 20100064118
    Abstract: A method and apparatus for reducing latency in computer processors. The method incorporates a special instruction set that provides an indication of whether a particular instruction is capable of being executed nearly simultaneously with a preceding instruction in the same group. In such a situation, multiple instructions may be executed at a rate faster than expected. A simple apparatus for accomplishing this method is illustrated.
    Type: Application
    Filed: September 10, 2008
    Publication date: March 11, 2010
    Applicant: VNS PORTFOLIO LLC
    Inventor: Charles H. Moore
  • Patent number: 7672409
    Abstract: A method of multi-user detection in a given uplink and downlink time slot in a software-defined receiver which includes filtering and sampling a received signal; forming a block-banded matrix A of the sampled signals; and solving {circumflex over (d)}=T?1y, where T=(AHA), y=AHx. The methods of solving for the matrix T includes a) computing Cholesky factors of the matrix T by approximating using the block-banded property of the matrix T and A; b) Schur decomposition for Cholesky factors of the matrix T and approximating the lower triangular Cholesky factor matrix R using block Toeplitz property of matrix T; or c) Fourier Transformation.
    Type: Grant
    Filed: July 15, 2005
    Date of Patent: March 2, 2010
    Assignee: Sandbridge Technologies, Inc.
    Inventor: Sanyogita Shamsunder
  • Publication number: 20100042814
    Abstract: An instruction set architecture includes a definition set of extended real values (e.g., computations or values that typically produce an IEEE NaN result) and a rules set of extended real value rules specifying values for one or more functions of one or more extended real values. Operations are performed on extended real values based at least partially on the extended real value rules. The instruction set architecture can be used, for example, to facilitate continued operations in a computer in case of errors relating to computations on or resulting in undefined values.
    Type: Application
    Filed: August 14, 2008
    Publication date: February 18, 2010
    Inventors: Saeid Tehrani, Babak Makkinejad
  • Patent number: 7659911
    Abstract: A method and apparatus for perfectly lossless and minimal-loss interconversion of digital color data between spectral color spaces (RGB) and perceptually based luma-chroma color spaces (Y?CBCR) is disclosed. In particular, the present invention provides a process for converting digital pixels from R?G?B? space to Y?CBCR space and back, or from Y?CBCR space to R?G?B? space and back, with zero error, or, in constant-precision implementations, with guaranteed minimal error. This invention permits digital video editing and image editing systems to repeatedly interconvert between color spaces without accumulating errors. In image codecs, this invention can improve the quality of lossy image compressors independently of their core algorithms, and enables lossless image compressors to operate in a different color space than the source data without thereby becoming lossy.
    Type: Grant
    Filed: April 21, 2005
    Date of Patent: February 9, 2010
    Inventor: Andreas Wittenstein
  • Publication number: 20100030837
    Abstract: A method of modifying a group of full adder circuits to compute a Boolean function of a set number of input bits, each full adder circuit having first and second data inputs, a data output, a carry input and a carry output, the full adder circuits being interconnected so as to form a carry chain. The method comprises the steps of setting the first input of each full adder circuit to a same fixed value, connecting each respective input bit of the set number of input bits to the second input of a respective one of the full adder circuits and using the output of the carry chain of the array of full adder circuits as the result of the Boolean function.
    Type: Application
    Filed: June 26, 2009
    Publication date: February 4, 2010
    Inventor: Anthony STANSFIELD
  • Patent number: 7656412
    Abstract: A system, a method and computer-readable media for performing texture resampling algorithms on a processing device. A texture resampling algorithm is selected. This algorithm is decomposed into multiple one-dimensional transformations. Instructions for performing each of the one-dimensional transformations are communicated to a processing device, such as a GPU. The processing device may generate an output image by separately executing the instructions associated with each of the one-dimensional transformations.
    Type: Grant
    Filed: December 21, 2005
    Date of Patent: February 2, 2010
    Assignee: Microsoft Corporation
    Inventors: Denis Demandolx, Steven White
  • Publication number: 20100023733
    Abstract: A method and apparatus to gain additional functionality of a microprocessor by adding an extended instruction set mode. In this mode, the result of executing an instruction may be changed without changing the instruction itself. In the extended instruction set mode, there is an increase to the number of bits of precision when executing the plus instruction. An additional bit position is added to the program counter register. When this bit is set, the microprocessor is in extended instruction set mode. In addition, a new one bit latch is provided. The latch may be changed only when the microprocessor is in extended instruction set mode. The latch is defined as holding a true carry bit. A significant bit of a register holding a sum is saved in the carry latch at the end of the plus instruction.
    Type: Application
    Filed: December 18, 2008
    Publication date: January 28, 2010
    Applicant: VNS PORTFOLIO LLC
    Inventors: Charles H. Moore, Gregory V. Bailey
  • Publication number: 20100017453
    Abstract: A programmable signal processing circuit has an instruction processing circuit (23, 24. 26), which has an instruction set that comprises a demapping instruction. The instruction processing circuit (23, 24, 26) has an operand input (30a) for receiving a complex number operand of the demapping instruction from a register file (22) and a result output (34) for writing a demapping result of the demapping instruction to the register file (22). The instruction processing circuit (23, 24, 26) determines at least four bit metrics in response to the demapping instruction, each indicating a relative position of the complex number relative to respective border line in a complex plane. The instruction processing circuit (23, 24, 26) writes a combination of the at least four bit metrics together to the result output (34) in the demapping result.
    Type: Application
    Filed: December 13, 2005
    Publication date: January 21, 2010
    Applicant: KONINKLIJKE PHILIPS ELECTRONICS, N.V.
    Inventors: Ingolf Held, Marcus M.G. Quax, Paulus W.F. Gruijters
  • Publication number: 20090327664
    Abstract: An arithmetic processing apparatus includes an operation circuit group that performs encryption and a redundant operation circuit group configured the same as the operation circuit group. The arithmetic processing apparatus, while performing encryption, performs normal encryption in the operation circuit group, and performs an encryption mask processing program by using data and the like randomly generated by a random data generating unit and the like in the redundant operation circuit group. The arithmetic processing apparatus, when not performing encryption, performs normal arithmetic processing in the redundant operation circuit group.
    Type: Application
    Filed: March 25, 2009
    Publication date: December 31, 2009
    Inventor: Koichi Yoshimi
  • Publication number: 20090313458
    Abstract: A processor that can execute instructions in either scalar mode or vector mode. In scalar mode, instructions are executed once per fetch. In vector mode, instructions are executed multiple times per fetch. In vector mode, the processor recognizes scalar variables and vector variables. Scalar variables may be assigned a fixed memory location. Vector variables use different physical locations at different iterations of the same instruction. The processor includes circuitry to automatically index addresses of vector variables for each iteration of the same instruction. This circuitry partitions a register into a vector region and a scalar region. Accesses to the vector region are automatically indexed based on the number of iterations of the instruction that have been performed.
    Type: Application
    Filed: August 20, 2009
    Publication date: December 17, 2009
    Applicant: STMicroelectronics Inc.
    Inventors: Osvaldo Colavin, Davide Rizzo, Vineet Soni
  • Patent number: 7631166
    Abstract: A repeat instruction (RPT) operates on one or more operands, but the RPT instruction includes only an opcode and does not specify locations of the operand or operands. The type of operation to be performed when the RPT instruction is executed depends upon an initial instruction. If, for example, the initial instruction is an ADD, then the RPT instruction causes an ADC operation to be performed, thereby facilitating efficient coding of an extended precision addition operation. The locations of the operands for the RPT instruction are assumed to be in predetermined memory locations. When coding a repeated operation, rather than following the initial instruction with one or more instructions of the same form, the initial instruction is followed by one or more of the shorter RPT instructions, thereby conserving memory space and facilitating backward compatibility with an instruction set that does not have the RPT instruction.
    Type: Grant
    Filed: August 11, 2008
    Date of Patent: December 8, 2009
    Assignee: ZiLOG, Inc.
    Inventor: Thomas Henry Hildebrandt
  • Publication number: 20090300325
    Abstract: A data processing system, apparatus and method for performing fractional multiply operations is disclosed. The system includes a memory that stores instructions for SIMD operations and a processing core. The processing core includes registers that store operands for the fractional multiply operations. A coprocessor included in the processing core performs the fractional multiply operations on the operands and stores the result in a destination register that is also included in the processing core.
    Type: Application
    Filed: August 12, 2009
    Publication date: December 3, 2009
    Applicant: Marvell International Ltd.
    Inventors: Nigel C. Paver, Bradley C. Aldrich
  • Publication number: 20090300335
    Abstract: A circuit arrangement and method couple a hardware-based pseudorandom number generator (PRNG) to an execution unit in such a manner that pseudorandom numbers generated by the PRNG may be selectively output to the execution unit for use as an operand during the execution of instructions by the execution unit. A PRNG may be coupled to an input of an operand multiplexer that outputs to an operand input of an execution unit so that operands provided by instructions supplied to the execution unit are selectively overridden with pseudorandom numbers generated by the PRNG. Furthermore, overridden operands provided by instructions supplied to the execution unit may be used as seed values for the PRNG.
    Type: Application
    Filed: June 3, 2008
    Publication date: December 3, 2009
    Inventors: Adam James Muff, Matthew Ray Tubbs
  • Publication number: 20090300336
    Abstract: The invention resides in a flexible data pipeline structure for accommodating software computational instructions for varying application programs and having a programmable embedded processor with internal pipeline stages the order and length of which varies as fast as every clock cycle based on the instruction sequence in an application program preloaded into the processor, and wherein the processor includes a data switch matrix selectively and flexibly interconnecting pluralities of mathematical execution units and memory units in response to said instructions, and wherein the execution units are configurable to perform operations at different precisions of multi-bit arithmetic and logic operations and in a multi-level hierarchical architecture structure.
    Type: Application
    Filed: May 29, 2008
    Publication date: December 3, 2009
    Inventors: Xiaolin Wang, Qian Wu, Benjamin Marshall, Fugui Wang, Ke Ning, Gregory Pitarys
  • Patent number: 7620764
    Abstract: A system, apparatus and a method for routing data over fewer switches and interconnections among reconfigurable logic elements, and for adapting routing resources to dynamically perform complex bit-level permutations, such as shifting and bit reversal operations. In one embodiment, an exemplary silo routing circuit is formed upon a semiconductor substrate and routes data among a number of reconfigurable computational elements. The silo routing circuit comprises a plurality of input terminals and a plurality of output terminals. Further, the silo routing circuit includes a multi-stage interconnection network (“MIN”) of switches configurable to form data paths from any input terminal to any output terminal.
    Type: Grant
    Filed: June 25, 2007
    Date of Patent: November 17, 2009
    Assignee: Stretch, Inc.
    Inventor: Charle′ R. Rupp
  • Publication number: 20090282224
    Abstract: An aspect of the present invention clips a sequence of data values within a known range (defined by a set of integer values) by a ceiling value and a floor value. In an embodiment, such a feature is obtained by first storing in each of a sequence of memory locations a respective value corresponding to each integer value, with a stored value in a memory location equaling the floor value if the memory location corresponds to an integer having a value less than the floor value, equaling the ceiling value if the memory location corresponds to an integer having a value greater than the ceiling value, and equaling the value of the corresponding integer otherwise. When a sequence of data values are thereafter received for clipping, the clipped value for each data value is obtained by merely retrieving a corresponding stored value from the corresponding location.
    Type: Application
    Filed: April 15, 2009
    Publication date: November 12, 2009
    Applicant: TEXAS INSTRUMENTS INCORPORATED
    Inventor: Parag Chaurasia
  • Publication number: 20090282223
    Abstract: Provided is a data processing circuit. A control unit outputs an operation control signal and a memory control signal. A plurality of program memories each outputs a command in response to the memory control signal. A plurality of arithmetic sections each selectively performs any one of the commands from the plurality of program memories in response to the operation control signal. Operation modes of the data processing circuit can be flexibly changed according to operation environments.
    Type: Application
    Filed: September 5, 2008
    Publication date: November 12, 2009
    Inventors: Chun-Gi LYUH, Jung-Hee SUK, Ik-Jae CHUN, Se-Wan HEO, Tae-Moon ROH, Jong-Dae KIM
  • Publication number: 20090265529
    Abstract: A processor (and method) of processing multiple data by a single instruction includes first and second register sets each of which includes a plurality of registers, and an arithmetic unit to rearrange data being registered in the first and second register sets according to a relative size of an absolute value of the data between the first and second register sets so that the relative size is defined before executing an instruction considering the relative size.
    Type: Application
    Filed: April 7, 2009
    Publication date: October 22, 2009
    Applicant: NEC CORPORATION
    Inventor: Yusuke Kobayashi
  • Publication number: 20090249039
    Abstract: The present invention provides extended precision in SIMD arithmetic operations in a processor having a register file and an accumulator. A first set of data elements and a second set of data elements are loaded into first and second vector registers, respectively. Each data element comprises N bits. Next, an arithmetic instruction is fetched from memory. The arithmetic instruction is decoded. Then, the first vector register and the second vector register are read from the register file. The present invention executes the arithmetic instruction on corresponding data elements in the first and second vector registers. The resulting element of the execution is then written into the accumulator. Then, the resulting element is transformed into an N-bit width element and written into a third register for further operation or storage in memory. The transformation of the resulting element can include, for example, rounding, clamping, and/or shifting the element.
    Type: Application
    Filed: June 8, 2009
    Publication date: October 1, 2009
    Applicant: MIPS Technologies, Inc.
    Inventors: Timothy Van Hook, Peter Hsu, William A. Huffman, Henry P. Moreton, Earl A. Killian
  • Publication number: 20090240926
    Abstract: A technique realizes execution of various combinations of arithmetic operations in, for example SIMD floating-point multiply-add arithmetic operation, with less instruction kind codes. An arithmetic operating apparatus includes a setting unit that sets in one or more unused bits of a single instruction extended instruction information to instruct at least one of a register and arithmetic operators to perform an extended process different from an ordinary process.
    Type: Application
    Filed: March 12, 2009
    Publication date: September 24, 2009
    Applicant: Fujitsu Limited
    Inventor: Shigeki ITOU
  • Publication number: 20090240925
    Abstract: A first arithmetic unit performs a network process for transmission and reception of a message. A second arithmetic unit performs a network process and a specific process that is predetermined to be performed on the message in relation with the network process. An alternate process management table stores therein process information in which associated identification information with an instruction sequence, the identification information being information for identifying a type of the message, the instruction sequence being a sequence for sequentially performing a network process and a specific process. The first arithmetic unit includes an identification information detector that detects the identification information from the message, and a controller that retrieves, from the alternate process management table, an instruction sequence corresponding to the identification information detected, so as to control the second arithmetic unit to perform the instruction sequence retrieved.
    Type: Application
    Filed: February 17, 2009
    Publication date: September 24, 2009
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Takeshi Ishihara, Yasuhiro Fukuju, Keisuke Mera
  • Publication number: 20090235058
    Abstract: A data processing device has an instruction decoder (1), a control logic unit (3), and ALU (4). The instruction decoder (1) decodes instruction codes of an arithmetic instruction. The control logic unit (3) detects the effective data width of operation data to be processed according to the decode result from the instruction decoder (1) and determines the number of cycles for the instruction execution corresponding to the effective data width. The ALU (4) executes the instruction with the number of cycles of the instruction execution determined by the control logic unit (3).
    Type: Application
    Filed: May 26, 2009
    Publication date: September 17, 2009
    Applicant: Renesas Technology Corporation
    Inventors: Sugako Ohtani, Hiroyuki Kondo
  • Publication number: 20090235049
    Abstract: The present invention provides a method and apparatus for QR-factorizing matrix on a multiprocessor system, wherein the multiprocessor system comprises at least one core processor and a plurality of accelerators, the method comprises the steps of: iteratively factorizing each panel in the matrix until the whole matrix is factorized; wherein in each iteration, the method comprises: partitioning an unprocessed matrix part in the matrix into a plurality of blocks according to a predetermined block size; partitioning a current processed panel in the unprocessed matrix part into at least two sub panels, wherein the current processed panel is composed of a plurality of blocks; and performing QR factorization one by one on the at least two sub panels with the plurality of accelerators, and updating the data of the sub panel(s) on which no QR factorization has been performed among the at least two sub panels by using the factorization result.
    Type: Application
    Filed: March 12, 2009
    Publication date: September 17, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Hui Li, Bai Ling Wang
  • Patent number: 7590828
    Abstract: The invention relates to a processing of a data word in a plurality of processing cycles. In order to improve the efficiency of the processing, the data word is divided for each cycle into a plurality of successive data blocks. The blocks are shifted by one block from one cycle to the next. In each of the cycles, each of the successive blocks is processed in sequence. In the first cycle, the processing results for successive blocks are moreover stored in a memory at memory addresses which change uniformly from one processing result to the next. In each subsequent processing cycle, the processing results for the successive blocks of the subsequent cycle are combined with processing results stored in the memory during a preceding cycle at memory addresses which change uniformly from one processing result in the subsequent cycle to the next.
    Type: Grant
    Filed: September 8, 2004
    Date of Patent: September 15, 2009
    Assignee: Nokia Corporation
    Inventor: Marc Hoffmann
  • Publication number: 20090225084
    Abstract: A data processing unit for performing a high speed drawing operation of arbitrary patterns even in individual pixels is provided. It is assumed that the color mode is set to 3 bits per pixel. A pixel “N” is written to a drawing position designated by a byte address [29:3] and a bit address [2:0]. The subsequent pixel “N+1” is written immediately after the pixel “N”. The next subsequent pixel “N+2” is written immediately after the pixel “N+1” to span the adjacent bytes. Thereby, the pixel data is continuously stored within each byte and across the boundary between bytes with no space. In this case, the operation of writing pixel data, i.e., the drawing is performed not in words or bytes but in pixels. In addition to this, high speed drawing is possible by the use of a caching system.
    Type: Application
    Filed: September 12, 2008
    Publication date: September 10, 2009
    Inventors: Shuhei KATO, Koichi Usami
  • Publication number: 20090228691
    Abstract: An arithmetic processing apparatus capable of performing an arithmetic operation for generating a condition flag commonly referred to by using a condition flag generated on an arithmetic operation unit basis in as few steps as possible is provided. The arithmetic processing apparatus, which processes multiple data in parallel based on single instruction, includes: processing elements capable of performing a common arithmetic operation based on the evaluation result of the instruction stored in the instruction register; and a condition flag arithmetic operation unit capable of performing one of the logical operation and the comparison operation on the condition flag retained in each processing element, transferring the operation result to each processing element, and updating the condition flag based on the operation result.
    Type: Application
    Filed: August 24, 2005
    Publication date: September 10, 2009
    Applicant: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.
    Inventors: Takeshi Furuta, Hideshi Nishida, Takeshi Tanaka
  • Patent number: 7587582
    Abstract: A method and apparatus for efficiently performing graphic operations are provided. This is accomplished by providing a processor that supports any combination of the following instructions: parallel multiply-add, conditional pick, parallel averaging, parallel power, parallel reciprocal square root and parallel shifts.
    Type: Grant
    Filed: August 16, 2000
    Date of Patent: September 8, 2009
    Assignee: Sun Microsystems, Inc.
    Inventors: Subramania Sudharsanan, Jeffrey Meng Wah Chan, Michael F. Deering, Marc Tremblay, Scott R. Nelson
  • Publication number: 20090217008
    Abstract: Provided is a program conversion apparatus for generating a secret holding program, which disables a malicious analyzer from analyzing the an original program easily.
    Type: Application
    Filed: April 21, 2006
    Publication date: August 27, 2009
    Inventors: Taichi Sato, Motoji Ohmori, Rieko Asai, Yuichi Futa, Tomoyuki Haga, Masahiro Mambo
  • Publication number: 20090198973
    Abstract: A processing circuit according to the present invention includes a plurality of logic circuits (designated as L11, . . . , and L44) formed by arranging in arrays and is configured to input an output from a logic circuit to the logic circuit located on the following row. Each of the plurality of logic circuits includes an operation circuit (ALU) configured to perform an operation on inputted data; and a selecting unit (MUX) configured to select and output any one of an operation output from the operation circuit or an operation output from the logic circuit located on the preceding row.
    Type: Application
    Filed: January 28, 2009
    Publication date: August 6, 2009
    Applicant: SANYO ELECTRIC CO., LTD.
    Inventors: Kazuhisa IIZUKA, Makoto OZONE
  • Publication number: 20090198974
    Abstract: A method for executing multiple computational primitives is provided in accordance with exemplary embodiments. A first computational unit and at least a second computational unit cooperate to execute multiple computational primitives. The first computational unit independently computes other computational primitives. By virtue of arbitration for shared source operand buses or shared result buses, availability of the first and second computational units needed to execute cooperatively the multiple computational primitives is assured by a process of reservation as used for a computational primitive executed on a dedicated computational unit.
    Type: Application
    Filed: January 31, 2008
    Publication date: August 6, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Harry S. Barowski, J. Adam Butts, Stephen V. Kosonocky, Silvia M. Mueller, Jochen Preiss
  • Publication number: 20090187749
    Abstract: A bypass circuit is provided in a pipeline processor. A pipeline register is provided between an instruction execution stage and a write-back stage. The pipeline register stores a data validity flag and a WRITE control flag to control writing data into a general purpose register unit. The data retained in the pipeline register is allowed to be written back into the general purpose register unit when the WRITE control flag indicates “valid”. The pipeline register continues to retain the retained data even after the writing of the retained data into the general purpose register unit. The first pipeline register supplies the retained data to the second stage through the bypass circuit at the time of executing a subsequent instruction having data dependency on a preceding instruction.
    Type: Application
    Filed: January 12, 2009
    Publication date: July 23, 2009
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventor: Jun TANABE
  • Publication number: 20090187746
    Abstract: An apparatus for processing data is provided comprising processing circuitry having permutation circuitry for performing permutation operations, a register bank having a plurality of registers for storing data and control circuitry responsive to program instructions to control the processing circuitry to perform data processing operations. The control circuitry is arranged to be responsive to a control-generating instruction to generate in dependence upon a bit-mask control signals to configure permutation circuitry for performing permutation operation on an input operand. The bit-mask identifies within the input operand the first group of data elements having a first ordering and a second group of data elements having a second ordering and the permutation operation is such that it preserves one of the first ordering and the second ordering but changes the other of the first ordering and the second ordering.
    Type: Application
    Filed: December 16, 2008
    Publication date: July 23, 2009
    Applicant: ARM LIMITED
    Inventors: Dominic Hugo Symes, Mladen Wilder
  • Publication number: 20090182990
    Abstract: Embodiments of the invention provide methods and apparatus for executing a multiple operand minimum or maximum instructions. Executing the multiple operand minimum or maximum instruction comprises transferring more than two operands to one or more processing lanes of a vector unit. A first compare operation may be performed in at least one processing lane of the vector unit to determine a greater or smaller of a first operand and a second operand. The greater (or smaller) operand may be transferred to a dot product unit, wherein, in a second compare operation, the transferred operand is compared to at least a third operand to determine one of the greater and smaller of the more than two operands.
    Type: Application
    Filed: January 15, 2008
    Publication date: July 16, 2009
    Inventors: Adam J. Muff, Matthew R. Tubbs
  • Publication number: 20090182991
    Abstract: A processor core includes an instruction decode unit that may dispatch a same integer instruction stream to a plurality of integer execution units operating in lock-step. The processor core also includes signature generation logic that may generate, concurrently with execution of the integer instructions, a respective signature from result signals conveyed on respective result buses in one or more pipeline stages within each of the integer execution units in response to the result signals becoming available. The processor core also includes compare logic that may detect a mismatch between signatures from each of the integer execution units. Further, in response to the compare logic detecting any mismatch, the compare logic may cause instructions causing the mismatch to be re-executed.
    Type: Application
    Filed: January 10, 2008
    Publication date: July 16, 2009
    Inventor: Nhon Quach
  • Patent number: 7555635
    Abstract: A data processing device has an instruction decoder (1), a control logic unit (3), and ALU (4). The instruction decoder (1) decodes instruction codes of an arithmetic instruction. The control logic unit (3) detects the effective data width of operation data to be processed according to the decode result from the instruction decoder (1) and determines the number of cycles for the instruction execution corresponding to the effective data width. The ALU (4) executes the instruction with the number of cycles of the instruction execution determined by the control logic unit (3).
    Type: Grant
    Filed: July 26, 2007
    Date of Patent: June 30, 2009
    Assignee: Renesas Technology Corp.
    Inventors: Sugako Ohtani, Hiroyuki Kondo
  • Patent number: 7554464
    Abstract: A method and system allows for fast compression and decompressing of data using existing repetitive interleaved patterns within scientific data (floating point, integer, and image). An advantage of the method and system is that it is so fast that it can be used to save time due to a lower amount of data transferred/stored in scenarios like network transfer, disk or memory storage, cache storage or any other real-time applications where time plays a crucial role.
    Type: Grant
    Filed: September 30, 2004
    Date of Patent: June 30, 2009
    Assignee: Gear Six, Inc.
    Inventor: Matthias Oberdorfer
  • Publication number: 20090150654
    Abstract: A functional unit is added to a graphics processor to provide direct support for double-precision arithmetic, in addition to the single-precision functional units used for rendering. The double-precision functional unit can execute a number of different operations, including fused multiply-add, on double-precision inputs using data paths and/or logic circuits that are at least double-precision width. The double-precision and single-precision functional units can be controlled by a shared instruction issue circuit, and the number of copies of the double-precision functional unit included in a core can be less than the number of copies of the single-precision functional units, thereby reducing the effect of adding support for double-precision on chip area.
    Type: Application
    Filed: December 7, 2007
    Publication date: June 11, 2009
    Applicant: NVIDIA CORPORATION
    Inventors: Stuart Oberman, Ming Y. Siu, David C. Tannenbaum
  • Publication number: 20090150651
    Abstract: Disclosed herein is a semiconductor chip including: a plurality of processing devices that can communicate with each other; wherein each of the processing devices includes an arithmetic unit, an individual memory connected to the arithmetic unit on a one-to-one basis, and a control unit configured to independently control turning on and off of operation of the arithmetic unit and the individual memory.
    Type: Application
    Filed: November 17, 2008
    Publication date: June 11, 2009
    Applicant: Sony Corporation
    Inventor: Mutsuhiro Ohmori
  • Patent number: 7546443
    Abstract: The present invention provides extended precision in SIMD arithmetic operations in a processor having a register file and an accumulator. A first set of data elements and a second set of data elements are loaded into first and second vector registers, respectively. Each data element comprises N bits. Next, an arithmetic instruction is fetched from memory. The arithmetic instruction is decoded. Then, the first vector register and the second vector register are read from the register file. The present invention executes the arithmetic instruction on corresponding data elements in the first and second vector registers. The resulting element of the execution is then written into the accumulator. Then, the resulting element is transformed into an N-bit width element and written into a third register for further operation or storage in memory. The transformation of the resulting element can include, for example, rounding, clamping, and/or shifting the element.
    Type: Grant
    Filed: January 24, 2006
    Date of Patent: June 9, 2009
    Assignee: MIPS Technologies, Inc.
    Inventors: Timothy J. Van Hook, Peter Hsu, William A. Huffman, Henry P. Moreton, Earl A. Killian
  • Publication number: 20090144527
    Abstract: The present invention provides a stream processing apparatus capable of improving the processing performance in the case of continuously processing a plurality of data streams. A control stream, different from a data stream, is prepared, and a program and a parameter are updated in advance in accordance with the control stream. Double buffer areas are prepared in a memory of the stream processing apparatus into which the program and the parameter are stored. The location of the data stream to be input is written in the control stream, and buffers for reading the data stream are multiplexed so as to read in advance the top portion of the data stream to be processed next.
    Type: Application
    Filed: November 28, 2008
    Publication date: June 4, 2009
    Inventors: Hiroaki NAKATA, Takafumi YUASA, Fumitaka IZUHARA, Kazushi AKIE, Motoki KIMURA
  • Patent number: 7533249
    Abstract: In order to reuse configuration information in a dynamic reconfiguration arithmetic circuit, data lines, address lines, a mask register and the like are required as hardware resources for rewriting only configuration information of dynamic reconfiguration arithmetic cells needed to be changed. However, this results in an increase in area of the arithmetic circuit. According to the present invention, a shift register is the only hardware resource in the dynamic reconfiguration arithmetic block for changing the configuration information. The shift register is structured by connecting in series storage units corresponding one-to-one with each arithmetic cell. An output from the end terminal of the shift register and an output of the configuration information storage unit are input to the configuration information selector, and an output of the configuration information selector is connected to the front of the shift register. The cell address counter counts up from 0 and increments one at a time.
    Type: Grant
    Filed: October 23, 2007
    Date of Patent: May 12, 2009
    Assignee: Panasonic Corporation
    Inventor: Masaki Maeda