Arithmetic Operation Instruction Processing Patents (Class 712/221)
-
Patent number: 7730287Abstract: Methods and software are presented for processing data in a programmable processor, involving (a) decoding instructions for execution using an execution unit operable to execute instructions by partitioning data stored in registers in a register file into multiple data elements, the instructions selected from an instruction set that includes group arithmetic instructions and group data handling instructions, (b) in response to decoding different group data handling instructions, executing group data handling operations that re-arrange data elements in different ways, and (c) in response to decoding different group arithmetic instructions, executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on the multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results.Type: GrantFiled: July 27, 2007Date of Patent: June 1, 2010Assignee: Microunity Systems Engineering, Inc.Inventors: Craig Hansen, John Moussouris, Alexia Massalin
-
Publication number: 20100122064Abstract: A device may include a data processing logic cell field and one or more sequential CPUs. The logic cell field and the CPUs may be configured to be coupled to each other for data exchange. The data exchange may be in block form using lines leading to a cache memory. In a method for operating a reconfigurable unit having runtime-limited configurations, the configurations may be able to increase their maximum allowed runtime, e.g., by triggering a parallel counter. An increase in configuration runtime by the configurations may be suppressed in response to an interrupt.Type: ApplicationFiled: September 30, 2009Publication date: May 13, 2010Inventor: MARTIN VORBACH
-
Publication number: 20100115245Abstract: A system for detecting and correcting invalid calculation results due to a timing violation. A processor compares results of an instruction simultaneously executed by a first arithmetic pipeline and a second arithmetic pipeline of the processor. In the second arithmetic pipeline, the critical stage of the first arithmetic pipeline is divided to multiple stages. A first result calculated by the first arithmetic pipeline is speculatively executed within the processor. The second arithmetic pipeline calculates a second result. The processor compares the second result to the first result. When the results are identical, the first result is assigned as the final result with a complete status. When the results do not match, the processor replaces the first result with the second result. The processor may then cancel the speculatively executed instruction and issue the second result as a final result. The processor may then restart subsequent instructions using the second result.Type: ApplicationFiled: October 30, 2008Publication date: May 6, 2010Inventor: Atsuya Okazaki
-
Publication number: 20100106947Abstract: The present application relates to the field of processors and in particular to the carrying out of arithmetic operations. Many of the computations performed by processors consist of a large number of simple operations. As a result, a multiplication operation may take a significant number of clock cycles to complete. The present application provides a processor having a trivial operand register, which is used in the carrying out of arithmetic or storage operations for data values stored in a data store.Type: ApplicationFiled: March 16, 2008Publication date: April 29, 2010Inventor: David Moloney
-
Publication number: 20100095306Abstract: An arithmetic device simultaneously processes a plurality of threads and may continue the process by minimizing the degradation of the entire performance although a hardware error occurs. An arithmetic device 100 includes: an instruction execution circuit 101 capable of selectively executing a mode in which the instruction sequences of a plurality of threads are executed and a mode in which the instruction sequence of a single thread is executed; and a switch indication circuit 102 instructing the instruction execution circuit 101 to switch a thread mode.Type: ApplicationFiled: December 15, 2009Publication date: April 15, 2010Applicant: FUJITSU LIMITEDInventors: Norihito GOMYO, Toshio Yoshida, Ryuichi Sunayama
-
Publication number: 20100088493Abstract: A restriction is given to the calculation function for image processing achieved by the hard-wired system and the memory access control of a buffer memory, and a range of the restriction is made variable by a program control and others. Data is inputted to the buffer memory from the outside with a restriction of “in units of memory line”, and the number of memory lines and positions of the same to which data is inputted can be programmable by the control circuit. The arithmetic circuit is subjected to the restriction of performing the calculation in units of data of one or plural memory lines supplied from the buffer memory, and a calculation processing content in units of calculation processing for the units of data can be programmably assigned by the control circuit.Type: ApplicationFiled: September 24, 2009Publication date: April 8, 2010Inventors: Yoshitaka TAKAHASHI, Shoji MURAMATSU, Tetsuaki NAKAMIKAWA, Hiroyuki HAMASAKI, So OTSUKA
-
Patent number: 7694112Abstract: A method for executing multiple computational primitives is provided in accordance with exemplary embodiments. A first computational unit and at least a second computational unit cooperate to execute multiple computational primitives. The first computational unit independently computes other computational primitives. By virtue of arbitration for shared source operand buses or shared result buses, availability of the first and second computational units needed to execute cooperatively the multiple computational primitives is assured by a process of reservation as used for a computational primitive executed on a dedicated computational unit.Type: GrantFiled: January 31, 2008Date of Patent: April 6, 2010Assignee: International Business Machines CorporationInventors: Harry S. Barowski, J. Adam Butts, Stephen V. Kosonocky, Silvia M. Mueller, Jochen Preiss
-
Publication number: 20100082949Abstract: A processor and associated methodology employ a SIMD architecture and instruction set to efficiently perform video analytics operation on images. The processor contains a group of SIMD instructions used by the method to implement video analytic filters that avoid bit expansion of the pixels to be filtered. The filters hold the number of bits representing a pixel constant throughout the entire operation, conserving processor capacity and throughput when performing video analytics.Type: ApplicationFiled: November 21, 2008Publication date: April 1, 2010Applicant: AXIS ABInventor: Johan Almbladh
-
Publication number: 20100077176Abstract: Apparatus and methods for storing data in a block to provide improved accessibility of the stored data in two or more dimensions. The data is loaded into memory macros constituting a row of the block such that sequential values in the data are loaded into sequential memory macros. The data loaded in the row is circularly shifted a predetermined number of columns relative to the preceding row. The circularly shifted row of data is stored, and the process is repeated until a predetermined number of rows of data are stored. A two dimensional (2D) data block is thereby formed. Each memory macro is a predetermined number of bits wide and each column is one memory macro wide.Type: ApplicationFiled: September 8, 2009Publication date: March 25, 2010Applicant: ADVANCED MICRO DEVICES, INC.Inventors: Larry Pearlstein, Richard K. Sita
-
Publication number: 20100064118Abstract: A method and apparatus for reducing latency in computer processors. The method incorporates a special instruction set that provides an indication of whether a particular instruction is capable of being executed nearly simultaneously with a preceding instruction in the same group. In such a situation, multiple instructions may be executed at a rate faster than expected. A simple apparatus for accomplishing this method is illustrated.Type: ApplicationFiled: September 10, 2008Publication date: March 11, 2010Applicant: VNS PORTFOLIO LLCInventor: Charles H. Moore
-
Patent number: 7672409Abstract: A method of multi-user detection in a given uplink and downlink time slot in a software-defined receiver which includes filtering and sampling a received signal; forming a block-banded matrix A of the sampled signals; and solving {circumflex over (d)}=T?1y, where T=(AHA), y=AHx. The methods of solving for the matrix T includes a) computing Cholesky factors of the matrix T by approximating using the block-banded property of the matrix T and A; b) Schur decomposition for Cholesky factors of the matrix T and approximating the lower triangular Cholesky factor matrix R using block Toeplitz property of matrix T; or c) Fourier Transformation.Type: GrantFiled: July 15, 2005Date of Patent: March 2, 2010Assignee: Sandbridge Technologies, Inc.Inventor: Sanyogita Shamsunder
-
Publication number: 20100042814Abstract: An instruction set architecture includes a definition set of extended real values (e.g., computations or values that typically produce an IEEE NaN result) and a rules set of extended real value rules specifying values for one or more functions of one or more extended real values. Operations are performed on extended real values based at least partially on the extended real value rules. The instruction set architecture can be used, for example, to facilitate continued operations in a computer in case of errors relating to computations on or resulting in undefined values.Type: ApplicationFiled: August 14, 2008Publication date: February 18, 2010Inventors: Saeid Tehrani, Babak Makkinejad
-
Patent number: 7659911Abstract: A method and apparatus for perfectly lossless and minimal-loss interconversion of digital color data between spectral color spaces (RGB) and perceptually based luma-chroma color spaces (Y?CBCR) is disclosed. In particular, the present invention provides a process for converting digital pixels from R?G?B? space to Y?CBCR space and back, or from Y?CBCR space to R?G?B? space and back, with zero error, or, in constant-precision implementations, with guaranteed minimal error. This invention permits digital video editing and image editing systems to repeatedly interconvert between color spaces without accumulating errors. In image codecs, this invention can improve the quality of lossy image compressors independently of their core algorithms, and enables lossless image compressors to operate in a different color space than the source data without thereby becoming lossy.Type: GrantFiled: April 21, 2005Date of Patent: February 9, 2010Inventor: Andreas Wittenstein
-
Publication number: 20100030837Abstract: A method of modifying a group of full adder circuits to compute a Boolean function of a set number of input bits, each full adder circuit having first and second data inputs, a data output, a carry input and a carry output, the full adder circuits being interconnected so as to form a carry chain. The method comprises the steps of setting the first input of each full adder circuit to a same fixed value, connecting each respective input bit of the set number of input bits to the second input of a respective one of the full adder circuits and using the output of the carry chain of the array of full adder circuits as the result of the Boolean function.Type: ApplicationFiled: June 26, 2009Publication date: February 4, 2010Inventor: Anthony STANSFIELD
-
Patent number: 7656412Abstract: A system, a method and computer-readable media for performing texture resampling algorithms on a processing device. A texture resampling algorithm is selected. This algorithm is decomposed into multiple one-dimensional transformations. Instructions for performing each of the one-dimensional transformations are communicated to a processing device, such as a GPU. The processing device may generate an output image by separately executing the instructions associated with each of the one-dimensional transformations.Type: GrantFiled: December 21, 2005Date of Patent: February 2, 2010Assignee: Microsoft CorporationInventors: Denis Demandolx, Steven White
-
Publication number: 20100023733Abstract: A method and apparatus to gain additional functionality of a microprocessor by adding an extended instruction set mode. In this mode, the result of executing an instruction may be changed without changing the instruction itself. In the extended instruction set mode, there is an increase to the number of bits of precision when executing the plus instruction. An additional bit position is added to the program counter register. When this bit is set, the microprocessor is in extended instruction set mode. In addition, a new one bit latch is provided. The latch may be changed only when the microprocessor is in extended instruction set mode. The latch is defined as holding a true carry bit. A significant bit of a register holding a sum is saved in the carry latch at the end of the plus instruction.Type: ApplicationFiled: December 18, 2008Publication date: January 28, 2010Applicant: VNS PORTFOLIO LLCInventors: Charles H. Moore, Gregory V. Bailey
-
Publication number: 20100017453Abstract: A programmable signal processing circuit has an instruction processing circuit (23, 24. 26), which has an instruction set that comprises a demapping instruction. The instruction processing circuit (23, 24, 26) has an operand input (30a) for receiving a complex number operand of the demapping instruction from a register file (22) and a result output (34) for writing a demapping result of the demapping instruction to the register file (22). The instruction processing circuit (23, 24, 26) determines at least four bit metrics in response to the demapping instruction, each indicating a relative position of the complex number relative to respective border line in a complex plane. The instruction processing circuit (23, 24, 26) writes a combination of the at least four bit metrics together to the result output (34) in the demapping result.Type: ApplicationFiled: December 13, 2005Publication date: January 21, 2010Applicant: KONINKLIJKE PHILIPS ELECTRONICS, N.V.Inventors: Ingolf Held, Marcus M.G. Quax, Paulus W.F. Gruijters
-
Publication number: 20090327664Abstract: An arithmetic processing apparatus includes an operation circuit group that performs encryption and a redundant operation circuit group configured the same as the operation circuit group. The arithmetic processing apparatus, while performing encryption, performs normal encryption in the operation circuit group, and performs an encryption mask processing program by using data and the like randomly generated by a random data generating unit and the like in the redundant operation circuit group. The arithmetic processing apparatus, when not performing encryption, performs normal arithmetic processing in the redundant operation circuit group.Type: ApplicationFiled: March 25, 2009Publication date: December 31, 2009Inventor: Koichi Yoshimi
-
Publication number: 20090313458Abstract: A processor that can execute instructions in either scalar mode or vector mode. In scalar mode, instructions are executed once per fetch. In vector mode, instructions are executed multiple times per fetch. In vector mode, the processor recognizes scalar variables and vector variables. Scalar variables may be assigned a fixed memory location. Vector variables use different physical locations at different iterations of the same instruction. The processor includes circuitry to automatically index addresses of vector variables for each iteration of the same instruction. This circuitry partitions a register into a vector region and a scalar region. Accesses to the vector region are automatically indexed based on the number of iterations of the instruction that have been performed.Type: ApplicationFiled: August 20, 2009Publication date: December 17, 2009Applicant: STMicroelectronics Inc.Inventors: Osvaldo Colavin, Davide Rizzo, Vineet Soni
-
Patent number: 7631166Abstract: A repeat instruction (RPT) operates on one or more operands, but the RPT instruction includes only an opcode and does not specify locations of the operand or operands. The type of operation to be performed when the RPT instruction is executed depends upon an initial instruction. If, for example, the initial instruction is an ADD, then the RPT instruction causes an ADC operation to be performed, thereby facilitating efficient coding of an extended precision addition operation. The locations of the operands for the RPT instruction are assumed to be in predetermined memory locations. When coding a repeated operation, rather than following the initial instruction with one or more instructions of the same form, the initial instruction is followed by one or more of the shorter RPT instructions, thereby conserving memory space and facilitating backward compatibility with an instruction set that does not have the RPT instruction.Type: GrantFiled: August 11, 2008Date of Patent: December 8, 2009Assignee: ZiLOG, Inc.Inventor: Thomas Henry Hildebrandt
-
Publication number: 20090300325Abstract: A data processing system, apparatus and method for performing fractional multiply operations is disclosed. The system includes a memory that stores instructions for SIMD operations and a processing core. The processing core includes registers that store operands for the fractional multiply operations. A coprocessor included in the processing core performs the fractional multiply operations on the operands and stores the result in a destination register that is also included in the processing core.Type: ApplicationFiled: August 12, 2009Publication date: December 3, 2009Applicant: Marvell International Ltd.Inventors: Nigel C. Paver, Bradley C. Aldrich
-
Publication number: 20090300335Abstract: A circuit arrangement and method couple a hardware-based pseudorandom number generator (PRNG) to an execution unit in such a manner that pseudorandom numbers generated by the PRNG may be selectively output to the execution unit for use as an operand during the execution of instructions by the execution unit. A PRNG may be coupled to an input of an operand multiplexer that outputs to an operand input of an execution unit so that operands provided by instructions supplied to the execution unit are selectively overridden with pseudorandom numbers generated by the PRNG. Furthermore, overridden operands provided by instructions supplied to the execution unit may be used as seed values for the PRNG.Type: ApplicationFiled: June 3, 2008Publication date: December 3, 2009Inventors: Adam James Muff, Matthew Ray Tubbs
-
Publication number: 20090300336Abstract: The invention resides in a flexible data pipeline structure for accommodating software computational instructions for varying application programs and having a programmable embedded processor with internal pipeline stages the order and length of which varies as fast as every clock cycle based on the instruction sequence in an application program preloaded into the processor, and wherein the processor includes a data switch matrix selectively and flexibly interconnecting pluralities of mathematical execution units and memory units in response to said instructions, and wherein the execution units are configurable to perform operations at different precisions of multi-bit arithmetic and logic operations and in a multi-level hierarchical architecture structure.Type: ApplicationFiled: May 29, 2008Publication date: December 3, 2009Inventors: Xiaolin Wang, Qian Wu, Benjamin Marshall, Fugui Wang, Ke Ning, Gregory Pitarys
-
Patent number: 7620764Abstract: A system, apparatus and a method for routing data over fewer switches and interconnections among reconfigurable logic elements, and for adapting routing resources to dynamically perform complex bit-level permutations, such as shifting and bit reversal operations. In one embodiment, an exemplary silo routing circuit is formed upon a semiconductor substrate and routes data among a number of reconfigurable computational elements. The silo routing circuit comprises a plurality of input terminals and a plurality of output terminals. Further, the silo routing circuit includes a multi-stage interconnection network (“MIN”) of switches configurable to form data paths from any input terminal to any output terminal.Type: GrantFiled: June 25, 2007Date of Patent: November 17, 2009Assignee: Stretch, Inc.Inventor: Charle′ R. Rupp
-
Publication number: 20090282224Abstract: An aspect of the present invention clips a sequence of data values within a known range (defined by a set of integer values) by a ceiling value and a floor value. In an embodiment, such a feature is obtained by first storing in each of a sequence of memory locations a respective value corresponding to each integer value, with a stored value in a memory location equaling the floor value if the memory location corresponds to an integer having a value less than the floor value, equaling the ceiling value if the memory location corresponds to an integer having a value greater than the ceiling value, and equaling the value of the corresponding integer otherwise. When a sequence of data values are thereafter received for clipping, the clipped value for each data value is obtained by merely retrieving a corresponding stored value from the corresponding location.Type: ApplicationFiled: April 15, 2009Publication date: November 12, 2009Applicant: TEXAS INSTRUMENTS INCORPORATEDInventor: Parag Chaurasia
-
Publication number: 20090282223Abstract: Provided is a data processing circuit. A control unit outputs an operation control signal and a memory control signal. A plurality of program memories each outputs a command in response to the memory control signal. A plurality of arithmetic sections each selectively performs any one of the commands from the plurality of program memories in response to the operation control signal. Operation modes of the data processing circuit can be flexibly changed according to operation environments.Type: ApplicationFiled: September 5, 2008Publication date: November 12, 2009Inventors: Chun-Gi LYUH, Jung-Hee SUK, Ik-Jae CHUN, Se-Wan HEO, Tae-Moon ROH, Jong-Dae KIM
-
Publication number: 20090265529Abstract: A processor (and method) of processing multiple data by a single instruction includes first and second register sets each of which includes a plurality of registers, and an arithmetic unit to rearrange data being registered in the first and second register sets according to a relative size of an absolute value of the data between the first and second register sets so that the relative size is defined before executing an instruction considering the relative size.Type: ApplicationFiled: April 7, 2009Publication date: October 22, 2009Applicant: NEC CORPORATIONInventor: Yusuke Kobayashi
-
Publication number: 20090249039Abstract: The present invention provides extended precision in SIMD arithmetic operations in a processor having a register file and an accumulator. A first set of data elements and a second set of data elements are loaded into first and second vector registers, respectively. Each data element comprises N bits. Next, an arithmetic instruction is fetched from memory. The arithmetic instruction is decoded. Then, the first vector register and the second vector register are read from the register file. The present invention executes the arithmetic instruction on corresponding data elements in the first and second vector registers. The resulting element of the execution is then written into the accumulator. Then, the resulting element is transformed into an N-bit width element and written into a third register for further operation or storage in memory. The transformation of the resulting element can include, for example, rounding, clamping, and/or shifting the element.Type: ApplicationFiled: June 8, 2009Publication date: October 1, 2009Applicant: MIPS Technologies, Inc.Inventors: Timothy Van Hook, Peter Hsu, William A. Huffman, Henry P. Moreton, Earl A. Killian
-
Publication number: 20090240926Abstract: A technique realizes execution of various combinations of arithmetic operations in, for example SIMD floating-point multiply-add arithmetic operation, with less instruction kind codes. An arithmetic operating apparatus includes a setting unit that sets in one or more unused bits of a single instruction extended instruction information to instruct at least one of a register and arithmetic operators to perform an extended process different from an ordinary process.Type: ApplicationFiled: March 12, 2009Publication date: September 24, 2009Applicant: Fujitsu LimitedInventor: Shigeki ITOU
-
Publication number: 20090240925Abstract: A first arithmetic unit performs a network process for transmission and reception of a message. A second arithmetic unit performs a network process and a specific process that is predetermined to be performed on the message in relation with the network process. An alternate process management table stores therein process information in which associated identification information with an instruction sequence, the identification information being information for identifying a type of the message, the instruction sequence being a sequence for sequentially performing a network process and a specific process. The first arithmetic unit includes an identification information detector that detects the identification information from the message, and a controller that retrieves, from the alternate process management table, an instruction sequence corresponding to the identification information detected, so as to control the second arithmetic unit to perform the instruction sequence retrieved.Type: ApplicationFiled: February 17, 2009Publication date: September 24, 2009Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Takeshi Ishihara, Yasuhiro Fukuju, Keisuke Mera
-
Publication number: 20090235058Abstract: A data processing device has an instruction decoder (1), a control logic unit (3), and ALU (4). The instruction decoder (1) decodes instruction codes of an arithmetic instruction. The control logic unit (3) detects the effective data width of operation data to be processed according to the decode result from the instruction decoder (1) and determines the number of cycles for the instruction execution corresponding to the effective data width. The ALU (4) executes the instruction with the number of cycles of the instruction execution determined by the control logic unit (3).Type: ApplicationFiled: May 26, 2009Publication date: September 17, 2009Applicant: Renesas Technology CorporationInventors: Sugako Ohtani, Hiroyuki Kondo
-
Publication number: 20090235049Abstract: The present invention provides a method and apparatus for QR-factorizing matrix on a multiprocessor system, wherein the multiprocessor system comprises at least one core processor and a plurality of accelerators, the method comprises the steps of: iteratively factorizing each panel in the matrix until the whole matrix is factorized; wherein in each iteration, the method comprises: partitioning an unprocessed matrix part in the matrix into a plurality of blocks according to a predetermined block size; partitioning a current processed panel in the unprocessed matrix part into at least two sub panels, wherein the current processed panel is composed of a plurality of blocks; and performing QR factorization one by one on the at least two sub panels with the plurality of accelerators, and updating the data of the sub panel(s) on which no QR factorization has been performed among the at least two sub panels by using the factorization result.Type: ApplicationFiled: March 12, 2009Publication date: September 17, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Hui Li, Bai Ling Wang
-
Patent number: 7590828Abstract: The invention relates to a processing of a data word in a plurality of processing cycles. In order to improve the efficiency of the processing, the data word is divided for each cycle into a plurality of successive data blocks. The blocks are shifted by one block from one cycle to the next. In each of the cycles, each of the successive blocks is processed in sequence. In the first cycle, the processing results for successive blocks are moreover stored in a memory at memory addresses which change uniformly from one processing result to the next. In each subsequent processing cycle, the processing results for the successive blocks of the subsequent cycle are combined with processing results stored in the memory during a preceding cycle at memory addresses which change uniformly from one processing result in the subsequent cycle to the next.Type: GrantFiled: September 8, 2004Date of Patent: September 15, 2009Assignee: Nokia CorporationInventor: Marc Hoffmann
-
Publication number: 20090225084Abstract: A data processing unit for performing a high speed drawing operation of arbitrary patterns even in individual pixels is provided. It is assumed that the color mode is set to 3 bits per pixel. A pixel “N” is written to a drawing position designated by a byte address [29:3] and a bit address [2:0]. The subsequent pixel “N+1” is written immediately after the pixel “N”. The next subsequent pixel “N+2” is written immediately after the pixel “N+1” to span the adjacent bytes. Thereby, the pixel data is continuously stored within each byte and across the boundary between bytes with no space. In this case, the operation of writing pixel data, i.e., the drawing is performed not in words or bytes but in pixels. In addition to this, high speed drawing is possible by the use of a caching system.Type: ApplicationFiled: September 12, 2008Publication date: September 10, 2009Inventors: Shuhei KATO, Koichi Usami
-
Publication number: 20090228691Abstract: An arithmetic processing apparatus capable of performing an arithmetic operation for generating a condition flag commonly referred to by using a condition flag generated on an arithmetic operation unit basis in as few steps as possible is provided. The arithmetic processing apparatus, which processes multiple data in parallel based on single instruction, includes: processing elements capable of performing a common arithmetic operation based on the evaluation result of the instruction stored in the instruction register; and a condition flag arithmetic operation unit capable of performing one of the logical operation and the comparison operation on the condition flag retained in each processing element, transferring the operation result to each processing element, and updating the condition flag based on the operation result.Type: ApplicationFiled: August 24, 2005Publication date: September 10, 2009Applicant: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.Inventors: Takeshi Furuta, Hideshi Nishida, Takeshi Tanaka
-
Patent number: 7587582Abstract: A method and apparatus for efficiently performing graphic operations are provided. This is accomplished by providing a processor that supports any combination of the following instructions: parallel multiply-add, conditional pick, parallel averaging, parallel power, parallel reciprocal square root and parallel shifts.Type: GrantFiled: August 16, 2000Date of Patent: September 8, 2009Assignee: Sun Microsystems, Inc.Inventors: Subramania Sudharsanan, Jeffrey Meng Wah Chan, Michael F. Deering, Marc Tremblay, Scott R. Nelson
-
Publication number: 20090217008Abstract: Provided is a program conversion apparatus for generating a secret holding program, which disables a malicious analyzer from analyzing the an original program easily.Type: ApplicationFiled: April 21, 2006Publication date: August 27, 2009Inventors: Taichi Sato, Motoji Ohmori, Rieko Asai, Yuichi Futa, Tomoyuki Haga, Masahiro Mambo
-
Publication number: 20090198973Abstract: A processing circuit according to the present invention includes a plurality of logic circuits (designated as L11, . . . , and L44) formed by arranging in arrays and is configured to input an output from a logic circuit to the logic circuit located on the following row. Each of the plurality of logic circuits includes an operation circuit (ALU) configured to perform an operation on inputted data; and a selecting unit (MUX) configured to select and output any one of an operation output from the operation circuit or an operation output from the logic circuit located on the preceding row.Type: ApplicationFiled: January 28, 2009Publication date: August 6, 2009Applicant: SANYO ELECTRIC CO., LTD.Inventors: Kazuhisa IIZUKA, Makoto OZONE
-
Publication number: 20090198974Abstract: A method for executing multiple computational primitives is provided in accordance with exemplary embodiments. A first computational unit and at least a second computational unit cooperate to execute multiple computational primitives. The first computational unit independently computes other computational primitives. By virtue of arbitration for shared source operand buses or shared result buses, availability of the first and second computational units needed to execute cooperatively the multiple computational primitives is assured by a process of reservation as used for a computational primitive executed on a dedicated computational unit.Type: ApplicationFiled: January 31, 2008Publication date: August 6, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Harry S. Barowski, J. Adam Butts, Stephen V. Kosonocky, Silvia M. Mueller, Jochen Preiss
-
Publication number: 20090187749Abstract: A bypass circuit is provided in a pipeline processor. A pipeline register is provided between an instruction execution stage and a write-back stage. The pipeline register stores a data validity flag and a WRITE control flag to control writing data into a general purpose register unit. The data retained in the pipeline register is allowed to be written back into the general purpose register unit when the WRITE control flag indicates “valid”. The pipeline register continues to retain the retained data even after the writing of the retained data into the general purpose register unit. The first pipeline register supplies the retained data to the second stage through the bypass circuit at the time of executing a subsequent instruction having data dependency on a preceding instruction.Type: ApplicationFiled: January 12, 2009Publication date: July 23, 2009Applicant: KABUSHIKI KAISHA TOSHIBAInventor: Jun TANABE
-
Publication number: 20090187746Abstract: An apparatus for processing data is provided comprising processing circuitry having permutation circuitry for performing permutation operations, a register bank having a plurality of registers for storing data and control circuitry responsive to program instructions to control the processing circuitry to perform data processing operations. The control circuitry is arranged to be responsive to a control-generating instruction to generate in dependence upon a bit-mask control signals to configure permutation circuitry for performing permutation operation on an input operand. The bit-mask identifies within the input operand the first group of data elements having a first ordering and a second group of data elements having a second ordering and the permutation operation is such that it preserves one of the first ordering and the second ordering but changes the other of the first ordering and the second ordering.Type: ApplicationFiled: December 16, 2008Publication date: July 23, 2009Applicant: ARM LIMITEDInventors: Dominic Hugo Symes, Mladen Wilder
-
Publication number: 20090182990Abstract: Embodiments of the invention provide methods and apparatus for executing a multiple operand minimum or maximum instructions. Executing the multiple operand minimum or maximum instruction comprises transferring more than two operands to one or more processing lanes of a vector unit. A first compare operation may be performed in at least one processing lane of the vector unit to determine a greater or smaller of a first operand and a second operand. The greater (or smaller) operand may be transferred to a dot product unit, wherein, in a second compare operation, the transferred operand is compared to at least a third operand to determine one of the greater and smaller of the more than two operands.Type: ApplicationFiled: January 15, 2008Publication date: July 16, 2009Inventors: Adam J. Muff, Matthew R. Tubbs
-
Publication number: 20090182991Abstract: A processor core includes an instruction decode unit that may dispatch a same integer instruction stream to a plurality of integer execution units operating in lock-step. The processor core also includes signature generation logic that may generate, concurrently with execution of the integer instructions, a respective signature from result signals conveyed on respective result buses in one or more pipeline stages within each of the integer execution units in response to the result signals becoming available. The processor core also includes compare logic that may detect a mismatch between signatures from each of the integer execution units. Further, in response to the compare logic detecting any mismatch, the compare logic may cause instructions causing the mismatch to be re-executed.Type: ApplicationFiled: January 10, 2008Publication date: July 16, 2009Inventor: Nhon Quach
-
Patent number: 7555635Abstract: A data processing device has an instruction decoder (1), a control logic unit (3), and ALU (4). The instruction decoder (1) decodes instruction codes of an arithmetic instruction. The control logic unit (3) detects the effective data width of operation data to be processed according to the decode result from the instruction decoder (1) and determines the number of cycles for the instruction execution corresponding to the effective data width. The ALU (4) executes the instruction with the number of cycles of the instruction execution determined by the control logic unit (3).Type: GrantFiled: July 26, 2007Date of Patent: June 30, 2009Assignee: Renesas Technology Corp.Inventors: Sugako Ohtani, Hiroyuki Kondo
-
Patent number: 7554464Abstract: A method and system allows for fast compression and decompressing of data using existing repetitive interleaved patterns within scientific data (floating point, integer, and image). An advantage of the method and system is that it is so fast that it can be used to save time due to a lower amount of data transferred/stored in scenarios like network transfer, disk or memory storage, cache storage or any other real-time applications where time plays a crucial role.Type: GrantFiled: September 30, 2004Date of Patent: June 30, 2009Assignee: Gear Six, Inc.Inventor: Matthias Oberdorfer
-
Publication number: 20090150654Abstract: A functional unit is added to a graphics processor to provide direct support for double-precision arithmetic, in addition to the single-precision functional units used for rendering. The double-precision functional unit can execute a number of different operations, including fused multiply-add, on double-precision inputs using data paths and/or logic circuits that are at least double-precision width. The double-precision and single-precision functional units can be controlled by a shared instruction issue circuit, and the number of copies of the double-precision functional unit included in a core can be less than the number of copies of the single-precision functional units, thereby reducing the effect of adding support for double-precision on chip area.Type: ApplicationFiled: December 7, 2007Publication date: June 11, 2009Applicant: NVIDIA CORPORATIONInventors: Stuart Oberman, Ming Y. Siu, David C. Tannenbaum
-
Publication number: 20090150651Abstract: Disclosed herein is a semiconductor chip including: a plurality of processing devices that can communicate with each other; wherein each of the processing devices includes an arithmetic unit, an individual memory connected to the arithmetic unit on a one-to-one basis, and a control unit configured to independently control turning on and off of operation of the arithmetic unit and the individual memory.Type: ApplicationFiled: November 17, 2008Publication date: June 11, 2009Applicant: Sony CorporationInventor: Mutsuhiro Ohmori
-
Patent number: 7546443Abstract: The present invention provides extended precision in SIMD arithmetic operations in a processor having a register file and an accumulator. A first set of data elements and a second set of data elements are loaded into first and second vector registers, respectively. Each data element comprises N bits. Next, an arithmetic instruction is fetched from memory. The arithmetic instruction is decoded. Then, the first vector register and the second vector register are read from the register file. The present invention executes the arithmetic instruction on corresponding data elements in the first and second vector registers. The resulting element of the execution is then written into the accumulator. Then, the resulting element is transformed into an N-bit width element and written into a third register for further operation or storage in memory. The transformation of the resulting element can include, for example, rounding, clamping, and/or shifting the element.Type: GrantFiled: January 24, 2006Date of Patent: June 9, 2009Assignee: MIPS Technologies, Inc.Inventors: Timothy J. Van Hook, Peter Hsu, William A. Huffman, Henry P. Moreton, Earl A. Killian
-
Publication number: 20090144527Abstract: The present invention provides a stream processing apparatus capable of improving the processing performance in the case of continuously processing a plurality of data streams. A control stream, different from a data stream, is prepared, and a program and a parameter are updated in advance in accordance with the control stream. Double buffer areas are prepared in a memory of the stream processing apparatus into which the program and the parameter are stored. The location of the data stream to be input is written in the control stream, and buffers for reading the data stream are multiplexed so as to read in advance the top portion of the data stream to be processed next.Type: ApplicationFiled: November 28, 2008Publication date: June 4, 2009Inventors: Hiroaki NAKATA, Takafumi YUASA, Fumitaka IZUHARA, Kazushi AKIE, Motoki KIMURA
-
Patent number: 7533249Abstract: In order to reuse configuration information in a dynamic reconfiguration arithmetic circuit, data lines, address lines, a mask register and the like are required as hardware resources for rewriting only configuration information of dynamic reconfiguration arithmetic cells needed to be changed. However, this results in an increase in area of the arithmetic circuit. According to the present invention, a shift register is the only hardware resource in the dynamic reconfiguration arithmetic block for changing the configuration information. The shift register is structured by connecting in series storage units corresponding one-to-one with each arithmetic cell. An output from the end terminal of the shift register and an output of the configuration information storage unit are input to the configuration information selector, and an output of the configuration information selector is connected to the front of the shift register. The cell address counter counts up from 0 and increments one at a time.Type: GrantFiled: October 23, 2007Date of Patent: May 12, 2009Assignee: Panasonic CorporationInventor: Masaki Maeda