Arithmetic Operation Instruction Processing Patents (Class 712/221)
-
Publication number: 20130166890Abstract: An arrangement of at least two arithmetic logic units carries out an operation defined by a decoded instruction including at least one operand and more than one operation code. The operation codes and at least one operand are received and corresponding executions are performed by the arithmetic logic units on a single clock cycle. The result of the execution from one arithmetic logic unit is used as an operand by a further arithmetic logic unit. The decoding of the instruction is performed in an immediately preceding single clock cycle.Type: ApplicationFiled: February 11, 2013Publication date: June 27, 2013Applicant: STMicroelectronics (Research & Development) LimitedInventor: STMicroelectronics (Research & Development) Limited
-
Patent number: 8473721Abstract: Disclosed herein is a processing unit configured to process video data, and applications thereof. In an embodiment, the processing unit includes a buffer and an execution unit. The buffer is configured to store a data word, wherein the data word comprises a plurality of bytes of video data. The execution unit is configured to execute a single instruction to (i) shift bytes of video data contained in the data word to align a desired byte of video data and (ii) process the desired byte of the video data to provide processed video data.Type: GrantFiled: April 16, 2010Date of Patent: June 25, 2013Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Michael J. Mantor, Jeffrey T. Brady, Christopher L. Spencer, Daniel W. Wong, Andrew E. Gruber
-
Patent number: 8473719Abstract: New instruction definitions for a packet add (PADD) operation and for a single instruction multiple add (SMAD) operation are disclosed. In addition, a new dedicated PADD logic device that performs the PADD operation in about one to two processor clock cycles is disclosed. Also, a new dedicated SMAD logic device that performs a single instruction multiple data add (SMAD) operation in about one to two clock cycles is disclosed.Type: GrantFiled: October 31, 2006Date of Patent: June 25, 2013Assignee: Intel CorporationInventors: Corey Gee, Bapiraju Vinnakota, Saleem Mohammadali, Carl A. Alberola
-
Patent number: 8468326Abstract: A hardware module configured to perform single instructions faster than is possible in software running on the microprocessor. In one implementation, the hardware module is configured to perform a single count instruction, including - counting a number of “ones” contained in a first register; and storing, in a second register, the count of the number of “ones” contained in the first register.Type: GrantFiled: August 3, 2009Date of Patent: June 18, 2013Assignee: Marvell International Ltd.Inventors: Jack Kang, Jianwei Bei, Shanker Rao Donthineni, Manish Kumar, Victor Lin, Justin Lau
-
Publication number: 20130151820Abstract: A method and apparatus are described for processing data during an execution pipeline cycle of a processor. Valid bits of the data are generated according to a designated data size. Each of the valid bits is inserted into at least one of a plurality of bit positions. The valid bits are rotated in a predetermined direction (i.e., left or right rotation) by a designated number of bit positions. Valid bits are removed from a portion of the plurality of bit positions after being rotated. Zeros or most significant bits (MSBs) of the data may be inserted in the bit positions from which the valid bits were removed. The number of bit positions to rotate the valid bits by may be designated by a first bit subset and a second bit subset. The first bit subset may indicate a number of bytes, and the second bit subset may indicate a number of bits.Type: ApplicationFiled: December 9, 2011Publication date: June 13, 2013Applicant: Advanced Micro Devices, Inc.Inventors: Srikanth Arekapudi, Saurabh Gupta
-
Patent number: 8447988Abstract: In certain embodiments, a digital signal processor (DSP) has multiple arithmetic logic units and a register module. The DSP is adapted to generate a message digest H from a message M in accordance with the SHA-1 standard, where M includes N blocks M(i), i=1, . . . , N, and the processing of each block M(i) includes t iterations of processing words of message schedule {Wt}. In each iteration possible, the DSP uses free operations to precalculate Wt and working variable values for use in the next iteration. In addition, in each iteration possible, the DSP rotates the registers associated with particular working variables to reduce operations that merely copy unchanged values from one register to another.Type: GrantFiled: September 16, 2009Date of Patent: May 21, 2013Assignee: LSI CorporationInventors: Dmitriy Vladimirovich Alekseev, Alexei Vladimirovich Galatenko, Ilya Viktorovich Lyalin, Alexander Markovic, Denis Vassilevich Parfenov
-
Patent number: 8429384Abstract: An architecture for a digital signal processor alleviates the difficulties and complexities normally associated with writing and optimizing programs to avoid stalls during which one instruction awaits the result of a prior instruction. The architecture coordinates the processing of data for multiple instructions through a multiple stage data pipeline. As a result, the architecture not only supports simultaneous execution of multiple programs, but also permits each program to execute without delays caused by inter-relationships between instructions within the program.Type: GrantFiled: November 15, 2006Date of Patent: April 23, 2013Assignee: Harman International Industries, IncorporatedInventors: James D. Pennock, Ronald Baker, Brian R. Parker, Christopher Belcher
-
Patent number: 8423748Abstract: A register control circuit that controls a register specified by an inputted address includes a signal output that outputs a first control signal and a second control signal based on the inputted address, a selector that selects data of a register specified by the first control signal outputted from the signal output, a logical operator that performs a logical operation of write data outputted from a processor and the data selected by the selector to output an operation result, and a storage that stores data in the register specified by the first control signal by selecting one of the write data and the operation results as the data based on the second control signal outputted from the signal output.Type: GrantFiled: July 27, 2009Date of Patent: April 16, 2013Assignee: Fujitsu LimitedInventor: Yuusuke Ashizuka
-
Publication number: 20130086366Abstract: An apparatus includes a register file including a logical circuit. The register file is configured to perform one or more logical operations in conjunction with the logical circuit. The logical operation is performed in response to the register file receiving a register file control instruction. The register file control instruction is independent from an arithmetic logic unit (ALU) control instruction and a multiply-and-accumulate unit (MACU) control instruction.Type: ApplicationFiled: September 30, 2011Publication date: April 4, 2013Applicant: QUALCOMM INCORPORATEDInventor: Aaron D. Lamb
-
Publication number: 20130080740Abstract: In one embodiment, a microprocessor includes fetch logic for retrieving an instruction, decode logic configured to identify an arithmetic operation specified in the instruction, and execution logic configured to receive operands specified by the instruction. The execution logic includes a primary logic path configured to perform the arithmetic operation on such operands and a secondary parallel logic path configured to output metadata associated with the result of the arithmetic operation.Type: ApplicationFiled: September 28, 2011Publication date: March 28, 2013Applicant: NVIDIA CORPORATIONInventors: Peter Gentle, Scott Pitkethly
-
Patent number: 8407273Abstract: A processor or other device, such as a programmable and/or massively parallel processor or other device, includes processing elements designed to perform arithmetic operations (possibly but not necessarily including, for example, one or more of addition, multiplication, subtraction, and division) on numerical values of low precision but high dynamic range (“LPHDR arithmetic”). Such a processor or other device may, for example, be implemented on a single chip. Whether or not implemented on a single chip, the number of LPHDR arithmetic elements in the processor or other device in certain embodiments of the present invention significantly exceeds (e.g., by at least 20 more than three times) the number of arithmetic elements, if any, in the processor or other device which are designed to perform high dynamic range arithmetic of traditional precision (such as 32 bit or 64 bit floating point arithmetic).Type: GrantFiled: February 17, 2012Date of Patent: March 26, 2013Assignee: Singular Computing LLCInventor: Joseph Bates
-
Patent number: 8402254Abstract: The RISC data processor is based on the idea that in case that there are many flag-generating instructions, the number of flags generated by each instruction is increased so that a decrease of flag-generating instructions exceeds an increase of flag-using instructions in quantity, thereby achieving the decrease in instructions. With the data processor, an instruction for generating flags according to operands' data sizes is defined. To an instruction set handled by the RISC data processor, an instruction capable of executing an operation on operand in more than one data size, which performs a process identical to an operation process conducted on the small-size operand on low-order bits of the large-size operand, and generates flags capable of coping with the respective data sizes regardless of the data size of each operand subjected to the operation is added. Thus, the reduction in instruction code space of the RISC data processor tight in instruction code space can be achieved.Type: GrantFiled: February 11, 2009Date of Patent: March 19, 2013Assignee: Renesas Electronics CorporationInventor: Fumio Arakawa
-
Patent number: 8375197Abstract: Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: performing, for each node, a local reduction operation using allreduce contribution data for the cores of that node, yielding, for each node, a local reduction result for one or more representative cores for that node; establishing one or more logical rings among the nodes, each logical ring including only one of the representative cores from each node; performing, for each logical ring, a global allreduce operation using the local reduction result for the representative cores included in that logical ring, yielding a global allreduce result for each representative core included in that logical ring; and performing, for each node, a local broadcast operation using the global allreduce results for each representative core on that node.Type: GrantFiled: May 21, 2008Date of Patent: February 12, 2013Assignee: International Business Machines CorporationInventor: Ahmad Faraj
-
Publication number: 20130024668Abstract: An architecture includes a controller. The controller is configured to receive a microprogram. The microprogram is configured for performing at least one of hierarchical or a sequence of polynomial computations. The architecture also includes an arithmetic logic unit (ALU) communicably coupled to the controller. The ALU is controlled by the controller. Additionally, the microprogram is compiled prior to execution by the controller, the microprogram is compiled into a plurality of binary tables, and the microprogram is programmed in a command language in which each command includes a first portion for indicating at least one of a command or data transferred to the ALU, and a second portion for including a control command to the controller. The architecture and implementation of the programmable controller may be for cryptographic applications, including those related to public key cryptography.Type: ApplicationFiled: September 27, 2012Publication date: January 24, 2013Applicant: LSI CORPORATIONInventor: LSI Corporation
-
Publication number: 20130024667Abstract: An attribute group storage unit acquires and holds attribute groups set to respective data blocks. A scenario determination unit determines respective transfer systems of the respective blocks between a memory of the lowest hierarchy and a memory of another hierarchy based on those attribute groups and a configuration of an arithmetic unit which is the parallel processor, and controls the transfer of the respective data blocks according to the determined transfer systems, and the parallel arithmetic operation corresponding to the transfer. Each of the attribute groups is necessary to determine the transfer systems, and includes one or more attributes not depending on the configuration of the parallel processor. The attribute groups of the write blocks are set assuming that each of the write blocks has already been located in the memory of another hierarchy, and is transferred to the memory of the lowest hierarchy.Type: ApplicationFiled: June 21, 2012Publication date: January 24, 2013Applicant: Renesas Electronics CorporationInventor: Shorin KYO
-
Publication number: 20130013895Abstract: An exemplary byte-oriented microcontroller includes a program memory, a program memory bus, and a core circuit. The program memory bus has a bus width wider than one instruction byte, and the core circuit is coupled to the program memory through the program memory bus for executing at least one instruction by processing a plurality of instruction bytes fetched from the program memory. The core circuit includes a fetch unit, for fetching the instruction bytes through the program memory bus and re-ordering the fetched instruction bytes to form a complete instruction.Type: ApplicationFiled: July 6, 2011Publication date: January 10, 2013Applicant: FS-SEMI CO., LTD.Inventor: Hsiao-Ming Huang
-
Methods and Apparatus for Efficient Complex Long Multiplication and Covariance Matrix Implementation
Publication number: 20130007421Abstract: Efficient computation of complex long multiplication results and an efficient calculation of a covariance matrix are described. A parallel array VLIW digital signal processor is employed along with specialized complex long multiplication instructions and communication operations between the processing elements which are overlapped with computation to provide very high performance operation. Successive iterations of a loop of tightly packed VLIWs may be used allowing the complex multiplication pipeline hardware to be efficiently used.Type: ApplicationFiled: September 13, 2012Publication date: January 3, 2013Applicant: ALTERA CORPORATIONInventors: Gerald G. Pechanek, Ricardo Rodriguez, Matthew Plonski, David Strube, Kevin Coopman -
Publication number: 20120311305Abstract: Provided is an information processing device including an instruction cache, a data cache, first and second arithmetic unit groups including a plurality of arithmetic units capable of parallel operation, a first arithmetic-control circuit that generates one or more operation instructions for the first arithmetic unit group, and a second arithmetic-control circuit that generates one or more operation instructions for the second arithmetic unit group based on an instruction code of a fixed instruction register. The first arithmetic unit group sets the instruction code to the fixed instruction register according to an operation instruction generated based on a first specific instruction code by the first arithmetic-control circuit, and provides data to the second arithmetic unit group according to an operation instruction generated based on a second specific instruction code by the first arithmetic-control circuit.Type: ApplicationFiled: May 29, 2012Publication date: December 6, 2012Inventors: Yuki KOBAYASHI, Shohei NOMOTO
-
Patent number: 8311683Abstract: Illustrative embodiments provide a computer implemented method, a data processing system, and a computer program product for adjusting cooling settings. The computer implemented method comprises analyzing a set of instructions of an application to determine a number of degrees by which a set of instructions will raise a temperature of at least one processor core. The computer implemented method further calculates a cooling setting for at least one cooling system for the at least one processor core. The computer implemented method adjusts the at least one cooling system based on the cooling setting. The step of analyzing the set of instructions is performed before the set of instructions is executed on the at least one processor core. The step of adjusting the at least one cooling system is performed before the set of instructions is executed on the at least one processor core.Type: GrantFiled: April 29, 2009Date of Patent: November 13, 2012Assignee: International Business Machines CorporationInventors: Robert Lee Angell, David Wayne Cosby, Robert R. Friedlander, James R. Kraemer
-
Patent number: 8312442Abstract: A computing system has an amount of shared cache, and performs runtime automatic parallelization wherein when a parallelized loop is encountered, a main thread shares the workload with at least one other non-main thread. A method for providing interprocedural prefetching includes compiling source code to produce compiled code having a main thread including a parallelized loop. Prior to the parallelized loop in the main thread, the main thread includes prefetching instructions for the at least one other non-main thread that shares the workload of the parallelized loop. As a result, the main thread prefetches data into the shared cache for use by the at least one other non-main thread.Type: GrantFiled: December 10, 2008Date of Patent: November 13, 2012Assignee: Oracle America, Inc.Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
-
Patent number: 8307196Abstract: A method for operating a data processing system is provided. The method includes providing a first operand stored in a first register, providing a second operand stored in the register, providing a third operand stored in the register. The method further includes executing a first instruction, where executing the first instruction comprises: (1) retrieving the first operand, the second operand, and the third operand from the first register; (2) performing an operation using the first operand, the second operand, and the third operand to generate a bit exact result.Type: GrantFiled: April 5, 2006Date of Patent: November 6, 2012Assignee: Freescale Semiconductor, Inc.Inventors: William C. Moyer, Imran Ahmed, Dan Tamir
-
Patent number: 8301869Abstract: A programmable controller which executes a plurality of independent sequence programs in parallel is provided with an ASIC, including a plurality of arithmetic-logic units and a plurality of arbitration circuits, and MPUs as many as the arbitration circuits. The entire execution time of the programmable controller is shortened by changing combinations (groups of arithmetic-logic units) of the MPUs (and the arbitration circuits as many as the MPUs) and the arithmetic-logic units, based on the ratios of MPU execution instructions and ASIC execution instructions included in those instructions which constitute the programs to be executed in parallel.Type: GrantFiled: February 18, 2011Date of Patent: October 30, 2012Assignee: Fanuc CorporationInventors: Masuo Kokura, Yasushi Nomoto, Motoyoshi Miyachi
-
Patent number: 8296350Abstract: The present invention provides a method and apparatus for QR-factorizing matrix on a multiprocessor system, wherein the multiprocessor system comprises at least one core processor and a plurality of accelerators, the method comprises the steps of: iteratively factorizing each panel in the matrix until the whole matrix is factorized; wherein in each iteration, the method comprises: partitioning an unprocessed matrix part in the matrix into a plurality of blocks according to a predetermined block size; partitioning a current processed panel in the unprocessed matrix part into at least two sub panels, wherein the current processed panel is composed of a plurality of blocks; and performing QR factorization one by one on the at least two sub panels with the plurality of accelerators, and updating the data of the sub panel(s) on which no QR factorization has been performed among the at least two sub panels by using the factorization result.Type: GrantFiled: March 12, 2009Date of Patent: October 23, 2012Assignee: International Business Machines CorporationInventors: Hui Li, Bai Ling Wang
-
Publication number: 20120260073Abstract: A microprocessor includes processor modes comprising a user mode and a plurality of exception modes. An execution unit performs arithmetic operations on operands specified by program instructions. A first set of storage elements holds a first subset of the operands and provides them to the execution unit coupled thereto. A second set of storage elements associated with each of the modes hold a second subset of the operands and are incapable of directly providing the second operand subset to the execution unit. To enter a new mode from a current mode, logic saves the first operand subset held in the first set of storage elements to the second set of storage elements associated with the current mode and restores to the first set of storage elements the second operand subset held in the second set of storage elements associated with the new mode.Type: ApplicationFiled: March 6, 2012Publication date: October 11, 2012Applicant: VIA TECHNOLOGIES, INC.Inventors: G. Glenn Henry, Terry Parks, Rodney E. Hooker
-
Publication number: 20120254585Abstract: Methods and apparatus for double precision division/inversion vector computations on Single Instruction Multiple Data (SIMD) computing platforms are described. In one embodiment, an input argument is represented by an exponent portion and a fraction portion. These portions are scaled, inverted, and multiplied to generate an inverse version of the input argument. In an embodiment, the inversion of the exponent portion may be done by changing the sign of the exponent. Other embodiments are also described.Type: ApplicationFiled: December 25, 2009Publication date: October 4, 2012Inventors: Andrey Kolesov, Valery Kuriakin, Maria Guseva
-
Patent number: 8276116Abstract: An algebra operation method includes the steps of converting algebra operations for a plurality of objects which appear in a program into an algebra operation sequence object described using object access data used to access the plurality of objects and object state data used to store states associated with the plurality of objects without immediately evaluating the algebra operations, determining a function to be applied to the algebra operation sequence object, and evaluating the algebra operations by executing the function by designating an argument group required for the function in response to a call of a substitute operator.Type: GrantFiled: June 7, 2007Date of Patent: September 25, 2012Assignee: Canon Kabushiki KaishaInventor: Yasuhiro Nakahara
-
Patent number: 8271767Abstract: An HW arithmetic unit executes a predetermined arithmetic operation. An arithmetic-mode determining unit determines, based on an attribute or a content of data relating to processing that has requested the arithmetic operation, either a synchronous mode that executes the processing after waiting for completion of the arithmetic operation by an arithmetic circuit or an asynchronous mode that executes the processing without waiting for completion of the arithmetic operation by the arithmetic circuit, as an execution mode of the arithmetic operation. An arithmetic-process control unit controls the arithmetic operation by the arithmetic circuit according to the determined execution mode.Type: GrantFiled: June 27, 2008Date of Patent: September 18, 2012Assignee: Kabushiki Kaisha ToshibaInventors: Keisuke Mera, Takeshi Ishihara, Yasuhiro Fukuju
-
Patent number: 8250337Abstract: General purpose array processing techniques including processing methods and apparatus. Processors may include parallel processing paths designed with reusable computational components such as multipliers, multiplexers, and ALUs. Flow of data through the paths and operations performed may be controlled based on opcodes. Processors may be shared, scalable, and configured to perform matrix operations. In particular, such operation may be useful for physical sections of MIMO-OFDM communication systems.Type: GrantFiled: April 27, 2007Date of Patent: August 21, 2012Assignee: Qualcomm IncorporatedInventor: Garret Webster Shih
-
Publication number: 20120210101Abstract: A competition testing apparatus for testing an access competition of an arithmetic unit includes a memory that stores a program, a first processor that executes the program by accessing the memory, a second processor that executes the program by accessing the memory, and an arbitration unit that arbitrates accessing the first processor and the second processor and reports a result of the arbitration upon the first processor and the second processor accessing the same address space in the memory, wherein the memory stores a odd number of programs, further comprises a controller that controls the first processor to process the plurality of test programs stored in the storage in predetermined order, and controls the second processor to process the plurality of test programs stored in the storage in order reverse to the predetermined order, and a recording unit that records the result of arbitration performed using the arbitrator.Type: ApplicationFiled: August 12, 2011Publication date: August 16, 2012Applicant: FUJITSU LIMITEDInventor: Yasushi ASANO
-
Publication number: 20120204012Abstract: A method includes providing a data processor having an instruction pipeline, where the instruction pipeline has a plurality of instruction pipeline stages, and where the plurality of instruction pipeline stages includes a first instruction pipeline stage and a second instruction pipeline stage. The method further includes providing a data processor instruction that causes the data processor to perform a first set of computational operations during execution of the data processor instruction, performing the first set of computational operations in the first instruction pipeline stage if the data processor instruction is being executed and a first mode has been selected, and performing the first set of computational operations in the second instruction pipeline stage if the data processor instruction is being executed and a second mode has been selected.Type: ApplicationFiled: April 13, 2012Publication date: August 9, 2012Applicant: Rambus Inc.Inventors: William C. Moyer, Jeffrey W. Scott
-
Publication number: 20120198211Abstract: There is a need for providing a battery-less integrated circuit (IC) card capable of operating in accordance with a contact usage or a non-contact usage, preventing coprocessor throughput from degrading despite a decreased clock frequency for reduced power consumption under non-contact usage, and ensuring high-speed processing under non-contact usage. A dual interface card is a battery-less IC card capable of operating in accordance with a contact usage or a non-contact usage. The dual interface card operates at a high clock under contact usage and at a low clock under non-contact usage. A targeted operation comprises a plurality of different basic operations. The dual interface card comprises a basic arithmetic circuit group. Under the contact usage, the basic arithmetic circuit group performs one basic operation of the targeted operation at one cycle. Under the non-contact usage, the basic arithmetic circuit group sequentially performs at least two basic operations of the targeted operation at one cycle.Type: ApplicationFiled: April 10, 2012Publication date: August 2, 2012Applicant: Renesas Electronics CorporationInventors: Daisuke SUZUKI, Minoru Saeki, Yuichiro Nariyoshi
-
Publication number: 20120198212Abstract: A microprocessor, a method for enhanced precision sum-of-products calculation and a video decoding device are provided, in which at least one general-purpose-register is arranged to provide a number of destination bits to a multiply unit, and a control unit is adapted to provide at least a multiply-high instruction and a multiply-high-and-accumulate instruction to the multiply unit. The multiply unit is arranged to receive at least first and second source operands having an associated number of source bits, a sum of source bits exceeding the number of destination bits, connected to a register-extension cache comprising at least one cache entry arranged to store a number of precision-enhancement bits, and adapted to store a destination portion of a result operand in the general-purpose-register and a precision enhancement portion in the cache entry. The result operand is generated by a multiply-high operation or by a multiply-high-and-accumulate operation, depending on the received instructions.Type: ApplicationFiled: November 30, 2009Publication date: August 2, 2012Inventor: Martin Raubuch
-
Patent number: 8219783Abstract: An SIMD type microprocessor is disclosed. The SIMD type microprocessor includes plural PEs (processor elements) each of which provides an ALU (arithmetic and logic unit) for lower-order bits, an ALU for upper-order bits, a control circuit for lower-order bits, a control circuit for upper-order bits, a range determining circuit for lower-order bits, and a range determining circuit for upper-order bits. The SIMD type microprocessor further includes a global processor, a range designation bus for lower-order bits which connects the global processor to the range determining circuit for lower-order bits, and a range designation bus for upper-order bits which connects the global processor to the range determining circuit for upper-order bits.Type: GrantFiled: July 3, 2008Date of Patent: July 10, 2012Assignee: Ricoh Company, Ltd.Inventor: Kazuhiko Hara
-
Patent number: 8214626Abstract: Method, apparatus, and program means for shuffling data. The method of one embodiment comprises receiving a first operand having a set of L data elements and a second operand having a set of L control elements. For each control element, data from a first operand data element designated by the individual control element is shuffled to an associated resultant data element position if its flush to zero field is not set and a zero is placed into the associated resultant data element position if its flush to zero field is not set.Type: GrantFiled: March 31, 2009Date of Patent: July 3, 2012Assignee: Intel CorporationInventors: William W. Macy, Jr., Eric L. Debes, Patrice L. Roussel, Huy V. Nguyen
-
Publication number: 20120166773Abstract: In certain embodiments, a digital signal processor (DSP) has multiple arithmetic logic units and a register module. The DSP is adapted to generate a message digest H from a message M in accordance with the SHA-1 standard, where M includes N blocks M(i), i=1, . . . , N, and the processing of each block M(i) includes t iterations of processing words of message schedule {Wt}. In each iteration possible, the DSP uses free operations to precalculate Wt and working variable values for use in the next iteration. In addition, in each iteration possible, the DSP rotates the registers associated with particular working variables to reduce operations that merely copy unchanged values from one register to another.Type: ApplicationFiled: September 16, 2009Publication date: June 28, 2012Applicant: LSI CORPORATIONInventors: Dmitriy Vladimirovich Alekseev, Alexei Vladimirovich Galatenko, Ilya Viktorovich Lyalin, Alexander Markovic, Denis Vassilevich Parfenov
-
Patent number: 8200948Abstract: An apparatus and method are provided for performing re-arrangement operations on data. The data processing apparatus has a register data store with a plurality of registers for storing data, and processing logic for performing a sequence of operations on data including at least one re-arrangement operation. The processing logic has scalar processing logic for performing scalar operations and SIMD processing logic for performing SIMD operations. The SIMD processing logic is responsive to a re-arrangement instruction specifying a family of re-arrangement operations to perform a selected re-arrangement operation from that family on a plurality of data elements constituted by data in one or more registers identified by the re-arrangement instruction. The selected re-arrangement operation is dependent on at least one parameter provided by the scalar processing logic, that parameter identifying a data element width for the data elements on which the selected re-arrangement operation is performed.Type: GrantFiled: December 4, 2007Date of Patent: June 12, 2012Assignee: ARM LimitedInventors: Daniel Kershaw, Dominic Hugo Symes, Alastair Reid
-
Patent number: 8190669Abstract: Multipurpose arithmetic functional units can perform planar attribute interpolation and unary function approximation operations. In one embodiment, planar interpolation operations for coordinates (x, y) are executed by computing A*x+B*y+C, and unary function approximation operations for operand x are executed by computing F2(xb)*xh2+F1(xb)*xh+F0(xb), where xh=x?xb. Shared multiplier and adder circuits are advantageously used to implement the product and sum operations for both classes of operations.Type: GrantFiled: October 20, 2004Date of Patent: May 29, 2012Assignee: NVIDIA CorporationInventors: Stuart F. Oberman, Ming Y. Siu
-
Patent number: 8185723Abstract: A method is presented including decomposing a first value into many parts. Decomposing includes shifting (310) a rounded integer portion of the first value to generate a second value. Generating (320) a third value. Extracting (330) a plurality of significand bits from the second value to generate a fourth value. Extracting (340) a portion of bits from the fourth value to generate an integer component. Generating (350) a fifth value. Also the third value, the fifth value, and the integer component are either stored (360, 380) in a memory or transmitted to an arithmetic logical unit (ALU).Type: GrantFiled: July 13, 2001Date of Patent: May 22, 2012Assignee: Intel CorporationInventors: Robert S. Norin, Olga Dryzhakova, Alexander Isaev, Andrey Naraikin
-
Patent number: 8185721Abstract: A system including a dual function adder is described. In one embodiment, the system includes an adder. The adder is configured for a first instruction to determine an address for a hardware prefetch if the first instruction is a hardware prefetch instruction. The adder is further configured for the first instruction to determine a value from an arithmetic operation if the first instruction is an arithmetic operation instruction.Type: GrantFiled: March 4, 2008Date of Patent: May 22, 2012Assignee: QUALCOMM IncorporatedInventors: Ajay Anant Ingle, Erich James Plondke, Lucian Codrescu
-
Publication number: 20120117441Abstract: A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.Type: ApplicationFiled: January 19, 2012Publication date: May 10, 2012Applicant: MicroUnity Systems Engineering, Inc.Inventors: Craig Hansen, John Moussouris, Alexia Massalin
-
Patent number: 8161268Abstract: Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.Type: GrantFiled: May 21, 2008Date of Patent: April 17, 2012Assignee: International Business Machines CorporationInventor: Ahmad Faraj
-
Publication number: 20120083912Abstract: An arithmetic logic and shifting device is disclosed and includes an arithmetic logic unit that has a first input to receive a first operand from a first register port, a second input to receive a second operand from a second register port, and an output to selectively provide a memory address to a memory unit in a first mode of operation and to selectively provide an arithmetic output in a second mode of operation. Further, the arithmetic logic and shifting device includes a programmable shifter device that has a first input to receive data from the memory unit, a second input to receive the arithmetic output, a third input to receive an operation code of a computer execution instruction, and a shifted output to provide shifted data.Type: ApplicationFiled: December 8, 2011Publication date: April 5, 2012Applicant: QUALCOMM INCORPORATEDInventors: Muhammad Ahmed, Ajay Anant Ingle, Sujat Jamil
-
Patent number: 8150902Abstract: A processor or other device, such as a programmable and/or massively parallel processor or other device, includes processing elements designed to perform arithmetic operations (possibly but not necessarily including, for example, one or more of addition, multiplication, subtraction, and division) on numerical values of low precision but high dynamic range (“LPHDR arithmetic”). Such a processor or other device may, for example, be implemented on a single chip. Whether or not implemented on a single chip, the number of LPHDR arithmetic elements in the processor or other device in certain embodiments of the present invention significantly exceeds (e.g., by at least 20 more than three times) the number of arithmetic elements, if any, in the processor or other device which are designed to perform high dynamic range arithmetic of traditional precision (such as 32 bit or 64 bit floating point arithmetic).Type: GrantFiled: June 15, 2010Date of Patent: April 3, 2012Assignee: Singular Computing LLCInventor: Joseph Bates
-
Publication number: 20120079250Abstract: A semiconductor chip is described having a functional unit that can execute a first instruction and execute a second instruction. The first instruction is an instruction that multiplies two operands. The second instruction is an instruction that approximates a function according to C0+C1X2+C2X22. The functional unit has a multiplier circuit. The multiplier circuit has: i) a first input to receive bits of a first operand of the first instruction and receive bits of a C1 term of the second instruction; ii) a second input to receive bits of a second operand of the first instruction and receive bits of a X2 term of the second instruction.Type: ApplicationFiled: September 24, 2010Publication date: March 29, 2012Inventors: Alex Pineiro, Thomas E. Fletcher, Brian J. Hickmann
-
Publication number: 20120079243Abstract: A graphics processing unit core 26 includes a plurality of processing pipelines 38, 40, 42, 44. A program instruction of a thread of program instructions being executed by a processing pipeline includes a next-instruction-type field 36 indicating an instruction type of a next program instruction following the current program instruction within the processing thread concerned. This next-instruction-type field is used to control selection of to which processing pipeline the next instruction is issued before that next instruction has been fetched and decoded. The next-instruction-type field may be passed along the processing pipeline as the least significant four bits within a program counter value associated with a current program instruction 32. The next-instruction-type field may also be used to control the forwarding of thread state variables between processing pipelines when a thread migrates between processing pipelines prior to the next program instruction being fetched or decoded.Type: ApplicationFiled: September 1, 2011Publication date: March 29, 2012Applicant: ARM LimitedInventor: Jorn Nystad
-
Publication number: 20120079251Abstract: A method is described that involves executing a first instruction with a functional unit. The first instruction is a multiply-add instruction. The method further includes executing a second instruction with the functional unit. The second instruction is a round instruction.Type: ApplicationFiled: September 24, 2010Publication date: March 29, 2012Inventors: Amit Gradstein, Cristina S. Anderson, Zeev Sperber, Simon Rubanovich, Benny Eitan
-
Publication number: 20120072703Abstract: In one embodiment, a processor includes a multiply-accumulate (MAC) unit having a first path to handle execution of an instruction if a difference between at least a portion of first and second operands and a third operand is less than a threshold value, and a second path to handle the instruction execution if the difference is greater than the threshold value. Based on the difference, at least part of the third operand is to be provided to a multiplier of the MAC unit or to a compressor of the second path. Other embodiments are described and claimed.Type: ApplicationFiled: September 20, 2010Publication date: March 22, 2012Inventors: SURESH SRINIVASAN, RAJARAM RAMANARAYANAN, SANU K. MATHEW, RAM K. KRISHNAMURTHY, VASANTHA K. ERRAGUNTLA
-
Publication number: 20120060019Abstract: A reduction operation device detects a non-correspondence of an operation type or a data type in a reduction arithmetic operation of a parallel processing. The reduction operation device is inputted a plurality of the synchronization signals and data, sets each transmission destinations of the plurality of inputted synchronization signals and the plurality of data corresponding to a next stage of a reduction operation and executes the reduction operation. The synchronization unit in the reduction operation device detects the non-correspondence between the operation type or the data type included in an instruction of the reduction operation after the synchronization is established and controls the arithmetic operation of the arithmetic unit.Type: ApplicationFiled: June 22, 2011Publication date: March 8, 2012Applicant: Fujitsu LimitedInventors: Shinya Hiramoto, Yuichiro Ajima, Tomohiro Inoue
-
Patent number: 8131504Abstract: A system includes a serial connection mode for obtaining a first approximation to a zero error result by means of a negative rough precision for manufacturing a plurality of first semi-finished products, and a measurement apparatus for measuring a precision value of each first semi-finished product, and a Full-9 Principle for sifting the first semi-finished products. A parallel connection mode is used for obtaining a second approximation to the zero error result by means of a positive rough precision by division to manufacture a plurality of second semi-finished products, and the measurement apparatus is used to measure a precision value of each second semi-finished product, and an error sift formula is utilized to sift the second semi-finished products.Type: GrantFiled: September 14, 2007Date of Patent: March 6, 2012Inventor: Martin Jo
-
Patent number: 8127276Abstract: Apparatus, method, and computer readable medium for generating and utilizing a feature code to monitor a program are provided. The program is run in a secure environment at the beginning. The program calls a function through an application program interface. A return address of the application program interface is used to generate the feature code. When the application runs again at another time, the feature code is utilized to monitor the program. According to the aforementioned arrangement and steps, the application program interface can be monitored dynamically. Consequently, any program can be monitored by this approach, which results in a more secure environment. Further, fewer application program interfaces are required to be monitored, so the required computer resource is less.Type: GrantFiled: April 3, 2007Date of Patent: February 28, 2012Assignee: Institute for Information IndustryInventors: Cheng-Kai Chen, Hung Min Sun, Kang-Chiao Lin, Shih-Ying Chang, Shuai-Min Chen