Arithmetic Operation Instruction Processing Patents (Class 712/221)
  • Publication number: 20130166890
    Abstract: An arrangement of at least two arithmetic logic units carries out an operation defined by a decoded instruction including at least one operand and more than one operation code. The operation codes and at least one operand are received and corresponding executions are performed by the arithmetic logic units on a single clock cycle. The result of the execution from one arithmetic logic unit is used as an operand by a further arithmetic logic unit. The decoding of the instruction is performed in an immediately preceding single clock cycle.
    Type: Application
    Filed: February 11, 2013
    Publication date: June 27, 2013
    Applicant: STMicroelectronics (Research & Development) Limited
    Inventor: STMicroelectronics (Research & Development) Limited
  • Patent number: 8473721
    Abstract: Disclosed herein is a processing unit configured to process video data, and applications thereof. In an embodiment, the processing unit includes a buffer and an execution unit. The buffer is configured to store a data word, wherein the data word comprises a plurality of bytes of video data. The execution unit is configured to execute a single instruction to (i) shift bytes of video data contained in the data word to align a desired byte of video data and (ii) process the desired byte of the video data to provide processed video data.
    Type: Grant
    Filed: April 16, 2010
    Date of Patent: June 25, 2013
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Michael J. Mantor, Jeffrey T. Brady, Christopher L. Spencer, Daniel W. Wong, Andrew E. Gruber
  • Patent number: 8473719
    Abstract: New instruction definitions for a packet add (PADD) operation and for a single instruction multiple add (SMAD) operation are disclosed. In addition, a new dedicated PADD logic device that performs the PADD operation in about one to two processor clock cycles is disclosed. Also, a new dedicated SMAD logic device that performs a single instruction multiple data add (SMAD) operation in about one to two clock cycles is disclosed.
    Type: Grant
    Filed: October 31, 2006
    Date of Patent: June 25, 2013
    Assignee: Intel Corporation
    Inventors: Corey Gee, Bapiraju Vinnakota, Saleem Mohammadali, Carl A. Alberola
  • Patent number: 8468326
    Abstract: A hardware module configured to perform single instructions faster than is possible in software running on the microprocessor. In one implementation, the hardware module is configured to perform a single count instruction, including - counting a number of “ones” contained in a first register; and storing, in a second register, the count of the number of “ones” contained in the first register.
    Type: Grant
    Filed: August 3, 2009
    Date of Patent: June 18, 2013
    Assignee: Marvell International Ltd.
    Inventors: Jack Kang, Jianwei Bei, Shanker Rao Donthineni, Manish Kumar, Victor Lin, Justin Lau
  • Publication number: 20130151820
    Abstract: A method and apparatus are described for processing data during an execution pipeline cycle of a processor. Valid bits of the data are generated according to a designated data size. Each of the valid bits is inserted into at least one of a plurality of bit positions. The valid bits are rotated in a predetermined direction (i.e., left or right rotation) by a designated number of bit positions. Valid bits are removed from a portion of the plurality of bit positions after being rotated. Zeros or most significant bits (MSBs) of the data may be inserted in the bit positions from which the valid bits were removed. The number of bit positions to rotate the valid bits by may be designated by a first bit subset and a second bit subset. The first bit subset may indicate a number of bytes, and the second bit subset may indicate a number of bits.
    Type: Application
    Filed: December 9, 2011
    Publication date: June 13, 2013
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Srikanth Arekapudi, Saurabh Gupta
  • Patent number: 8447988
    Abstract: In certain embodiments, a digital signal processor (DSP) has multiple arithmetic logic units and a register module. The DSP is adapted to generate a message digest H from a message M in accordance with the SHA-1 standard, where M includes N blocks M(i), i=1, . . . , N, and the processing of each block M(i) includes t iterations of processing words of message schedule {Wt}. In each iteration possible, the DSP uses free operations to precalculate Wt and working variable values for use in the next iteration. In addition, in each iteration possible, the DSP rotates the registers associated with particular working variables to reduce operations that merely copy unchanged values from one register to another.
    Type: Grant
    Filed: September 16, 2009
    Date of Patent: May 21, 2013
    Assignee: LSI Corporation
    Inventors: Dmitriy Vladimirovich Alekseev, Alexei Vladimirovich Galatenko, Ilya Viktorovich Lyalin, Alexander Markovic, Denis Vassilevich Parfenov
  • Patent number: 8429384
    Abstract: An architecture for a digital signal processor alleviates the difficulties and complexities normally associated with writing and optimizing programs to avoid stalls during which one instruction awaits the result of a prior instruction. The architecture coordinates the processing of data for multiple instructions through a multiple stage data pipeline. As a result, the architecture not only supports simultaneous execution of multiple programs, but also permits each program to execute without delays caused by inter-relationships between instructions within the program.
    Type: Grant
    Filed: November 15, 2006
    Date of Patent: April 23, 2013
    Assignee: Harman International Industries, Incorporated
    Inventors: James D. Pennock, Ronald Baker, Brian R. Parker, Christopher Belcher
  • Patent number: 8423748
    Abstract: A register control circuit that controls a register specified by an inputted address includes a signal output that outputs a first control signal and a second control signal based on the inputted address, a selector that selects data of a register specified by the first control signal outputted from the signal output, a logical operator that performs a logical operation of write data outputted from a processor and the data selected by the selector to output an operation result, and a storage that stores data in the register specified by the first control signal by selecting one of the write data and the operation results as the data based on the second control signal outputted from the signal output.
    Type: Grant
    Filed: July 27, 2009
    Date of Patent: April 16, 2013
    Assignee: Fujitsu Limited
    Inventor: Yuusuke Ashizuka
  • Publication number: 20130086366
    Abstract: An apparatus includes a register file including a logical circuit. The register file is configured to perform one or more logical operations in conjunction with the logical circuit. The logical operation is performed in response to the register file receiving a register file control instruction. The register file control instruction is independent from an arithmetic logic unit (ALU) control instruction and a multiply-and-accumulate unit (MACU) control instruction.
    Type: Application
    Filed: September 30, 2011
    Publication date: April 4, 2013
    Applicant: QUALCOMM INCORPORATED
    Inventor: Aaron D. Lamb
  • Publication number: 20130080740
    Abstract: In one embodiment, a microprocessor includes fetch logic for retrieving an instruction, decode logic configured to identify an arithmetic operation specified in the instruction, and execution logic configured to receive operands specified by the instruction. The execution logic includes a primary logic path configured to perform the arithmetic operation on such operands and a secondary parallel logic path configured to output metadata associated with the result of the arithmetic operation.
    Type: Application
    Filed: September 28, 2011
    Publication date: March 28, 2013
    Applicant: NVIDIA CORPORATION
    Inventors: Peter Gentle, Scott Pitkethly
  • Patent number: 8407273
    Abstract: A processor or other device, such as a programmable and/or massively parallel processor or other device, includes processing elements designed to perform arithmetic operations (possibly but not necessarily including, for example, one or more of addition, multiplication, subtraction, and division) on numerical values of low precision but high dynamic range (“LPHDR arithmetic”). Such a processor or other device may, for example, be implemented on a single chip. Whether or not implemented on a single chip, the number of LPHDR arithmetic elements in the processor or other device in certain embodiments of the present invention significantly exceeds (e.g., by at least 20 more than three times) the number of arithmetic elements, if any, in the processor or other device which are designed to perform high dynamic range arithmetic of traditional precision (such as 32 bit or 64 bit floating point arithmetic).
    Type: Grant
    Filed: February 17, 2012
    Date of Patent: March 26, 2013
    Assignee: Singular Computing LLC
    Inventor: Joseph Bates
  • Patent number: 8402254
    Abstract: The RISC data processor is based on the idea that in case that there are many flag-generating instructions, the number of flags generated by each instruction is increased so that a decrease of flag-generating instructions exceeds an increase of flag-using instructions in quantity, thereby achieving the decrease in instructions. With the data processor, an instruction for generating flags according to operands' data sizes is defined. To an instruction set handled by the RISC data processor, an instruction capable of executing an operation on operand in more than one data size, which performs a process identical to an operation process conducted on the small-size operand on low-order bits of the large-size operand, and generates flags capable of coping with the respective data sizes regardless of the data size of each operand subjected to the operation is added. Thus, the reduction in instruction code space of the RISC data processor tight in instruction code space can be achieved.
    Type: Grant
    Filed: February 11, 2009
    Date of Patent: March 19, 2013
    Assignee: Renesas Electronics Corporation
    Inventor: Fumio Arakawa
  • Patent number: 8375197
    Abstract: Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: performing, for each node, a local reduction operation using allreduce contribution data for the cores of that node, yielding, for each node, a local reduction result for one or more representative cores for that node; establishing one or more logical rings among the nodes, each logical ring including only one of the representative cores from each node; performing, for each logical ring, a global allreduce operation using the local reduction result for the representative cores included in that logical ring, yielding a global allreduce result for each representative core included in that logical ring; and performing, for each node, a local broadcast operation using the global allreduce results for each representative core on that node.
    Type: Grant
    Filed: May 21, 2008
    Date of Patent: February 12, 2013
    Assignee: International Business Machines Corporation
    Inventor: Ahmad Faraj
  • Publication number: 20130024668
    Abstract: An architecture includes a controller. The controller is configured to receive a microprogram. The microprogram is configured for performing at least one of hierarchical or a sequence of polynomial computations. The architecture also includes an arithmetic logic unit (ALU) communicably coupled to the controller. The ALU is controlled by the controller. Additionally, the microprogram is compiled prior to execution by the controller, the microprogram is compiled into a plurality of binary tables, and the microprogram is programmed in a command language in which each command includes a first portion for indicating at least one of a command or data transferred to the ALU, and a second portion for including a control command to the controller. The architecture and implementation of the programmable controller may be for cryptographic applications, including those related to public key cryptography.
    Type: Application
    Filed: September 27, 2012
    Publication date: January 24, 2013
    Applicant: LSI CORPORATION
    Inventor: LSI Corporation
  • Publication number: 20130024667
    Abstract: An attribute group storage unit acquires and holds attribute groups set to respective data blocks. A scenario determination unit determines respective transfer systems of the respective blocks between a memory of the lowest hierarchy and a memory of another hierarchy based on those attribute groups and a configuration of an arithmetic unit which is the parallel processor, and controls the transfer of the respective data blocks according to the determined transfer systems, and the parallel arithmetic operation corresponding to the transfer. Each of the attribute groups is necessary to determine the transfer systems, and includes one or more attributes not depending on the configuration of the parallel processor. The attribute groups of the write blocks are set assuming that each of the write blocks has already been located in the memory of another hierarchy, and is transferred to the memory of the lowest hierarchy.
    Type: Application
    Filed: June 21, 2012
    Publication date: January 24, 2013
    Applicant: Renesas Electronics Corporation
    Inventor: Shorin KYO
  • Publication number: 20130013895
    Abstract: An exemplary byte-oriented microcontroller includes a program memory, a program memory bus, and a core circuit. The program memory bus has a bus width wider than one instruction byte, and the core circuit is coupled to the program memory through the program memory bus for executing at least one instruction by processing a plurality of instruction bytes fetched from the program memory. The core circuit includes a fetch unit, for fetching the instruction bytes through the program memory bus and re-ordering the fetched instruction bytes to form a complete instruction.
    Type: Application
    Filed: July 6, 2011
    Publication date: January 10, 2013
    Applicant: FS-SEMI CO., LTD.
    Inventor: Hsiao-Ming Huang
  • Publication number: 20130007421
    Abstract: Efficient computation of complex long multiplication results and an efficient calculation of a covariance matrix are described. A parallel array VLIW digital signal processor is employed along with specialized complex long multiplication instructions and communication operations between the processing elements which are overlapped with computation to provide very high performance operation. Successive iterations of a loop of tightly packed VLIWs may be used allowing the complex multiplication pipeline hardware to be efficiently used.
    Type: Application
    Filed: September 13, 2012
    Publication date: January 3, 2013
    Applicant: ALTERA CORPORATION
    Inventors: Gerald G. Pechanek, Ricardo Rodriguez, Matthew Plonski, David Strube, Kevin Coopman
  • Publication number: 20120311305
    Abstract: Provided is an information processing device including an instruction cache, a data cache, first and second arithmetic unit groups including a plurality of arithmetic units capable of parallel operation, a first arithmetic-control circuit that generates one or more operation instructions for the first arithmetic unit group, and a second arithmetic-control circuit that generates one or more operation instructions for the second arithmetic unit group based on an instruction code of a fixed instruction register. The first arithmetic unit group sets the instruction code to the fixed instruction register according to an operation instruction generated based on a first specific instruction code by the first arithmetic-control circuit, and provides data to the second arithmetic unit group according to an operation instruction generated based on a second specific instruction code by the first arithmetic-control circuit.
    Type: Application
    Filed: May 29, 2012
    Publication date: December 6, 2012
    Inventors: Yuki KOBAYASHI, Shohei NOMOTO
  • Patent number: 8311683
    Abstract: Illustrative embodiments provide a computer implemented method, a data processing system, and a computer program product for adjusting cooling settings. The computer implemented method comprises analyzing a set of instructions of an application to determine a number of degrees by which a set of instructions will raise a temperature of at least one processor core. The computer implemented method further calculates a cooling setting for at least one cooling system for the at least one processor core. The computer implemented method adjusts the at least one cooling system based on the cooling setting. The step of analyzing the set of instructions is performed before the set of instructions is executed on the at least one processor core. The step of adjusting the at least one cooling system is performed before the set of instructions is executed on the at least one processor core.
    Type: Grant
    Filed: April 29, 2009
    Date of Patent: November 13, 2012
    Assignee: International Business Machines Corporation
    Inventors: Robert Lee Angell, David Wayne Cosby, Robert R. Friedlander, James R. Kraemer
  • Patent number: 8312442
    Abstract: A computing system has an amount of shared cache, and performs runtime automatic parallelization wherein when a parallelized loop is encountered, a main thread shares the workload with at least one other non-main thread. A method for providing interprocedural prefetching includes compiling source code to produce compiled code having a main thread including a parallelized loop. Prior to the parallelized loop in the main thread, the main thread includes prefetching instructions for the at least one other non-main thread that shares the workload of the parallelized loop. As a result, the main thread prefetches data into the shared cache for use by the at least one other non-main thread.
    Type: Grant
    Filed: December 10, 2008
    Date of Patent: November 13, 2012
    Assignee: Oracle America, Inc.
    Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
  • Patent number: 8307196
    Abstract: A method for operating a data processing system is provided. The method includes providing a first operand stored in a first register, providing a second operand stored in the register, providing a third operand stored in the register. The method further includes executing a first instruction, where executing the first instruction comprises: (1) retrieving the first operand, the second operand, and the third operand from the first register; (2) performing an operation using the first operand, the second operand, and the third operand to generate a bit exact result.
    Type: Grant
    Filed: April 5, 2006
    Date of Patent: November 6, 2012
    Assignee: Freescale Semiconductor, Inc.
    Inventors: William C. Moyer, Imran Ahmed, Dan Tamir
  • Patent number: 8301869
    Abstract: A programmable controller which executes a plurality of independent sequence programs in parallel is provided with an ASIC, including a plurality of arithmetic-logic units and a plurality of arbitration circuits, and MPUs as many as the arbitration circuits. The entire execution time of the programmable controller is shortened by changing combinations (groups of arithmetic-logic units) of the MPUs (and the arbitration circuits as many as the MPUs) and the arithmetic-logic units, based on the ratios of MPU execution instructions and ASIC execution instructions included in those instructions which constitute the programs to be executed in parallel.
    Type: Grant
    Filed: February 18, 2011
    Date of Patent: October 30, 2012
    Assignee: Fanuc Corporation
    Inventors: Masuo Kokura, Yasushi Nomoto, Motoyoshi Miyachi
  • Patent number: 8296350
    Abstract: The present invention provides a method and apparatus for QR-factorizing matrix on a multiprocessor system, wherein the multiprocessor system comprises at least one core processor and a plurality of accelerators, the method comprises the steps of: iteratively factorizing each panel in the matrix until the whole matrix is factorized; wherein in each iteration, the method comprises: partitioning an unprocessed matrix part in the matrix into a plurality of blocks according to a predetermined block size; partitioning a current processed panel in the unprocessed matrix part into at least two sub panels, wherein the current processed panel is composed of a plurality of blocks; and performing QR factorization one by one on the at least two sub panels with the plurality of accelerators, and updating the data of the sub panel(s) on which no QR factorization has been performed among the at least two sub panels by using the factorization result.
    Type: Grant
    Filed: March 12, 2009
    Date of Patent: October 23, 2012
    Assignee: International Business Machines Corporation
    Inventors: Hui Li, Bai Ling Wang
  • Publication number: 20120260073
    Abstract: A microprocessor includes processor modes comprising a user mode and a plurality of exception modes. An execution unit performs arithmetic operations on operands specified by program instructions. A first set of storage elements holds a first subset of the operands and provides them to the execution unit coupled thereto. A second set of storage elements associated with each of the modes hold a second subset of the operands and are incapable of directly providing the second operand subset to the execution unit. To enter a new mode from a current mode, logic saves the first operand subset held in the first set of storage elements to the second set of storage elements associated with the current mode and restores to the first set of storage elements the second operand subset held in the second set of storage elements associated with the new mode.
    Type: Application
    Filed: March 6, 2012
    Publication date: October 11, 2012
    Applicant: VIA TECHNOLOGIES, INC.
    Inventors: G. Glenn Henry, Terry Parks, Rodney E. Hooker
  • Publication number: 20120254585
    Abstract: Methods and apparatus for double precision division/inversion vector computations on Single Instruction Multiple Data (SIMD) computing platforms are described. In one embodiment, an input argument is represented by an exponent portion and a fraction portion. These portions are scaled, inverted, and multiplied to generate an inverse version of the input argument. In an embodiment, the inversion of the exponent portion may be done by changing the sign of the exponent. Other embodiments are also described.
    Type: Application
    Filed: December 25, 2009
    Publication date: October 4, 2012
    Inventors: Andrey Kolesov, Valery Kuriakin, Maria Guseva
  • Patent number: 8276116
    Abstract: An algebra operation method includes the steps of converting algebra operations for a plurality of objects which appear in a program into an algebra operation sequence object described using object access data used to access the plurality of objects and object state data used to store states associated with the plurality of objects without immediately evaluating the algebra operations, determining a function to be applied to the algebra operation sequence object, and evaluating the algebra operations by executing the function by designating an argument group required for the function in response to a call of a substitute operator.
    Type: Grant
    Filed: June 7, 2007
    Date of Patent: September 25, 2012
    Assignee: Canon Kabushiki Kaisha
    Inventor: Yasuhiro Nakahara
  • Patent number: 8271767
    Abstract: An HW arithmetic unit executes a predetermined arithmetic operation. An arithmetic-mode determining unit determines, based on an attribute or a content of data relating to processing that has requested the arithmetic operation, either a synchronous mode that executes the processing after waiting for completion of the arithmetic operation by an arithmetic circuit or an asynchronous mode that executes the processing without waiting for completion of the arithmetic operation by the arithmetic circuit, as an execution mode of the arithmetic operation. An arithmetic-process control unit controls the arithmetic operation by the arithmetic circuit according to the determined execution mode.
    Type: Grant
    Filed: June 27, 2008
    Date of Patent: September 18, 2012
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Keisuke Mera, Takeshi Ishihara, Yasuhiro Fukuju
  • Patent number: 8250337
    Abstract: General purpose array processing techniques including processing methods and apparatus. Processors may include parallel processing paths designed with reusable computational components such as multipliers, multiplexers, and ALUs. Flow of data through the paths and operations performed may be controlled based on opcodes. Processors may be shared, scalable, and configured to perform matrix operations. In particular, such operation may be useful for physical sections of MIMO-OFDM communication systems.
    Type: Grant
    Filed: April 27, 2007
    Date of Patent: August 21, 2012
    Assignee: Qualcomm Incorporated
    Inventor: Garret Webster Shih
  • Publication number: 20120210101
    Abstract: A competition testing apparatus for testing an access competition of an arithmetic unit includes a memory that stores a program, a first processor that executes the program by accessing the memory, a second processor that executes the program by accessing the memory, and an arbitration unit that arbitrates accessing the first processor and the second processor and reports a result of the arbitration upon the first processor and the second processor accessing the same address space in the memory, wherein the memory stores a odd number of programs, further comprises a controller that controls the first processor to process the plurality of test programs stored in the storage in predetermined order, and controls the second processor to process the plurality of test programs stored in the storage in order reverse to the predetermined order, and a recording unit that records the result of arbitration performed using the arbitrator.
    Type: Application
    Filed: August 12, 2011
    Publication date: August 16, 2012
    Applicant: FUJITSU LIMITED
    Inventor: Yasushi ASANO
  • Publication number: 20120204012
    Abstract: A method includes providing a data processor having an instruction pipeline, where the instruction pipeline has a plurality of instruction pipeline stages, and where the plurality of instruction pipeline stages includes a first instruction pipeline stage and a second instruction pipeline stage. The method further includes providing a data processor instruction that causes the data processor to perform a first set of computational operations during execution of the data processor instruction, performing the first set of computational operations in the first instruction pipeline stage if the data processor instruction is being executed and a first mode has been selected, and performing the first set of computational operations in the second instruction pipeline stage if the data processor instruction is being executed and a second mode has been selected.
    Type: Application
    Filed: April 13, 2012
    Publication date: August 9, 2012
    Applicant: Rambus Inc.
    Inventors: William C. Moyer, Jeffrey W. Scott
  • Publication number: 20120198211
    Abstract: There is a need for providing a battery-less integrated circuit (IC) card capable of operating in accordance with a contact usage or a non-contact usage, preventing coprocessor throughput from degrading despite a decreased clock frequency for reduced power consumption under non-contact usage, and ensuring high-speed processing under non-contact usage. A dual interface card is a battery-less IC card capable of operating in accordance with a contact usage or a non-contact usage. The dual interface card operates at a high clock under contact usage and at a low clock under non-contact usage. A targeted operation comprises a plurality of different basic operations. The dual interface card comprises a basic arithmetic circuit group. Under the contact usage, the basic arithmetic circuit group performs one basic operation of the targeted operation at one cycle. Under the non-contact usage, the basic arithmetic circuit group sequentially performs at least two basic operations of the targeted operation at one cycle.
    Type: Application
    Filed: April 10, 2012
    Publication date: August 2, 2012
    Applicant: Renesas Electronics Corporation
    Inventors: Daisuke SUZUKI, Minoru Saeki, Yuichiro Nariyoshi
  • Publication number: 20120198212
    Abstract: A microprocessor, a method for enhanced precision sum-of-products calculation and a video decoding device are provided, in which at least one general-purpose-register is arranged to provide a number of destination bits to a multiply unit, and a control unit is adapted to provide at least a multiply-high instruction and a multiply-high-and-accumulate instruction to the multiply unit. The multiply unit is arranged to receive at least first and second source operands having an associated number of source bits, a sum of source bits exceeding the number of destination bits, connected to a register-extension cache comprising at least one cache entry arranged to store a number of precision-enhancement bits, and adapted to store a destination portion of a result operand in the general-purpose-register and a precision enhancement portion in the cache entry. The result operand is generated by a multiply-high operation or by a multiply-high-and-accumulate operation, depending on the received instructions.
    Type: Application
    Filed: November 30, 2009
    Publication date: August 2, 2012
    Inventor: Martin Raubuch
  • Patent number: 8219783
    Abstract: An SIMD type microprocessor is disclosed. The SIMD type microprocessor includes plural PEs (processor elements) each of which provides an ALU (arithmetic and logic unit) for lower-order bits, an ALU for upper-order bits, a control circuit for lower-order bits, a control circuit for upper-order bits, a range determining circuit for lower-order bits, and a range determining circuit for upper-order bits. The SIMD type microprocessor further includes a global processor, a range designation bus for lower-order bits which connects the global processor to the range determining circuit for lower-order bits, and a range designation bus for upper-order bits which connects the global processor to the range determining circuit for upper-order bits.
    Type: Grant
    Filed: July 3, 2008
    Date of Patent: July 10, 2012
    Assignee: Ricoh Company, Ltd.
    Inventor: Kazuhiko Hara
  • Patent number: 8214626
    Abstract: Method, apparatus, and program means for shuffling data. The method of one embodiment comprises receiving a first operand having a set of L data elements and a second operand having a set of L control elements. For each control element, data from a first operand data element designated by the individual control element is shuffled to an associated resultant data element position if its flush to zero field is not set and a zero is placed into the associated resultant data element position if its flush to zero field is not set.
    Type: Grant
    Filed: March 31, 2009
    Date of Patent: July 3, 2012
    Assignee: Intel Corporation
    Inventors: William W. Macy, Jr., Eric L. Debes, Patrice L. Roussel, Huy V. Nguyen
  • Publication number: 20120166773
    Abstract: In certain embodiments, a digital signal processor (DSP) has multiple arithmetic logic units and a register module. The DSP is adapted to generate a message digest H from a message M in accordance with the SHA-1 standard, where M includes N blocks M(i), i=1, . . . , N, and the processing of each block M(i) includes t iterations of processing words of message schedule {Wt}. In each iteration possible, the DSP uses free operations to precalculate Wt and working variable values for use in the next iteration. In addition, in each iteration possible, the DSP rotates the registers associated with particular working variables to reduce operations that merely copy unchanged values from one register to another.
    Type: Application
    Filed: September 16, 2009
    Publication date: June 28, 2012
    Applicant: LSI CORPORATION
    Inventors: Dmitriy Vladimirovich Alekseev, Alexei Vladimirovich Galatenko, Ilya Viktorovich Lyalin, Alexander Markovic, Denis Vassilevich Parfenov
  • Patent number: 8200948
    Abstract: An apparatus and method are provided for performing re-arrangement operations on data. The data processing apparatus has a register data store with a plurality of registers for storing data, and processing logic for performing a sequence of operations on data including at least one re-arrangement operation. The processing logic has scalar processing logic for performing scalar operations and SIMD processing logic for performing SIMD operations. The SIMD processing logic is responsive to a re-arrangement instruction specifying a family of re-arrangement operations to perform a selected re-arrangement operation from that family on a plurality of data elements constituted by data in one or more registers identified by the re-arrangement instruction. The selected re-arrangement operation is dependent on at least one parameter provided by the scalar processing logic, that parameter identifying a data element width for the data elements on which the selected re-arrangement operation is performed.
    Type: Grant
    Filed: December 4, 2007
    Date of Patent: June 12, 2012
    Assignee: ARM Limited
    Inventors: Daniel Kershaw, Dominic Hugo Symes, Alastair Reid
  • Patent number: 8190669
    Abstract: Multipurpose arithmetic functional units can perform planar attribute interpolation and unary function approximation operations. In one embodiment, planar interpolation operations for coordinates (x, y) are executed by computing A*x+B*y+C, and unary function approximation operations for operand x are executed by computing F2(xb)*xh2+F1(xb)*xh+F0(xb), where xh=x?xb. Shared multiplier and adder circuits are advantageously used to implement the product and sum operations for both classes of operations.
    Type: Grant
    Filed: October 20, 2004
    Date of Patent: May 29, 2012
    Assignee: NVIDIA Corporation
    Inventors: Stuart F. Oberman, Ming Y. Siu
  • Patent number: 8185723
    Abstract: A method is presented including decomposing a first value into many parts. Decomposing includes shifting (310) a rounded integer portion of the first value to generate a second value. Generating (320) a third value. Extracting (330) a plurality of significand bits from the second value to generate a fourth value. Extracting (340) a portion of bits from the fourth value to generate an integer component. Generating (350) a fifth value. Also the third value, the fifth value, and the integer component are either stored (360, 380) in a memory or transmitted to an arithmetic logical unit (ALU).
    Type: Grant
    Filed: July 13, 2001
    Date of Patent: May 22, 2012
    Assignee: Intel Corporation
    Inventors: Robert S. Norin, Olga Dryzhakova, Alexander Isaev, Andrey Naraikin
  • Patent number: 8185721
    Abstract: A system including a dual function adder is described. In one embodiment, the system includes an adder. The adder is configured for a first instruction to determine an address for a hardware prefetch if the first instruction is a hardware prefetch instruction. The adder is further configured for the first instruction to determine a value from an arithmetic operation if the first instruction is an arithmetic operation instruction.
    Type: Grant
    Filed: March 4, 2008
    Date of Patent: May 22, 2012
    Assignee: QUALCOMM Incorporated
    Inventors: Ajay Anant Ingle, Erich James Plondke, Lucian Codrescu
  • Publication number: 20120117441
    Abstract: A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.
    Type: Application
    Filed: January 19, 2012
    Publication date: May 10, 2012
    Applicant: MicroUnity Systems Engineering, Inc.
    Inventors: Craig Hansen, John Moussouris, Alexia Massalin
  • Patent number: 8161268
    Abstract: Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.
    Type: Grant
    Filed: May 21, 2008
    Date of Patent: April 17, 2012
    Assignee: International Business Machines Corporation
    Inventor: Ahmad Faraj
  • Publication number: 20120083912
    Abstract: An arithmetic logic and shifting device is disclosed and includes an arithmetic logic unit that has a first input to receive a first operand from a first register port, a second input to receive a second operand from a second register port, and an output to selectively provide a memory address to a memory unit in a first mode of operation and to selectively provide an arithmetic output in a second mode of operation. Further, the arithmetic logic and shifting device includes a programmable shifter device that has a first input to receive data from the memory unit, a second input to receive the arithmetic output, a third input to receive an operation code of a computer execution instruction, and a shifted output to provide shifted data.
    Type: Application
    Filed: December 8, 2011
    Publication date: April 5, 2012
    Applicant: QUALCOMM INCORPORATED
    Inventors: Muhammad Ahmed, Ajay Anant Ingle, Sujat Jamil
  • Patent number: 8150902
    Abstract: A processor or other device, such as a programmable and/or massively parallel processor or other device, includes processing elements designed to perform arithmetic operations (possibly but not necessarily including, for example, one or more of addition, multiplication, subtraction, and division) on numerical values of low precision but high dynamic range (“LPHDR arithmetic”). Such a processor or other device may, for example, be implemented on a single chip. Whether or not implemented on a single chip, the number of LPHDR arithmetic elements in the processor or other device in certain embodiments of the present invention significantly exceeds (e.g., by at least 20 more than three times) the number of arithmetic elements, if any, in the processor or other device which are designed to perform high dynamic range arithmetic of traditional precision (such as 32 bit or 64 bit floating point arithmetic).
    Type: Grant
    Filed: June 15, 2010
    Date of Patent: April 3, 2012
    Assignee: Singular Computing LLC
    Inventor: Joseph Bates
  • Publication number: 20120079250
    Abstract: A semiconductor chip is described having a functional unit that can execute a first instruction and execute a second instruction. The first instruction is an instruction that multiplies two operands. The second instruction is an instruction that approximates a function according to C0+C1X2+C2X22. The functional unit has a multiplier circuit. The multiplier circuit has: i) a first input to receive bits of a first operand of the first instruction and receive bits of a C1 term of the second instruction; ii) a second input to receive bits of a second operand of the first instruction and receive bits of a X2 term of the second instruction.
    Type: Application
    Filed: September 24, 2010
    Publication date: March 29, 2012
    Inventors: Alex Pineiro, Thomas E. Fletcher, Brian J. Hickmann
  • Publication number: 20120079243
    Abstract: A graphics processing unit core 26 includes a plurality of processing pipelines 38, 40, 42, 44. A program instruction of a thread of program instructions being executed by a processing pipeline includes a next-instruction-type field 36 indicating an instruction type of a next program instruction following the current program instruction within the processing thread concerned. This next-instruction-type field is used to control selection of to which processing pipeline the next instruction is issued before that next instruction has been fetched and decoded. The next-instruction-type field may be passed along the processing pipeline as the least significant four bits within a program counter value associated with a current program instruction 32. The next-instruction-type field may also be used to control the forwarding of thread state variables between processing pipelines when a thread migrates between processing pipelines prior to the next program instruction being fetched or decoded.
    Type: Application
    Filed: September 1, 2011
    Publication date: March 29, 2012
    Applicant: ARM Limited
    Inventor: Jorn Nystad
  • Publication number: 20120079251
    Abstract: A method is described that involves executing a first instruction with a functional unit. The first instruction is a multiply-add instruction. The method further includes executing a second instruction with the functional unit. The second instruction is a round instruction.
    Type: Application
    Filed: September 24, 2010
    Publication date: March 29, 2012
    Inventors: Amit Gradstein, Cristina S. Anderson, Zeev Sperber, Simon Rubanovich, Benny Eitan
  • Publication number: 20120072703
    Abstract: In one embodiment, a processor includes a multiply-accumulate (MAC) unit having a first path to handle execution of an instruction if a difference between at least a portion of first and second operands and a third operand is less than a threshold value, and a second path to handle the instruction execution if the difference is greater than the threshold value. Based on the difference, at least part of the third operand is to be provided to a multiplier of the MAC unit or to a compressor of the second path. Other embodiments are described and claimed.
    Type: Application
    Filed: September 20, 2010
    Publication date: March 22, 2012
    Inventors: SURESH SRINIVASAN, RAJARAM RAMANARAYANAN, SANU K. MATHEW, RAM K. KRISHNAMURTHY, VASANTHA K. ERRAGUNTLA
  • Publication number: 20120060019
    Abstract: A reduction operation device detects a non-correspondence of an operation type or a data type in a reduction arithmetic operation of a parallel processing. The reduction operation device is inputted a plurality of the synchronization signals and data, sets each transmission destinations of the plurality of inputted synchronization signals and the plurality of data corresponding to a next stage of a reduction operation and executes the reduction operation. The synchronization unit in the reduction operation device detects the non-correspondence between the operation type or the data type included in an instruction of the reduction operation after the synchronization is established and controls the arithmetic operation of the arithmetic unit.
    Type: Application
    Filed: June 22, 2011
    Publication date: March 8, 2012
    Applicant: Fujitsu Limited
    Inventors: Shinya Hiramoto, Yuichiro Ajima, Tomohiro Inoue
  • Patent number: 8131504
    Abstract: A system includes a serial connection mode for obtaining a first approximation to a zero error result by means of a negative rough precision for manufacturing a plurality of first semi-finished products, and a measurement apparatus for measuring a precision value of each first semi-finished product, and a Full-9 Principle for sifting the first semi-finished products. A parallel connection mode is used for obtaining a second approximation to the zero error result by means of a positive rough precision by division to manufacture a plurality of second semi-finished products, and the measurement apparatus is used to measure a precision value of each second semi-finished product, and an error sift formula is utilized to sift the second semi-finished products.
    Type: Grant
    Filed: September 14, 2007
    Date of Patent: March 6, 2012
    Inventor: Martin Jo
  • Patent number: 8127276
    Abstract: Apparatus, method, and computer readable medium for generating and utilizing a feature code to monitor a program are provided. The program is run in a secure environment at the beginning. The program calls a function through an application program interface. A return address of the application program interface is used to generate the feature code. When the application runs again at another time, the feature code is utilized to monitor the program. According to the aforementioned arrangement and steps, the application program interface can be monitored dynamically. Consequently, any program can be monitored by this approach, which results in a more secure environment. Further, fewer application program interfaces are required to be monitored, so the required computer resource is less.
    Type: Grant
    Filed: April 3, 2007
    Date of Patent: February 28, 2012
    Assignee: Institute for Information Industry
    Inventors: Cheng-Kai Chen, Hung Min Sun, Kang-Chiao Lin, Shih-Ying Chang, Shuai-Min Chen