Arithmetic Operation Instruction Processing Patents (Class 712/221)

Floating point or vector (Class 712/222)

APPARATUS COMPRISING A PLURALITY OF ARITHMETIC LOGIC UNITS

Publication number: 20130166890

Abstract: An arrangement of at least two arithmetic logic units carries out an operation defined by a decoded instruction including at least one operand and more than one operation code. The operation codes and at least one operand are received and corresponding executions are performed by the arithmetic logic units on a single clock cycle. The result of the execution from one arithmetic logic unit is used as an operand by a further arithmetic logic unit. The decoding of the instruction is performed in an immediately preceding single clock cycle.

Type: Application

Filed: February 11, 2013

Publication date: June 27, 2013

Applicant: STMicroelectronics (Research & Development) Limited

Inventor: STMicroelectronics (Research & Development) Limited
Video instruction processing of desired bytes in multi-byte buffers by shifting to matching byte location

Patent number: 8473721

Abstract: Disclosed herein is a processing unit configured to process video data, and applications thereof. In an embodiment, the processing unit includes a buffer and an execution unit. The buffer is configured to store a data word, wherein the data word comprises a plurality of bytes of video data. The execution unit is configured to execute a single instruction to (i) shift bytes of video data contained in the data word to align a desired byte of video data and (ii) process the desired byte of the video data to provide processed video data.

Type: Grant

Filed: April 16, 2010

Date of Patent: June 25, 2013

Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Michael J. Mantor, Jeffrey T. Brady, Christopher L. Spencer, Daniel W. Wong, Andrew E. Gruber
Data packet arithmetic logic devices and methods

Patent number: 8473719

Abstract: New instruction definitions for a packet add (PADD) operation and for a single instruction multiple add (SMAD) operation are disclosed. In addition, a new dedicated PADD logic device that performs the PADD operation in about one to two processor clock cycles is disclosed. Also, a new dedicated SMAD logic device that performs a single instruction multiple data add (SMAD) operation in about one to two clock cycles is disclosed.

Type: Grant

Filed: October 31, 2006

Date of Patent: June 25, 2013

Assignee: Intel Corporation

Inventors: Corey Gee, Bapiraju Vinnakota, Saleem Mohammadali, Carl A. Alberola
Method and apparatus for accelerating execution of logical “and” instructions in data processing applications

Patent number: 8468326

Abstract: A hardware module configured to perform single instructions faster than is possible in software running on the microprocessor. In one implementation, the hardware module is configured to perform a single count instruction, including - counting a number of “ones” contained in a first register; and storing, in a second register, the count of the number of “ones” contained in the first register.

Type: Grant

Filed: August 3, 2009

Date of Patent: June 18, 2013

Assignee: Marvell International Ltd.

Inventors: Jack Kang, Jianwei Bei, Shanker Rao Donthineni, Manish Kumar, Victor Lin, Justin Lau
METHOD AND APPARATUS FOR ROTATING AND SHIFTING DATA DURING AN EXECUTION PIPELINE CYCLE OF A PROCESSOR

Publication number: 20130151820

Abstract: A method and apparatus are described for processing data during an execution pipeline cycle of a processor. Valid bits of the data are generated according to a designated data size. Each of the valid bits is inserted into at least one of a plurality of bit positions. The valid bits are rotated in a predetermined direction (i.e., left or right rotation) by a designated number of bit positions. Valid bits are removed from a portion of the plurality of bit positions after being rotated. Zeros or most significant bits (MSBs) of the data may be inserted in the bit positions from which the valid bits were removed. The number of bit positions to rotate the valid bits by may be designated by a first bit subset and a second bit subset. The first bit subset may indicate a number of bytes, and the second bit subset may indicate a number of bits.

Type: Application

Filed: December 9, 2011

Publication date: June 13, 2013

Applicant: Advanced Micro Devices, Inc.

Inventors: Srikanth Arekapudi, Saurabh Gupta
Hash processing using a processor

Patent number: 8447988

Abstract: In certain embodiments, a digital signal processor (DSP) has multiple arithmetic logic units and a register module. The DSP is adapted to generate a message digest H from a message M in accordance with the SHA-1 standard, where M includes N blocks M(i), i=1, . . . , N, and the processing of each block M(i) includes t iterations of processing words of message schedule {Wt}. In each iteration possible, the DSP uses free operations to precalculate Wt and working variable values for use in the next iteration. In addition, in each iteration possible, the DSP rotates the registers associated with particular working variables to reduce operations that merely copy unchanged values from one register to another.

Type: Grant

Filed: September 16, 2009

Date of Patent: May 21, 2013

Assignee: LSI Corporation

Inventors: Dmitriy Vladimirovich Alekseev, Alexei Vladimirovich Galatenko, Ilya Viktorovich Lyalin, Alexander Markovic, Denis Vassilevich Parfenov
Interleaved hardware multithreading processor architecture

Patent number: 8429384

Abstract: An architecture for a digital signal processor alleviates the difficulties and complexities normally associated with writing and optimizing programs to avoid stalls during which one instruction awaits the result of a prior instruction. The architecture coordinates the processing of data for multiple instructions through a multiple stage data pipeline. As a result, the architecture not only supports simultaneous execution of multiple programs, but also permits each program to execute without delays caused by inter-relationships between instructions within the program.

Type: Grant

Filed: November 15, 2006

Date of Patent: April 23, 2013

Assignee: Harman International Industries, Incorporated

Inventors: James D. Pennock, Ronald Baker, Brian R. Parker, Christopher Belcher
Register control circuit and register control method

Patent number: 8423748

Abstract: A register control circuit that controls a register specified by an inputted address includes a signal output that outputs a first control signal and a second control signal based on the inputted address, a selector that selects data of a register specified by the first control signal outputted from the signal output, a logical operator that performs a logical operation of write data outputted from a processor and the data selected by the selector to output an operation result, and a storage that stores data in the register specified by the first control signal by selecting one of the write data and the operation results as the data based on the second control signal outputted from the signal output.

Type: Grant

Filed: July 27, 2009

Date of Patent: April 16, 2013

Assignee: Fujitsu Limited

Inventor: Yuusuke Ashizuka
Register File with Embedded Shift and Parallel Write Capability

Publication number: 20130086366

Abstract: An apparatus includes a register file including a logical circuit. The register file is configured to perform one or more logical operations in conjunction with the logical circuit. The logical operation is performed in response to the register file receiving a register file control instruction. The register file control instruction is independent from an arithmetic logic unit (ALU) control instruction and a multiply-and-accumulate unit (MACU) control instruction.

Type: Application

Filed: September 30, 2011

Publication date: April 4, 2013

Applicant: QUALCOMM INCORPORATED

Inventor: Aaron D. Lamb
FAST CONDITION CODE GENERATION FOR ARITHMETIC LOGIC UNIT

Publication number: 20130080740

Abstract: In one embodiment, a microprocessor includes fetch logic for retrieving an instruction, decode logic configured to identify an arithmetic operation specified in the instruction, and execution logic configured to receive operands specified by the instruction. The execution logic includes a primary logic path configured to perform the arithmetic operation on such operands and a secondary parallel logic path configured to output metadata associated with the result of the arithmetic operation.

Type: Application

Filed: September 28, 2011

Publication date: March 28, 2013

Applicant: NVIDIA CORPORATION

Inventors: Peter Gentle, Scott Pitkethly
Processing with compact arithmetic processing element

Patent number: 8407273

Abstract: A processor or other device, such as a programmable and/or massively parallel processor or other device, includes processing elements designed to perform arithmetic operations (possibly but not necessarily including, for example, one or more of addition, multiplication, subtraction, and division) on numerical values of low precision but high dynamic range (“LPHDR arithmetic”). Such a processor or other device may, for example, be implemented on a single chip. Whether or not implemented on a single chip, the number of LPHDR arithmetic elements in the processor or other device in certain embodiments of the present invention significantly exceeds (e.g., by at least 20 more than three times) the number of arithmetic elements, if any, in the processor or other device which are designed to perform high dynamic range arithmetic of traditional precision (such as 32 bit or 64 bit floating point arithmetic).

Type: Grant

Filed: February 17, 2012

Date of Patent: March 26, 2013

Assignee: Singular Computing LLC

Inventor: Joseph Bates
Flag generation and use in processor with same processing for operation on small size operand as low order bits portion of operation on large size operand

Patent number: 8402254

Abstract: The RISC data processor is based on the idea that in case that there are many flag-generating instructions, the number of flags generated by each instruction is increased so that a decrease of flag-generating instructions exceeds an increase of flag-using instructions in quantity, thereby achieving the decrease in instructions. With the data processor, an instruction for generating flags according to operands' data sizes is defined. To an instruction set handled by the RISC data processor, an instruction capable of executing an operation on operand in more than one data size, which performs a process identical to an operation process conducted on the small-size operand on low-order bits of the large-size operand, and generates flags capable of coping with the respective data sizes regardless of the data size of each operand subjected to the operation is added. Thus, the reduction in instruction code space of the RISC data processor tight in instruction code space can be achieved.

Type: Grant

Filed: February 11, 2009

Date of Patent: March 19, 2013

Assignee: Renesas Electronics Corporation

Inventor: Fumio Arakawa
Performing an allreduce operation on a plurality of compute nodes of a parallel computer

Patent number: 8375197

Abstract: Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: performing, for each node, a local reduction operation using allreduce contribution data for the cores of that node, yielding, for each node, a local reduction result for one or more representative cores for that node; establishing one or more logical rings among the nodes, each logical ring including only one of the representative cores from each node; performing, for each logical ring, a global allreduce operation using the local reduction result for the representative cores included in that logical ring, yielding a global allreduce result for each representative core included in that logical ring; and performing, for each node, a local broadcast operation using the global allreduce results for each representative core on that node.

Type: Grant

Filed: May 21, 2008

Date of Patent: February 12, 2013

Assignee: International Business Machines Corporation

Inventor: Ahmad Faraj
ARCHITECTURE AND IMPLEMENTATION METHOD OF PROGRAMMABLE ARITHMETIC CONTROLLER FOR CRYPTOGRAPHIC APPLICATIONS

Publication number: 20130024668

Abstract: An architecture includes a controller. The controller is configured to receive a microprogram. The microprogram is configured for performing at least one of hierarchical or a sequence of polynomial computations. The architecture also includes an arithmetic logic unit (ALU) communicably coupled to the controller. The ALU is controlled by the controller. Additionally, the microprogram is compiled prior to execution by the controller, the microprogram is compiled into a plurality of binary tables, and the microprogram is programmed in a command language in which each command includes a first portion for indicating at least one of a command or data transferred to the ALU, and a second portion for including a control command to the controller. The architecture and implementation of the programmable controller may be for cryptographic applications, including those related to public key cryptography.

Type: Application

Filed: September 27, 2012

Publication date: January 24, 2013

Applicant: LSI CORPORATION

Inventor: LSI Corporation
ARITHMETIC AND CONTROL UNIT, ARITHMETHIC AND CONTROL METHOD, PROGRAM AND PARALLEL PROCESSOR

Publication number: 20130024667

Abstract: An attribute group storage unit acquires and holds attribute groups set to respective data blocks. A scenario determination unit determines respective transfer systems of the respective blocks between a memory of the lowest hierarchy and a memory of another hierarchy based on those attribute groups and a configuration of an arithmetic unit which is the parallel processor, and controls the transfer of the respective data blocks according to the determined transfer systems, and the parallel arithmetic operation corresponding to the transfer. Each of the attribute groups is necessary to determine the transfer systems, and includes one or more attributes not depending on the configuration of the parallel processor. The attribute groups of the write blocks are set assuming that each of the write blocks has already been located in the memory of another hierarchy, and is transferred to the memory of the lowest hierarchy.

Type: Application

Filed: June 21, 2012

Publication date: January 24, 2013

Applicant: Renesas Electronics Corporation

Inventor: Shorin KYO
BYTE-ORIENTED MICROCONTROLLER HAVING WIDER PROGRAM MEMORY BUS SUPPORTING MACRO INSTRUCTION EXECUTION, ACCESSING RETURN ADDRESS IN ONE CLOCK CYCLE, STORAGE ACCESSING OPERATION VIA POINTER COMBINATION, AND INCREASED POINTER ADJUSTMENT AMOUNT

Publication number: 20130013895

Abstract: An exemplary byte-oriented microcontroller includes a program memory, a program memory bus, and a core circuit. The program memory bus has a bus width wider than one instruction byte, and the core circuit is coupled to the program memory through the program memory bus for executing at least one instruction by processing a plurality of instruction bytes fetched from the program memory. The core circuit includes a fetch unit, for fetching the instruction bytes through the program memory bus and re-ordering the fetched instruction bytes to form a complete instruction.

Type: Application

Filed: July 6, 2011

Publication date: January 10, 2013

Applicant: FS-SEMI CO., LTD.

Inventor: Hsiao-Ming Huang
Methods and Apparatus for Efficient Complex Long Multiplication and Covariance Matrix Implementation

Publication number: 20130007421

Abstract: Efficient computation of complex long multiplication results and an efficient calculation of a covariance matrix are described. A parallel array VLIW digital signal processor is employed along with specialized complex long multiplication instructions and communication operations between the processing elements which are overlapped with computation to provide very high performance operation. Successive iterations of a loop of tightly packed VLIWs may be used allowing the complex multiplication pipeline hardware to be efficiently used.

Type: Application

Filed: September 13, 2012

Publication date: January 3, 2013

Applicant: ALTERA CORPORATION

Inventors: Gerald G. Pechanek, Ricardo Rodriguez, Matthew Plonski, David Strube, Kevin Coopman
INFORMATION PROCESSING DEVICE

Publication number: 20120311305

Abstract: Provided is an information processing device including an instruction cache, a data cache, first and second arithmetic unit groups including a plurality of arithmetic units capable of parallel operation, a first arithmetic-control circuit that generates one or more operation instructions for the first arithmetic unit group, and a second arithmetic-control circuit that generates one or more operation instructions for the second arithmetic unit group based on an instruction code of a fixed instruction register. The first arithmetic unit group sets the instruction code to the fixed instruction register according to an operation instruction generated based on a first specific instruction code by the first arithmetic-control circuit, and provides data to the second arithmetic unit group according to an operation instruction generated based on a second specific instruction code by the first arithmetic-control circuit.

Type: Application

Filed: May 29, 2012

Publication date: December 6, 2012

Inventors: Yuki KOBAYASHI, Shohei NOMOTO
Processor cooling management

Patent number: 8311683

Abstract: Illustrative embodiments provide a computer implemented method, a data processing system, and a computer program product for adjusting cooling settings. The computer implemented method comprises analyzing a set of instructions of an application to determine a number of degrees by which a set of instructions will raise a temperature of at least one processor core. The computer implemented method further calculates a cooling setting for at least one cooling system for the at least one processor core. The computer implemented method adjusts the at least one cooling system based on the cooling setting. The step of analyzing the set of instructions is performed before the set of instructions is executed on the at least one processor core. The step of adjusting the at least one cooling system is performed before the set of instructions is executed on the at least one processor core.

Type: Grant

Filed: April 29, 2009

Date of Patent: November 13, 2012

Assignee: International Business Machines Corporation

Inventors: Robert Lee Angell, David Wayne Cosby, Robert R. Friedlander, James R. Kraemer
Method and system for interprocedural prefetching

Patent number: 8312442

Abstract: A computing system has an amount of shared cache, and performs runtime automatic parallelization wherein when a parallelized loop is encountered, a main thread shares the workload with at least one other non-main thread. A method for providing interprocedural prefetching includes compiling source code to produce compiled code having a main thread including a parallelized loop. Prior to the parallelized loop in the main thread, the main thread includes prefetching instructions for the at least one other non-main thread that shares the workload of the parallelized loop. As a result, the main thread prefetches data into the shared cache for use by the at least one other non-main thread.

Type: Grant

Filed: December 10, 2008

Date of Patent: November 13, 2012

Assignee: Oracle America, Inc.

Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
Data processing system having bit exact instructions and methods therefor

Patent number: 8307196

Abstract: A method for operating a data processing system is provided. The method includes providing a first operand stored in a first register, providing a second operand stored in the register, providing a third operand stored in the register. The method further includes executing a first instruction, where executing the first instruction comprises: (1) retrieving the first operand, the second operand, and the third operand from the first register; (2) performing an operation using the first operand, the second operand, and the third operand to generate a bit exact result.

Type: Grant

Filed: April 5, 2006

Date of Patent: November 6, 2012

Assignee: Freescale Semiconductor, Inc.

Inventors: William C. Moyer, Imran Ahmed, Dan Tamir
Programmable controller for executing a plurality of independent sequence programs in parallel

Patent number: 8301869

Abstract: A programmable controller which executes a plurality of independent sequence programs in parallel is provided with an ASIC, including a plurality of arithmetic-logic units and a plurality of arbitration circuits, and MPUs as many as the arbitration circuits. The entire execution time of the programmable controller is shortened by changing combinations (groups of arithmetic-logic units) of the MPUs (and the arbitration circuits as many as the MPUs) and the arithmetic-logic units, based on the ratios of MPU execution instructions and ASIC execution instructions included in those instructions which constitute the programs to be executed in parallel.

Type: Grant

Filed: February 18, 2011

Date of Patent: October 30, 2012

Assignee: Fanuc Corporation

Inventors: Masuo Kokura, Yasushi Nomoto, Motoyoshi Miyachi
Method and apparatus for QR-factorizing matrix on multiprocessor system

Patent number: 8296350

Abstract: The present invention provides a method and apparatus for QR-factorizing matrix on a multiprocessor system, wherein the multiprocessor system comprises at least one core processor and a plurality of accelerators, the method comprises the steps of: iteratively factorizing each panel in the matrix until the whole matrix is factorized; wherein in each iteration, the method comprises: partitioning an unprocessed matrix part in the matrix into a plurality of blocks according to a predetermined block size; partitioning a current processed panel in the unprocessed matrix part into at least two sub panels, wherein the current processed panel is composed of a plurality of blocks; and performing QR factorization one by one on the at least two sub panels with the plurality of accelerators, and updating the data of the sub panel(s) on which no QR factorization has been performed among the at least two sub panels by using the factorization result.

Type: Grant

Filed: March 12, 2009

Date of Patent: October 23, 2012

Assignee: International Business Machines Corporation

Inventors: Hui Li, Bai Ling Wang
EMULATION OF EXECUTION MODE BANKED REGISTERS

Publication number: 20120260073

Abstract: A microprocessor includes processor modes comprising a user mode and a plurality of exception modes. An execution unit performs arithmetic operations on operands specified by program instructions. A first set of storage elements holds a first subset of the operands and provides them to the execution unit coupled thereto. A second set of storage elements associated with each of the modes hold a second subset of the operands and are incapable of directly providing the second operand subset to the execution unit. To enter a new mode from a current mode, logic saves the first operand subset held in the first set of storage elements to the second set of storage elements associated with the current mode and restores to the first set of storage elements the second operand subset held in the second set of storage elements associated with the new mode.

Type: Application

Filed: March 6, 2012

Publication date: October 11, 2012

Applicant: VIA TECHNOLOGIES, INC.

Inventors: G. Glenn Henry, Terry Parks, Rodney E. Hooker
METHOD AND APPARATUS FOR FAST BRANCH-FREE VECTOR DIVISION COMPUTATION

Publication number: 20120254585

Abstract: Methods and apparatus for double precision division/inversion vector computations on Single Instruction Multiple Data (SIMD) computing platforms are described. In one embodiment, an input argument is represented by an exponent portion and a fraction portion. These portions are scaled, inverted, and multiplied to generate an inverse version of the input argument. In an embodiment, the inversion of the exponent portion may be done by changing the sign of the exponent. Other embodiments are also described.

Type: Application

Filed: December 25, 2009

Publication date: October 4, 2012

Inventors: Andrey Kolesov, Valery Kuriakin, Maria Guseva
Algebra operation method, apparatus, and storage medium thereof

Patent number: 8276116

Abstract: An algebra operation method includes the steps of converting algebra operations for a plurality of objects which appear in a program into an algebra operation sequence object described using object access data used to access the plurality of objects and object state data used to store states associated with the plurality of objects without immediately evaluating the algebra operations, determining a function to be applied to the algebra operation sequence object, and evaluating the algebra operations by executing the function by designating an argument group required for the function in response to a call of a substitute operator.

Type: Grant

Filed: June 7, 2007

Date of Patent: September 25, 2012

Assignee: Canon Kabushiki Kaisha

Inventor: Yasuhiro Nakahara
Controlling arithmetic processing according to asynchronous and synchronous modes based upon data size threshold

Patent number: 8271767

Abstract: An HW arithmetic unit executes a predetermined arithmetic operation. An arithmetic-mode determining unit determines, based on an attribute or a content of data relating to processing that has requested the arithmetic operation, either a synchronous mode that executes the processing after waiting for completion of the arithmetic operation by an arithmetic circuit or an asynchronous mode that executes the processing without waiting for completion of the arithmetic operation by the arithmetic circuit, as an execution mode of the arithmetic operation. An arithmetic-process control unit controls the arithmetic operation by the arithmetic circuit according to the determined execution mode.

Type: Grant

Filed: June 27, 2008

Date of Patent: September 18, 2012

Assignee: Kabushiki Kaisha Toshiba

Inventors: Keisuke Mera, Takeshi Ishihara, Yasuhiro Fukuju
Array processor with two parallel processing paths of multipliers and ALUs with idle operation capability controlled by portions of opcode including indication of valid output

Patent number: 8250337

Abstract: General purpose array processing techniques including processing methods and apparatus. Processors may include parallel processing paths designed with reusable computational components such as multipliers, multiplexers, and ALUs. Flow of data through the paths and operations performed may be controlled based on opcodes. Processors may be shared, scalable, and configured to perform matrix operations. In particular, such operation may be useful for physical sections of MIMO-OFDM communication systems.

Type: Grant

Filed: April 27, 2007

Date of Patent: August 21, 2012

Assignee: Qualcomm Incorporated

Inventor: Garret Webster Shih
COMPETITION TESTING DEVICE

Publication number: 20120210101

Abstract: A competition testing apparatus for testing an access competition of an arithmetic unit includes a memory that stores a program, a first processor that executes the program by accessing the memory, a second processor that executes the program by accessing the memory, and an arbitration unit that arbitrates accessing the first processor and the second processor and reports a result of the arbitration upon the first processor and the second processor accessing the same address space in the memory, wherein the memory stores a odd number of programs, further comprises a controller that controls the first processor to process the plurality of test programs stored in the storage in predetermined order, and controls the second processor to process the plurality of test programs stored in the storage in order reverse to the predetermined order, and a recording unit that records the result of arbitration performed using the arbitrator.

Type: Application

Filed: August 12, 2011

Publication date: August 16, 2012

Applicant: FUJITSU LIMITED

Inventor: Yasushi ASANO
CONFIGURABLE PIPELINE BASED ON ERROR DETECTION MODE IN A DATA PROCESSING SYSTEM

Publication number: 20120204012

Abstract: A method includes providing a data processor having an instruction pipeline, where the instruction pipeline has a plurality of instruction pipeline stages, and where the plurality of instruction pipeline stages includes a first instruction pipeline stage and a second instruction pipeline stage. The method further includes providing a data processor instruction that causes the data processor to perform a first set of computational operations during execution of the data processor instruction, performing the first set of computational operations in the first instruction pipeline stage if the data processor instruction is being executed and a first mode has been selected, and performing the first set of computational operations in the second instruction pipeline stage if the data processor instruction is being executed and a second mode has been selected.

Type: Application

Filed: April 13, 2012

Publication date: August 9, 2012

Applicant: Rambus Inc.

Inventors: William C. Moyer, Jeffrey W. Scott
ARITHMETIC UNIT AND ARITHMETIC PROCESSING METHOD FOR OPERATING WITH HIGHER AND LOWER CLOCK FREQUENCIES

Publication number: 20120198211

Abstract: There is a need for providing a battery-less integrated circuit (IC) card capable of operating in accordance with a contact usage or a non-contact usage, preventing coprocessor throughput from degrading despite a decreased clock frequency for reduced power consumption under non-contact usage, and ensuring high-speed processing under non-contact usage. A dual interface card is a battery-less IC card capable of operating in accordance with a contact usage or a non-contact usage. The dual interface card operates at a high clock under contact usage and at a low clock under non-contact usage. A targeted operation comprises a plurality of different basic operations. The dual interface card comprises a basic arithmetic circuit group. Under the contact usage, the basic arithmetic circuit group performs one basic operation of the targeted operation at one cycle. Under the non-contact usage, the basic arithmetic circuit group sequentially performs at least two basic operations of the targeted operation at one cycle.

Type: Application

Filed: April 10, 2012

Publication date: August 2, 2012

Applicant: Renesas Electronics Corporation

Inventors: Daisuke SUZUKI, Minoru Saeki, Yuichiro Nariyoshi
Microprocessor and Method for Enhanced Precision Sum-of-Products Calculation on a Microprocessor

Publication number: 20120198212

Abstract: A microprocessor, a method for enhanced precision sum-of-products calculation and a video decoding device are provided, in which at least one general-purpose-register is arranged to provide a number of destination bits to a multiply unit, and a control unit is adapted to provide at least a multiply-high instruction and a multiply-high-and-accumulate instruction to the multiply unit. The multiply unit is arranged to receive at least first and second source operands having an associated number of source bits, a sum of source bits exceeding the number of destination bits, connected to a register-extension cache comprising at least one cache entry arranged to store a number of precision-enhancement bits, and adapted to store a destination portion of a result operand in the general-purpose-register and a precision enhancement portion in the cache entry. The result operand is generated by a multiply-high operation or by a multiply-high-and-accumulate operation, depending on the received instructions.

Type: Application

Filed: November 30, 2009

Publication date: August 2, 2012

Inventor: Martin Raubuch
SIMD type microprocessor having processing elements that have plural determining units

Patent number: 8219783

Abstract: An SIMD type microprocessor is disclosed. The SIMD type microprocessor includes plural PEs (processor elements) each of which provides an ALU (arithmetic and logic unit) for lower-order bits, an ALU for upper-order bits, a control circuit for lower-order bits, a control circuit for upper-order bits, a range determining circuit for lower-order bits, and a range determining circuit for upper-order bits. The SIMD type microprocessor further includes a global processor, a range designation bus for lower-order bits which connects the global processor to the range determining circuit for lower-order bits, and a range designation bus for upper-order bits which connects the global processor to the range determining circuit for upper-order bits.

Type: Grant

Filed: July 3, 2008

Date of Patent: July 10, 2012

Assignee: Ricoh Company, Ltd.

Inventor: Kazuhiko Hara
Method and apparatus for shuffling data

Patent number: 8214626

Abstract: Method, apparatus, and program means for shuffling data. The method of one embodiment comprises receiving a first operand having a set of L data elements and a second operand having a set of L control elements. For each control element, data from a first operand data element designated by the individual control element is shuffled to an associated resultant data element position if its flush to zero field is not set and a zero is placed into the associated resultant data element position if its flush to zero field is not set.

Type: Grant

Filed: March 31, 2009

Date of Patent: July 3, 2012

Assignee: Intel Corporation

Inventors: William W. Macy, Jr., Eric L. Debes, Patrice L. Roussel, Huy V. Nguyen
HASH PROCESSING USING A PROCESSOR

Publication number: 20120166773

Abstract: In certain embodiments, a digital signal processor (DSP) has multiple arithmetic logic units and a register module. The DSP is adapted to generate a message digest H from a message M in accordance with the SHA-1 standard, where M includes N blocks M(i), i=1, . . . , N, and the processing of each block M(i) includes t iterations of processing words of message schedule {Wt}. In each iteration possible, the DSP uses free operations to precalculate Wt and working variable values for use in the next iteration. In addition, in each iteration possible, the DSP rotates the registers associated with particular working variables to reduce operations that merely copy unchanged values from one register to another.

Type: Application

Filed: September 16, 2009

Publication date: June 28, 2012

Applicant: LSI CORPORATION

Inventors: Dmitriy Vladimirovich Alekseev, Alexei Vladimirovich Galatenko, Ilya Viktorovich Lyalin, Alexander Markovic, Denis Vassilevich Parfenov
Apparatus and method for performing re-arrangement operations on data

Patent number: 8200948

Abstract: An apparatus and method are provided for performing re-arrangement operations on data. The data processing apparatus has a register data store with a plurality of registers for storing data, and processing logic for performing a sequence of operations on data including at least one re-arrangement operation. The processing logic has scalar processing logic for performing scalar operations and SIMD processing logic for performing SIMD operations. The SIMD processing logic is responsive to a re-arrangement instruction specifying a family of re-arrangement operations to perform a selected re-arrangement operation from that family on a plurality of data elements constituted by data in one or more registers identified by the re-arrangement instruction. The selected re-arrangement operation is dependent on at least one parameter provided by the scalar processing logic, that parameter identifying a data element width for the data elements on which the selected re-arrangement operation is performed.

Type: Grant

Filed: December 4, 2007

Date of Patent: June 12, 2012

Assignee: ARM Limited

Inventors: Daniel Kershaw, Dominic Hugo Symes, Alastair Reid
Multipurpose arithmetic functional unit

Patent number: 8190669

Abstract: Multipurpose arithmetic functional units can perform planar attribute interpolation and unary function approximation operations. In one embodiment, planar interpolation operations for coordinates (x, y) are executed by computing A*x+B*y+C, and unary function approximation operations for operand x are executed by computing F2(xb)*xh2+F1(xb)*xh+F0(xb), where xh=x?xb. Shared multiplier and adder circuits are advantageously used to implement the product and sum operations for both classes of operations.

Type: Grant

Filed: October 20, 2004

Date of Patent: May 29, 2012

Assignee: NVIDIA Corporation

Inventors: Stuart F. Oberman, Ming Y. Siu
Method and apparatus to extract integer and fractional components from floating-point data

Patent number: 8185723

Abstract: A method is presented including decomposing a first value into many parts. Decomposing includes shifting (310) a rounded integer portion of the first value to generate a second value. Generating (320) a third value. Extracting (330) a plurality of significand bits from the second value to generate a fourth value. Extracting (340) a portion of bits from the fourth value to generate an integer component. Generating (350) a fifth value. Also the third value, the fifth value, and the integer component are either stored (360, 380) in a memory or transmitted to an arithmetic logical unit (ALU).

Type: Grant

Filed: July 13, 2001

Date of Patent: May 22, 2012

Assignee: Intel Corporation

Inventors: Robert S. Norin, Olga Dryzhakova, Alexander Isaev, Andrey Naraikin
Dual function adder for computing a hardware prefetch address and an arithmetic operation value

Patent number: 8185721

Abstract: A system including a dual function adder is described. In one embodiment, the system includes an adder. The adder is configured for a first instruction to determine an address for a hardware prefetch if the first instruction is a hardware prefetch instruction. The adder is further configured for the first instruction to determine a value from an arithmetic operation if the first instruction is an arithmetic operation instruction.

Type: Grant

Filed: March 4, 2008

Date of Patent: May 22, 2012

Assignee: QUALCOMM Incorporated

Inventors: Ajay Anant Ingle, Erich James Plondke, Lucian Codrescu
Processor Architecture for Executing Wide Transform Slice Instructions

Publication number: 20120117441

Abstract: A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.

Type: Application

Filed: January 19, 2012

Publication date: May 10, 2012

Applicant: MicroUnity Systems Engineering, Inc.

Inventors: Craig Hansen, John Moussouris, Alexia Massalin
Performing an allreduce operation on a plurality of compute nodes of a parallel computer

Patent number: 8161268

Abstract: Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.

Type: Grant

Filed: May 21, 2008

Date of Patent: April 17, 2012

Assignee: International Business Machines Corporation

Inventor: Ahmad Faraj
ARITHMETIC LOGIC AND SHIFTING DEVICE FOR USE IN A PROCESSOR

Publication number: 20120083912

Abstract: An arithmetic logic and shifting device is disclosed and includes an arithmetic logic unit that has a first input to receive a first operand from a first register port, a second input to receive a second operand from a second register port, and an output to selectively provide a memory address to a memory unit in a first mode of operation and to selectively provide an arithmetic output in a second mode of operation. Further, the arithmetic logic and shifting device includes a programmable shifter device that has a first input to receive data from the memory unit, a second input to receive the arithmetic output, a third input to receive an operation code of a computer execution instruction, and a shifted output to provide shifted data.

Type: Application

Filed: December 8, 2011

Publication date: April 5, 2012

Applicant: QUALCOMM INCORPORATED

Inventors: Muhammad Ahmed, Ajay Anant Ingle, Sujat Jamil
Processing with compact arithmetic processing element

Patent number: 8150902

Abstract: A processor or other device, such as a programmable and/or massively parallel processor or other device, includes processing elements designed to perform arithmetic operations (possibly but not necessarily including, for example, one or more of addition, multiplication, subtraction, and division) on numerical values of low precision but high dynamic range (“LPHDR arithmetic”). Such a processor or other device may, for example, be implemented on a single chip. Whether or not implemented on a single chip, the number of LPHDR arithmetic elements in the processor or other device in certain embodiments of the present invention significantly exceeds (e.g., by at least 20 more than three times) the number of arithmetic elements, if any, in the processor or other device which are designed to perform high dynamic range arithmetic of traditional precision (such as 32 bit or 64 bit floating point arithmetic).

Type: Grant

Filed: June 15, 2010

Date of Patent: April 3, 2012

Assignee: Singular Computing LLC

Inventor: Joseph Bates
FUNCTIONAL UNIT CAPABLE OF EXECUTING APPROXIMATIONS OF FUNCTIONS

Publication number: 20120079250

Abstract: A semiconductor chip is described having a functional unit that can execute a first instruction and execute a second instruction. The first instruction is an instruction that multiplies two operands. The second instruction is an instruction that approximates a function according to C0+C1X2+C2X22. The functional unit has a multiplier circuit. The multiplier circuit has: i) a first input to receive bits of a first operand of the first instruction and receive bits of a C1 term of the second instruction; ii) a second input to receive bits of a second operand of the first instruction and receive bits of a X2 term of the second instruction.

Type: Application

Filed: September 24, 2010

Publication date: March 29, 2012

Inventors: Alex Pineiro, Thomas E. Fletcher, Brian J. Hickmann
Next-instruction-type-field

Publication number: 20120079243

Abstract: A graphics processing unit core 26 includes a plurality of processing pipelines 38, 40, 42, 44. A program instruction of a thread of program instructions being executed by a processing pipeline includes a next-instruction-type field 36 indicating an instruction type of a next program instruction following the current program instruction within the processing thread concerned. This next-instruction-type field is used to control selection of to which processing pipeline the next instruction is issued before that next instruction has been fetched and decoded. The next-instruction-type field may be passed along the processing pipeline as the least significant four bits within a program counter value associated with a current program instruction 32. The next-instruction-type field may also be used to control the forwarding of thread state variables between processing pipelines when a thread migrates between processing pipelines prior to the next program instruction being fetched or decoded.

Type: Application

Filed: September 1, 2011

Publication date: March 29, 2012

Applicant: ARM Limited

Inventor: Jorn Nystad
MULTIPLY ADD FUNCTIONAL UNIT CAPABLE OF EXECUTING SCALE, ROUND, GETEXP, ROUND, GETMANT, REDUCE, RANGE AND CLASS INSTRUCTIONS

Publication number: 20120079251

Abstract: A method is described that involves executing a first instruction with a functional unit. The first instruction is a multiply-add instruction. The method further includes executing a second instruction with the functional unit. The second instruction is a round instruction.

Type: Application

Filed: September 24, 2010

Publication date: March 29, 2012

Inventors: Amit Gradstein, Cristina S. Anderson, Zeev Sperber, Simon Rubanovich, Benny Eitan
SPLIT PATH MULTIPLY ACCUMULATE UNIT

Publication number: 20120072703

Abstract: In one embodiment, a processor includes a multiply-accumulate (MAC) unit having a first path to handle execution of an instruction if a difference between at least a portion of first and second operands and a third operand is less than a threshold value, and a second path to handle the instruction execution if the difference is greater than the threshold value. Based on the difference, at least part of the third operand is to be provided to a multiplier of the MAC unit or to a compressor of the second path. Other embodiments are described and claimed.

Type: Application

Filed: September 20, 2010

Publication date: March 22, 2012

Inventors: SURESH SRINIVASAN, RAJARAM RAMANARAYANAN, SANU K. MATHEW, RAM K. KRISHNAMURTHY, VASANTHA K. ERRAGUNTLA
REDUCTION OPERATION DEVICE, A PROCESSOR, AND A COMPUTER SYSTEM

Publication number: 20120060019

Abstract: A reduction operation device detects a non-correspondence of an operation type or a data type in a reduction arithmetic operation of a parallel processing. The reduction operation device is inputted a plurality of the synchronization signals and data, sets each transmission destinations of the plurality of inputted synchronization signals and the plurality of data corresponding to a next stage of a reduction operation and executes the reduction operation. The synchronization unit in the reduction operation device detects the non-correspondence between the operation type or the data type included in an instruction of the reduction operation after the synchronization is established and controls the arithmetic operation of the arithmetic unit.

Type: Application

Filed: June 22, 2011

Publication date: March 8, 2012

Applicant: Fujitsu Limited

Inventors: Shinya Hiramoto, Yuichiro Ajima, Tomohiro Inoue
Method for achieving arbitrary precision

Patent number: 8131504

Abstract: A system includes a serial connection mode for obtaining a first approximation to a zero error result by means of a negative rough precision for manufacturing a plurality of first semi-finished products, and a measurement apparatus for measuring a precision value of each first semi-finished product, and a Full-9 Principle for sifting the first semi-finished products. A parallel connection mode is used for obtaining a second approximation to the zero error result by means of a positive rough precision by division to manufacture a plurality of second semi-finished products, and the measurement apparatus is used to measure a precision value of each second semi-finished product, and an error sift formula is utilized to sift the second semi-finished products.

Type: Grant

Filed: September 14, 2007

Date of Patent: March 6, 2012

Inventor: Martin Jo
Apparatus, method, and computer readable medium thereof for generating and utilizing a feature code to monitor a program

Patent number: 8127276

Abstract: Apparatus, method, and computer readable medium for generating and utilizing a feature code to monitor a program are provided. The program is run in a secure environment at the beginning. The program calls a function through an application program interface. A return address of the application program interface is used to generate the feature code. When the application runs again at another time, the feature code is utilized to monitor the program. According to the aforementioned arrangement and steps, the application program interface can be monitored dynamically. Consequently, any program can be monitored by this approach, which results in a more secure environment. Further, fewer application program interfaces are required to be monitored, so the required computer resource is less.

Type: Grant

Filed: April 3, 2007

Date of Patent: February 28, 2012

Assignee: Institute for Information Industry

Inventors: Cheng-Kai Chen, Hung Min Sun, Kang-Chiao Lin, Shih-Ying Chang, Shuai-Min Chen

prev 1 2 3 4 5 6 7 8 9 … next