Arithmetic Operation Instruction Processing Patents (Class 712/221)

Floating point or vector (Class 712/222)

DIGITAL PROCESSOR HAVING INSTRUCTION SET WITH COMPLEX EXPONENTIAL NON-LINEAR FUNCTION

Publication number: 20140075162

Abstract: A digital processor is provided having an instruction set with a complex exponential function. The digital processor evaluates a complex exponential function for an input value, x, by obtaining a complex exponential software instruction having the input value, x, as an input; and in response to the complex exponential software instruction: invoking at least one complex exponential functional unit that implements complex exponential software instructions to apply the complex exponential function to the input value, x; and generating an output corresponding to the complex exponential of the input value, x. A complex exponential function for an input value, x, can be evaluated by wrapping the input value to maintain a given range; computing a coarse approximation angle using a look-up table; scaling the coarse approximation angle to obtain an angle from 0 to ?; and computing a fine corrective value using a polynomial approximation.

Type: Application

Filed: October 26, 2012

Publication date: March 13, 2014

Applicant: LSI Corporation

Inventors: Kameran Azadet, Albert Molina, Joseph H. Othmer, Parakalan Venkataraghavan, Meng-Lin Yu, Joseph Williams
CENTRAL PROCESSING UNIT AND ARITHMETIC UNIT

Publication number: 20140068231

Abstract: There is a need to provide a central processing unit capable of improving the resistance to power analysis attack without changing programs, lowering clock frequencies, and greatly redesigning a central processing unit of the related art. In a central processing unit, an arithmetic unit is capable of performing arithmetic operation using data irrelevant to data stored in a register group. A control unit allows the arithmetic unit to perform arithmetic processing corresponding to an incorporated instruction. At this time, the control unit allows the arithmetic unit to perform arithmetic processing using the irrelevant data during a first one-clock cycle.

Type: Application

Filed: August 29, 2013

Publication date: March 6, 2014

Applicant: Renesas Electronics Corporation

Inventor: Minoru SAEKI
Variable clocked heterogeneous serial array processor

Patent number: 8656143

Abstract: A serial array processor may have an execution unit, which is comprised of a multiplicity of single bit arithmetic logic units (ALUs), and which may perform parallel operations on a subset of all the words in memory by serially accessing and processing them, one bit at a time, while an instruction unit of the processor is pre-fetching the next instruction, a word at a time, in a manner orthogonal to the execution unit.

Type: Grant

Filed: February 3, 2010

Date of Patent: February 18, 2014

Inventor: Laurence H. Cooke
Residual Addition for Video Software Techniques

Publication number: 20140047220

Abstract: According to some embodiments, a technique provides for the execution of an instruction that includes receiving residual data of a first image and decoded pixels of a second image, zero-extending a plurality of unsigned data operands of the decoded pixels producing a plurality of unpacked data operands, adding a plurality of signed data operands of the residual data to the plurality of unpacked data operands producing a plurality of signed results; and saturating the plurality of signed results producing a plurality of unsigned results.

Type: Application

Filed: October 11, 2013

Publication date: February 13, 2014

Inventors: BRADLEY ALDRICH, NIGEL PAVER, MURLI GANESHAN
System and method for implementing elliptic curve scalar multiplication in cryptography

Patent number: 8649508

Abstract: A system and method for implementing the Elliptic Curve scalar multiplication method in cryptography, where the Double Base Number System is expressed in decreasing order of exponents and further on using it to determine Elliptic curve scalar multiplication over a finite elliptic curve.

Type: Grant

Filed: September 29, 2008

Date of Patent: February 11, 2014

Assignee: Tata Consultancy Services Ltd.

Inventor: Natarajan Vijayarangan
ARITHMETIC PROCESSING APPARATUS AND METHOD FOR HIGH SPEED PROCESSING OF APPLICATION

Publication number: 20140025934

Abstract: An arithmetic processing apparatus and method for high speed processing of an application are provided. The arithmetic processing apparatus may include a program control unit to store operation processing information necessary for application operation in a communication channel by executing an application code, and an operation processing unit to process the application operation using the operation processing information stored in the communication channel.

Type: Application

Filed: July 18, 2013

Publication date: January 23, 2014

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Joon Ho SONG, Shi Hwa Lee, Do Hyung Kim
Mathematical operation processing apparatus for performing high speed mathematical operations

Patent number: 8635434

Abstract: A mathematical operation processing apparatus is disclosed by which the supply of an operand which is performed based on condition codes by a plurality of mathematical operations can be performed at a high speed. The mathematical operation processing apparatus includes a plurality of computing elements configured to perform different mathematical operations different from one another and produce mathematical operation results of the mathematical operations and condition codes. A condition code set register retains the condition codes produced simultaneously by the computing elements as a condition code set. A condition code conversion section performs a predetermined conversion for the condition code set and outputs a result of the conversion as a conversion condition code set. An operand supplying section supplies an operand for the mathematical operations in the computing elements based on the conversion condition code set.

Type: Grant

Filed: December 4, 2007

Date of Patent: January 21, 2014

Assignee: Sony Corporation

Inventors: Yasuhiro Iizuka, Takahiro Sato, Takayasu Kon, Kenichi Sanpei, Eiichiro Morinaga
Managing and implementing metadata in central processing unit using register extensions

Patent number: 8635415

Abstract: A set of default registers of a processor are expanded into metadata registers on the processor of a computer system. The default registers having stored thereon data, while metadata which is related to the data is stored separately on the metadata registers.

Type: Grant

Filed: September 30, 2009

Date of Patent: January 21, 2014

Assignee: Intel Corporation

Inventors: Baiju V. Patel, Rajeev Gopalakrishna, Andrew F. Glew, Robert J. Kushlis, Don Alan Van Dyke, Joseph Frank Cihula, Asit K. Mallick, James B. Crossland, Gilbert Neiger, Scott Dion Rodgers, Martin Guy Dixon, Mark Jay Charney, Jacob (Koby) Gottlieb
MODIFIED BALANCED THROUGHPUT DATA-PATH ARCHITECTURE FOR SPECIAL CORRELATION APPLICATIONS

Publication number: 20140019727

Abstract: Apparatus and method for a modified, balanced throughput data-path architecture is given for efficiently implementing the digital signal processing algorithms of filtering, convolution and correlation in computer hardware, in which both data and coefficient buffers can be implemented as sliding windows. This architecture uses a multiplexer and a data path branch from the Address Generator unit to the multiply-accumulate execution unit. By selecting between the data path of Address Generator to execution unit and the data path of register to execution unit, the unbalanced throughput and multiply-accumulate bubble cycles caused by misaligned addressing on coefficients can be overcome. The modified balanced throughput data-path architecture can achieve a high multiply-accumulate operation rate per cycle in implementing digital signal processing algorithms.

Type: Application

Filed: July 8, 2013

Publication date: January 16, 2014

Inventors: PengFei ZHU, HongXia SUN, YongQiang WU, Elio GUIDETTI
METHOD FOR FAST LARGE-INTEGER ARITHMETIC ON IA PROCESSORS

Publication number: 20140019725

Abstract: Methods, systems, and apparatuses are disclosed for implementing fast large-integer arithmetic within an integrated circuit, such as on IA (Intel Architecture) processors, in which such means include receiving a 512-bit value for squaring, the 512-bit value having eight sub-elements each of 64-bits and performing a 512-bit squaring algorithm by: (i) multiplying every one of the eight sub-elements by itself to yield a square of each of the eight sub-elements, the eight squared sub-elements collectively identified as T1, (ii) multiplying every one of the eight sub-elements by the other remaining seven of the eight sub-elements to yield an asymmetric intermediate result having seven diagonals therein, wherein each of the seven diagonals are of a different length, (iii) reorganizing the asymmetric intermediate result having the seven diagonals therein into a symmetric intermediate result having four diagonals each of 7×1 sub-elements of the 64-bits in length arranged across a plurality of columns, (iv) adding all

Type: Application

Filed: December 6, 2012

Publication date: January 16, 2014

Inventors: ERDINC OZTURK, VINODH GOPAL, JAMES GUILFORD
PARALLEL ARITHMETIC DEVICE, DATA PROCESSING SYSTEM WITH PARALLEL ARITHMETIC DEVICE, AND DATA PROCESSING PROGRAM

Publication number: 20140019726

Abstract: A parallel arithmetic device includes a status management section, a plurality of processor elements, and a plurality of switch elements for determining the relation of coupling of each of the processor elements. Each of the processor elements includes an instruction memory for memorizing a plurality of operation instructions corresponding respectively to a plurality of contexts so that an operation instruction corresponding to the context selected by the status management section is read out, and a plurality of arithmetic units for performing arithmetic processes in parallel on a plurality of sets of input data in a manner compliant with the operation instruction read out from the instruction memory.

Type: Application

Filed: July 5, 2013

Publication date: January 16, 2014

Inventors: Takao TOI, Taro FUJII, Yoshinosuke KATO, Toshiro KITAOKA
SIMD dot product operations with overlapped operands

Patent number: 8631224

Abstract: A data processing system includes a plurality of general purpose registers, and processor circuitry for executing one or more instructions, including a vector dot product instruction for simultaneously performing at least two dot products. The vector dot product instruction identifies a first and second source register, each for storing a plurality of vector elements, where a first dot product is to be performed between a first subset of vector elements of the first source register and a first subset of vector elements of the second source register, and a second dot product is to be performed between a second subset of vector elements of the first source register and a second subset of vector elements of the second source register. The first and second subsets of the second source register are different and at least two vector elements of the first and second subsets of the second source register overlap.

Type: Grant

Filed: September 13, 2007

Date of Patent: January 14, 2014

Assignee: Freescale Semiconductor, Inc.

Inventor: William C. Moyer
ADDITION INSTRUCTIONS WITH INDEPENDENT CARRY CHAINS

Publication number: 20140013086

Abstract: A number of addition instructions are provided that have no data dependency between each other. A first addition instruction stores its carry output in a first flag of a flags register without modifying a second flag in the flags register. A second addition instruction stores its carry output in the second flag of the flags register without modifying the first flag in the flags register.

Type: Application

Filed: December 22, 2011

Publication date: January 9, 2014

Inventors: Vinodh Gopal, James D. Guilford, Gilbert M. Wolrich, Wajdi K. Feghali, Erdinc Ozturk, Martin G. Dixon, Sean P. Mirkes, Matthew C. Merten, Tong Li, Bret T. Toll, I
Data processing device

Patent number: 8627046

Abstract: A data processing device has an instruction decoder, a control logic unit, and ALU. The instruction decoder decodes instruction codes of an arithmetic instruction. The control logic unit detects the effective data width of operation data to be processed according to the decode result from the instruction decoder and determines the number of cycles for the instruction execution corresponding to the effective, data width. The ALU executes the instruction with the number of cycles of the instruction execution determined by the control logic unit.

Type: Grant

Filed: May 23, 2011

Date of Patent: January 7, 2014

Assignee: Renesas Electronics Corporation

Inventors: Sugako Ohtani, Hiroyuki Kondo
SYSTEM AND METHOD FOR PERFORMING PREDICATED SELECTION OF AN OUTPUT REGISTER

Publication number: 20140006754

Abstract: A system includes a processor having an instruction register for storing an instruction having a predefined opcode, a predicate register for storing a predicate condition to select an output register for a result of the instruction, a first output register, and a second output register. The processor further includes processor circuitry operable to execute the instruction to produce a result, and processor circuitry operable to store the result of the instruction in the first output register if the predicate condition to select the output is true, and to store the second output register if the predicate condition to select the output is false. A single instruction is used to produce the result, and to store the result of the instruction.

Type: Application

Filed: September 5, 2013

Publication date: January 2, 2014

Applicant: NVIDIA Corporation

Inventors: Timo Oskari Aila, Samuli Matias Laine
MATRIX MULTIPLY ACCUMULATE INSTRUCTION

Publication number: 20140006753

Abstract: A method is described. The method includes iteratively performing for each position in a result matrix stored in a third register, multiplying a value at a matrix position stored in a first register with a value at a matrix position stored in a second register to obtain a first multiplicative value, where the positions in the first register and the second register are determined by the position in the result matrix and performing an exclusive or (XOR) operation with the first multiplicative value and a value stored at a result matrix position stored in the third register to obtain a result value.

Type: Application

Filed: December 22, 2011

Publication date: January 2, 2014

Inventors: Vinodh Gopal, Gilbert M. Wolrich, Kirk S. Yap, James D. Guilford, Erdinc Ozturk, Sean M. Gulley, Wajdi K. Feghali, Martin G. Dixon
ARITHMETIC PROCESSING APPARATUS, AND CACHE MEMORY CONTROL DEVICE AND CACHE MEMORY CONTROL METHOD

Publication number: 20130346730

Abstract: An arithmetic processing apparatus includes a plurality of processors, each of the processors having an arithmetic unit and a cache memory. The processor includes an instruction port that holds a plurality of instructions accessing data of the cache memory, a first determination unit that validates a first flag when receiving an invalidation request for data in the cache memory, a cache index of a target address and a way ID of the received request match with a cache index of a designated address and a way ID of the load instruction, a second determination unit that validates a second flag when target data is transmitted due to a cache miss, and an instruction re-execution determination unit that instructs re-execution of an instruction subsequent to the load instruction when both the first flag and the second flag are validated at the time of completion of an instruction in the instruction port.

Type: Application

Filed: April 30, 2013

Publication date: December 26, 2013

Applicant: FUJITSU LIMITED

Inventor: Naohiro KIYOTA
MULTIPLY-AND-ACCUMULATE OPERATION IN AN IMPLANTABLE MICROCONTROLLER

Publication number: 20130339677

Abstract: The invention provides microprocessor extensions for cooperating with a sequential arithmetic-logic unit (ALU) to execute a multiply-and-accumulate operation (MAc). The ALU performs a continuous sequence of accumulation instructions synchronously with a clock signal (CLK1). Buffers (BUF1, BUF2) store input data which are fed to a combinatorial multiplier (MULT) by first buses (L1, L2). A second bus (N1) forwards the product to the ALU, where it is accumulated with previous data. Since at least the first buses operate independently of the clock signal, they do not limit the speed of the MAc operation. In particular embodiments, a finite state machine (FSM) controls the buses on the basis of triggers, e.g., signals from the multiplier and/or ALU indicating the completion of their respective instructions. The FSM may be operable in a low-power mode. The invention also relates to methods, computer programs and the use of a sequential ALU for executing MAc operations.

Type: Application

Filed: April 14, 2011

Publication date: December 19, 2013

Applicant: ST. JUDE MEDICAL AB

Inventor: Mattias Tullberg
Image forming apparatus using logical arithmetic processing and image forming program using logical arithmetic processing

Patent number: 8610947

Abstract: An image processing apparatus includes an interpreting unit that interprets an order of the logical arithmetic processing and a kind of a logical arithmetic processing; and a drawing unit that, in a case of drawing the image information as raster data, draws from an element of an upper-order side in order of the logical arithmetic processing interpreted by the interpreting unit with respect to an area that is interpreted to be processed by a simple overwrite processing for giving priority to an uppermost-order side element as the kind of the logical arithmetic processing, and draws using a calculation sequentially from an element of a lower-order side in order of the logical arithmetic processing interpreted by the interpreting unit with respect to an area that is interpreted to be processed by a logical arithmetic processing for using the calculation as to the overlapped elements as the kind of the logical arithmetic processing.

Type: Grant

Filed: August 20, 2009

Date of Patent: December 17, 2013

Assignee: Fuji Xerox Co., Ltd

Inventor: Shusuke Tanimoto
CO-PROCESSOR FOR COMPLEX ARITHMETIC PROCESSING, AND PROCESSOR SYSTEM

Publication number: 20130318329

Abstract: In order to enable to quickly and efficiently execute, by one system, various modulation/demodulation/synchronous processes in a plurality of radio communication methods, a co-processor (22) for complex arithmetic processing, which forms a processor system (100), includes a complex arithmetic circuit (22) that executes for complex data a complex arithmetic operation required for radio communication in accordance with an instruction from a primary processor (10), and a memory controller (20, 21) that operates in parallel with the complex arithmetic circuit and accesses a memory. A trace circuit provided in the complex arithmetic circuit (22) monitors arithmetic result data for first complex data series sequentially read from the memory, and detects a normalization coefficient for normalizing the arithmetic result data.

Type: Application

Filed: September 15, 2011

Publication date: November 28, 2013

Applicant: NEC CORPORATION

Inventors: Toshiki Takeuchi, Hiroyuki Igura
DSP performing instruction analyzed m-bit processing of data stored in memory with truncation / extension via data exchange unit

Patent number: 8595470

Abstract: A digital signal processor includes an instruction analysis unit, a digital signal processor (DSP) core and a memory unit. The instruction analysis unit receives an instruction and determines the required bit width M for the data process corresponding to the instruction. The DSP core performs the M-bit data process based on the bit width M determined by the instruction analysis unit, and the memory unit stores multiple data and performs the M-bit access based on the bit width M determined by the instruction analysis unit thereby allowing the DSP core to access, and at least one available space in the memory unit will be adjusted such that only the access space having the bit width M for the operation corresponding to the instruction will be open in each access, thereby effectively achieving the effect of power-saving.

Type: Grant

Filed: November 9, 2010

Date of Patent: November 26, 2013

Assignee: Sentelic Corporation

Inventors: Zhiyang Guo, Mao-Sung Wu, Chun Hsien, Tsai-Lin Lee
SEMICONDUCTOR INTEGRATED CIRCUIT AND OPERATION METHOD THEREOF

Publication number: 20130310983

Abstract: It is intended to reduce the amount of computation to be performed by CPU or the required amount of storage space in a built-in memory for timing adjustment of a pulse output signal. A digital multiplying circuit in the phase arithmetic circuit of the pulse generating circuit generates a multiplication output signal by multiplying a phase angle change value in the phase adjustment data register and a count maximum value Nmax in the cycle data register. A digital dividing circuit generates a division output signal by dividing the multiplication output signal by 360 degrees of phase angle for one cycle. A digital adding circuit adds the division output signal and rise setting/fall setting count values and a subtracting circuit subtracts the division output signal from these values. The addition and subtraction generate new rise setting/fall setting count values required to delay/advance the phase by the phase angle change value.

Type: Application

Filed: May 6, 2013

Publication date: November 21, 2013

Applicant: Renesas Electronics Corporation

Inventors: Takehiro SHIMIZU, Toshio ASAI
Bandwidth efficient instruction-driven multiplication engine

Patent number: 8589469

Abstract: Multiplication engines and multiplication methods are provided for a digital processor.

Type: Grant

Filed: January 10, 2008

Date of Patent: November 19, 2013

Assignee: Analog Devices Technology

Inventors: Andreas D. Olofsson, Baruch Yanovitch
Instruction support for performing montgomery multiplication

Patent number: 8583902

Abstract: Techniques are disclosed relating to a processor including instruction support for performing a Montgomery multiplication. The processor may issue, for execution, programmer-selectable instruction from a defined instruction set architecture (ISA). The processor may include an instruction execution unit configured to receive instructions including a first instance of a Montgomery-multiply instruction defined within the ISA. The Montgomery-multiply instruction is executable by the processor to operate on at least operands A, B, and N residing in respective portions of a general-purpose register file of the processor, where at least one of operands A, B, N spans at least two registers of general-purpose register file. The instruction execution unit is configured to calculate P mod N in response to receiving the first instance of the Montgomery-multiply instruction, where P is the product of at least operand A, operand B, and R^?1.

Type: Grant

Filed: May 7, 2010

Date of Patent: November 12, 2013

Assignee: Oracle International Corporation

Inventors: Christopher H. Olson, Gregory F. Grohoski, Lawrence Spracklen, Nils Gura
Helical band geometry for dynamical topology changing

Patent number: 8583903

Abstract: Disclosed herein are efficient geometries for dynamical topology changing (DTC), together with protocols to incorporate DTC into quantum computation. Given an Ising system, twisted depletion to implement a logical gate T, anyonic state teleportation into and out of the topology altering structure, and certain geometries of the (1,?2)-bands, a classical computer can be enabled to implement a quantum algorithm.

Type: Grant

Filed: December 28, 2010

Date of Patent: November 12, 2013

Assignee: Microsoft Corporation

Inventors: Michael Freedman, Parsa Bonderson, Chetan Nayak, Sankar Das Sarma
SEMICONDUCTOR DEVICE

Publication number: 20130297916

Abstract: A related art semiconductor device suffers from a problem that a processing capacity is decayed by switching an occupied state for each partition. A semiconductor device according to the present invention includes an execution unit that executes an arithmetic instruction, and a scheduler including multiple first setting registers each defining a correspondence relationship between hardware threads and partitions, and generates a thread select signal on the basis of a partition schedule and a thread schedule. The scheduler outputs a thread select signal designating a specific hardware thread without depending on the thread schedule as the partition indicated by a first occupation control signal according to a first occupation control signal output when the execution unit executes a first occupation start instruction.

Type: Application

Filed: April 9, 2013

Publication date: November 7, 2013

Applicant: Renesas Electronics Corporation

Inventors: Hitoshi Suzuki, Koji Adachi
DATA PACKET ARITHMETIC LOGIC DEVICES AND MEHTODS

Publication number: 20130290684

Abstract: New instruction definitions for a packet add (PADD) operation and for a single instruction multiple add (SMAD) operation are disclosed. In addition, a new dedicated PADD logic device that performs the PADD operation in about one to two processor clock cycles is disclosed. Also, a new dedicated SMAD logic device that performs a single instruction multiple data add (SMAD) operation in about one to two clock cycles is disclosed.

Type: Application

Filed: June 25, 2013

Publication date: October 31, 2013

Inventors: Corey Gee, Bapiraju Vinnakota, Saleem Mohammadali, Carl A. Alberola
SIGNAL PROCESSING CIRCUIT

Publication number: 20130283016

Abstract: Provided is a signal processing circuit occupying a small circuit area. A common arithmetic operation element is shared between a plurality of arithmetic operation sequence control units. An arbitration circuit selects, when the plurality of arithmetic operation sequence control units simultaneously generate requests for arithmetic operations to use the common arithmetic operation element, the predetermined sequence control unit based on priority information about the plurality of arithmetic operation sequence control units, causes the common arithmetic operation element to execute the arithmetic operation requested from the selected arithmetic operation sequence control unit, and returns the result of the arithmetic operation to the selected arithmetic operation sequence control unit.

Type: Application

Filed: April 17, 2013

Publication date: October 24, 2013

Applicant: RENESAS ELECTRONICS CORPORATION

Inventors: Hiroyuki YAMASAKI, Hideyuki NODA, Kan MURATA
Vector processing of different instructions selected by each unit from multiple instruction group based on instruction predicate and previous result comparison

Patent number: 8566566

Abstract: There is provided a vector processing apparatus and method allowing for the parallel processing of a plurality of different instructions while maintaining vector processing architecture. The vector processing apparatus includes an instruction memory storing a multiple instruction group including one or more instructions; an instruction fetch unit reading the multiple instruction group from the instruction memory; and a plurality of instruction processing units each receiving the multiple instruction group through the instruction fetch unit, selecting a single instruction from the multiple instruction group according to a previous arithmetic result, and performing a arithmetic operation.

Type: Grant

Filed: August 2, 2010

Date of Patent: October 22, 2013

Assignee: Electronics and Telecommunications Research Institute

Inventors: Moo Kyoung Chung, Young Su Kwon, Kyung Su Kim
Packed Data Rearrangement Control Indexes Precursors Generation Processors, Methods, Systems, and Instructions

Publication number: 20130275729

Abstract: A method of an aspect includes receiving an instruction indicating a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes the result including a sequence of at least four non-negative integers. In an aspect, values of the at least four non-negative integers are not calculated using a result of a preceding instruction. Other methods, apparatus, systems, and instructions are disclosed.

Type: Application

Filed: December 22, 2011

Publication date: October 17, 2013

Inventors: Seth Abraham, Robert Valentine, Elmoustapha Ould-Ahmed-Vall, Zeev Sperber, Amit Gradstein
Processors, Methods, Systems, and Instructions to Generate Sequences of Integers in which Integers in Consecutive Positions Differ by a Constant Integer Stride and Where a Smallest Integer is Offset from Zero by an Integer Offset

Publication number: 20130275727

Abstract: A method of an aspect includes receiving an instruction. The instruction indicates an integer stride, indicates an integer offset, and indicates a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four integers in numerical order with a smallest one of the at least four integers differing from zero by the integer offset and with all integers of the sequence in consecutive positions differing by the integer stride. Other methods, apparatus, systems, and instructions are disclosed.

Type: Application

Filed: December 22, 2011

Publication date: October 17, 2013

Inventors: Seth Abraham, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Zeev Sperber, Amit Gradstein
PACKED DATA OPERATION MASK REGISTER ARITHMETIC COMBINATION PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS

Publication number: 20130275728

Abstract: A method of an aspect includes receiving a packed data operation mask register arithmetic combination instruction. The packed data operation mask register arithmetic combination instruction indicates a first packed data operation mask register, indicates a second packed data operation mask register, and indicates a destination storage location. An arithmetic combination of at least a portion of bits of the first packed data operation mask register and at least a corresponding portion of bits of the second packed data operation mask register is stored in the destination storage location in response to the packed data operation mask register arithmetic combination instruction. Other methods, apparatus, systems, and instructions are disclosed.

Type: Application

Filed: December 22, 2011

Publication date: October 17, 2013

Applicant: Intel Corporation

Inventors: Bret L. Toll, Robert Valentine, Jesus Corbal San Adrian, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney
ARITHMETIC PROCESSING APPARATUS AND BRANCH PREDICTION METHOD

Publication number: 20130275726

Abstract: A branch target address table is provided for each branch instruction having a plurality of branch targets. Each branch target address table stores a history of a plurality of branch target addresses determined in the past by executing a corresponding branch instruction. A branch target prediction unit predicts a predicted branch target address with respect to a branch instruction with reference to the history of branch target addresses stored in the branch target address table corresponding to the branch instruction. The predicted branch target address obtained as a result of the prediction is stored, for example, in a predicted branch target address storage unit in association with the branch instruction, and is referenced by an instruction fetch control unit at the time of prefetching a branch target instruction.

Type: Application

Filed: June 10, 2013

Publication date: October 17, 2013

Inventor: Megumi Ukai
Lane crossing instruction selecting operand data bits conveyed from register via direct path and lane crossing path for execution

Patent number: 8560811

Abstract: The present invention provides a method and apparatus for handling lane-crossing instructions in an execution pipeline. One embodiment of the method includes conveying bits of an instruction from a register to an execution stage in a pipeline along a first data path that includes a lane crossing stage configured to change a first mapping of the register to the execution stage to a second mapping. The method also includes concurrently conveying the bits along a second data path from the register to the execution stage that bypasses the lane crossing stage. The method further includes selecting the first or second data path to provide the bits to the execution stage.

Type: Grant

Filed: August 5, 2010

Date of Patent: October 15, 2013

Assignee: Advanced Micro Devices, Inc.

Inventor: John M. King
REDUCING POWER CONSUMPTION IN A FUSED MULTIPLY-ADD (FMA) UNIT OF A PROCESSOR

Publication number: 20130268794

Abstract: In one embodiment, the present invention includes a processor having a fused multiply-add (FMA) unit to perform FMA instructions and add-like instructions. This unit can include an adder with multiple segments each independently controlled by a logic. The logic can clock gate at least one segment during execution of an add-like instruction in another segment of the adder when the add-like instruction has a width less than a width of the FMA unit. Other embodiments are described and claimed.

Type: Application

Filed: November 21, 2011

Publication date: October 10, 2013

Inventor: Chad D. Hancock
SINGLE CYCLE COMPARE AND SELECT OPERATIONS

Publication number: 20130262819

Abstract: An apparatus includes a processor to determine an extremum among a series of values that are successively provided to a first register and a second register. The processor is configured to execute a single cycle search instruction, including compare a value in the first register with a value in a first accumulator, and store an extremum of the two values in the first accumulator; and compare a value in the second register with a value in a second accumulator, and store an extremum of the two values in the second accumulator. The processor is configured to execute a single cycle select instruction, including compare the value in the first accumulator with the value in the second accumulator, and store an extremum of the two values in the first accumulator, the extremum stored in the first accumulator representing the extremum of the series of numbers.

Type: Application

Filed: April 2, 2012

Publication date: October 3, 2013

Inventors: Srinivasan Iyer, Carsten Aagaard Pedersen
PROCESSOR FOR PERFORMING MULTIPLY-ADD OPERATIONS ON PACKED DATA

Publication number: 20130262836

Abstract: A method and apparatus for including in a processor instructions for performing multiply-subtract operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-subtract operations on data elements in the first and second packed data.

Type: Application

Filed: May 30, 2013

Publication date: October 3, 2013

Inventors: Alexander Peleg, Milland Mittal, Larry M. Mennemeier, Benny Eitan, Carole Dulong, Eiichi Kowashi, Wolf C. Witt
Add instructions to add three source operands

Patent number: 8549264

Abstract: A method in one aspect may include receiving an add instruction. The add instruction may indicate a first source operand, a second source operand, and a third source operand. A sum of the first, second, and third source operands may be stored as a result of the add instruction. The sum may be stored partly in a destination operand indicated by the add instruction and partly a plurality of flags. Other methods are also disclosed, as are apparatus, systems, and instructions on machine-readable medium.

Type: Grant

Filed: December 22, 2009

Date of Patent: October 1, 2013

Assignee: Intel Corporation

Inventors: Vinodh Gopal, James D. Guilford, Gilbert M. Wolrich, Wajdi K. Feghali, Erdinc Ozturk, Martin G. Dixon, Sean P. Mirkes, Bret L. Toll, Maxim Loktyukhin, Mark C. Davis, Alexandre J. Farcy
ARITHMETIC PROCESSING UNIT

Publication number: 20130254516

Abstract: An arithmetic processing unit that performs processing of a stream-type includes an arithmetic unit configured to operate an input operand to obtain a result of operation; and a data input and output unit configured to read the input operand out of a memory when an instruction which is issued in a case where a stream length of the input operand is shorter than a stream length of an output operand corresponding to the input operand and includes data indicating a recursive rule used when the input operand is read out, to supply the read input operand, and to store the result of the operation obtained by the arithmetic unit in the memory as the output operand, wherein the arithmetic unit 20 operates the input operand read out by the data input and output unit and outputs the result of operation to the data input and output unit.

Type: Application

Filed: November 7, 2012

Publication date: September 26, 2013

Inventors: Yi GE, Kazuo HORIO
Method and apparatus for QR-factorizing matrix on a multiprocessor system

Patent number: 8543626

Abstract: A method and apparatus for QR-factorizing matrix on a multiprocessor system, wherein the multiprocessor system comprises at least one core processor and a plurality of accelerators, comprises the steps of: iteratively factorizing each panel in the matrix until the whole matrix is factorized; wherein in each iteration, the method comprises: partitioning an unprocessed matrix part in the matrix into a plurality of blocks according to a predetermined block size; partitioning a current processed panel in the unprocessed matrix part into at least two sub panels, wherein the current processed panel is composed of a plurality of blocks; and performing QR factorization one by one on the at least two sub panels with the plurality of accelerators, and updating the data of the sub panel(s) on which no QR factorization has been performed among the at least two sub panels by using the factorization result.

Type: Grant

Filed: July 27, 2012

Date of Patent: September 24, 2013

Assignee: International Business Machines Corporation

Inventors: Hui Li, Bai Ling Wang
Add Instructions to Add Three Source Operands

Publication number: 20130227252

Abstract: A method in one aspect may include receiving an add instruction. The add instruction may indicate a first source operand, a second source operand, and a third source operand. A sum of the first, second, and third source operands may be stored as a result of the add instruction. The sum may be stored partly in a destination operand indicated by the add instruction and partly a plurality of flags. Other methods are also disclosed, as are apparatus, systems, and instructions on machine-readable medium.

Type: Application

Filed: March 13, 2013

Publication date: August 29, 2013

Inventors: Vinodh Gopal, James D. Guilford, Gilbert M. Wolrich, Wajdi K. Feghali, Erdinc Ozturk, Martin G. Dixon, Sean P. Mirkes, Bret L. Toll, Maxim Loktyukhin, Mark C. Davis, Alexandre J. Farcy
Instruction folding mechanism, method for performing the same and pixel processing system employing the same

Patent number: 8520016

Abstract: An instruction folding mechanism, a method for performing the instruction folding mechanism and a pixel processing system employing the instruction folding mechanism are described. The pixel processing system comprises an instruction folding mechanism and a pixel shader. The instruction folding mechanism folds a plurality of first instructions in a first program to generate a second program having at least one second instruction which is a combination of the first instructions. The pixel shader connected to the instruction folding mechanism fetches the second program to decode at least the second instruction having the combination of the first instructions to execute the second program. The instruction folding mechanism comprises an instruction scheduler, a folding rule checker, and an instruction combiner. The instruction scheduler connected to the folding rule checker is used to scan the first instructions according to static positions in order to schedule the first instructions in the first program.

Type: Grant

Filed: March 9, 2009

Date of Patent: August 27, 2013

Assignee: Taichi Holdings, LLC

Inventor: R-Ming Hsu
IMAGE PROCESSING DEVICE AND DATA PROCESSOR

Publication number: 20130212362

Abstract: A restriction is given to the calculation function for image processing achieved by the hard-wired system and the memory access control of a buffer memory, and a range of the restriction is made variable by a program control and others. Data is inputted to the buffer memory from the outside with a restriction of “in units of memory line”, and the number of memory lines and positions of the same to which data is inputted can be programmable by the control circuit. The arithmetic circuit is subjected to the restriction of performing the calculation in units of data of one or plural memory lines supplied from the buffer memory, and a calculation processing content in units of calculation processing for the units of data can be programmably assigned by the control circuit.

Type: Application

Filed: March 15, 2013

Publication date: August 15, 2013

Applicant: RENESAS ELECTRONICS CORPORATION

Inventor: RENESAS ELECTRONICS CORPORATION
Floating Point Constant Generation Instruction

Publication number: 20130212357

Abstract: Systems and methods for generating a floating point constant value from an instruction are disclosed. A first field of the instruction is decoded as a sign bit of the floating point constant value. A second field of the instruction is decoded to correspond to an exponent value of the floating point constant value. A third field of the instruction is decoded to correspond to the significand of the floating point constant value. The first field, the second field, and the third field are combined to form the floating point constant value. The exponent value may include a bias, and a bias constant may be added to the exponent value to compensate for the bias. The third field may comprise the most significant bits of the significand. Optionally, the second field and the third field may be shifted by first and second shift values respectively before they are combined to form the floating point constant value.

Type: Application

Filed: February 9, 2012

Publication date: August 15, 2013

Applicant: QUALCOMM INCORPORATED

Inventors: Erich James Plondke, Lucian Codrescu, Charles Joseph Tabony, Swaminathan Balasubramanian
Data Processing Device and Method

Publication number: 20130205123

Abstract: The present invention relates to a processor having a trace cache and a plurality of ALUs arranged in a matrix, comprising an analyser unit located between the trace cache and the ALUs, wherein the analyser unit analyses the code in the trace cache, detects loops, transforms the code, and issues to the ALUs sections of the code combined to blocks for joint execution for a plurality of clock cycles.

Type: Application

Filed: July 8, 2011

Publication date: August 8, 2013

Inventor: Martin Vorbach
Rotate instructions that complete execution without reading carry flag

Patent number: 8504807

Abstract: A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag.

Type: Grant

Filed: December 26, 2009

Date of Patent: August 6, 2013

Assignee: Intel Corporation

Inventors: Vinodh Gopal, James D. Guilford, Gilbert M. Wolrich, Wajdi K. Feghali, Erdinc Ozturk, Martin G. Dixon, Sean P. Mirkes, Bret L. Toll, Maxim Loktyukhin, Mark C. Davis, Alexandre J. Farcy
Method and device for detecting an erroneous jump during program execution

Patent number: 8495734

Abstract: The present disclosure relates to a method for executing, by a processor, a program read in a program memory, comprising steps of: detecting a program memory read address jump; providing prior to a jump address instruction for jumping a program memory read address, an instruction for storing the presence of the jump address instruction; and activating an error signal if an address jump has been detected and if the presence of a jump address instruction has not been stored. The present disclosure also relates to securing integrated circuits.

Type: Grant

Filed: June 16, 2009

Date of Patent: July 23, 2013

Assignee: STMicroelectronics SA

Inventors: Frederic Bancel, Nicolas Berard, David Hely
Performing an allreduce operation on a plurality of compute nodes of a parallel computer

Patent number: 8484440

Abstract: Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: establishing, for each node, a plurality of logical rings, each ring including a different set of at least one core on that node, each ring including the cores on at least two of the nodes; iteratively for each node: assigning each core of that node to one of the rings established for that node to which the core has not previously been assigned, and performing, for each ring for that node, a global allreduce operation using contribution data for the cores assigned to that ring or any global allreduce results from previous global allreduce operations, yielding current global allreduce results for each core; and performing, for each node, a local allreduce operation using the global allreduce results.

Type: Grant

Filed: May 21, 2008

Date of Patent: July 9, 2013

Assignee: International Business Machines Corporation

Inventor: Ahmad Faraj
METHOD OF, AND APPARATUS FOR, STREAM SCHEDULING IN PARALLEL PIPELINED HARDWARE

Publication number: 20130173890

Abstract: A method of generating a hardware design for a stream processor. The method includes defining a graph representing a processing operation designating processes to be implemented in hardware as part of the stream processor. The graph represents the processing operation in the time domain as a function of clock cycles and includes at least one data path. At least one stream offset object is provided located at a particular point in the data path.

Type: Application

Filed: February 27, 2013

Publication date: July 4, 2013

Applicant: MAXELER TECHNOLOGIES LTD.

Inventor: MAXELER TECHNOLOGIES LTD.
METHOD AND APPARATUS FOR GENERATING FLAGS FOR A PROCESSOR

Publication number: 20130166889

Abstract: A method and apparatus are described for generating flags in response to processing data during an execution pipeline cycle of a processor. The processor may include a multiplexer configured generate valid bits for received data according to a designated data size, and a logic unit configured to control the generation of flags based on a shift or rotate operation command, the designated data size and information indicating how many bytes and bits to rotate or shift the data by. A carry flag may be used to extend the amount of bits supported by shift and rotate operations. A sign flag may be used to indicate whether a result is a positive or negative number. An overflow flag may be used to indicate that a data overflow exists, whereby there are not a sufficient number of bits to store the data.

Type: Application

Filed: December 22, 2011

Publication date: June 27, 2013

Applicant: ADVANCED MICRO DEVICES, INC.

Inventors: Srikanth Arekapudi, Saurabh Gupta

prev 1 2 3 4 5 6 7 8 … next