Arithmetic Operation Instruction Processing Patents (Class 712/221)
  • Publication number: 20140075162
    Abstract: A digital processor is provided having an instruction set with a complex exponential function. The digital processor evaluates a complex exponential function for an input value, x, by obtaining a complex exponential software instruction having the input value, x, as an input; and in response to the complex exponential software instruction: invoking at least one complex exponential functional unit that implements complex exponential software instructions to apply the complex exponential function to the input value, x; and generating an output corresponding to the complex exponential of the input value, x. A complex exponential function for an input value, x, can be evaluated by wrapping the input value to maintain a given range; computing a coarse approximation angle using a look-up table; scaling the coarse approximation angle to obtain an angle from 0 to ?; and computing a fine corrective value using a polynomial approximation.
    Type: Application
    Filed: October 26, 2012
    Publication date: March 13, 2014
    Applicant: LSI Corporation
    Inventors: Kameran Azadet, Albert Molina, Joseph H. Othmer, Parakalan Venkataraghavan, Meng-Lin Yu, Joseph Williams
  • Publication number: 20140068231
    Abstract: There is a need to provide a central processing unit capable of improving the resistance to power analysis attack without changing programs, lowering clock frequencies, and greatly redesigning a central processing unit of the related art. In a central processing unit, an arithmetic unit is capable of performing arithmetic operation using data irrelevant to data stored in a register group. A control unit allows the arithmetic unit to perform arithmetic processing corresponding to an incorporated instruction. At this time, the control unit allows the arithmetic unit to perform arithmetic processing using the irrelevant data during a first one-clock cycle.
    Type: Application
    Filed: August 29, 2013
    Publication date: March 6, 2014
    Applicant: Renesas Electronics Corporation
    Inventor: Minoru SAEKI
  • Patent number: 8656143
    Abstract: A serial array processor may have an execution unit, which is comprised of a multiplicity of single bit arithmetic logic units (ALUs), and which may perform parallel operations on a subset of all the words in memory by serially accessing and processing them, one bit at a time, while an instruction unit of the processor is pre-fetching the next instruction, a word at a time, in a manner orthogonal to the execution unit.
    Type: Grant
    Filed: February 3, 2010
    Date of Patent: February 18, 2014
    Inventor: Laurence H. Cooke
  • Publication number: 20140047220
    Abstract: According to some embodiments, a technique provides for the execution of an instruction that includes receiving residual data of a first image and decoded pixels of a second image, zero-extending a plurality of unsigned data operands of the decoded pixels producing a plurality of unpacked data operands, adding a plurality of signed data operands of the residual data to the plurality of unpacked data operands producing a plurality of signed results; and saturating the plurality of signed results producing a plurality of unsigned results.
    Type: Application
    Filed: October 11, 2013
    Publication date: February 13, 2014
    Inventors: BRADLEY ALDRICH, NIGEL PAVER, MURLI GANESHAN
  • Patent number: 8649508
    Abstract: A system and method for implementing the Elliptic Curve scalar multiplication method in cryptography, where the Double Base Number System is expressed in decreasing order of exponents and further on using it to determine Elliptic curve scalar multiplication over a finite elliptic curve.
    Type: Grant
    Filed: September 29, 2008
    Date of Patent: February 11, 2014
    Assignee: Tata Consultancy Services Ltd.
    Inventor: Natarajan Vijayarangan
  • Publication number: 20140025934
    Abstract: An arithmetic processing apparatus and method for high speed processing of an application are provided. The arithmetic processing apparatus may include a program control unit to store operation processing information necessary for application operation in a communication channel by executing an application code, and an operation processing unit to process the application operation using the operation processing information stored in the communication channel.
    Type: Application
    Filed: July 18, 2013
    Publication date: January 23, 2014
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Joon Ho SONG, Shi Hwa Lee, Do Hyung Kim
  • Patent number: 8635434
    Abstract: A mathematical operation processing apparatus is disclosed by which the supply of an operand which is performed based on condition codes by a plurality of mathematical operations can be performed at a high speed. The mathematical operation processing apparatus includes a plurality of computing elements configured to perform different mathematical operations different from one another and produce mathematical operation results of the mathematical operations and condition codes. A condition code set register retains the condition codes produced simultaneously by the computing elements as a condition code set. A condition code conversion section performs a predetermined conversion for the condition code set and outputs a result of the conversion as a conversion condition code set. An operand supplying section supplies an operand for the mathematical operations in the computing elements based on the conversion condition code set.
    Type: Grant
    Filed: December 4, 2007
    Date of Patent: January 21, 2014
    Assignee: Sony Corporation
    Inventors: Yasuhiro Iizuka, Takahiro Sato, Takayasu Kon, Kenichi Sanpei, Eiichiro Morinaga
  • Patent number: 8635415
    Abstract: A set of default registers of a processor are expanded into metadata registers on the processor of a computer system. The default registers having stored thereon data, while metadata which is related to the data is stored separately on the metadata registers.
    Type: Grant
    Filed: September 30, 2009
    Date of Patent: January 21, 2014
    Assignee: Intel Corporation
    Inventors: Baiju V. Patel, Rajeev Gopalakrishna, Andrew F. Glew, Robert J. Kushlis, Don Alan Van Dyke, Joseph Frank Cihula, Asit K. Mallick, James B. Crossland, Gilbert Neiger, Scott Dion Rodgers, Martin Guy Dixon, Mark Jay Charney, Jacob (Koby) Gottlieb
  • Publication number: 20140019727
    Abstract: Apparatus and method for a modified, balanced throughput data-path architecture is given for efficiently implementing the digital signal processing algorithms of filtering, convolution and correlation in computer hardware, in which both data and coefficient buffers can be implemented as sliding windows. This architecture uses a multiplexer and a data path branch from the Address Generator unit to the multiply-accumulate execution unit. By selecting between the data path of Address Generator to execution unit and the data path of register to execution unit, the unbalanced throughput and multiply-accumulate bubble cycles caused by misaligned addressing on coefficients can be overcome. The modified balanced throughput data-path architecture can achieve a high multiply-accumulate operation rate per cycle in implementing digital signal processing algorithms.
    Type: Application
    Filed: July 8, 2013
    Publication date: January 16, 2014
    Inventors: PengFei ZHU, HongXia SUN, YongQiang WU, Elio GUIDETTI
  • Publication number: 20140019725
    Abstract: Methods, systems, and apparatuses are disclosed for implementing fast large-integer arithmetic within an integrated circuit, such as on IA (Intel Architecture) processors, in which such means include receiving a 512-bit value for squaring, the 512-bit value having eight sub-elements each of 64-bits and performing a 512-bit squaring algorithm by: (i) multiplying every one of the eight sub-elements by itself to yield a square of each of the eight sub-elements, the eight squared sub-elements collectively identified as T1, (ii) multiplying every one of the eight sub-elements by the other remaining seven of the eight sub-elements to yield an asymmetric intermediate result having seven diagonals therein, wherein each of the seven diagonals are of a different length, (iii) reorganizing the asymmetric intermediate result having the seven diagonals therein into a symmetric intermediate result having four diagonals each of 7×1 sub-elements of the 64-bits in length arranged across a plurality of columns, (iv) adding all
    Type: Application
    Filed: December 6, 2012
    Publication date: January 16, 2014
    Inventors: ERDINC OZTURK, VINODH GOPAL, JAMES GUILFORD
  • Publication number: 20140019726
    Abstract: A parallel arithmetic device includes a status management section, a plurality of processor elements, and a plurality of switch elements for determining the relation of coupling of each of the processor elements. Each of the processor elements includes an instruction memory for memorizing a plurality of operation instructions corresponding respectively to a plurality of contexts so that an operation instruction corresponding to the context selected by the status management section is read out, and a plurality of arithmetic units for performing arithmetic processes in parallel on a plurality of sets of input data in a manner compliant with the operation instruction read out from the instruction memory.
    Type: Application
    Filed: July 5, 2013
    Publication date: January 16, 2014
    Inventors: Takao TOI, Taro FUJII, Yoshinosuke KATO, Toshiro KITAOKA
  • Patent number: 8631224
    Abstract: A data processing system includes a plurality of general purpose registers, and processor circuitry for executing one or more instructions, including a vector dot product instruction for simultaneously performing at least two dot products. The vector dot product instruction identifies a first and second source register, each for storing a plurality of vector elements, where a first dot product is to be performed between a first subset of vector elements of the first source register and a first subset of vector elements of the second source register, and a second dot product is to be performed between a second subset of vector elements of the first source register and a second subset of vector elements of the second source register. The first and second subsets of the second source register are different and at least two vector elements of the first and second subsets of the second source register overlap.
    Type: Grant
    Filed: September 13, 2007
    Date of Patent: January 14, 2014
    Assignee: Freescale Semiconductor, Inc.
    Inventor: William C. Moyer
  • Publication number: 20140013086
    Abstract: A number of addition instructions are provided that have no data dependency between each other. A first addition instruction stores its carry output in a first flag of a flags register without modifying a second flag in the flags register. A second addition instruction stores its carry output in the second flag of the flags register without modifying the first flag in the flags register.
    Type: Application
    Filed: December 22, 2011
    Publication date: January 9, 2014
    Inventors: Vinodh Gopal, James D. Guilford, Gilbert M. Wolrich, Wajdi K. Feghali, Erdinc Ozturk, Martin G. Dixon, Sean P. Mirkes, Matthew C. Merten, Tong Li, Bret T. Toll, I
  • Patent number: 8627046
    Abstract: A data processing device has an instruction decoder, a control logic unit, and ALU. The instruction decoder decodes instruction codes of an arithmetic instruction. The control logic unit detects the effective data width of operation data to be processed according to the decode result from the instruction decoder and determines the number of cycles for the instruction execution corresponding to the effective, data width. The ALU executes the instruction with the number of cycles of the instruction execution determined by the control logic unit.
    Type: Grant
    Filed: May 23, 2011
    Date of Patent: January 7, 2014
    Assignee: Renesas Electronics Corporation
    Inventors: Sugako Ohtani, Hiroyuki Kondo
  • Publication number: 20140006754
    Abstract: A system includes a processor having an instruction register for storing an instruction having a predefined opcode, a predicate register for storing a predicate condition to select an output register for a result of the instruction, a first output register, and a second output register. The processor further includes processor circuitry operable to execute the instruction to produce a result, and processor circuitry operable to store the result of the instruction in the first output register if the predicate condition to select the output is true, and to store the second output register if the predicate condition to select the output is false. A single instruction is used to produce the result, and to store the result of the instruction.
    Type: Application
    Filed: September 5, 2013
    Publication date: January 2, 2014
    Applicant: NVIDIA Corporation
    Inventors: Timo Oskari Aila, Samuli Matias Laine
  • Publication number: 20140006753
    Abstract: A method is described. The method includes iteratively performing for each position in a result matrix stored in a third register, multiplying a value at a matrix position stored in a first register with a value at a matrix position stored in a second register to obtain a first multiplicative value, where the positions in the first register and the second register are determined by the position in the result matrix and performing an exclusive or (XOR) operation with the first multiplicative value and a value stored at a result matrix position stored in the third register to obtain a result value.
    Type: Application
    Filed: December 22, 2011
    Publication date: January 2, 2014
    Inventors: Vinodh Gopal, Gilbert M. Wolrich, Kirk S. Yap, James D. Guilford, Erdinc Ozturk, Sean M. Gulley, Wajdi K. Feghali, Martin G. Dixon
  • Publication number: 20130346730
    Abstract: An arithmetic processing apparatus includes a plurality of processors, each of the processors having an arithmetic unit and a cache memory. The processor includes an instruction port that holds a plurality of instructions accessing data of the cache memory, a first determination unit that validates a first flag when receiving an invalidation request for data in the cache memory, a cache index of a target address and a way ID of the received request match with a cache index of a designated address and a way ID of the load instruction, a second determination unit that validates a second flag when target data is transmitted due to a cache miss, and an instruction re-execution determination unit that instructs re-execution of an instruction subsequent to the load instruction when both the first flag and the second flag are validated at the time of completion of an instruction in the instruction port.
    Type: Application
    Filed: April 30, 2013
    Publication date: December 26, 2013
    Applicant: FUJITSU LIMITED
    Inventor: Naohiro KIYOTA
  • Publication number: 20130339677
    Abstract: The invention provides microprocessor extensions for cooperating with a sequential arithmetic-logic unit (ALU) to execute a multiply-and-accumulate operation (MAc). The ALU performs a continuous sequence of accumulation instructions synchronously with a clock signal (CLK1). Buffers (BUF1, BUF2) store input data which are fed to a combinatorial multiplier (MULT) by first buses (L1, L2). A second bus (N1) forwards the product to the ALU, where it is accumulated with previous data. Since at least the first buses operate independently of the clock signal, they do not limit the speed of the MAc operation. In particular embodiments, a finite state machine (FSM) controls the buses on the basis of triggers, e.g., signals from the multiplier and/or ALU indicating the completion of their respective instructions. The FSM may be operable in a low-power mode. The invention also relates to methods, computer programs and the use of a sequential ALU for executing MAc operations.
    Type: Application
    Filed: April 14, 2011
    Publication date: December 19, 2013
    Applicant: ST. JUDE MEDICAL AB
    Inventor: Mattias Tullberg
  • Patent number: 8610947
    Abstract: An image processing apparatus includes an interpreting unit that interprets an order of the logical arithmetic processing and a kind of a logical arithmetic processing; and a drawing unit that, in a case of drawing the image information as raster data, draws from an element of an upper-order side in order of the logical arithmetic processing interpreted by the interpreting unit with respect to an area that is interpreted to be processed by a simple overwrite processing for giving priority to an uppermost-order side element as the kind of the logical arithmetic processing, and draws using a calculation sequentially from an element of a lower-order side in order of the logical arithmetic processing interpreted by the interpreting unit with respect to an area that is interpreted to be processed by a logical arithmetic processing for using the calculation as to the overlapped elements as the kind of the logical arithmetic processing.
    Type: Grant
    Filed: August 20, 2009
    Date of Patent: December 17, 2013
    Assignee: Fuji Xerox Co., Ltd
    Inventor: Shusuke Tanimoto
  • Publication number: 20130318329
    Abstract: In order to enable to quickly and efficiently execute, by one system, various modulation/demodulation/synchronous processes in a plurality of radio communication methods, a co-processor (22) for complex arithmetic processing, which forms a processor system (100), includes a complex arithmetic circuit (22) that executes for complex data a complex arithmetic operation required for radio communication in accordance with an instruction from a primary processor (10), and a memory controller (20, 21) that operates in parallel with the complex arithmetic circuit and accesses a memory. A trace circuit provided in the complex arithmetic circuit (22) monitors arithmetic result data for first complex data series sequentially read from the memory, and detects a normalization coefficient for normalizing the arithmetic result data.
    Type: Application
    Filed: September 15, 2011
    Publication date: November 28, 2013
    Applicant: NEC CORPORATION
    Inventors: Toshiki Takeuchi, Hiroyuki Igura
  • Patent number: 8595470
    Abstract: A digital signal processor includes an instruction analysis unit, a digital signal processor (DSP) core and a memory unit. The instruction analysis unit receives an instruction and determines the required bit width M for the data process corresponding to the instruction. The DSP core performs the M-bit data process based on the bit width M determined by the instruction analysis unit, and the memory unit stores multiple data and performs the M-bit access based on the bit width M determined by the instruction analysis unit thereby allowing the DSP core to access, and at least one available space in the memory unit will be adjusted such that only the access space having the bit width M for the operation corresponding to the instruction will be open in each access, thereby effectively achieving the effect of power-saving.
    Type: Grant
    Filed: November 9, 2010
    Date of Patent: November 26, 2013
    Assignee: Sentelic Corporation
    Inventors: Zhiyang Guo, Mao-Sung Wu, Chun Hsien, Tsai-Lin Lee
  • Publication number: 20130310983
    Abstract: It is intended to reduce the amount of computation to be performed by CPU or the required amount of storage space in a built-in memory for timing adjustment of a pulse output signal. A digital multiplying circuit in the phase arithmetic circuit of the pulse generating circuit generates a multiplication output signal by multiplying a phase angle change value in the phase adjustment data register and a count maximum value Nmax in the cycle data register. A digital dividing circuit generates a division output signal by dividing the multiplication output signal by 360 degrees of phase angle for one cycle. A digital adding circuit adds the division output signal and rise setting/fall setting count values and a subtracting circuit subtracts the division output signal from these values. The addition and subtraction generate new rise setting/fall setting count values required to delay/advance the phase by the phase angle change value.
    Type: Application
    Filed: May 6, 2013
    Publication date: November 21, 2013
    Applicant: Renesas Electronics Corporation
    Inventors: Takehiro SHIMIZU, Toshio ASAI
  • Patent number: 8589469
    Abstract: Multiplication engines and multiplication methods are provided for a digital processor.
    Type: Grant
    Filed: January 10, 2008
    Date of Patent: November 19, 2013
    Assignee: Analog Devices Technology
    Inventors: Andreas D. Olofsson, Baruch Yanovitch
  • Patent number: 8583902
    Abstract: Techniques are disclosed relating to a processor including instruction support for performing a Montgomery multiplication. The processor may issue, for execution, programmer-selectable instruction from a defined instruction set architecture (ISA). The processor may include an instruction execution unit configured to receive instructions including a first instance of a Montgomery-multiply instruction defined within the ISA. The Montgomery-multiply instruction is executable by the processor to operate on at least operands A, B, and N residing in respective portions of a general-purpose register file of the processor, where at least one of operands A, B, N spans at least two registers of general-purpose register file. The instruction execution unit is configured to calculate P mod N in response to receiving the first instance of the Montgomery-multiply instruction, where P is the product of at least operand A, operand B, and R^?1.
    Type: Grant
    Filed: May 7, 2010
    Date of Patent: November 12, 2013
    Assignee: Oracle International Corporation
    Inventors: Christopher H. Olson, Gregory F. Grohoski, Lawrence Spracklen, Nils Gura
  • Patent number: 8583903
    Abstract: Disclosed herein are efficient geometries for dynamical topology changing (DTC), together with protocols to incorporate DTC into quantum computation. Given an Ising system, twisted depletion to implement a logical gate T, anyonic state teleportation into and out of the topology altering structure, and certain geometries of the (1,?2)-bands, a classical computer can be enabled to implement a quantum algorithm.
    Type: Grant
    Filed: December 28, 2010
    Date of Patent: November 12, 2013
    Assignee: Microsoft Corporation
    Inventors: Michael Freedman, Parsa Bonderson, Chetan Nayak, Sankar Das Sarma
  • Publication number: 20130297916
    Abstract: A related art semiconductor device suffers from a problem that a processing capacity is decayed by switching an occupied state for each partition. A semiconductor device according to the present invention includes an execution unit that executes an arithmetic instruction, and a scheduler including multiple first setting registers each defining a correspondence relationship between hardware threads and partitions, and generates a thread select signal on the basis of a partition schedule and a thread schedule. The scheduler outputs a thread select signal designating a specific hardware thread without depending on the thread schedule as the partition indicated by a first occupation control signal according to a first occupation control signal output when the execution unit executes a first occupation start instruction.
    Type: Application
    Filed: April 9, 2013
    Publication date: November 7, 2013
    Applicant: Renesas Electronics Corporation
    Inventors: Hitoshi Suzuki, Koji Adachi
  • Publication number: 20130290684
    Abstract: New instruction definitions for a packet add (PADD) operation and for a single instruction multiple add (SMAD) operation are disclosed. In addition, a new dedicated PADD logic device that performs the PADD operation in about one to two processor clock cycles is disclosed. Also, a new dedicated SMAD logic device that performs a single instruction multiple data add (SMAD) operation in about one to two clock cycles is disclosed.
    Type: Application
    Filed: June 25, 2013
    Publication date: October 31, 2013
    Inventors: Corey Gee, Bapiraju Vinnakota, Saleem Mohammadali, Carl A. Alberola
  • Publication number: 20130283016
    Abstract: Provided is a signal processing circuit occupying a small circuit area. A common arithmetic operation element is shared between a plurality of arithmetic operation sequence control units. An arbitration circuit selects, when the plurality of arithmetic operation sequence control units simultaneously generate requests for arithmetic operations to use the common arithmetic operation element, the predetermined sequence control unit based on priority information about the plurality of arithmetic operation sequence control units, causes the common arithmetic operation element to execute the arithmetic operation requested from the selected arithmetic operation sequence control unit, and returns the result of the arithmetic operation to the selected arithmetic operation sequence control unit.
    Type: Application
    Filed: April 17, 2013
    Publication date: October 24, 2013
    Applicant: RENESAS ELECTRONICS CORPORATION
    Inventors: Hiroyuki YAMASAKI, Hideyuki NODA, Kan MURATA
  • Patent number: 8566566
    Abstract: There is provided a vector processing apparatus and method allowing for the parallel processing of a plurality of different instructions while maintaining vector processing architecture. The vector processing apparatus includes an instruction memory storing a multiple instruction group including one or more instructions; an instruction fetch unit reading the multiple instruction group from the instruction memory; and a plurality of instruction processing units each receiving the multiple instruction group through the instruction fetch unit, selecting a single instruction from the multiple instruction group according to a previous arithmetic result, and performing a arithmetic operation.
    Type: Grant
    Filed: August 2, 2010
    Date of Patent: October 22, 2013
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Moo Kyoung Chung, Young Su Kwon, Kyung Su Kim
  • Publication number: 20130275729
    Abstract: A method of an aspect includes receiving an instruction indicating a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes the result including a sequence of at least four non-negative integers. In an aspect, values of the at least four non-negative integers are not calculated using a result of a preceding instruction. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Application
    Filed: December 22, 2011
    Publication date: October 17, 2013
    Inventors: Seth Abraham, Robert Valentine, Elmoustapha Ould-Ahmed-Vall, Zeev Sperber, Amit Gradstein
  • Publication number: 20130275727
    Abstract: A method of an aspect includes receiving an instruction. The instruction indicates an integer stride, indicates an integer offset, and indicates a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four integers in numerical order with a smallest one of the at least four integers differing from zero by the integer offset and with all integers of the sequence in consecutive positions differing by the integer stride. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Application
    Filed: December 22, 2011
    Publication date: October 17, 2013
    Inventors: Seth Abraham, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Zeev Sperber, Amit Gradstein
  • Publication number: 20130275728
    Abstract: A method of an aspect includes receiving a packed data operation mask register arithmetic combination instruction. The packed data operation mask register arithmetic combination instruction indicates a first packed data operation mask register, indicates a second packed data operation mask register, and indicates a destination storage location. An arithmetic combination of at least a portion of bits of the first packed data operation mask register and at least a corresponding portion of bits of the second packed data operation mask register is stored in the destination storage location in response to the packed data operation mask register arithmetic combination instruction. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Application
    Filed: December 22, 2011
    Publication date: October 17, 2013
    Applicant: Intel Corporation
    Inventors: Bret L. Toll, Robert Valentine, Jesus Corbal San Adrian, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney
  • Publication number: 20130275726
    Abstract: A branch target address table is provided for each branch instruction having a plurality of branch targets. Each branch target address table stores a history of a plurality of branch target addresses determined in the past by executing a corresponding branch instruction. A branch target prediction unit predicts a predicted branch target address with respect to a branch instruction with reference to the history of branch target addresses stored in the branch target address table corresponding to the branch instruction. The predicted branch target address obtained as a result of the prediction is stored, for example, in a predicted branch target address storage unit in association with the branch instruction, and is referenced by an instruction fetch control unit at the time of prefetching a branch target instruction.
    Type: Application
    Filed: June 10, 2013
    Publication date: October 17, 2013
    Inventor: Megumi Ukai
  • Patent number: 8560811
    Abstract: The present invention provides a method and apparatus for handling lane-crossing instructions in an execution pipeline. One embodiment of the method includes conveying bits of an instruction from a register to an execution stage in a pipeline along a first data path that includes a lane crossing stage configured to change a first mapping of the register to the execution stage to a second mapping. The method also includes concurrently conveying the bits along a second data path from the register to the execution stage that bypasses the lane crossing stage. The method further includes selecting the first or second data path to provide the bits to the execution stage.
    Type: Grant
    Filed: August 5, 2010
    Date of Patent: October 15, 2013
    Assignee: Advanced Micro Devices, Inc.
    Inventor: John M. King
  • Publication number: 20130268794
    Abstract: In one embodiment, the present invention includes a processor having a fused multiply-add (FMA) unit to perform FMA instructions and add-like instructions. This unit can include an adder with multiple segments each independently controlled by a logic. The logic can clock gate at least one segment during execution of an add-like instruction in another segment of the adder when the add-like instruction has a width less than a width of the FMA unit. Other embodiments are described and claimed.
    Type: Application
    Filed: November 21, 2011
    Publication date: October 10, 2013
    Inventor: Chad D. Hancock
  • Publication number: 20130262819
    Abstract: An apparatus includes a processor to determine an extremum among a series of values that are successively provided to a first register and a second register. The processor is configured to execute a single cycle search instruction, including compare a value in the first register with a value in a first accumulator, and store an extremum of the two values in the first accumulator; and compare a value in the second register with a value in a second accumulator, and store an extremum of the two values in the second accumulator. The processor is configured to execute a single cycle select instruction, including compare the value in the first accumulator with the value in the second accumulator, and store an extremum of the two values in the first accumulator, the extremum stored in the first accumulator representing the extremum of the series of numbers.
    Type: Application
    Filed: April 2, 2012
    Publication date: October 3, 2013
    Inventors: Srinivasan Iyer, Carsten Aagaard Pedersen
  • Publication number: 20130262836
    Abstract: A method and apparatus for including in a processor instructions for performing multiply-subtract operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-subtract operations on data elements in the first and second packed data.
    Type: Application
    Filed: May 30, 2013
    Publication date: October 3, 2013
    Inventors: Alexander Peleg, Milland Mittal, Larry M. Mennemeier, Benny Eitan, Carole Dulong, Eiichi Kowashi, Wolf C. Witt
  • Patent number: 8549264
    Abstract: A method in one aspect may include receiving an add instruction. The add instruction may indicate a first source operand, a second source operand, and a third source operand. A sum of the first, second, and third source operands may be stored as a result of the add instruction. The sum may be stored partly in a destination operand indicated by the add instruction and partly a plurality of flags. Other methods are also disclosed, as are apparatus, systems, and instructions on machine-readable medium.
    Type: Grant
    Filed: December 22, 2009
    Date of Patent: October 1, 2013
    Assignee: Intel Corporation
    Inventors: Vinodh Gopal, James D. Guilford, Gilbert M. Wolrich, Wajdi K. Feghali, Erdinc Ozturk, Martin G. Dixon, Sean P. Mirkes, Bret L. Toll, Maxim Loktyukhin, Mark C. Davis, Alexandre J. Farcy
  • Publication number: 20130254516
    Abstract: An arithmetic processing unit that performs processing of a stream-type includes an arithmetic unit configured to operate an input operand to obtain a result of operation; and a data input and output unit configured to read the input operand out of a memory when an instruction which is issued in a case where a stream length of the input operand is shorter than a stream length of an output operand corresponding to the input operand and includes data indicating a recursive rule used when the input operand is read out, to supply the read input operand, and to store the result of the operation obtained by the arithmetic unit in the memory as the output operand, wherein the arithmetic unit 20 operates the input operand read out by the data input and output unit and outputs the result of operation to the data input and output unit.
    Type: Application
    Filed: November 7, 2012
    Publication date: September 26, 2013
    Inventors: Yi GE, Kazuo HORIO
  • Patent number: 8543626
    Abstract: A method and apparatus for QR-factorizing matrix on a multiprocessor system, wherein the multiprocessor system comprises at least one core processor and a plurality of accelerators, comprises the steps of: iteratively factorizing each panel in the matrix until the whole matrix is factorized; wherein in each iteration, the method comprises: partitioning an unprocessed matrix part in the matrix into a plurality of blocks according to a predetermined block size; partitioning a current processed panel in the unprocessed matrix part into at least two sub panels, wherein the current processed panel is composed of a plurality of blocks; and performing QR factorization one by one on the at least two sub panels with the plurality of accelerators, and updating the data of the sub panel(s) on which no QR factorization has been performed among the at least two sub panels by using the factorization result.
    Type: Grant
    Filed: July 27, 2012
    Date of Patent: September 24, 2013
    Assignee: International Business Machines Corporation
    Inventors: Hui Li, Bai Ling Wang
  • Publication number: 20130227252
    Abstract: A method in one aspect may include receiving an add instruction. The add instruction may indicate a first source operand, a second source operand, and a third source operand. A sum of the first, second, and third source operands may be stored as a result of the add instruction. The sum may be stored partly in a destination operand indicated by the add instruction and partly a plurality of flags. Other methods are also disclosed, as are apparatus, systems, and instructions on machine-readable medium.
    Type: Application
    Filed: March 13, 2013
    Publication date: August 29, 2013
    Inventors: Vinodh Gopal, James D. Guilford, Gilbert M. Wolrich, Wajdi K. Feghali, Erdinc Ozturk, Martin G. Dixon, Sean P. Mirkes, Bret L. Toll, Maxim Loktyukhin, Mark C. Davis, Alexandre J. Farcy
  • Patent number: 8520016
    Abstract: An instruction folding mechanism, a method for performing the instruction folding mechanism and a pixel processing system employing the instruction folding mechanism are described. The pixel processing system comprises an instruction folding mechanism and a pixel shader. The instruction folding mechanism folds a plurality of first instructions in a first program to generate a second program having at least one second instruction which is a combination of the first instructions. The pixel shader connected to the instruction folding mechanism fetches the second program to decode at least the second instruction having the combination of the first instructions to execute the second program. The instruction folding mechanism comprises an instruction scheduler, a folding rule checker, and an instruction combiner. The instruction scheduler connected to the folding rule checker is used to scan the first instructions according to static positions in order to schedule the first instructions in the first program.
    Type: Grant
    Filed: March 9, 2009
    Date of Patent: August 27, 2013
    Assignee: Taichi Holdings, LLC
    Inventor: R-Ming Hsu
  • Publication number: 20130212362
    Abstract: A restriction is given to the calculation function for image processing achieved by the hard-wired system and the memory access control of a buffer memory, and a range of the restriction is made variable by a program control and others. Data is inputted to the buffer memory from the outside with a restriction of “in units of memory line”, and the number of memory lines and positions of the same to which data is inputted can be programmable by the control circuit. The arithmetic circuit is subjected to the restriction of performing the calculation in units of data of one or plural memory lines supplied from the buffer memory, and a calculation processing content in units of calculation processing for the units of data can be programmably assigned by the control circuit.
    Type: Application
    Filed: March 15, 2013
    Publication date: August 15, 2013
    Applicant: RENESAS ELECTRONICS CORPORATION
    Inventor: RENESAS ELECTRONICS CORPORATION
  • Publication number: 20130212357
    Abstract: Systems and methods for generating a floating point constant value from an instruction are disclosed. A first field of the instruction is decoded as a sign bit of the floating point constant value. A second field of the instruction is decoded to correspond to an exponent value of the floating point constant value. A third field of the instruction is decoded to correspond to the significand of the floating point constant value. The first field, the second field, and the third field are combined to form the floating point constant value. The exponent value may include a bias, and a bias constant may be added to the exponent value to compensate for the bias. The third field may comprise the most significant bits of the significand. Optionally, the second field and the third field may be shifted by first and second shift values respectively before they are combined to form the floating point constant value.
    Type: Application
    Filed: February 9, 2012
    Publication date: August 15, 2013
    Applicant: QUALCOMM INCORPORATED
    Inventors: Erich James Plondke, Lucian Codrescu, Charles Joseph Tabony, Swaminathan Balasubramanian
  • Publication number: 20130205123
    Abstract: The present invention relates to a processor having a trace cache and a plurality of ALUs arranged in a matrix, comprising an analyser unit located between the trace cache and the ALUs, wherein the analyser unit analyses the code in the trace cache, detects loops, transforms the code, and issues to the ALUs sections of the code combined to blocks for joint execution for a plurality of clock cycles.
    Type: Application
    Filed: July 8, 2011
    Publication date: August 8, 2013
    Inventor: Martin Vorbach
  • Patent number: 8504807
    Abstract: A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag.
    Type: Grant
    Filed: December 26, 2009
    Date of Patent: August 6, 2013
    Assignee: Intel Corporation
    Inventors: Vinodh Gopal, James D. Guilford, Gilbert M. Wolrich, Wajdi K. Feghali, Erdinc Ozturk, Martin G. Dixon, Sean P. Mirkes, Bret L. Toll, Maxim Loktyukhin, Mark C. Davis, Alexandre J. Farcy
  • Patent number: 8495734
    Abstract: The present disclosure relates to a method for executing, by a processor, a program read in a program memory, comprising steps of: detecting a program memory read address jump; providing prior to a jump address instruction for jumping a program memory read address, an instruction for storing the presence of the jump address instruction; and activating an error signal if an address jump has been detected and if the presence of a jump address instruction has not been stored. The present disclosure also relates to securing integrated circuits.
    Type: Grant
    Filed: June 16, 2009
    Date of Patent: July 23, 2013
    Assignee: STMicroelectronics SA
    Inventors: Frederic Bancel, Nicolas Berard, David Hely
  • Patent number: 8484440
    Abstract: Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: establishing, for each node, a plurality of logical rings, each ring including a different set of at least one core on that node, each ring including the cores on at least two of the nodes; iteratively for each node: assigning each core of that node to one of the rings established for that node to which the core has not previously been assigned, and performing, for each ring for that node, a global allreduce operation using contribution data for the cores assigned to that ring or any global allreduce results from previous global allreduce operations, yielding current global allreduce results for each core; and performing, for each node, a local allreduce operation using the global allreduce results.
    Type: Grant
    Filed: May 21, 2008
    Date of Patent: July 9, 2013
    Assignee: International Business Machines Corporation
    Inventor: Ahmad Faraj
  • Publication number: 20130173890
    Abstract: A method of generating a hardware design for a stream processor. The method includes defining a graph representing a processing operation designating processes to be implemented in hardware as part of the stream processor. The graph represents the processing operation in the time domain as a function of clock cycles and includes at least one data path. At least one stream offset object is provided located at a particular point in the data path.
    Type: Application
    Filed: February 27, 2013
    Publication date: July 4, 2013
    Applicant: MAXELER TECHNOLOGIES LTD.
    Inventor: MAXELER TECHNOLOGIES LTD.
  • Publication number: 20130166889
    Abstract: A method and apparatus are described for generating flags in response to processing data during an execution pipeline cycle of a processor. The processor may include a multiplexer configured generate valid bits for received data according to a designated data size, and a logic unit configured to control the generation of flags based on a shift or rotate operation command, the designated data size and information indicating how many bytes and bits to rotate or shift the data by. A carry flag may be used to extend the amount of bits supported by shift and rotate operations. A sign flag may be used to indicate whether a result is a positive or negative number. An overflow flag may be used to indicate that a data overflow exists, whereby there are not a sufficient number of bits to store the data.
    Type: Application
    Filed: December 22, 2011
    Publication date: June 27, 2013
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Srikanth Arekapudi, Saurabh Gupta