Arithmetic Operation Instruction Processing Patents (Class 712/221)
-
Publication number: 20140075162Abstract: A digital processor is provided having an instruction set with a complex exponential function. The digital processor evaluates a complex exponential function for an input value, x, by obtaining a complex exponential software instruction having the input value, x, as an input; and in response to the complex exponential software instruction: invoking at least one complex exponential functional unit that implements complex exponential software instructions to apply the complex exponential function to the input value, x; and generating an output corresponding to the complex exponential of the input value, x. A complex exponential function for an input value, x, can be evaluated by wrapping the input value to maintain a given range; computing a coarse approximation angle using a look-up table; scaling the coarse approximation angle to obtain an angle from 0 to ?; and computing a fine corrective value using a polynomial approximation.Type: ApplicationFiled: October 26, 2012Publication date: March 13, 2014Applicant: LSI CorporationInventors: Kameran Azadet, Albert Molina, Joseph H. Othmer, Parakalan Venkataraghavan, Meng-Lin Yu, Joseph Williams
-
Publication number: 20140068231Abstract: There is a need to provide a central processing unit capable of improving the resistance to power analysis attack without changing programs, lowering clock frequencies, and greatly redesigning a central processing unit of the related art. In a central processing unit, an arithmetic unit is capable of performing arithmetic operation using data irrelevant to data stored in a register group. A control unit allows the arithmetic unit to perform arithmetic processing corresponding to an incorporated instruction. At this time, the control unit allows the arithmetic unit to perform arithmetic processing using the irrelevant data during a first one-clock cycle.Type: ApplicationFiled: August 29, 2013Publication date: March 6, 2014Applicant: Renesas Electronics CorporationInventor: Minoru SAEKI
-
Patent number: 8656143Abstract: A serial array processor may have an execution unit, which is comprised of a multiplicity of single bit arithmetic logic units (ALUs), and which may perform parallel operations on a subset of all the words in memory by serially accessing and processing them, one bit at a time, while an instruction unit of the processor is pre-fetching the next instruction, a word at a time, in a manner orthogonal to the execution unit.Type: GrantFiled: February 3, 2010Date of Patent: February 18, 2014Inventor: Laurence H. Cooke
-
Publication number: 20140047220Abstract: According to some embodiments, a technique provides for the execution of an instruction that includes receiving residual data of a first image and decoded pixels of a second image, zero-extending a plurality of unsigned data operands of the decoded pixels producing a plurality of unpacked data operands, adding a plurality of signed data operands of the residual data to the plurality of unpacked data operands producing a plurality of signed results; and saturating the plurality of signed results producing a plurality of unsigned results.Type: ApplicationFiled: October 11, 2013Publication date: February 13, 2014Inventors: BRADLEY ALDRICH, NIGEL PAVER, MURLI GANESHAN
-
Patent number: 8649508Abstract: A system and method for implementing the Elliptic Curve scalar multiplication method in cryptography, where the Double Base Number System is expressed in decreasing order of exponents and further on using it to determine Elliptic curve scalar multiplication over a finite elliptic curve.Type: GrantFiled: September 29, 2008Date of Patent: February 11, 2014Assignee: Tata Consultancy Services Ltd.Inventor: Natarajan Vijayarangan
-
Publication number: 20140025934Abstract: An arithmetic processing apparatus and method for high speed processing of an application are provided. The arithmetic processing apparatus may include a program control unit to store operation processing information necessary for application operation in a communication channel by executing an application code, and an operation processing unit to process the application operation using the operation processing information stored in the communication channel.Type: ApplicationFiled: July 18, 2013Publication date: January 23, 2014Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventors: Joon Ho SONG, Shi Hwa Lee, Do Hyung Kim
-
Patent number: 8635434Abstract: A mathematical operation processing apparatus is disclosed by which the supply of an operand which is performed based on condition codes by a plurality of mathematical operations can be performed at a high speed. The mathematical operation processing apparatus includes a plurality of computing elements configured to perform different mathematical operations different from one another and produce mathematical operation results of the mathematical operations and condition codes. A condition code set register retains the condition codes produced simultaneously by the computing elements as a condition code set. A condition code conversion section performs a predetermined conversion for the condition code set and outputs a result of the conversion as a conversion condition code set. An operand supplying section supplies an operand for the mathematical operations in the computing elements based on the conversion condition code set.Type: GrantFiled: December 4, 2007Date of Patent: January 21, 2014Assignee: Sony CorporationInventors: Yasuhiro Iizuka, Takahiro Sato, Takayasu Kon, Kenichi Sanpei, Eiichiro Morinaga
-
Patent number: 8635415Abstract: A set of default registers of a processor are expanded into metadata registers on the processor of a computer system. The default registers having stored thereon data, while metadata which is related to the data is stored separately on the metadata registers.Type: GrantFiled: September 30, 2009Date of Patent: January 21, 2014Assignee: Intel CorporationInventors: Baiju V. Patel, Rajeev Gopalakrishna, Andrew F. Glew, Robert J. Kushlis, Don Alan Van Dyke, Joseph Frank Cihula, Asit K. Mallick, James B. Crossland, Gilbert Neiger, Scott Dion Rodgers, Martin Guy Dixon, Mark Jay Charney, Jacob (Koby) Gottlieb
-
Publication number: 20140019727Abstract: Apparatus and method for a modified, balanced throughput data-path architecture is given for efficiently implementing the digital signal processing algorithms of filtering, convolution and correlation in computer hardware, in which both data and coefficient buffers can be implemented as sliding windows. This architecture uses a multiplexer and a data path branch from the Address Generator unit to the multiply-accumulate execution unit. By selecting between the data path of Address Generator to execution unit and the data path of register to execution unit, the unbalanced throughput and multiply-accumulate bubble cycles caused by misaligned addressing on coefficients can be overcome. The modified balanced throughput data-path architecture can achieve a high multiply-accumulate operation rate per cycle in implementing digital signal processing algorithms.Type: ApplicationFiled: July 8, 2013Publication date: January 16, 2014Inventors: PengFei ZHU, HongXia SUN, YongQiang WU, Elio GUIDETTI
-
Publication number: 20140019725Abstract: Methods, systems, and apparatuses are disclosed for implementing fast large-integer arithmetic within an integrated circuit, such as on IA (Intel Architecture) processors, in which such means include receiving a 512-bit value for squaring, the 512-bit value having eight sub-elements each of 64-bits and performing a 512-bit squaring algorithm by: (i) multiplying every one of the eight sub-elements by itself to yield a square of each of the eight sub-elements, the eight squared sub-elements collectively identified as T1, (ii) multiplying every one of the eight sub-elements by the other remaining seven of the eight sub-elements to yield an asymmetric intermediate result having seven diagonals therein, wherein each of the seven diagonals are of a different length, (iii) reorganizing the asymmetric intermediate result having the seven diagonals therein into a symmetric intermediate result having four diagonals each of 7×1 sub-elements of the 64-bits in length arranged across a plurality of columns, (iv) adding allType: ApplicationFiled: December 6, 2012Publication date: January 16, 2014Inventors: ERDINC OZTURK, VINODH GOPAL, JAMES GUILFORD
-
Publication number: 20140019726Abstract: A parallel arithmetic device includes a status management section, a plurality of processor elements, and a plurality of switch elements for determining the relation of coupling of each of the processor elements. Each of the processor elements includes an instruction memory for memorizing a plurality of operation instructions corresponding respectively to a plurality of contexts so that an operation instruction corresponding to the context selected by the status management section is read out, and a plurality of arithmetic units for performing arithmetic processes in parallel on a plurality of sets of input data in a manner compliant with the operation instruction read out from the instruction memory.Type: ApplicationFiled: July 5, 2013Publication date: January 16, 2014Inventors: Takao TOI, Taro FUJII, Yoshinosuke KATO, Toshiro KITAOKA
-
Patent number: 8631224Abstract: A data processing system includes a plurality of general purpose registers, and processor circuitry for executing one or more instructions, including a vector dot product instruction for simultaneously performing at least two dot products. The vector dot product instruction identifies a first and second source register, each for storing a plurality of vector elements, where a first dot product is to be performed between a first subset of vector elements of the first source register and a first subset of vector elements of the second source register, and a second dot product is to be performed between a second subset of vector elements of the first source register and a second subset of vector elements of the second source register. The first and second subsets of the second source register are different and at least two vector elements of the first and second subsets of the second source register overlap.Type: GrantFiled: September 13, 2007Date of Patent: January 14, 2014Assignee: Freescale Semiconductor, Inc.Inventor: William C. Moyer
-
Publication number: 20140013086Abstract: A number of addition instructions are provided that have no data dependency between each other. A first addition instruction stores its carry output in a first flag of a flags register without modifying a second flag in the flags register. A second addition instruction stores its carry output in the second flag of the flags register without modifying the first flag in the flags register.Type: ApplicationFiled: December 22, 2011Publication date: January 9, 2014Inventors: Vinodh Gopal, James D. Guilford, Gilbert M. Wolrich, Wajdi K. Feghali, Erdinc Ozturk, Martin G. Dixon, Sean P. Mirkes, Matthew C. Merten, Tong Li, Bret T. Toll, I
-
Patent number: 8627046Abstract: A data processing device has an instruction decoder, a control logic unit, and ALU. The instruction decoder decodes instruction codes of an arithmetic instruction. The control logic unit detects the effective data width of operation data to be processed according to the decode result from the instruction decoder and determines the number of cycles for the instruction execution corresponding to the effective, data width. The ALU executes the instruction with the number of cycles of the instruction execution determined by the control logic unit.Type: GrantFiled: May 23, 2011Date of Patent: January 7, 2014Assignee: Renesas Electronics CorporationInventors: Sugako Ohtani, Hiroyuki Kondo
-
Publication number: 20140006754Abstract: A system includes a processor having an instruction register for storing an instruction having a predefined opcode, a predicate register for storing a predicate condition to select an output register for a result of the instruction, a first output register, and a second output register. The processor further includes processor circuitry operable to execute the instruction to produce a result, and processor circuitry operable to store the result of the instruction in the first output register if the predicate condition to select the output is true, and to store the second output register if the predicate condition to select the output is false. A single instruction is used to produce the result, and to store the result of the instruction.Type: ApplicationFiled: September 5, 2013Publication date: January 2, 2014Applicant: NVIDIA CorporationInventors: Timo Oskari Aila, Samuli Matias Laine
-
Publication number: 20140006753Abstract: A method is described. The method includes iteratively performing for each position in a result matrix stored in a third register, multiplying a value at a matrix position stored in a first register with a value at a matrix position stored in a second register to obtain a first multiplicative value, where the positions in the first register and the second register are determined by the position in the result matrix and performing an exclusive or (XOR) operation with the first multiplicative value and a value stored at a result matrix position stored in the third register to obtain a result value.Type: ApplicationFiled: December 22, 2011Publication date: January 2, 2014Inventors: Vinodh Gopal, Gilbert M. Wolrich, Kirk S. Yap, James D. Guilford, Erdinc Ozturk, Sean M. Gulley, Wajdi K. Feghali, Martin G. Dixon
-
Publication number: 20130346730Abstract: An arithmetic processing apparatus includes a plurality of processors, each of the processors having an arithmetic unit and a cache memory. The processor includes an instruction port that holds a plurality of instructions accessing data of the cache memory, a first determination unit that validates a first flag when receiving an invalidation request for data in the cache memory, a cache index of a target address and a way ID of the received request match with a cache index of a designated address and a way ID of the load instruction, a second determination unit that validates a second flag when target data is transmitted due to a cache miss, and an instruction re-execution determination unit that instructs re-execution of an instruction subsequent to the load instruction when both the first flag and the second flag are validated at the time of completion of an instruction in the instruction port.Type: ApplicationFiled: April 30, 2013Publication date: December 26, 2013Applicant: FUJITSU LIMITEDInventor: Naohiro KIYOTA
-
Publication number: 20130339677Abstract: The invention provides microprocessor extensions for cooperating with a sequential arithmetic-logic unit (ALU) to execute a multiply-and-accumulate operation (MAc). The ALU performs a continuous sequence of accumulation instructions synchronously with a clock signal (CLK1). Buffers (BUF1, BUF2) store input data which are fed to a combinatorial multiplier (MULT) by first buses (L1, L2). A second bus (N1) forwards the product to the ALU, where it is accumulated with previous data. Since at least the first buses operate independently of the clock signal, they do not limit the speed of the MAc operation. In particular embodiments, a finite state machine (FSM) controls the buses on the basis of triggers, e.g., signals from the multiplier and/or ALU indicating the completion of their respective instructions. The FSM may be operable in a low-power mode. The invention also relates to methods, computer programs and the use of a sequential ALU for executing MAc operations.Type: ApplicationFiled: April 14, 2011Publication date: December 19, 2013Applicant: ST. JUDE MEDICAL ABInventor: Mattias Tullberg
-
Patent number: 8610947Abstract: An image processing apparatus includes an interpreting unit that interprets an order of the logical arithmetic processing and a kind of a logical arithmetic processing; and a drawing unit that, in a case of drawing the image information as raster data, draws from an element of an upper-order side in order of the logical arithmetic processing interpreted by the interpreting unit with respect to an area that is interpreted to be processed by a simple overwrite processing for giving priority to an uppermost-order side element as the kind of the logical arithmetic processing, and draws using a calculation sequentially from an element of a lower-order side in order of the logical arithmetic processing interpreted by the interpreting unit with respect to an area that is interpreted to be processed by a logical arithmetic processing for using the calculation as to the overlapped elements as the kind of the logical arithmetic processing.Type: GrantFiled: August 20, 2009Date of Patent: December 17, 2013Assignee: Fuji Xerox Co., LtdInventor: Shusuke Tanimoto
-
Publication number: 20130318329Abstract: In order to enable to quickly and efficiently execute, by one system, various modulation/demodulation/synchronous processes in a plurality of radio communication methods, a co-processor (22) for complex arithmetic processing, which forms a processor system (100), includes a complex arithmetic circuit (22) that executes for complex data a complex arithmetic operation required for radio communication in accordance with an instruction from a primary processor (10), and a memory controller (20, 21) that operates in parallel with the complex arithmetic circuit and accesses a memory. A trace circuit provided in the complex arithmetic circuit (22) monitors arithmetic result data for first complex data series sequentially read from the memory, and detects a normalization coefficient for normalizing the arithmetic result data.Type: ApplicationFiled: September 15, 2011Publication date: November 28, 2013Applicant: NEC CORPORATIONInventors: Toshiki Takeuchi, Hiroyuki Igura
-
Patent number: 8595470Abstract: A digital signal processor includes an instruction analysis unit, a digital signal processor (DSP) core and a memory unit. The instruction analysis unit receives an instruction and determines the required bit width M for the data process corresponding to the instruction. The DSP core performs the M-bit data process based on the bit width M determined by the instruction analysis unit, and the memory unit stores multiple data and performs the M-bit access based on the bit width M determined by the instruction analysis unit thereby allowing the DSP core to access, and at least one available space in the memory unit will be adjusted such that only the access space having the bit width M for the operation corresponding to the instruction will be open in each access, thereby effectively achieving the effect of power-saving.Type: GrantFiled: November 9, 2010Date of Patent: November 26, 2013Assignee: Sentelic CorporationInventors: Zhiyang Guo, Mao-Sung Wu, Chun Hsien, Tsai-Lin Lee
-
Publication number: 20130310983Abstract: It is intended to reduce the amount of computation to be performed by CPU or the required amount of storage space in a built-in memory for timing adjustment of a pulse output signal. A digital multiplying circuit in the phase arithmetic circuit of the pulse generating circuit generates a multiplication output signal by multiplying a phase angle change value in the phase adjustment data register and a count maximum value Nmax in the cycle data register. A digital dividing circuit generates a division output signal by dividing the multiplication output signal by 360 degrees of phase angle for one cycle. A digital adding circuit adds the division output signal and rise setting/fall setting count values and a subtracting circuit subtracts the division output signal from these values. The addition and subtraction generate new rise setting/fall setting count values required to delay/advance the phase by the phase angle change value.Type: ApplicationFiled: May 6, 2013Publication date: November 21, 2013Applicant: Renesas Electronics CorporationInventors: Takehiro SHIMIZU, Toshio ASAI
-
Patent number: 8589469Abstract: Multiplication engines and multiplication methods are provided for a digital processor.Type: GrantFiled: January 10, 2008Date of Patent: November 19, 2013Assignee: Analog Devices TechnologyInventors: Andreas D. Olofsson, Baruch Yanovitch
-
Patent number: 8583902Abstract: Techniques are disclosed relating to a processor including instruction support for performing a Montgomery multiplication. The processor may issue, for execution, programmer-selectable instruction from a defined instruction set architecture (ISA). The processor may include an instruction execution unit configured to receive instructions including a first instance of a Montgomery-multiply instruction defined within the ISA. The Montgomery-multiply instruction is executable by the processor to operate on at least operands A, B, and N residing in respective portions of a general-purpose register file of the processor, where at least one of operands A, B, N spans at least two registers of general-purpose register file. The instruction execution unit is configured to calculate P mod N in response to receiving the first instance of the Montgomery-multiply instruction, where P is the product of at least operand A, operand B, and R^?1.Type: GrantFiled: May 7, 2010Date of Patent: November 12, 2013Assignee: Oracle International CorporationInventors: Christopher H. Olson, Gregory F. Grohoski, Lawrence Spracklen, Nils Gura
-
Patent number: 8583903Abstract: Disclosed herein are efficient geometries for dynamical topology changing (DTC), together with protocols to incorporate DTC into quantum computation. Given an Ising system, twisted depletion to implement a logical gate T, anyonic state teleportation into and out of the topology altering structure, and certain geometries of the (1,?2)-bands, a classical computer can be enabled to implement a quantum algorithm.Type: GrantFiled: December 28, 2010Date of Patent: November 12, 2013Assignee: Microsoft CorporationInventors: Michael Freedman, Parsa Bonderson, Chetan Nayak, Sankar Das Sarma
-
Publication number: 20130297916Abstract: A related art semiconductor device suffers from a problem that a processing capacity is decayed by switching an occupied state for each partition. A semiconductor device according to the present invention includes an execution unit that executes an arithmetic instruction, and a scheduler including multiple first setting registers each defining a correspondence relationship between hardware threads and partitions, and generates a thread select signal on the basis of a partition schedule and a thread schedule. The scheduler outputs a thread select signal designating a specific hardware thread without depending on the thread schedule as the partition indicated by a first occupation control signal according to a first occupation control signal output when the execution unit executes a first occupation start instruction.Type: ApplicationFiled: April 9, 2013Publication date: November 7, 2013Applicant: Renesas Electronics CorporationInventors: Hitoshi Suzuki, Koji Adachi
-
Publication number: 20130290684Abstract: New instruction definitions for a packet add (PADD) operation and for a single instruction multiple add (SMAD) operation are disclosed. In addition, a new dedicated PADD logic device that performs the PADD operation in about one to two processor clock cycles is disclosed. Also, a new dedicated SMAD logic device that performs a single instruction multiple data add (SMAD) operation in about one to two clock cycles is disclosed.Type: ApplicationFiled: June 25, 2013Publication date: October 31, 2013Inventors: Corey Gee, Bapiraju Vinnakota, Saleem Mohammadali, Carl A. Alberola
-
Publication number: 20130283016Abstract: Provided is a signal processing circuit occupying a small circuit area. A common arithmetic operation element is shared between a plurality of arithmetic operation sequence control units. An arbitration circuit selects, when the plurality of arithmetic operation sequence control units simultaneously generate requests for arithmetic operations to use the common arithmetic operation element, the predetermined sequence control unit based on priority information about the plurality of arithmetic operation sequence control units, causes the common arithmetic operation element to execute the arithmetic operation requested from the selected arithmetic operation sequence control unit, and returns the result of the arithmetic operation to the selected arithmetic operation sequence control unit.Type: ApplicationFiled: April 17, 2013Publication date: October 24, 2013Applicant: RENESAS ELECTRONICS CORPORATIONInventors: Hiroyuki YAMASAKI, Hideyuki NODA, Kan MURATA
-
Patent number: 8566566Abstract: There is provided a vector processing apparatus and method allowing for the parallel processing of a plurality of different instructions while maintaining vector processing architecture. The vector processing apparatus includes an instruction memory storing a multiple instruction group including one or more instructions; an instruction fetch unit reading the multiple instruction group from the instruction memory; and a plurality of instruction processing units each receiving the multiple instruction group through the instruction fetch unit, selecting a single instruction from the multiple instruction group according to a previous arithmetic result, and performing a arithmetic operation.Type: GrantFiled: August 2, 2010Date of Patent: October 22, 2013Assignee: Electronics and Telecommunications Research InstituteInventors: Moo Kyoung Chung, Young Su Kwon, Kyung Su Kim
-
Publication number: 20130275729Abstract: A method of an aspect includes receiving an instruction indicating a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes the result including a sequence of at least four non-negative integers. In an aspect, values of the at least four non-negative integers are not calculated using a result of a preceding instruction. Other methods, apparatus, systems, and instructions are disclosed.Type: ApplicationFiled: December 22, 2011Publication date: October 17, 2013Inventors: Seth Abraham, Robert Valentine, Elmoustapha Ould-Ahmed-Vall, Zeev Sperber, Amit Gradstein
-
Publication number: 20130275727Abstract: A method of an aspect includes receiving an instruction. The instruction indicates an integer stride, indicates an integer offset, and indicates a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four integers in numerical order with a smallest one of the at least four integers differing from zero by the integer offset and with all integers of the sequence in consecutive positions differing by the integer stride. Other methods, apparatus, systems, and instructions are disclosed.Type: ApplicationFiled: December 22, 2011Publication date: October 17, 2013Inventors: Seth Abraham, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Zeev Sperber, Amit Gradstein
-
Publication number: 20130275728Abstract: A method of an aspect includes receiving a packed data operation mask register arithmetic combination instruction. The packed data operation mask register arithmetic combination instruction indicates a first packed data operation mask register, indicates a second packed data operation mask register, and indicates a destination storage location. An arithmetic combination of at least a portion of bits of the first packed data operation mask register and at least a corresponding portion of bits of the second packed data operation mask register is stored in the destination storage location in response to the packed data operation mask register arithmetic combination instruction. Other methods, apparatus, systems, and instructions are disclosed.Type: ApplicationFiled: December 22, 2011Publication date: October 17, 2013Applicant: Intel CorporationInventors: Bret L. Toll, Robert Valentine, Jesus Corbal San Adrian, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney
-
Publication number: 20130275726Abstract: A branch target address table is provided for each branch instruction having a plurality of branch targets. Each branch target address table stores a history of a plurality of branch target addresses determined in the past by executing a corresponding branch instruction. A branch target prediction unit predicts a predicted branch target address with respect to a branch instruction with reference to the history of branch target addresses stored in the branch target address table corresponding to the branch instruction. The predicted branch target address obtained as a result of the prediction is stored, for example, in a predicted branch target address storage unit in association with the branch instruction, and is referenced by an instruction fetch control unit at the time of prefetching a branch target instruction.Type: ApplicationFiled: June 10, 2013Publication date: October 17, 2013Inventor: Megumi Ukai
-
Patent number: 8560811Abstract: The present invention provides a method and apparatus for handling lane-crossing instructions in an execution pipeline. One embodiment of the method includes conveying bits of an instruction from a register to an execution stage in a pipeline along a first data path that includes a lane crossing stage configured to change a first mapping of the register to the execution stage to a second mapping. The method also includes concurrently conveying the bits along a second data path from the register to the execution stage that bypasses the lane crossing stage. The method further includes selecting the first or second data path to provide the bits to the execution stage.Type: GrantFiled: August 5, 2010Date of Patent: October 15, 2013Assignee: Advanced Micro Devices, Inc.Inventor: John M. King
-
Publication number: 20130268794Abstract: In one embodiment, the present invention includes a processor having a fused multiply-add (FMA) unit to perform FMA instructions and add-like instructions. This unit can include an adder with multiple segments each independently controlled by a logic. The logic can clock gate at least one segment during execution of an add-like instruction in another segment of the adder when the add-like instruction has a width less than a width of the FMA unit. Other embodiments are described and claimed.Type: ApplicationFiled: November 21, 2011Publication date: October 10, 2013Inventor: Chad D. Hancock
-
Publication number: 20130262819Abstract: An apparatus includes a processor to determine an extremum among a series of values that are successively provided to a first register and a second register. The processor is configured to execute a single cycle search instruction, including compare a value in the first register with a value in a first accumulator, and store an extremum of the two values in the first accumulator; and compare a value in the second register with a value in a second accumulator, and store an extremum of the two values in the second accumulator. The processor is configured to execute a single cycle select instruction, including compare the value in the first accumulator with the value in the second accumulator, and store an extremum of the two values in the first accumulator, the extremum stored in the first accumulator representing the extremum of the series of numbers.Type: ApplicationFiled: April 2, 2012Publication date: October 3, 2013Inventors: Srinivasan Iyer, Carsten Aagaard Pedersen
-
Publication number: 20130262836Abstract: A method and apparatus for including in a processor instructions for performing multiply-subtract operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-subtract operations on data elements in the first and second packed data.Type: ApplicationFiled: May 30, 2013Publication date: October 3, 2013Inventors: Alexander Peleg, Milland Mittal, Larry M. Mennemeier, Benny Eitan, Carole Dulong, Eiichi Kowashi, Wolf C. Witt
-
Patent number: 8549264Abstract: A method in one aspect may include receiving an add instruction. The add instruction may indicate a first source operand, a second source operand, and a third source operand. A sum of the first, second, and third source operands may be stored as a result of the add instruction. The sum may be stored partly in a destination operand indicated by the add instruction and partly a plurality of flags. Other methods are also disclosed, as are apparatus, systems, and instructions on machine-readable medium.Type: GrantFiled: December 22, 2009Date of Patent: October 1, 2013Assignee: Intel CorporationInventors: Vinodh Gopal, James D. Guilford, Gilbert M. Wolrich, Wajdi K. Feghali, Erdinc Ozturk, Martin G. Dixon, Sean P. Mirkes, Bret L. Toll, Maxim Loktyukhin, Mark C. Davis, Alexandre J. Farcy
-
Publication number: 20130254516Abstract: An arithmetic processing unit that performs processing of a stream-type includes an arithmetic unit configured to operate an input operand to obtain a result of operation; and a data input and output unit configured to read the input operand out of a memory when an instruction which is issued in a case where a stream length of the input operand is shorter than a stream length of an output operand corresponding to the input operand and includes data indicating a recursive rule used when the input operand is read out, to supply the read input operand, and to store the result of the operation obtained by the arithmetic unit in the memory as the output operand, wherein the arithmetic unit 20 operates the input operand read out by the data input and output unit and outputs the result of operation to the data input and output unit.Type: ApplicationFiled: November 7, 2012Publication date: September 26, 2013Inventors: Yi GE, Kazuo HORIO
-
Patent number: 8543626Abstract: A method and apparatus for QR-factorizing matrix on a multiprocessor system, wherein the multiprocessor system comprises at least one core processor and a plurality of accelerators, comprises the steps of: iteratively factorizing each panel in the matrix until the whole matrix is factorized; wherein in each iteration, the method comprises: partitioning an unprocessed matrix part in the matrix into a plurality of blocks according to a predetermined block size; partitioning a current processed panel in the unprocessed matrix part into at least two sub panels, wherein the current processed panel is composed of a plurality of blocks; and performing QR factorization one by one on the at least two sub panels with the plurality of accelerators, and updating the data of the sub panel(s) on which no QR factorization has been performed among the at least two sub panels by using the factorization result.Type: GrantFiled: July 27, 2012Date of Patent: September 24, 2013Assignee: International Business Machines CorporationInventors: Hui Li, Bai Ling Wang
-
Publication number: 20130227252Abstract: A method in one aspect may include receiving an add instruction. The add instruction may indicate a first source operand, a second source operand, and a third source operand. A sum of the first, second, and third source operands may be stored as a result of the add instruction. The sum may be stored partly in a destination operand indicated by the add instruction and partly a plurality of flags. Other methods are also disclosed, as are apparatus, systems, and instructions on machine-readable medium.Type: ApplicationFiled: March 13, 2013Publication date: August 29, 2013Inventors: Vinodh Gopal, James D. Guilford, Gilbert M. Wolrich, Wajdi K. Feghali, Erdinc Ozturk, Martin G. Dixon, Sean P. Mirkes, Bret L. Toll, Maxim Loktyukhin, Mark C. Davis, Alexandre J. Farcy
-
Patent number: 8520016Abstract: An instruction folding mechanism, a method for performing the instruction folding mechanism and a pixel processing system employing the instruction folding mechanism are described. The pixel processing system comprises an instruction folding mechanism and a pixel shader. The instruction folding mechanism folds a plurality of first instructions in a first program to generate a second program having at least one second instruction which is a combination of the first instructions. The pixel shader connected to the instruction folding mechanism fetches the second program to decode at least the second instruction having the combination of the first instructions to execute the second program. The instruction folding mechanism comprises an instruction scheduler, a folding rule checker, and an instruction combiner. The instruction scheduler connected to the folding rule checker is used to scan the first instructions according to static positions in order to schedule the first instructions in the first program.Type: GrantFiled: March 9, 2009Date of Patent: August 27, 2013Assignee: Taichi Holdings, LLCInventor: R-Ming Hsu
-
Publication number: 20130212362Abstract: A restriction is given to the calculation function for image processing achieved by the hard-wired system and the memory access control of a buffer memory, and a range of the restriction is made variable by a program control and others. Data is inputted to the buffer memory from the outside with a restriction of “in units of memory line”, and the number of memory lines and positions of the same to which data is inputted can be programmable by the control circuit. The arithmetic circuit is subjected to the restriction of performing the calculation in units of data of one or plural memory lines supplied from the buffer memory, and a calculation processing content in units of calculation processing for the units of data can be programmably assigned by the control circuit.Type: ApplicationFiled: March 15, 2013Publication date: August 15, 2013Applicant: RENESAS ELECTRONICS CORPORATIONInventor: RENESAS ELECTRONICS CORPORATION
-
Publication number: 20130212357Abstract: Systems and methods for generating a floating point constant value from an instruction are disclosed. A first field of the instruction is decoded as a sign bit of the floating point constant value. A second field of the instruction is decoded to correspond to an exponent value of the floating point constant value. A third field of the instruction is decoded to correspond to the significand of the floating point constant value. The first field, the second field, and the third field are combined to form the floating point constant value. The exponent value may include a bias, and a bias constant may be added to the exponent value to compensate for the bias. The third field may comprise the most significant bits of the significand. Optionally, the second field and the third field may be shifted by first and second shift values respectively before they are combined to form the floating point constant value.Type: ApplicationFiled: February 9, 2012Publication date: August 15, 2013Applicant: QUALCOMM INCORPORATEDInventors: Erich James Plondke, Lucian Codrescu, Charles Joseph Tabony, Swaminathan Balasubramanian
-
Publication number: 20130205123Abstract: The present invention relates to a processor having a trace cache and a plurality of ALUs arranged in a matrix, comprising an analyser unit located between the trace cache and the ALUs, wherein the analyser unit analyses the code in the trace cache, detects loops, transforms the code, and issues to the ALUs sections of the code combined to blocks for joint execution for a plurality of clock cycles.Type: ApplicationFiled: July 8, 2011Publication date: August 8, 2013Inventor: Martin Vorbach
-
Patent number: 8504807Abstract: A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag.Type: GrantFiled: December 26, 2009Date of Patent: August 6, 2013Assignee: Intel CorporationInventors: Vinodh Gopal, James D. Guilford, Gilbert M. Wolrich, Wajdi K. Feghali, Erdinc Ozturk, Martin G. Dixon, Sean P. Mirkes, Bret L. Toll, Maxim Loktyukhin, Mark C. Davis, Alexandre J. Farcy
-
Patent number: 8495734Abstract: The present disclosure relates to a method for executing, by a processor, a program read in a program memory, comprising steps of: detecting a program memory read address jump; providing prior to a jump address instruction for jumping a program memory read address, an instruction for storing the presence of the jump address instruction; and activating an error signal if an address jump has been detected and if the presence of a jump address instruction has not been stored. The present disclosure also relates to securing integrated circuits.Type: GrantFiled: June 16, 2009Date of Patent: July 23, 2013Assignee: STMicroelectronics SAInventors: Frederic Bancel, Nicolas Berard, David Hely
-
Patent number: 8484440Abstract: Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: establishing, for each node, a plurality of logical rings, each ring including a different set of at least one core on that node, each ring including the cores on at least two of the nodes; iteratively for each node: assigning each core of that node to one of the rings established for that node to which the core has not previously been assigned, and performing, for each ring for that node, a global allreduce operation using contribution data for the cores assigned to that ring or any global allreduce results from previous global allreduce operations, yielding current global allreduce results for each core; and performing, for each node, a local allreduce operation using the global allreduce results.Type: GrantFiled: May 21, 2008Date of Patent: July 9, 2013Assignee: International Business Machines CorporationInventor: Ahmad Faraj
-
Publication number: 20130173890Abstract: A method of generating a hardware design for a stream processor. The method includes defining a graph representing a processing operation designating processes to be implemented in hardware as part of the stream processor. The graph represents the processing operation in the time domain as a function of clock cycles and includes at least one data path. At least one stream offset object is provided located at a particular point in the data path.Type: ApplicationFiled: February 27, 2013Publication date: July 4, 2013Applicant: MAXELER TECHNOLOGIES LTD.Inventor: MAXELER TECHNOLOGIES LTD.
-
Publication number: 20130166889Abstract: A method and apparatus are described for generating flags in response to processing data during an execution pipeline cycle of a processor. The processor may include a multiplexer configured generate valid bits for received data according to a designated data size, and a logic unit configured to control the generation of flags based on a shift or rotate operation command, the designated data size and information indicating how many bytes and bits to rotate or shift the data by. A carry flag may be used to extend the amount of bits supported by shift and rotate operations. A sign flag may be used to indicate whether a result is a positive or negative number. An overflow flag may be used to indicate that a data overflow exists, whereby there are not a sufficient number of bits to store the data.Type: ApplicationFiled: December 22, 2011Publication date: June 27, 2013Applicant: ADVANCED MICRO DEVICES, INC.Inventors: Srikanth Arekapudi, Saurabh Gupta