Arithmetic Operation Instruction Processing Patents (Class 712/221)
  • Publication number: 20110078413
    Abstract: An arithmetic processing apparatus includes an arithmetic circuit; a first memory configured to store data to be processed in the arithmetic circuit; a second memory configured to be accessed through a first path by the arithmetic circuit; a preloader configured to preload the data from the second memory into the first memory through a second path; a memory controller configured to arbitrate between a first access by the arithmetic circuit using the first path and a second access by the preloader using the second path; and a scheduler configured to control the memory controller.
    Type: Application
    Filed: September 28, 2010
    Publication date: March 31, 2011
    Applicant: FUJITSU LIMITED
    Inventors: Hiromasa YAMAUCHI, Koichiro Yamashita
  • Publication number: 20110078389
    Abstract: A set of default registers of a processor are expanded into metadata registers on the processor of a computer system. The default registers having stored thereon data, while metadata which is related to the data is stored separately on the metadata registers.
    Type: Application
    Filed: September 30, 2009
    Publication date: March 31, 2011
    Inventors: Baiju V. Patel, Rajeev Gopalakrishna, Andrew F. Glew, Robert J. Kushlis, Don Alan Van Dyke, Joseph Frank Cihula, Asit K. Mallick, James B. Crossland, Gilbert Neiger, Scott Dion Rodgers, Martin Guy Dixon, Mark Jay Charney, Jocob Gottlieb
  • Publication number: 20110078420
    Abstract: A computer architecture (100) and a method for adapting and executing (200) a computer program therefore, is provided. A value is computed by processing the instructions comprised in a basic block of the program in accordance with a first mathematical function (208). An instruction comprising an original address is modified, using a second mathematical function (214) taking the value as input, to comprise a modified address. In this manner, a fault attack during execution of the computer program will cause a disturbance of the control flow, thereby making such an attack unlikely to succeed.
    Type: Application
    Filed: May 12, 2009
    Publication date: March 31, 2011
    Applicant: NXP B.V.
    Inventors: Joachim Artur Trescher, Paulus Mathias Hubertus Mechtildis Gorissen, Wilhelmus Petrus Adrianus Johannus Michiels
  • Patent number: 7917737
    Abstract: A method of performing data and pointer compression includes, in a buffer which is formed between a processor and a level one cache and stores plural tags and full-word values associated with the tags, when the buffer is presented with an address, breaking the address into a line number which indexes a set of the full-word values, and a tag which is used as a key to determine whether a value in the set of full-word values includes a value associated with the presented address, if a tag in the presented address matches a tag in the buffer, returning a full-word value in the buffer which is associated with the tag, and storing the returned full-word value in a destination register of an instruction which originated the presented address, and if a tag in the presented address does not match a tag in the buffer, generating a fault and branching control to a pre-defined handler.
    Type: Grant
    Filed: October 22, 2008
    Date of Patent: March 29, 2011
    Assignee: International Business Machines Corporation
    Inventors: David Francis Bacon, Perry Cheng, David Paul Grove
  • Publication number: 20110057938
    Abstract: A system and method are presented by which data on a graphics processing unit (GPU) can be output to one or more buffers with independent output frequencies. In one embodiment, a GPU includes a shader processor configured to respectively emit a plurality of data sets into a plurality of streams in parallel. Each data is emitted into at least a portion of its respective stream. Also included is a first number of counters configured to respectively track the emitted data sets.
    Type: Application
    Filed: September 9, 2010
    Publication date: March 10, 2011
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Todd Martin, Vineet Goel
  • Patent number: 7904698
    Abstract: The electronic circuit contains a plurality of processing elements (10), which are supplied with instructions under control of a common program flow, typically for SIMD operation wherein the same instructions are applied to all processing elements and different operand data of the instructions to respective ones of the processing elements (10). Under control of the instructions each processing element (10) determines, whether an operand data dependent condition has occurred. The processing element outputs a condition signal dependent on said determination. The condition signals are summed to form a sum signal. Program flow is controlled by a conditional jump dependent on a value represented by the sum signal.
    Type: Grant
    Filed: February 9, 2006
    Date of Patent: March 8, 2011
    Assignee: Koninklijke Philips Electronics N.V.
    Inventors: Richard P. Kleihorst, Anteneh A. Abbo, Sebastien F. Mouy
  • Publication number: 20110055445
    Abstract: A signal processing system may include a multiply-accumulate (MAC) unit to generate output data by performing multiply-accumulate operations on first and second input data in response to a stream of MAC instruction words, where the MAC unit is pipelined to enable it to perform a multiply-accumulate operation in response to each MAC instruction word. The system may also include an instruction generator to generate the stream of MAC instruction words by performing loop expansion on a stream of intermediate instruction words, where one intermediate instruction word may comprise a group of fields to set up the MAC unit to execute in response to the one intermediate instruction word.
    Type: Application
    Filed: March 15, 2010
    Publication date: March 3, 2011
    Applicant: AZURAY TECHNOLOGIES, INC.
    Inventors: Edward Gee, Keith Slavin, Robert Batten, Vincenzo DiTommaso, Ravindranath Naiknaware, Triet Tu Le, Adam Heiberg, Dennis Morel
  • Patent number: 7895420
    Abstract: A method for reducing operations in a processing environment is provided that includes generating one or more binary representations, one or more of the binary representations being included in one or more linear equations that include one or more operations. The method also includes converting one or more of the linear equations to one or more polynomials and then performing kernel extraction and optimization on one or more of the polynomials. One or more common subexpressions associated with the polynomials are identified in order to reduce one or more of the operations.
    Type: Grant
    Filed: February 25, 2005
    Date of Patent: February 22, 2011
    Assignee: Fujitsu Limited
    Inventors: Farzan Fallah, Anup Hosangadi, Ryan C. Kastner
  • Publication number: 20110035569
    Abstract: A superscalar pipelined microprocessor includes a register set defined by its instruction set architecture, a cache memory, execution units, and a load unit, coupled to the cache memory and distinct from the other execution units. The load unit comprises an ALU. The load unit receives an instruction that specifies a memory address of a source operand, an operation to be performed on the source operand to generate a result, and a destination register of the register set to which the result is to be stored. The load unit reads the source operand from the cache memory. The ALU performs the operation on the source operand to generate the result, rather than forwarding the source operand to any of the other execution units of the microprocessor to perform the operation on the source operand to generate the result. The load unit outputs the result for subsequent retirement to the destination register.
    Type: Application
    Filed: October 30, 2009
    Publication date: February 10, 2011
    Inventors: Gerard M. Col, Colin Eddy, Rodney E. Hooker
  • Publication number: 20110035570
    Abstract: A superscalar pipelined microprocessor includes a register set defined by an instruction set architecture of the microprocessor, execution units, and a store unit, coupled to the cache memory and distinct from the other execution units of the microprocessor. The store unit comprises an ALU. The store unit receives an instruction that specifies a source register of the register set and an operation to be performed on a source operand to generate a result. The store unit reads the source operand from the source register. The ALU performs the operation on the source operand to generate the result, rather than forwarding the source operand to any of the other execution units of the microprocessor to perform the operation on the source operand to generate the result. The store unit operatively writes the result to the cache memory.
    Type: Application
    Filed: October 30, 2009
    Publication date: February 10, 2011
    Inventors: Gerard M. Col, Colin Eddy, Rodney E. Hooker
  • Publication number: 20110022823
    Abstract: An information processing system includes an execution unit and a decoder. The execution unit includes a plurality of arithmetic units each having a first operation circuit that performs a first operation on a first input value and a second input value, a second operation circuit that performs a second operation on the first input value and the second input value, and a selector that selects and outputs either a first output value output from the first operation circuit or a second output value output from the second operation circuit based on a selection signal. The decoder decodes an operation instruction and determines each value of the selection signal of each arithmetic unit. The decoder determines the value of the selection signal corresponding to the operation instruction with respect to each program.
    Type: Application
    Filed: June 7, 2010
    Publication date: January 27, 2011
    Inventor: Hideyuki MIWA
  • Patent number: 7877581
    Abstract: A networking application processor is provided. The processor includes an input socket configured to receive data packets. The processor includes a memory for holding instructions and circuitry configured to access data structures associated with the processing stages. The circuitry configured to access data structures enables a single cycle access to an operand from a memory location. An arithmetic logic unit (ALU) is provided. Circuitry for aligning operands to be processed by the ALU is included. The circuitry for aligning the operands causes the operand to be aligned by a lowest significant bit, wherein the circuitry for aligning the operand supplies an extension to the operand to allow the ALU to process different size operands.
    Type: Grant
    Filed: December 2, 2003
    Date of Patent: January 25, 2011
    Assignee: PMC-Sierra US, Inc.
    Inventors: Shridhar Mukund, Mahesh Gopalan, Neeraj Kashalkar
  • Patent number: 7873815
    Abstract: DSP architectures having improved performance are described. In an exemplary architecture, a DSP includes two MAC units and two ALUs, where one of the ALUs replaces an adder for one of the two MAC units. This DSP may be configured to operate in a dual-MAC/single-ALU configuration, a single-MAC/dual-ALU configuration, or a dual-MAC/dual-ALU configuration. This flexibility allows the DSP to handle various types of signal processing operations and improves utilization of the available hardware. The DSP architectures further includes pipeline registers that break up critical paths and allow operations at a higher clock speed for greater throughput.
    Type: Grant
    Filed: March 4, 2004
    Date of Patent: January 18, 2011
    Assignee: QUALCOMM Incorporated
    Inventors: Gilbert C. Sih, De D. Hsu, Way-Shing Lee, Xufeng Chen
  • Publication number: 20110010523
    Abstract: A cascadable arithmetic and logic unit (ALU) which is configurable in function and interconnection. No decoding of commands is needed during execution of the algorithm. The ALU can be reconfigured at run time without any effect on surrounding ALUs, processing units or data streams. The volume of configuration data is very small, which has positive effects on the space required and the configuration speed. Broadcasting is supported through the internal bus systems in order to distribute large volumes of data rapidly and efficiently. The ALU is equipped with a power-saving mode to shut down power consumption completely. There is also a clock rate divider which makes it possible to operate the ALU at a slower clock rate. Special mechanisms are available for feedback on the internal states to the external controllers.
    Type: Application
    Filed: July 27, 2010
    Publication date: January 13, 2011
    Inventors: Martin VORBACH, Robert Münch
  • Publication number: 20100332795
    Abstract: A computer system includes a central processing unit, a random-access-memory interface, a random-access memory in which addresses are allocated in an address space of the random-access-memory interface and a reconfigurable arithmetic device whose arithmetic function is capable of being dynamically changed in accordance with configuration data. The reconfigurable arithmetic device includes input terminals, output terminals, a plurality of processor elements that perform individual arithmetic processes in synchronization with a clock, an inter-processor-element network which connects the input terminals and the output terminals to input ports and output ports of the plurality of processor elements, a random-access memory built into the reconfigurable arithmetic device and a control unit that sets the plurality of processor elements and the inter-processor-element network.
    Type: Application
    Filed: June 7, 2010
    Publication date: December 30, 2010
    Applicant: FUJITSU SEMICONDUCTOR LIMITED
    Inventors: Hiroshi FURUKAWA, Ichiro Kasama
  • Publication number: 20100332758
    Abstract: A cache memory device includes: a data memory storing data written by an arithmetic processing unit; a connecting unit connecting an input path from the arithmetic processing unit to the data memory and an output path from the data memory to a main storage unit; a selecting unit provided on the output path to select data from the data memory or data from the arithmetic processing unit via the connecting unit, and to transfer the selected data to the output path; and a control unit controlling the selecting unit such that the data from the data memory is transferred to the output path when the data is written from the data memory to the main storage unit, and such that the data is transferred to the output path via the connecting unit when the data is written from the arithmetic processing unit to the main storage unit.
    Type: Application
    Filed: June 29, 2010
    Publication date: December 30, 2010
    Applicant: FUJITSU LIMITED
    Inventors: Akihiro Waku, Naoya Ishimura, Hiroyuki Kojima
  • Publication number: 20100325396
    Abstract: The present invention relates to a multithread processor, and this multithread processor comprises a plurality of register windows each provided for each of threads and capable of storing data to be used for instruction processing in an arithmetic unit, a work register capable of mutually transferring data with respect to the plurality of register windows and the arithmetic unit and a multithread control unit for controlling data transfer among the plurality of register windows, the work register and the arithmetic unit on the basis of an execution thread identifier for identifying the thread to be executed in the arithmetic unit. This enables conducting the multithread processing at a high speed.
    Type: Application
    Filed: August 10, 2010
    Publication date: December 23, 2010
    Applicant: FUJITSU LIMITED
    Inventor: Toshio Yoshida
  • Publication number: 20100325387
    Abstract: An arithmetic processing apparatus includes: a plurality of processing units connected in series to each other, wherein each of the processing units includes a limitation information setting section in which limitation information, which indicates the amount of arithmetic processing that each of the processing units is to process for data of each arithmetic processing unit, is set; an arithmetic section which executes arithmetic processing on the data of each arithmetic processing unit, according to the limitation information set in the limitation information setting section, by the same program between the plurality of processing units; and a memory in which processing data subjected to the arithmetic processing by the arithmetic section is stored.
    Type: Application
    Filed: May 25, 2010
    Publication date: December 23, 2010
    Applicant: Sony Corporation
    Inventors: Kenji YAMANE, Tsuyoshi Kano, Masahiro Takahashi
  • Publication number: 20100318771
    Abstract: A processor includes a decode unit and a byte permute unit. The byte permute unit receives an instruction from the decode unit. The byte permute unit determines whether the instruction corresponds to a shuffle instruction or a shift instruction. For a shuffle instruction, the byte permute unit uses a byte shuffler to perform a shuffle operation indicated by the instruction. For a shift instruction that indicates a shift magnitude, the byte permute unit uses the byte shuffler to byte-level shift a source operand corresponding to the instruction by an integer number of bytes. The byte permute unit also generates a sequence of output bits by bit-shifting the byte-level shifted source operand by a number of bits such that the sum of the number of bits and the integer number of bytes is equal to the shift magnitude.
    Type: Application
    Filed: June 11, 2009
    Publication date: December 16, 2010
    Inventors: Ranganathan Sudhakar, Jonathan Choy, Debjit Das Sarma
  • Publication number: 20100312997
    Abstract: Systems, internal processors, and methods of parallel data processing in an internal processor are provided. In one embodiment, an external controller sends instructions to a memory device, and the internal processor on the memory device executes the instructions on the data. The internal processor may include one or more arithmetic logic units (ALUs), and each ALU may perform an operation on an entire operand, such that one or more operands may be processed in parallel by one or more ALUs in the internal processor. The operations may be completed on each operand in one or more cycles through the circuitry of the ALU, and the path of the operands through the ALU may be based on the width of the ALU, the size of the operands, or the type of operation to be performed.
    Type: Application
    Filed: June 4, 2009
    Publication date: December 9, 2010
    Applicant: MICRON TECHNOLOGY, INC.
    Inventor: Robert Walker
  • Publication number: 20100312999
    Abstract: One or more of the present techniques provide a compute engine buffer configured to maneuver data and increase the efficiency of a compute engine. One such compute engine buffer is connected to a compute engine which performs operations on operands retrieved from the buffer, and stores results of the operations to the buffer. Such a compute engine buffer includes a compute buffer having storage units which may be electrically connected or isolated, based on the size of the operands to be stored and the configuration of the compute engine. The compute engine buffer further includes a data buffer, which may be a simple buffer. Operands may be copied to the data buffer before being copied to the compute buffer, which may save additional clock cycles for the compute engine, further increasing the compute engine efficiency.
    Type: Application
    Filed: June 4, 2009
    Publication date: December 9, 2010
    Applicant: MICRON TECHNOLOGY, INC.
    Inventor: Robert Walker
  • Publication number: 20100312998
    Abstract: Devices, systems, and methods of communicating information directly to a sequencer or a buffer in a memory device are provided. In some embodiments, instructions are sent directly from an external processor to a sequencer in the memory device, and the sequencer configures the instructions for an internal processor, such as one or more arithmetic logic units (ALUs) embedded on the memory device. Further, data to be operated on by the internal processor can be sent directly from the external processor to a buffer, and the sequencer can copy the data from the buffer to the internal processor. As power can be consumed each time a memory array is written to or read from, the direct communication of instructions and/or data can reduce the power consumed in writing to or reading from the memory array.
    Type: Application
    Filed: June 4, 2009
    Publication date: December 9, 2010
    Applicant: MICRON TECHNOLOGY, INC.
    Inventor: Robert Walker
  • Publication number: 20100313000
    Abstract: The present techniques provide an internal processor of a memory device configured to selectively execute instructions in parallel, for example. One such internal processor includes a plurality of arithmetic logic units (ALUs), each connected to conditional masking logic, and each configured to process conditional instructions. A condition instruction may be received by a sequencer of the memory device. Once the condition instruction is received, the sequencer may enable the conditional masking logic of the ALUs. The sequencer may toggle a signal to the conditional masking logic such that the masking logic masks certain instructions if a condition of the condition instruction has been met, and masks other instructions if the condition has not been met. In one embodiment, each ALU in the internal processor may selectively perform instructions in parallel.
    Type: Application
    Filed: June 4, 2009
    Publication date: December 9, 2010
    Applicant: MICRON TECHNOLOGY, INC.
    Inventor: Robert Walker
  • Patent number: 7849294
    Abstract: Illustrative embodiments determine the data type of the operand being accessed as well as analyze the data value subrange of the input operand data type. If the operand's data type does not match the required format of the instruction being processed, a determination is made as to whether a subrange of data values of the data type of the input operand is supported natively. If the subrange of data values of the input operand is not supported natively, then a format conversion is performed on the data and the instruction may then operate on the data. Otherwise, the data may be operated on directly by the instruction without a format conversion operation and thus, the conversion is not performed.
    Type: Grant
    Filed: January 31, 2008
    Date of Patent: December 7, 2010
    Assignee: International Business Machines Corporation
    Inventors: Michael K. Gschwind, Brett Olsson
  • Publication number: 20100306504
    Abstract: At least one instruction of a sequence of program instructions has a plurality of alternative outcomes including at least a first outcome that is independent of at least one operand and a second outcome that is dependent on the at least one operand. The at least one operand is a value generated by a preceding instruction in the sequence. The instruction is issued for execution independently of when the at least one operand is generated by the preceding instruction. Recovery circuitry is provided to perform a recovery operation in the event that the second outcome is executed for the at least one instruction and the at least one operand has not been generated by the preceding instruction when the at least one instruction is to be executed by said instruction execution circuitry.
    Type: Application
    Filed: May 27, 2009
    Publication date: December 2, 2010
    Applicant: ARM Limited
    Inventors: Robert Gregory McDonald, Paul Gilbert Meyer
  • Publication number: 20100299505
    Abstract: An instruction fusion calculation device of the present invention includes an instruction fusion detection circuit, an instruction fusion circuit, and a calculator. The instruction fusion detection circuit determines whether or not a fusion of a preceding instruction and a subsequent instruction that have a flow dependence relationship between them can be made. The instruction fusion circuit fuses the preceding instruction and the subsequent instruction to which it is determined by the instruction fusion detection circuit that the instructions can be fused into one instruction. The calculator executes the fused instruction into which the instructions are fused by the instruction fusion circuit to output the calculation result and outputs at least one of the calculation results obtained by executing the preceding instruction and the subsequent instruction as an intermediate result.
    Type: Application
    Filed: May 18, 2010
    Publication date: November 25, 2010
    Inventor: TAKAHIKO UESUGI
  • Publication number: 20100274996
    Abstract: A micro-processor includes a clock generator configured to generate a fetch clock, a decoding clock, an execution clock, and a write-back clock that are sequentially enabled; a volatile memory device configured to output pre-stored program data in response to the fetch clock; a command decoder configured to decode the program data in response to the decoding clock and generate a decoding command; an arithmetic device configured to perform an arithmetic operation according to the command of the decoding command in response to the execution clock; and a peripheral circuit device configured to be operated according to the command of the decoding command in response to the write-back clock.
    Type: Application
    Filed: June 11, 2009
    Publication date: October 28, 2010
    Applicant: Gwangju Institute of Science and Technology
    Inventors: Ie-Ryung PARK, Dong-Soo Har, Yousaf Zafar
  • Publication number: 20100274990
    Abstract: An apparatus and method for performing SIMD multiply-accumulate operations includes SIMD data processing circuitry responsive to control signals to perform data processing operations in parallel on multiple data elements. Instruction decoder circuitry is coupled to the SIMD data processing circuitry and is responsive to program instructions to generate the required control signals. The instruction decoder circuitry is responsive to a single instruction (referred to herein as a repeating multiply-accumulate instruction) having as input operands a first vector of input data elements, a second vector of coefficient data elements, and a scalar value indicative of a plurality of iterations required, to generate control signals to control the SIMD processing circuitry.
    Type: Application
    Filed: September 17, 2009
    Publication date: October 28, 2010
    Inventors: Mladen Wilder, Dominic Hugo Symes, Richard Edward Bruce
  • Publication number: 20100268918
    Abstract: A system for execution of a decoding method is disclosed. The system is capable of executing at least two data decoding methods which are different in underlying coding principle, wherein at least one of the data decoding methods requires data shuffling operations on the data. In one aspect, the system includes at least one application specific processor having an instruction set having arithmetic operators excluding multiplication, division and power. The processor is selected for execution of approximations of each of the at least two data decoding methods. The system also includes at least a first memory unit, e.g. background memory, for storing data. The system also includes a transfer unit for transferring data from the first memory unit towards the at least one programmable processor. The transfer unit includes a data shuffler. The system may also include a controller for controlling the data shuffler independent from the processor.
    Type: Application
    Filed: April 1, 2010
    Publication date: October 21, 2010
    Applicants: IMEC, Samsung Electronics Co., Ltd.
    Inventors: Robert Priewasser, Bruno Bougard, Frederik Naessens
  • Publication number: 20100266121
    Abstract: An IC chip includes: a first memory which stores a control program for executing cryptographic processing; a second memory which stores an application; an arithmetic processor which receives first data including at least part of a cryptographic private key stored in a predetermined area of the application, and executes the cryptographic processing in accordance with the control program; and an auxiliary arithmetic processor which executes predetermined arithmetic processing under control of the arithmetic processor. If the first data does not match a data format defined by a software interface of the auxiliary arithmetic processor, the arithmetic processor controls to generate second data by processing the first data so as to match the data format, and to store the generated second data in a data table provided in the second memory.
    Type: Application
    Filed: January 27, 2010
    Publication date: October 21, 2010
    Inventors: Hiroki YAMAZAKI, Makoto Aikawa, Kazunori Hashimoto
  • Patent number: 7814296
    Abstract: Provided is a data processing circuit. A control unit outputs an operation control signal and a memory control signal. A plurality of program memories each outputs a command in response to the memory control signal. A plurality of arithmetic sections each selectively performs any one of the commands from the plurality of program memories in response to the operation control signal. Operation modes of the data processing circuit can be flexibly changed according to operation environments.
    Type: Grant
    Filed: September 5, 2008
    Date of Patent: October 12, 2010
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Chun-Gi Lyuh, Jung-Hee Suk, Ik-Jae Chun, Se-Wan Heo, Tae-Moon Roh, Jong-Dae Kim
  • Patent number: 7809931
    Abstract: A vector permutation system (100) for a single-instruction multiple-data microprocessor has a set of vector registers (110) which feed vectors to permutation logic (120) and then to a negate block (130) where they are permuted and selectively negated according to control parameters received from a selected one of a set of control registers (140). A control arrangement (145, 150) selects which control register is to provide the control parameters. In this way no separate permutation instructions are necessary or need to be executed, and no permutation parameters need to be stored in the vector registers (10). This leads to higher performance, a smaller vector registers file and hence a smaller size of the microprocessor and better program code density.
    Type: Grant
    Filed: October 6, 2003
    Date of Patent: October 5, 2010
    Assignee: Freescale Semiconductor, Inc.
    Inventor: Martin Raubuch
  • Publication number: 20100238464
    Abstract: An image processing apparatus includes an interpreting unit that interprets an order of the logical arithmetic processing and a kind of a logical arithmetic processing; and a drawing unit that, in a case of drawing the image information as raster data, draws from an element of an upper-order side in order of the logical arithmetic processing interpreted by the interpreting unit with respect to an area that is interpreted to be processed by a simple overwrite processing for giving priority to an uppermost-order side element as the kind of the logical arithmetic processing, and draws using a calculation sequentially from an element of a lower-order side in order of the logical arithmetic processing interpreted by the interpreting unit with respect to an area that is interpreted to be processed by a logical arithmetic processing for using the calculation as to the overlapped elements as the kind of the logical arithmetic processing.
    Type: Application
    Filed: August 20, 2009
    Publication date: September 23, 2010
    Applicant: FUJI XEROX CO., LTD.
    Inventor: Shusuke Tanimoto
  • Publication number: 20100241683
    Abstract: An arithmetic operation apparatus includes: a branch node set detection unit to detect a set of branch nodes for each parallel level; a subtree memory storage area allocation unit to allocate an arithmetic result of a column vector to a memory storage area selected on a basis of a predetermined selection rule from a plurality of memory storage areas; and a node memory storage area allocation unit to allocate an arithmetic result of a column vector to a memory storage area selected on a basis of a predetermined selecting rule from a plurality of memory storage areas.
    Type: Application
    Filed: February 1, 2010
    Publication date: September 23, 2010
    Applicant: FUJITSU LIMITED
    Inventor: Makoto NAKANISHI
  • Patent number: 7797516
    Abstract: A set of low-cost microcontroller extensions facilitates Digital Signal Processing (DSP) applications by incorporating a Multiply-Accumulate (MAC) unit in a Central Processing Unit (CPU) of the microcontroller which is responsive to the extensions.
    Type: Grant
    Filed: March 16, 2007
    Date of Patent: September 14, 2010
    Assignee: ATMEL Corporation
    Inventors: Benjamin Francis Froemming, Emil Lambrache
  • Publication number: 20100223444
    Abstract: A method and a device having a plurality of bit operations capability, the device includes: a first and a second registers and an instruction fetch circuit, and an arithmetic logic unit adapted to: calculate, during a first clock cycle, a position value representative of a position, within a first information vector, of a first bit of information that has a first value; and to multiply the position value by a multiplication factor to provide a first result and to alter the value of the first bit to a second value to provide an updated information vector, during the first clock cycle.
    Type: Application
    Filed: August 18, 2006
    Publication date: September 2, 2010
    Applicant: Freescale Semiconductor, Inc.
    Inventors: Eran Glickman, Evgeni Ginzburg, Noam Sheffer
  • Patent number: 7783864
    Abstract: The partitioning of large arrays in the hardware structure, for multiplication and addition, into smaller structures results in a multiplier design which includes a series of nearly identical processing elements linked together in a chained fashion. As a result of simultaneous operation in two subphases per processing element and the chaining together of processing elements, the overall structure is operable in a pipelined fashion to improve throughput and speed. The chained processing elements are constructed so as to provide a pardonable chain with separate parts for processing factors of the modulus.
    Type: Grant
    Filed: February 12, 2007
    Date of Patent: August 24, 2010
    Assignee: International Business Machines Corporation
    Inventors: Camil Fayad, John K. Li, Siegfried Sutter, Tamas Visegrady
  • Patent number: 7779237
    Abstract: A method, system and processor for adaptively and selectively controlling the instruction execution frequency of a data processor. Processing logic or a software compiler determines when a number of first-type instructions, requiring longer execution latency, are scheduled to be executed. The logic/compiler then triggers the CPM unit to automatically switch the execution frequency of the instruction processor from a first frequency that is optimal for processing regular-type instructions to a second, pre-established lower frequency that is optimal for processing the first-type instructions, to enable more efficient execution and higher execution throughput of the number of first-type operations within the processor. When the first-type instructions have completed execution, the processor's instruction execution frequency is returned to the first optimal frequency.
    Type: Grant
    Filed: July 11, 2007
    Date of Patent: August 17, 2010
    Assignee: International Business Machines Corporation
    Inventors: Anthony Correale, Jr., Kenichi Tsuchiya
  • Publication number: 20100191938
    Abstract: An information processing device including: a first arithmetic processing unit performing first arithmetic processing; a second arithmetic processing unit performing second arithmetic processing; input registers adapted to include a first input register allocated to the first arithmetic processing unit, and a second input register allocated to the second arithmetic processing unit; and output registers storing a processing results of the first arithmetic processing unit and a processing results of the second arithmetic processing unit, in each of given execution cycles, the first arithmetic processing unit performs the first arithmetic processing using stored data of the first input register and stores a processing result of the first arithmetic processing in the output registers and the second arithmetic processing unit performs the second arithmetic processing using stored data of the second input register and stores a processing result of the second arithmetic processing in the output registers.
    Type: Application
    Filed: January 29, 2010
    Publication date: July 29, 2010
    Applicant: SEIKO EPSON CORPORATION
    Inventors: Hiroshi HASEGAWA, Fumio KOYAMA
  • Patent number: 7765386
    Abstract: An embodiment of the present invention is a technique to perform floating-point operations for vector processing. An input queue captures a plurality of vector inputs. A scheduler dispatches the vector inputs. A plurality of floating-point (FP) pipelines generates FP results from operating on scalar components of the vector inputs dispatched from the scheduler. An arbiter and assembly unit arbitrates use of output section and assembles the FP results to write to the output section.
    Type: Grant
    Filed: September 28, 2005
    Date of Patent: July 27, 2010
    Assignee: Intel Corporation
    Inventors: David D. Donofrio, Michael Dwyer
  • Publication number: 20100186006
    Abstract: A programmable device suitable for software defined radio terminal is disclosed. In one aspect, the device includes a scalar cluster providing a scalar data path and a scalar register file and arranged for executing scalar instructions. The device may further include at least two interconnected vector clusters connected with the scalar cluster. Each of the at least two vector clusters provides a vector data path and a vector register file and is arranged for executing at least one vector instruction different from vector instructions performed by any other vector cluster of the at least two vector clusters.
    Type: Application
    Filed: December 17, 2009
    Publication date: July 22, 2010
    Applicants: IMEC, Samsung Electronics
    Inventors: Bruno Bougard, Thomas Schuster
  • Publication number: 20100182314
    Abstract: It is presented a method for improving performance of generation of digitally represented graphics. Said method comprises the steps of: selecting (440) a tile comprising fragments to process; executing (452) a culling program for the tile, the culling program being replaceable; and executing a set of instructions, selected from a plurality of sets of instructions based on an output value of the culling program, for each of a plurality of subsets of the fragments. A corresponding display adapter and computer program product are also presented.
    Type: Application
    Filed: January 23, 2008
    Publication date: July 22, 2010
    Inventors: Tomas Akenine-Moller, Jon Hasselgren
  • Publication number: 20100185836
    Abstract: An arithmetic-program conversion apparatus includes: a program storage section storing an arithmetic program describing a circuit by a logical expression including a plurality of input and output variables, and operators; if the expression has three input variables or more, an intermediate-variable generation section generating an intermediate variable for converting the expression into a plurality of binomials including input and output variables; if the intermediate variable is generated, an expression conversion section converting the logical expression into a plurality of binomials including a binomial for obtaining the intermediate variable and a binomial obtaining the output variable from the intermediate variable; if a plurality of binomials are generated, an expression update section updating the stored original expression; a bit-width determination section determining bit widths of the output, input, and intermediate variables of the expression; and a bit-width storage section storing the bit widths
    Type: Application
    Filed: January 19, 2010
    Publication date: July 22, 2010
    Applicant: Sony Corporation
    Inventor: Shota Hasegawa
  • Patent number: 7761694
    Abstract: In one embodiment, the present invention includes a method for receiving first and second data operands in a common execution unit and manipulating the operands responsive to an instruction to generate an output according to local control signals of a local controller of the execution unit. Various instruction types such as shuffle and shift operations may be performed in the common execution unit in a single cycle. Other embodiments are described and claimed.
    Type: Grant
    Filed: June 30, 2006
    Date of Patent: July 20, 2010
    Assignee: Intel Corporation
    Inventors: Mohammad Abdallah, Hon Shing Lau, Shou-Wen Fu, Aviel Timor, Tal Gat
  • Patent number: 7761693
    Abstract: A data processing apparatus includes a register data store that stores data elements, an instruction decoder that decodes an “arithmetic returning high half” instruction, and a data processor that performs data processing operations controlled by the instruction decoder. In response to the decoded arithmetic returning high half instruction, the data processor specifies within the register data store one or more source registers to store a plurality of source data elements of a first size, and one or more destination registers to store a corresponding plurality of resultant data elements of a second size. The second size is half the size of the first size.
    Type: Grant
    Filed: July 13, 2004
    Date of Patent: July 20, 2010
    Assignee: ARM Limited
    Inventors: Dominic Hugo Symes, Simon Andrew Ford
  • Publication number: 20100180129
    Abstract: An arrangement of arithmetic logic units carries out an operation on at least one operand, wherein the operation is determined by operation codes received by the arithmetic logic units. The operation codes and at least one operand are received on a first clock cycle. The result of the operation is output from at least one arithmetic logic unit to at least one further arithmetic logic unit. A result of the plurality of arithmetic logic units is then output on a next clock cycle.
    Type: Application
    Filed: December 18, 2009
    Publication date: July 15, 2010
    Applicant: STMicroelectronics R&D Ltd.
    Inventor: David Smith
  • Publication number: 20100174891
    Abstract: In a reconfigurable SIMD processor, a unit of operation for executing an instruction corresponds to one group, and the one group that includes a plurality of PEs implements at least a part of an operation unit that executes at least one of an integer divide instruction: a floating decimal point add/subtract instruction; a floating decimal point multiply instruction; and a floating decimal point divide instruction, using operation units and general purpose registers provided in a plurality of the PEs. The number of the PEs that compose the one group is varied in accordance with the instruction.
    Type: Application
    Filed: March 27, 2008
    Publication date: July 8, 2010
    Inventor: Shohei Nomoto
  • Publication number: 20100174884
    Abstract: A processor (101) in which a plurality of arithmetic elements executing instructions are embedded includes: fixed function arithmetic elements (121 to 123) each having a circuit configuration that is not dynamically reconfigurable; a reconfigurable arithmetic element (125) having a circuit configuration that is dynamically reconfigurable; and an arithmetic operation control unit (113) which allocates instructions to the fixed function arithmetic elements (121 to 123) and the reconfigurable arithmetic element (125) and issues the allocated instructions to the respective arithmetic elements.
    Type: Application
    Filed: November 9, 2006
    Publication date: July 8, 2010
    Applicant: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.
    Inventors: Hiroyuki Morishita, Takao Yamamoto, Masaitsu Nakajima
  • Publication number: 20100169614
    Abstract: A 32-bit instruction 50 is composed of a 4-bit format field 51, a 4-bit operation field 52, and two 12-bit operation fields 59 and 60. The 4-bit operation field 52 can only include (1) an operation code “cc” that indicates a branch operation which uses a stored value of the implicitly indicated constant register 36 as the branch address, or (2) a constant “const”. The content of the 4-bit operation field 52 is specified by a format code provided in the format field 51.
    Type: Application
    Filed: February 12, 2010
    Publication date: July 1, 2010
    Applicant: Panasonic Corporation
    Inventors: Shuichi TAKAYAMA, Nobuo Higaki
  • Publication number: 20100146315
    Abstract: Selective power control of one or more processing elements matches a degree of parallelism to requirements of a task performed in a highly parallel programmable data processor. For example, when program operations require less than the full width of the data path, a software instruction of the program sets a mode of operation requiring a subset of the parallel processing capacity. At least one parallel processing element, that is not needed, can be shut down to conserve power. At a later time, when the added capacity is needed, execution of another software instruction sets the mode of operation to that of the wider data path, typically the full width, and the mode change reactivates the previously shut-down processing element.
    Type: Application
    Filed: February 17, 2010
    Publication date: June 10, 2010
    Applicant: QUALCOMM INCORPORATED
    Inventor: Kenneth Alan Dockser