Processing Control For Data Transfer Patents (Class 712/225)
  • Patent number: 7908464
    Abstract: A functional-level instruction-set computing (FLIC) architecture executes higher-level functional instructions such as lookups and bit-compares of variable-length operands. Each FLIC processing-engine slice has specialized processing units including a lookup unit that searches for a matching entry in a lookup cache. Variable-length operands are stored in execution buffers. The operand length and location in the execution buffer are stored in fixed-length general-purpose registers (GPRs) that also store fixed-length operands. A copy/move unit moves data between input and output buffers and one or more FLIC processing-engine slices. Multiple contexts can each have a set of GPRs and execution buffers. An expansion buffer in a FLIC slice can be allocated to a context to expand that context's execution buffer for storing longer operands.
    Type: Grant
    Filed: July 31, 2007
    Date of Patent: March 15, 2011
    Assignee: Alacritech, Inc.
    Inventors: Millind Mittal, Mehul Kharidia, Tarun Kumar Tripathy, J. Sukarno Mertoguno
  • Patent number: 7908409
    Abstract: A variety of advantageous mechanisms for improved data transfer control within a data processing system are described. A DMA controller is described which is implemented as a multiprocessing transfer engine supporting multiple transfer controllers which may work independently or in cooperation to carry out data transfers, with each transfer controller acting as an autonomous processor, fetching and dispatching DMA instructions to multiple execution units. In particular, mechanisms for initiating and controlling the sequence of data transfers are provided, as are processes for autonomously fetching DMA instructions which are decoded sequentially but executed in parallel.
    Type: Grant
    Filed: August 6, 2009
    Date of Patent: March 15, 2011
    Assignee: Altera Corporation
    Inventors: Edwin Franklin Barry, Edward A. Wolff
  • Publication number: 20110060893
    Abstract: A circuit having at least one processor and a microprogrammed machine for processing the data which enters or leaves the processor in order to input or output the data into/from the circuit in compliance with a communication protocol.
    Type: Application
    Filed: November 5, 2008
    Publication date: March 10, 2011
    Applicant: COMMISSARIAT A L'ENERGIE ATOMIQUE ET AUX ENERGIES ALTERNATIVES
    Inventor: Michel Harrand
  • Publication number: 20110055526
    Abstract: There is provided a method and apparatus for accessing a memory according to a processor instruction. The apparatus includes: a stack offset extractor extracting an offset value from a stack pointer offset indicating a local variable in the processor instruction; a local stack storage including a plurality of items, each of which is formed of an activation bit indicating whether each item is activated, an offset storing an offset value of a stack pointer, and an element storing a local variable value of the stack pointer; an offset comparator comparing the extracted offset value with an offset value of each item and determining whether an item corresponding to the extracted offset value is present in the local stack storage; and a stack access controller controlling a processor to access the local stack storage or a cache memory according to a determining result of the offset comparator.
    Type: Application
    Filed: July 8, 2010
    Publication date: March 3, 2011
    Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Young Su Kwon, Nak Woong Eum, Seong Mo Park
  • Patent number: 7900024
    Abstract: Mechanisms for handling data cache misses out-of-order for asynchronous pipelines are provided. The mechanisms associate load tag (LTAG) identifiers with the load instructions and uses them to track the load instruction across multiple pipelines as an index into a load table data structure of a load target buffer. The load table is used to manage cache “hits” and “misses” and to aid in the recycling of data from the L2 cache. With cache misses, the LTAG indexed load table permits load data to recycle from the L2 cache in any order. When the load instruction issues and sees its corresponding entry in the load table marked as a “miss,” the effects of issuance of the load instruction are canceled and the load instruction is stored in the load table for future reissuing to the instruction pipeline when the required data is recycled.
    Type: Grant
    Filed: October 17, 2008
    Date of Patent: March 1, 2011
    Assignee: International Business Machines Corporation
    Inventors: Christopher M. Abernathy, Jeffrey P. Bradford, Ronald P. Hall, Timothy H. Heil, David Shippy
  • Publication number: 20110047360
    Abstract: The present application provides a method of randomly accessing a compressed structure in memory without the need for retrieving and decompressing the entire compressed structure.
    Type: Application
    Filed: February 11, 2009
    Publication date: February 24, 2011
    Applicant: LINEAR ALGEBRA TECHNOLOGIES LIMITED
    Inventor: David Maloney
  • Publication number: 20110047361
    Abstract: A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register.
    Type: Application
    Filed: November 5, 2010
    Publication date: February 24, 2011
    Inventor: Patrice Roussel
  • Publication number: 20110047355
    Abstract: A circuit arrangement and method support offset based register address indexing, wherein register addresses to be used by an instruction are calculated using offsets to the full target register address, and the offsets are contained in the instruction and occupy less instruction space than the full address widths. An instruction may include at least one offset value that identifies a register address. During decoding of the instruction, an offset and a full target address are retrieved from the instruction, and then a register address is calculated by addition of the offset to the full target address.
    Type: Application
    Filed: August 24, 2009
    Publication date: February 24, 2011
    Applicant: International Business Machines Corporation
    Inventors: Eric O. Mejdrich, Adam J. Muff, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 7895418
    Abstract: There is disclosed an operand queue for use in a floating point unit. The floating point unit comprises floating point processing units for executing floating point instructions that write operands to an external memory and for executing floating point instructions that read operands from the external memory. The floating point also comprises an operand queue for storing a plurality of operands associated with one or more operations being processed in the floating point unit. The operand queue stores a first operand being written to an external memory by a floating point write instruction executed by a first one of the plurality of floating point processing units and supplies the first operand to a floating point read instruction executed by a second one of the plurality of floating point processing units subsequent to the execution of the floating point write instruction.
    Type: Grant
    Filed: November 28, 2005
    Date of Patent: February 22, 2011
    Assignee: National Semiconductor Corporation
    Inventor: Daniel W. Green
  • Publication number: 20110040955
    Abstract: A microprocessor includes a queue comprising a plurality of entries each configured to hold store information for a store instruction. The store information specifies sources of operands used to calculate a store address. The store instruction specifies store data to be stored to a memory location identified by the store address. The microprocessor also includes control logic, coupled to the queue, configured to encounter a load instruction. The load instruction includes load information that specifies sources of operands used to calculate a load address. The control logic detects that the load information matches the store information held in a valid one of the plurality of queue entries and responsively predicts that the microprocessor should forward to the load instruction the store data specified by the store instruction whose store information matches the load information.
    Type: Application
    Filed: May 17, 2010
    Publication date: February 17, 2011
    Inventors: Rodney E. Hooker, Colin Eddy
  • Patent number: 7889204
    Abstract: A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.
    Type: Grant
    Filed: October 31, 2007
    Date of Patent: February 15, 2011
    Assignee: MicroUnity Systems Engineering, Inc.
    Inventors: Craig Hansen, John Moussouris, Alexia Massalin
  • Patent number: 7890722
    Abstract: A sequentially performed implementation of a compound compare-and-swap (nCAS) operation has been developed. In one implementation, a double compare-and-swap (DCAS) operation does not result in a fault, interrupt, or trap in the situation where memory address A2 is invalid and the contents of memory address A1 are unequal to C1. In some realizations, memory locations addressed by a sequentially performed nCAS or DCAS instruction are reserved (e.g., locked) in a predefined order in accordance with a fixed total order of memory locations. In this way, deadlock between concurrently executed instances of sequentially performed nCAS instructions can be avoided. Other realizations defer responsibility for deadlock avoidance to the programmer.
    Type: Grant
    Filed: April 6, 2005
    Date of Patent: February 15, 2011
    Assignee: Oracle America, Inc.
    Inventors: Guy L. Steele, Jr., Ole Agesen, Nir N. Shavit
  • Patent number: 7890733
    Abstract: A data processor comprises a plurality of processing elements (PEs), with memory local to at least one of the processing elements, and a data packet-switched network interconnecting the processing elements and the memory to enable any of the PEs to access the memory. The network consists of nodes arranged linearly or in a grid, e.g., in a SIMD array, so as to connect the PEs and their local memories to a common controller. Transaction-enabled PEs and nodes set flags, which are maintained until the transaction is completed and signal status to the controller e.g., over a series of OR-gates. The processor performs memory accesses on data stored in the memory in response to control signals sent by the controller to the memory. The local memories share the same memory map or space. External memory may also be connected to the “end” nodes interfacing with the network, eg to provide cache.
    Type: Grant
    Filed: August 11, 2005
    Date of Patent: February 15, 2011
    Assignee: Rambus Inc.
    Inventor: Ray McConnell
  • Publication number: 20110035555
    Abstract: Apparatus, system and methods are provided for performing speculative data prefetching in a chip multiprocessor (CMP). Data is prefetched by a helper thread that runs on one core of the CMP while a main program runs concurrently on another core of the CMP. Data prefetched by the helper thread is provided to the helper core. For one embodiment, the data prefetched by the helper thread is pushed to the main core. It may or may not be provided to the helper core as well. A push of prefetched data to the main core may occur during a broadcast of the data to all cores of an affinity group. For at least one other embodiment, the data prefetched by a helper thread is provided, upon request from the main core, to the main core from the helper core's local cache.
    Type: Application
    Filed: October 21, 2010
    Publication date: February 10, 2011
    Inventors: Hong Wang, Perry H. Wang, Jeffery A. Brown, Per Hammarlund, George Z. Chrysos, Doron Orenstein, Steve Shih-wei Liao, John P. Shen
  • Publication number: 20110035569
    Abstract: A superscalar pipelined microprocessor includes a register set defined by its instruction set architecture, a cache memory, execution units, and a load unit, coupled to the cache memory and distinct from the other execution units. The load unit comprises an ALU. The load unit receives an instruction that specifies a memory address of a source operand, an operation to be performed on the source operand to generate a result, and a destination register of the register set to which the result is to be stored. The load unit reads the source operand from the cache memory. The ALU performs the operation on the source operand to generate the result, rather than forwarding the source operand to any of the other execution units of the microprocessor to perform the operation on the source operand to generate the result. The load unit outputs the result for subsequent retirement to the destination register.
    Type: Application
    Filed: October 30, 2009
    Publication date: February 10, 2011
    Inventors: Gerard M. Col, Colin Eddy, Rodney E. Hooker
  • Publication number: 20110032029
    Abstract: A configurable processor architecture uses a common simulation database for multiple processor configurations to reduce the cost of producing customized processor configurations. An unchanging core portion is used in each processor configuration. To support different memory modules, identification signals are provided from the memory modules or an identification module to configure the core portion.
    Type: Application
    Filed: October 26, 2010
    Publication date: February 10, 2011
    Applicant: INFINEON TECHNOLOGIES AG
    Inventors: Klaus J. OBERLAENDER, Ralph Haines, Eric Chesters, Dirk Behrens
  • Patent number: 7886129
    Abstract: A configurable coprocessor interface between a central processing unit (CPU) and a coprocessor is provided. The coprocessor interface has an instruction transfer signal group for transferring different instruction types from the CPU to the coprocessor, sequentially or in parallel, a busy signal group, for allowing the coprocessor to signal the CPU that it cannot receive a transfer of one or more of the different instruction types, and an instruction order signal group for indicating to the coprocessor a relative execution order for multiple instructions that are transferred in parallel. In addition, the coprocessor interface includes separate data transfer signal groups for data being transferred from the CPU to the coprocessor, and for data being transferred from the coprocessor to the CPU, along with a data order signal group for indicating a relative order of data (if transferred out-of-order).
    Type: Grant
    Filed: August 20, 2004
    Date of Patent: February 8, 2011
    Assignee: MIPS Technologies, Inc.
    Inventors: Lawrence Henry Hudepohl, Darren Miller Jones, Radhika Thekkath, Franz Treue
  • Patent number: 7882335
    Abstract: The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one load instruction is in the issue group, if so scheduling the least one load instruction in a first pipeline based upon a priority list; and (3) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit.
    Type: Grant
    Filed: February 19, 2008
    Date of Patent: February 1, 2011
    Assignee: International Business Machines Corporation
    Inventor: David A Luick
  • Patent number: 7882325
    Abstract: A single micro-instruction to perform either an N-bit or a 2N-bit load is provided. A microprocessor having an N-bit load port performs either an N-bit load or a 2N-bit load in a single cycle with the same micro-instruction being used for both the N-bit and the 2N-bit load.
    Type: Grant
    Filed: December 21, 2007
    Date of Patent: February 1, 2011
    Assignee: Intel Corporation
    Inventors: Zeev Sperber, Robert Valentine, Ehud Cohen, Doron Orenstien, Benny Eitan
  • Patent number: 7877533
    Abstract: A bus system includes one or more bus masters, one or more bus slaves, and a response unit. When an access request to a resource of a bus slave is sent from a bus master, the response unit outputs a wait response that is either a blocking wait response to cause the bus master to perform a blocking wait operation or a non-blocking wait response to cause it to perform a non-blocking wait operation to the bus master if the bus slave is in the wait state.
    Type: Grant
    Filed: June 4, 2007
    Date of Patent: January 25, 2011
    Assignee: Renesas Electronics Corporation
    Inventor: Hideki Matsuyama
  • Patent number: 7877581
    Abstract: A networking application processor is provided. The processor includes an input socket configured to receive data packets. The processor includes a memory for holding instructions and circuitry configured to access data structures associated with the processing stages. The circuitry configured to access data structures enables a single cycle access to an operand from a memory location. An arithmetic logic unit (ALU) is provided. Circuitry for aligning operands to be processed by the ALU is included. The circuitry for aligning the operands causes the operand to be aligned by a lowest significant bit, wherein the circuitry for aligning the operand supplies an extension to the operand to allow the ALU to process different size operands.
    Type: Grant
    Filed: December 2, 2003
    Date of Patent: January 25, 2011
    Assignee: PMC-Sierra US, Inc.
    Inventors: Shridhar Mukund, Mahesh Gopalan, Neeraj Kashalkar
  • Patent number: 7870365
    Abstract: In some embodiments, control and data messages are transmitted non-contentiously over corresponding control and data channels of inter-processor links in a matrix of mesh-interconnected matrix processors. A data stream instruction executed by a user thread of an instruction processing pipeline of a matrix processor may initiate a data stream transfer by a hardware data switch of the matrix processor over multiple consecutive cycles over a data channel. While the data stream is being transferred, the corresponding control channel may transfer control messages non-contentiously with respect to the data stream. The control messages may be messages received from other matrix processors and/or control messages initiated by a kernel thread of the current matrix processor.
    Type: Grant
    Filed: July 7, 2008
    Date of Patent: January 11, 2011
    Assignee: Ovics
    Inventors: Sorin C Cismas, Ilie Garbacea
  • Patent number: 7869459
    Abstract: A mechanism for communicating instructions and data between a processor and external devices are provided. The mechanism makes use of a channel interface as the primary mechanism for communicating between the processor and a memory flow controller. The channel interface provides channels for communicating with processor facilities, memory flow control facilities, machine state registers, and external processor interrupt facilities, for example. These channels may be designated as blocking or non-blocking. With blocking channels, when no data is available to be read from the corresponding registers, or there is no space available to write to the corresponding registers, the processor is placed in a low power “stall” state. The processor is automatically awakened, via communication across the blocking channel, when data becomes available or space is freed. Thus, the channels of the present invention permit the processor to stay in a low power state.
    Type: Grant
    Filed: May 29, 2008
    Date of Patent: January 11, 2011
    Assignee: International Business Machines Corporation
    Inventors: Michael N. Day, Charles R. Johns, John S Liberty, Todd E. Swanson, Thuong Q. Truong
  • Patent number: 7865700
    Abstract: The present invention provides a system and method for prioritizing store instructions in a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one store instruction is in the issue group, if so scheduling the least one store instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; (3) determine if there is an issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by scheduling the at least one store instruction in a different execution pipeline; (4) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit.
    Type: Grant
    Filed: February 19, 2008
    Date of Patent: January 4, 2011
    Assignee: International Business Machines Corporation
    Inventor: David A. Luick
  • Publication number: 20100332808
    Abstract: Minimizing code duplication in an unbounded transactional memory system. A computing apparatus including one or more processors in which it is possible to use a set of common mode-agnostic TM barrier sequences that runs on legacy ISA and extended ISA processors, and that employs hardware filter indicators (when available) to filter redundant applications of TM barriers, and that enables a compiled binary representation of the subject code to run correctly in any of the currently implemented set of transactional memory execution modes, including running the code outside of a transaction, and that enables the same compiled binary to continue to work with future TM implementations which may introduce as yet unknown future TM execution modes.
    Type: Application
    Filed: June 26, 2009
    Publication date: December 30, 2010
    Applicant: MICROSOFT CORPORATION
    Inventors: Ali-Reza Adl-Tabatabai, Bratin Saha, Gad Sheaffer, Vadim Bassin, Robert Y. Geva, Martin Taillefer, Darek Mihocka, Burton Jordan Smith, Jan Gray
  • Patent number: 7861069
    Abstract: The present invention provides a system and method for managing load and store operations necessary for reading from and writing to memory or I/O in a superscalar RISC architecture environment. To perform this task, a load store unit is provided whose main purpose is to make load requests out of order whenever possible to get the load data back for use by an instruction execution unit as quickly as possible. A load operation can only be performed out of order if there are no address collisions and no write pendings. An address collision occurs when a read is requested at a memory location where an older instruction will be writing. Write pending refers to the case where an older instruction requests a store operation, but the store address has not yet been calculated. The data cache unit returns 8 bytes of unaligned data. The load/store unit aligns this data properly before it is returned to the instruction execution unit.
    Type: Grant
    Filed: December 19, 2006
    Date of Patent: December 28, 2010
    Assignee: Seiko-Epson Corporation
    Inventors: Cheryl D. Senter, Johannes Wang
  • Publication number: 20100325400
    Abstract: A microprocessor comprises a register set, a micro operations pool (Uops pool), a hazard detection unit, an execution unit, a dispatch unit, and a mask unit. The Uops pool receives a first micro operation and a second micro operation from a decoder, and reads at least one first operand of the first micro operation and at least one second operand of the second micro operation from the register set. The hazard detection unit detects that the first micro operation is in a write after write hazard state due to the second micro operation. The execution unit executes the first micro operation dispatched from the Uops pool to obtain a first operation result and executes the second micro operation dispatched from the Uops pool to obtain a second operation result. The mask unit protects the first operation result from writing back to the register set according to the write after write hazard state.
    Type: Application
    Filed: December 14, 2009
    Publication date: December 23, 2010
    Applicant: RDC SEMICONDUCTOR CO., LTD.
    Inventor: Shou-Hua SHE
  • Patent number: 7856548
    Abstract: Prediction of data values to be read from memory by a microprocessor for load operations. In one aspect, a method for predicting a data value that will result from a load operation to be executed by the microprocessor includes accessing an entry in a load value prediction table that stores a predicted data value corresponding to the load operation. The predicted data value is provided as a result of the load operation without waiting for execution of the load operation to complete based on a confidence parameter stored in the entry compared to a dynamic confidence threshold.
    Type: Grant
    Filed: December 26, 2006
    Date of Patent: December 21, 2010
    Assignee: Oracle America, Inc.
    Inventors: Chris Nelson, Matthew Ashcraft, John Gregory Favor
  • Patent number: 7853860
    Abstract: A programmable signal processing circuit has an instruction processing circuit (23, 24, 26), with an instruction set that comprises a depuncture instruction. The instruction processing circuit (23, 24, 26) forms the depuncture result by copying bit metrics from a bit metrics operand and inserting one or more predetermined bit metric values between the bit metrics from the bit metric operand in the depuncture result. The instruction processing circuit (23, 24, 26) changes the relative locations of the copied bit metrics with respect to each other in the depuncture result as compared to the relative locations of the copied bit metrics with respect to each other in the bit metric operand, to an extent needed for accommodating the inserted predetermined bit metric value or values.
    Type: Grant
    Filed: December 13, 2005
    Date of Patent: December 14, 2010
    Assignee: Silicon Hive B.V.
    Inventors: Paulus W. F. Gruijters, Marcus M. G. Quax
  • Patent number: 7853713
    Abstract: The communication I/F unit according to the present invention includes a chain executing unit that executes all the chain SWRs. The SWR-chain storage unit stores therein a chain of SWRs. The chain executing unit sequentially reads the SWRs and executes the corresponding operations of an atomic operation so that the corresponding packets are sent outside.
    Type: Grant
    Filed: April 26, 2007
    Date of Patent: December 14, 2010
    Assignee: Fujitsu Limited
    Inventor: Nobutaka Imamura
  • Patent number: 7853778
    Abstract: A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register.
    Type: Grant
    Filed: December 20, 2001
    Date of Patent: December 14, 2010
    Assignee: Intel Corporation
    Inventor: Patrice Roussel
  • Publication number: 20100313001
    Abstract: An apparatus includes a plurality of processing modules which are connected to each other by corresponding communication unit and the modules transfer packets in a predetermined direction to execute a plurality of operations of pipeline processing. The module includes a storage unit for storing a first identification and a second identification for each of the plurality of operations, a reception unit for extracting data from a packet which has the first identification, a processing unit for processing the data extracted by the reception unit, and a transmission unit for storing the second identification corresponding to the first identification of the packet a packet and transmitting the packet to the module arranged in the predetermined direction.
    Type: Application
    Filed: June 4, 2010
    Publication date: December 9, 2010
    Applicant: CANON KABUSHIKI KAISHA
    Inventor: Hisashi Ishikawa
  • Publication number: 20100303157
    Abstract: The present invention is directed to lessen burden at the time of solving a conflict of overlapping processes in processes for a plurality of interruption factors. On completion of data transfer to an external memory, a data transfer completion interruption of high priority is generated. In the case where data transfer of predetermined number of packets is not completed in reception interruption, a timer interruption of low priority is generated. Before processing data in an external memory responding to the interruption, the number of transfer packets is obtained from a counter. After restart of reception, the counter stores the number of transfer restart packets. After obtaining the number of transfer packets from a counter responding to the occurrence of the timer interruption, a data transfer completion interruption is generated.
    Type: Application
    Filed: May 19, 2010
    Publication date: December 2, 2010
    Inventors: Hiroshige Abe, Isamu Mochizuki, Mika Mizutani
  • Publication number: 20100306511
    Abstract: There is a need for providing a communication data processor easily adaptable to network configurations required for industrial Ethernet. The apparatus successively analyzes received packets. The apparatus uses a register to determine whether or not to transmit the received packet as transmission data to another port. Rewritable memory saves a program code that provides control for analyzing a reception packet and generating a transmission packet. The apparatus is capable of complying with various communication protocols by changing the program code.
    Type: Application
    Filed: May 26, 2010
    Publication date: December 2, 2010
    Applicant: RENESAS ELECTRONICS CORPORATION
    Inventors: Yoshinori Mochizuki, Takatoshi Kato, Nobuaki Kohinata, Shigeki Taira
  • Publication number: 20100306502
    Abstract: A digital signal processor uses a number of independent sub-processors that may be controlled by a master programmable controller. For example, a specialized input processor may process input signals while a specialized output processor may process output signals. Each of these processors may also accomplish math functions when input and output processing is not necessary. The various processors may communicate with one another through general purpose registers which receive data and provide data to any of the processors in the system. Math processors may be added as needed to accomplish desired mathematical functions. In addition, a RAM processor may be utilized to hold the results of intermediate calculations in one embodiment of the present invention. In this way, an adaptable and scaleable design may be implemented that accommodates a variety of different operations without requiring redesign of all the components.
    Type: Application
    Filed: July 30, 2010
    Publication date: December 2, 2010
    Inventors: David K. Vavro, James A. Mitchell
  • Patent number: 7844801
    Abstract: Apparatus, system and methods are provided for performing speculative data prefetching in a chip multiprocessor (CMP). Data is prefetched by a helper thread that runs on one core of the CMP while a main program runs concurrently on another core of the CMP. Data prefetched by the helper thread is provided to the helper core. For one embodiment, the data prefetched by the helper thread is pushed to the main core. It may or may not be provided to the helper core as well. A push of prefetched data to the main core may occur during a broadcast of the data to all cores of an affinity group. For at least one other embodiment, the data prefetched by a helper thread is provided, upon request from the main core, to the main core from the helper core's local cache.
    Type: Grant
    Filed: July 31, 2003
    Date of Patent: November 30, 2010
    Assignee: Intel Corporation
    Inventors: Hong Wang, Perry H. Wang, Jeffery A. Brown, Per Hammarlund, George Z. Chrysos, Doron Orenstein, Steve Shih-wei Liao, John P. Shen
  • Patent number: 7844802
    Abstract: Ordering instructions for specifying the execution order of other instructions improve throughput in a pipelined multiprocessor. Memory write operations local to a CPU are allowed to occur in an arbitrary order, and constraints are placed on shared memory operations. Multiple sets of instructions are provided in which order of execution of the instructions is maintained through the use of CPU registers, write buffers in conjunction with assignment of sequence numbers to the instruction, or a hierarchical ordering system. The system ensures that an earlier designated instruction has reach a specified state of execution prior to a latter instruction reaching a specified state of execution. The ordering of operations allows memory operations local to a CPU to occur in conjunction with other memory operations that are not affected by such execution.
    Type: Grant
    Filed: June 24, 2008
    Date of Patent: November 30, 2010
    Assignee: International Business Machines Corporation
    Inventor: Paul E. McKenney
  • Publication number: 20100299130
    Abstract: Apparatus and method for processing information may determine whether a migration condition exists by a source information processing unit executing a program. When a migration condition is determined to exist by the source information processing unit, a destination information processing unit may determine whether an instruction to be executed of the program is a predetermined instruction. The instruction to be executed is converted by an instruction emulator, when a result of a determination by the destination information processing unit is the predetermined instruction.
    Type: Application
    Filed: May 10, 2010
    Publication date: November 25, 2010
    Applicant: Sony Corporation
    Inventors: Atsushi Mitsuzawa, Yuji Matsuyama, Toshihiko Kawai
  • Publication number: 20100293359
    Abstract: A clone set of General Purpose Registers (GPRs) is created to be used by a set of helper thread binaries, which is created from a set of main thread binaries. When the set of main thread binaries enters a wait state, the set of helper thread binaries uses the clone set of GPRs to continue using unused execution units within a processor core. The set of helper threads are thus able to warm up local cache memory with data that will be needed when execution of the set of main thread binaries resumes.
    Type: Application
    Filed: February 1, 2008
    Publication date: November 18, 2010
    Inventors: Ravi K. Arimilli, Juan C. Rubio, Balaram Sinharoy
  • Patent number: 7835806
    Abstract: A controller can process an instruction directed to the controller itself to access data in the memory of the controller dynamically at runtime, where the data can be indirectly accessed by referencing a tag name, associated with the data and a memory space in memory, which can be included in a string tag associated with the instruction. Multiple tags, each tag associated with a respective item of data, can be located or referenced dynamically at runtime to access the respective items of data where one tag can be associated with a first structure, array, and/or scope and a disparate tag can be associated with a disparate structure, array, and/or scope, via an instruction.
    Type: Grant
    Filed: January 29, 2007
    Date of Patent: November 16, 2010
    Assignee: Rockwell Automation Technologies, Inc.
    Inventors: Ronald E. Bliss, David A. Johnston
  • Publication number: 20100287357
    Abstract: In one embodiment, a serial processor is configured to execute software instructions in a software program in serial. A serial memory is configured to store data for use by the serial processor in executing the software instructions in serial. A plurality of parallel processors are configured to execute software instructions in the software program in parallel. A plurality of partitioned memory modules are provided and configured to store data for use by the plurality of parallel processors in executing software instructions in parallel. Accordingly, a processor/memory structure is provided that allows serial programs to use quick local serial memories and parallel programs to use partitioned parallel memories. The system may switch between a serial mode and a parallel mode. The system may incorporate pre-fetching commands of several varieties.
    Type: Application
    Filed: March 10, 2010
    Publication date: November 11, 2010
    Applicant: XMTT INC.
    Inventor: Uzi Y. Vishkin
  • Publication number: 20100287360
    Abstract: The speed of task scheduling by a multitask OS is increased. A task processor includes a CPU, a save circuit, and a task control circuit. The CPU is provided with a processing register and an execution control circuit operative to load data from a memory into a processing register and execute a task in accordance with the data in the processing register. The save circuit is provided with a plurality of save registers respectively associated with a plurality of tasks. In executing a predetermined system call, the execution control circuit notifies the task control circuit as such. The task control circuit switches between tasks for execution upon receipt of the system call signal, by saving, in the save register associated with a task being executed, the data in the processing register, selecting a task to be executed next, and loading data in the save register associated with the selected task into the processing register.
    Type: Application
    Filed: August 24, 2006
    Publication date: November 11, 2010
    Inventor: Naotaka Maruyama
  • Patent number: 7831812
    Abstract: A processor includes a processor core with a core interface unit that includes an age queue and a request queue. The core interface unit receives load requests from the processor core. The request queue stores the requests in respective slots of the request queue. The age queue stores ID tags in respective age queue slots. Each ID tag in the age queue corresponds to a respective address of a load instruction in the request queue. In one embodiment, ID tags propagate through the age queue at a fixed rate of two at a time from a tail of the age queue to a head of the age queue. Arbitration control circuitry generates an enable bit vector that identifies the oldest ID tag in the age queue corresponding to the oldest load request in the request queue. The arbitration circuitry selects the identified oldest instruction in the request queue as the next to dispatch. In one embodiment, the core interface unit exhibits an input frequency that is a multiple of an internal operating frequency of the core interface unit.
    Type: Grant
    Filed: August 31, 2007
    Date of Patent: November 9, 2010
    Assignee: International Business Machines Corporation
    Inventors: Alvan Wing Ng, Takuya Kano
  • Patent number: 7831811
    Abstract: A virtual machine in a processing system manages type information for operands. In one embodiment, the virtual machine accomplishes the following results through execution of a single instruction: adding an operand tag to a tag stack, and updating a stack pointer for the tag stack to recognize the addition of the operand tag to the tag stack. The single instruction may be a shift instruction, for example. The tag stack may reside in a tag stack register, and each operand tag may indicate whether a corresponding operand on an operand stack is to be treated as a reference operand or a non-reference operand. Other embodiments are described and claimed.
    Type: Grant
    Filed: October 31, 2005
    Date of Patent: November 9, 2010
    Assignee: Intel Corporation
    Inventors: Jinzhan Peng, Gansha Wu, Peng Guo, Xin Zhou, Zhiwei Ying
  • Publication number: 20100281236
    Abstract: An apparatus for processing data may include an array of processing elements (such as an n×m or n×n array of processing elements) configured to simultaneously perform operations on a plurality of data elements using a single instruction. Each processing element in the array may be configured to transfer data directly to at least one neighboring processing element within the array. In selected embodiments, the apparatus may include exchange registers to temporarily store data transferred between neighboring processing elements.
    Type: Application
    Filed: April 30, 2009
    Publication date: November 4, 2010
    Applicant: Novafora, Inc.
    Inventors: Shlomo Selim Rakib, Muhammad Ahmed, Marc Schaub
  • Patent number: 7827389
    Abstract: A method, system, and computer program product are provided for enhancing the execution of independent loads in a processing unit. The processing unit dispatches a first set of instructions in order from a first buffer for execution. The processing unit receives updated results from the execution of the first set of instructions. The processing unit updates, in a first register, at least one register entry associated with each instruction in the first set of instructions, with the updated results. The processing unit determines if the first set of instructions from the first buffer have completed execution. Responsive to the completed execution of the first set of instructions from the first buffer, the processing unit copies the set of entries from the first register to a second register.
    Type: Grant
    Filed: June 15, 2007
    Date of Patent: November 2, 2010
    Assignee: International Business Machines Corporation
    Inventors: Hung Q. Le, Dung Q. Nguyen
  • Patent number: 7827320
    Abstract: A Serial Advanced Technology Attachment (SATA) device for communicating with a host is disclosed. The SATA device comprises control circuitry which enters a XRDY state in preparation for sending data to the host, receives a first XRDY from the host while in the XRDY state, and sets a RXRDY flag. After receiving the first XRDY, the control circuitry receives a RRDY from the host while in the XRDY state, transmits a data block to the host in response to the RRDY, and enters an idle state after transmitting the data block to the host. If the RXRDY flag is set while in the idle state, the control circuitry waits for the host to transmit a second XRDY.
    Type: Grant
    Filed: March 28, 2008
    Date of Patent: November 2, 2010
    Assignee: Western Digital Technologies, Inc.
    Inventor: Curtis E. Stevens
  • Patent number: 7827390
    Abstract: A microprocessor includes a private RAM (PRAM), for use by microcode, which is non-user-accessible and within its own distinct address space from the system memory address space. The PRAM is denser and slower than user-accessible registers of the microprocessor macroarchitecture, thereby enabling it to provide significantly more storage for microcode. The microinstruction set includes a microinstruction for loading data from the PRAM into the user-accessible registers, and a microinstruction for storing data from user-accessible registers to the PRAM. The microcode may also use the two microinstructions to load/store between the PRAM and non-user-accessible registers of the microarchitecture.
    Type: Grant
    Filed: February 20, 2008
    Date of Patent: November 2, 2010
    Assignee: VIA Technologies, Inc.
    Inventors: G. Glenn Henry, Colin Eddy, Rodney E. Hooker, Terry Parks
  • Publication number: 20100274997
    Abstract: Methods, apparatus, and computer program products are disclosed for executing a gather operation on a parallel computer according to embodiments of the present invention. Embodiments include configuring, by the logical root, a result buffer or the logical root, the result buffer having positions, each position corresponding to a ranked node in the operational group and for storing contribution data gathered from that ranked node.
    Type: Application
    Filed: May 29, 2007
    Publication date: October 28, 2010
    Inventors: Charles J. Archer, Joseph D. Ratterman
  • Patent number: 7822946
    Abstract: A computing and communication chip architecture is provided wherein the interfaces of processor access to the memory chips are implemented as a high-speed packet switched serial interface as part of each chip. In one embodiment, the interface is accomplished through a gigabit Ethernet interface provided by protocol processor integrated as part of the chip. The protocol processor encapsulates the memory address and control information like Read, Write, number of successive bytes etc, as an Ethernet packet for communication among the processor and memory chips that are located on the same motherboard, or even on different circuit cards. In one embodiment, the communication over head of the Ethernet protocol is further reduced by using an enhanced Ethernet protocol with shortened data frames within a constrained neighborhood, and/or by utilizing a bit stream switch where direct connection paths can be established between elements that comprise the computing or communication architecture.
    Type: Grant
    Filed: February 4, 2008
    Date of Patent: October 26, 2010
    Assignee: PSIMAST, Inc
    Inventor: Viswa Sharma