Processing Control For Data Transfer Patents (Class 712/225)
-
Patent number: 7908464Abstract: A functional-level instruction-set computing (FLIC) architecture executes higher-level functional instructions such as lookups and bit-compares of variable-length operands. Each FLIC processing-engine slice has specialized processing units including a lookup unit that searches for a matching entry in a lookup cache. Variable-length operands are stored in execution buffers. The operand length and location in the execution buffer are stored in fixed-length general-purpose registers (GPRs) that also store fixed-length operands. A copy/move unit moves data between input and output buffers and one or more FLIC processing-engine slices. Multiple contexts can each have a set of GPRs and execution buffers. An expansion buffer in a FLIC slice can be allocated to a context to expand that context's execution buffer for storing longer operands.Type: GrantFiled: July 31, 2007Date of Patent: March 15, 2011Assignee: Alacritech, Inc.Inventors: Millind Mittal, Mehul Kharidia, Tarun Kumar Tripathy, J. Sukarno Mertoguno
-
Patent number: 7908409Abstract: A variety of advantageous mechanisms for improved data transfer control within a data processing system are described. A DMA controller is described which is implemented as a multiprocessing transfer engine supporting multiple transfer controllers which may work independently or in cooperation to carry out data transfers, with each transfer controller acting as an autonomous processor, fetching and dispatching DMA instructions to multiple execution units. In particular, mechanisms for initiating and controlling the sequence of data transfers are provided, as are processes for autonomously fetching DMA instructions which are decoded sequentially but executed in parallel.Type: GrantFiled: August 6, 2009Date of Patent: March 15, 2011Assignee: Altera CorporationInventors: Edwin Franklin Barry, Edward A. Wolff
-
Publication number: 20110060893Abstract: A circuit having at least one processor and a microprogrammed machine for processing the data which enters or leaves the processor in order to input or output the data into/from the circuit in compliance with a communication protocol.Type: ApplicationFiled: November 5, 2008Publication date: March 10, 2011Applicant: COMMISSARIAT A L'ENERGIE ATOMIQUE ET AUX ENERGIES ALTERNATIVESInventor: Michel Harrand
-
Publication number: 20110055526Abstract: There is provided a method and apparatus for accessing a memory according to a processor instruction. The apparatus includes: a stack offset extractor extracting an offset value from a stack pointer offset indicating a local variable in the processor instruction; a local stack storage including a plurality of items, each of which is formed of an activation bit indicating whether each item is activated, an offset storing an offset value of a stack pointer, and an element storing a local variable value of the stack pointer; an offset comparator comparing the extracted offset value with an offset value of each item and determining whether an item corresponding to the extracted offset value is present in the local stack storage; and a stack access controller controlling a processor to access the local stack storage or a cache memory according to a determining result of the offset comparator.Type: ApplicationFiled: July 8, 2010Publication date: March 3, 2011Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTEInventors: Young Su Kwon, Nak Woong Eum, Seong Mo Park
-
Patent number: 7900024Abstract: Mechanisms for handling data cache misses out-of-order for asynchronous pipelines are provided. The mechanisms associate load tag (LTAG) identifiers with the load instructions and uses them to track the load instruction across multiple pipelines as an index into a load table data structure of a load target buffer. The load table is used to manage cache “hits” and “misses” and to aid in the recycling of data from the L2 cache. With cache misses, the LTAG indexed load table permits load data to recycle from the L2 cache in any order. When the load instruction issues and sees its corresponding entry in the load table marked as a “miss,” the effects of issuance of the load instruction are canceled and the load instruction is stored in the load table for future reissuing to the instruction pipeline when the required data is recycled.Type: GrantFiled: October 17, 2008Date of Patent: March 1, 2011Assignee: International Business Machines CorporationInventors: Christopher M. Abernathy, Jeffrey P. Bradford, Ronald P. Hall, Timothy H. Heil, David Shippy
-
Publication number: 20110047360Abstract: The present application provides a method of randomly accessing a compressed structure in memory without the need for retrieving and decompressing the entire compressed structure.Type: ApplicationFiled: February 11, 2009Publication date: February 24, 2011Applicant: LINEAR ALGEBRA TECHNOLOGIES LIMITEDInventor: David Maloney
-
Publication number: 20110047361Abstract: A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register.Type: ApplicationFiled: November 5, 2010Publication date: February 24, 2011Inventor: Patrice Roussel
-
Publication number: 20110047355Abstract: A circuit arrangement and method support offset based register address indexing, wherein register addresses to be used by an instruction are calculated using offsets to the full target register address, and the offsets are contained in the instruction and occupy less instruction space than the full address widths. An instruction may include at least one offset value that identifies a register address. During decoding of the instruction, an offset and a full target address are retrieved from the instruction, and then a register address is calculated by addition of the offset to the full target address.Type: ApplicationFiled: August 24, 2009Publication date: February 24, 2011Applicant: International Business Machines CorporationInventors: Eric O. Mejdrich, Adam J. Muff, Robert A. Shearer, Matthew R. Tubbs
-
Patent number: 7895418Abstract: There is disclosed an operand queue for use in a floating point unit. The floating point unit comprises floating point processing units for executing floating point instructions that write operands to an external memory and for executing floating point instructions that read operands from the external memory. The floating point also comprises an operand queue for storing a plurality of operands associated with one or more operations being processed in the floating point unit. The operand queue stores a first operand being written to an external memory by a floating point write instruction executed by a first one of the plurality of floating point processing units and supplies the first operand to a floating point read instruction executed by a second one of the plurality of floating point processing units subsequent to the execution of the floating point write instruction.Type: GrantFiled: November 28, 2005Date of Patent: February 22, 2011Assignee: National Semiconductor CorporationInventor: Daniel W. Green
-
Publication number: 20110040955Abstract: A microprocessor includes a queue comprising a plurality of entries each configured to hold store information for a store instruction. The store information specifies sources of operands used to calculate a store address. The store instruction specifies store data to be stored to a memory location identified by the store address. The microprocessor also includes control logic, coupled to the queue, configured to encounter a load instruction. The load instruction includes load information that specifies sources of operands used to calculate a load address. The control logic detects that the load information matches the store information held in a valid one of the plurality of queue entries and responsively predicts that the microprocessor should forward to the load instruction the store data specified by the store instruction whose store information matches the load information.Type: ApplicationFiled: May 17, 2010Publication date: February 17, 2011Inventors: Rodney E. Hooker, Colin Eddy
-
Patent number: 7889204Abstract: A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.Type: GrantFiled: October 31, 2007Date of Patent: February 15, 2011Assignee: MicroUnity Systems Engineering, Inc.Inventors: Craig Hansen, John Moussouris, Alexia Massalin
-
Patent number: 7890722Abstract: A sequentially performed implementation of a compound compare-and-swap (nCAS) operation has been developed. In one implementation, a double compare-and-swap (DCAS) operation does not result in a fault, interrupt, or trap in the situation where memory address A2 is invalid and the contents of memory address A1 are unequal to C1. In some realizations, memory locations addressed by a sequentially performed nCAS or DCAS instruction are reserved (e.g., locked) in a predefined order in accordance with a fixed total order of memory locations. In this way, deadlock between concurrently executed instances of sequentially performed nCAS instructions can be avoided. Other realizations defer responsibility for deadlock avoidance to the programmer.Type: GrantFiled: April 6, 2005Date of Patent: February 15, 2011Assignee: Oracle America, Inc.Inventors: Guy L. Steele, Jr., Ole Agesen, Nir N. Shavit
-
Patent number: 7890733Abstract: A data processor comprises a plurality of processing elements (PEs), with memory local to at least one of the processing elements, and a data packet-switched network interconnecting the processing elements and the memory to enable any of the PEs to access the memory. The network consists of nodes arranged linearly or in a grid, e.g., in a SIMD array, so as to connect the PEs and their local memories to a common controller. Transaction-enabled PEs and nodes set flags, which are maintained until the transaction is completed and signal status to the controller e.g., over a series of OR-gates. The processor performs memory accesses on data stored in the memory in response to control signals sent by the controller to the memory. The local memories share the same memory map or space. External memory may also be connected to the “end” nodes interfacing with the network, eg to provide cache.Type: GrantFiled: August 11, 2005Date of Patent: February 15, 2011Assignee: Rambus Inc.Inventor: Ray McConnell
-
Publication number: 20110035555Abstract: Apparatus, system and methods are provided for performing speculative data prefetching in a chip multiprocessor (CMP). Data is prefetched by a helper thread that runs on one core of the CMP while a main program runs concurrently on another core of the CMP. Data prefetched by the helper thread is provided to the helper core. For one embodiment, the data prefetched by the helper thread is pushed to the main core. It may or may not be provided to the helper core as well. A push of prefetched data to the main core may occur during a broadcast of the data to all cores of an affinity group. For at least one other embodiment, the data prefetched by a helper thread is provided, upon request from the main core, to the main core from the helper core's local cache.Type: ApplicationFiled: October 21, 2010Publication date: February 10, 2011Inventors: Hong Wang, Perry H. Wang, Jeffery A. Brown, Per Hammarlund, George Z. Chrysos, Doron Orenstein, Steve Shih-wei Liao, John P. Shen
-
Publication number: 20110035569Abstract: A superscalar pipelined microprocessor includes a register set defined by its instruction set architecture, a cache memory, execution units, and a load unit, coupled to the cache memory and distinct from the other execution units. The load unit comprises an ALU. The load unit receives an instruction that specifies a memory address of a source operand, an operation to be performed on the source operand to generate a result, and a destination register of the register set to which the result is to be stored. The load unit reads the source operand from the cache memory. The ALU performs the operation on the source operand to generate the result, rather than forwarding the source operand to any of the other execution units of the microprocessor to perform the operation on the source operand to generate the result. The load unit outputs the result for subsequent retirement to the destination register.Type: ApplicationFiled: October 30, 2009Publication date: February 10, 2011Inventors: Gerard M. Col, Colin Eddy, Rodney E. Hooker
-
Publication number: 20110032029Abstract: A configurable processor architecture uses a common simulation database for multiple processor configurations to reduce the cost of producing customized processor configurations. An unchanging core portion is used in each processor configuration. To support different memory modules, identification signals are provided from the memory modules or an identification module to configure the core portion.Type: ApplicationFiled: October 26, 2010Publication date: February 10, 2011Applicant: INFINEON TECHNOLOGIES AGInventors: Klaus J. OBERLAENDER, Ralph Haines, Eric Chesters, Dirk Behrens
-
Patent number: 7886129Abstract: A configurable coprocessor interface between a central processing unit (CPU) and a coprocessor is provided. The coprocessor interface has an instruction transfer signal group for transferring different instruction types from the CPU to the coprocessor, sequentially or in parallel, a busy signal group, for allowing the coprocessor to signal the CPU that it cannot receive a transfer of one or more of the different instruction types, and an instruction order signal group for indicating to the coprocessor a relative execution order for multiple instructions that are transferred in parallel. In addition, the coprocessor interface includes separate data transfer signal groups for data being transferred from the CPU to the coprocessor, and for data being transferred from the coprocessor to the CPU, along with a data order signal group for indicating a relative order of data (if transferred out-of-order).Type: GrantFiled: August 20, 2004Date of Patent: February 8, 2011Assignee: MIPS Technologies, Inc.Inventors: Lawrence Henry Hudepohl, Darren Miller Jones, Radhika Thekkath, Franz Treue
-
Patent number: 7882335Abstract: The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one load instruction is in the issue group, if so scheduling the least one load instruction in a first pipeline based upon a priority list; and (3) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit.Type: GrantFiled: February 19, 2008Date of Patent: February 1, 2011Assignee: International Business Machines CorporationInventor: David A Luick
-
Patent number: 7882325Abstract: A single micro-instruction to perform either an N-bit or a 2N-bit load is provided. A microprocessor having an N-bit load port performs either an N-bit load or a 2N-bit load in a single cycle with the same micro-instruction being used for both the N-bit and the 2N-bit load.Type: GrantFiled: December 21, 2007Date of Patent: February 1, 2011Assignee: Intel CorporationInventors: Zeev Sperber, Robert Valentine, Ehud Cohen, Doron Orenstien, Benny Eitan
-
Patent number: 7877533Abstract: A bus system includes one or more bus masters, one or more bus slaves, and a response unit. When an access request to a resource of a bus slave is sent from a bus master, the response unit outputs a wait response that is either a blocking wait response to cause the bus master to perform a blocking wait operation or a non-blocking wait response to cause it to perform a non-blocking wait operation to the bus master if the bus slave is in the wait state.Type: GrantFiled: June 4, 2007Date of Patent: January 25, 2011Assignee: Renesas Electronics CorporationInventor: Hideki Matsuyama
-
Patent number: 7877581Abstract: A networking application processor is provided. The processor includes an input socket configured to receive data packets. The processor includes a memory for holding instructions and circuitry configured to access data structures associated with the processing stages. The circuitry configured to access data structures enables a single cycle access to an operand from a memory location. An arithmetic logic unit (ALU) is provided. Circuitry for aligning operands to be processed by the ALU is included. The circuitry for aligning the operands causes the operand to be aligned by a lowest significant bit, wherein the circuitry for aligning the operand supplies an extension to the operand to allow the ALU to process different size operands.Type: GrantFiled: December 2, 2003Date of Patent: January 25, 2011Assignee: PMC-Sierra US, Inc.Inventors: Shridhar Mukund, Mahesh Gopalan, Neeraj Kashalkar
-
Patent number: 7870365Abstract: In some embodiments, control and data messages are transmitted non-contentiously over corresponding control and data channels of inter-processor links in a matrix of mesh-interconnected matrix processors. A data stream instruction executed by a user thread of an instruction processing pipeline of a matrix processor may initiate a data stream transfer by a hardware data switch of the matrix processor over multiple consecutive cycles over a data channel. While the data stream is being transferred, the corresponding control channel may transfer control messages non-contentiously with respect to the data stream. The control messages may be messages received from other matrix processors and/or control messages initiated by a kernel thread of the current matrix processor.Type: GrantFiled: July 7, 2008Date of Patent: January 11, 2011Assignee: OvicsInventors: Sorin C Cismas, Ilie Garbacea
-
Patent number: 7869459Abstract: A mechanism for communicating instructions and data between a processor and external devices are provided. The mechanism makes use of a channel interface as the primary mechanism for communicating between the processor and a memory flow controller. The channel interface provides channels for communicating with processor facilities, memory flow control facilities, machine state registers, and external processor interrupt facilities, for example. These channels may be designated as blocking or non-blocking. With blocking channels, when no data is available to be read from the corresponding registers, or there is no space available to write to the corresponding registers, the processor is placed in a low power “stall” state. The processor is automatically awakened, via communication across the blocking channel, when data becomes available or space is freed. Thus, the channels of the present invention permit the processor to stay in a low power state.Type: GrantFiled: May 29, 2008Date of Patent: January 11, 2011Assignee: International Business Machines CorporationInventors: Michael N. Day, Charles R. Johns, John S Liberty, Todd E. Swanson, Thuong Q. Truong
-
Patent number: 7865700Abstract: The present invention provides a system and method for prioritizing store instructions in a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one store instruction is in the issue group, if so scheduling the least one store instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; (3) determine if there is an issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by scheduling the at least one store instruction in a different execution pipeline; (4) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit.Type: GrantFiled: February 19, 2008Date of Patent: January 4, 2011Assignee: International Business Machines CorporationInventor: David A. Luick
-
Publication number: 20100332808Abstract: Minimizing code duplication in an unbounded transactional memory system. A computing apparatus including one or more processors in which it is possible to use a set of common mode-agnostic TM barrier sequences that runs on legacy ISA and extended ISA processors, and that employs hardware filter indicators (when available) to filter redundant applications of TM barriers, and that enables a compiled binary representation of the subject code to run correctly in any of the currently implemented set of transactional memory execution modes, including running the code outside of a transaction, and that enables the same compiled binary to continue to work with future TM implementations which may introduce as yet unknown future TM execution modes.Type: ApplicationFiled: June 26, 2009Publication date: December 30, 2010Applicant: MICROSOFT CORPORATIONInventors: Ali-Reza Adl-Tabatabai, Bratin Saha, Gad Sheaffer, Vadim Bassin, Robert Y. Geva, Martin Taillefer, Darek Mihocka, Burton Jordan Smith, Jan Gray
-
Patent number: 7861069Abstract: The present invention provides a system and method for managing load and store operations necessary for reading from and writing to memory or I/O in a superscalar RISC architecture environment. To perform this task, a load store unit is provided whose main purpose is to make load requests out of order whenever possible to get the load data back for use by an instruction execution unit as quickly as possible. A load operation can only be performed out of order if there are no address collisions and no write pendings. An address collision occurs when a read is requested at a memory location where an older instruction will be writing. Write pending refers to the case where an older instruction requests a store operation, but the store address has not yet been calculated. The data cache unit returns 8 bytes of unaligned data. The load/store unit aligns this data properly before it is returned to the instruction execution unit.Type: GrantFiled: December 19, 2006Date of Patent: December 28, 2010Assignee: Seiko-Epson CorporationInventors: Cheryl D. Senter, Johannes Wang
-
Publication number: 20100325400Abstract: A microprocessor comprises a register set, a micro operations pool (Uops pool), a hazard detection unit, an execution unit, a dispatch unit, and a mask unit. The Uops pool receives a first micro operation and a second micro operation from a decoder, and reads at least one first operand of the first micro operation and at least one second operand of the second micro operation from the register set. The hazard detection unit detects that the first micro operation is in a write after write hazard state due to the second micro operation. The execution unit executes the first micro operation dispatched from the Uops pool to obtain a first operation result and executes the second micro operation dispatched from the Uops pool to obtain a second operation result. The mask unit protects the first operation result from writing back to the register set according to the write after write hazard state.Type: ApplicationFiled: December 14, 2009Publication date: December 23, 2010Applicant: RDC SEMICONDUCTOR CO., LTD.Inventor: Shou-Hua SHE
-
Patent number: 7856548Abstract: Prediction of data values to be read from memory by a microprocessor for load operations. In one aspect, a method for predicting a data value that will result from a load operation to be executed by the microprocessor includes accessing an entry in a load value prediction table that stores a predicted data value corresponding to the load operation. The predicted data value is provided as a result of the load operation without waiting for execution of the load operation to complete based on a confidence parameter stored in the entry compared to a dynamic confidence threshold.Type: GrantFiled: December 26, 2006Date of Patent: December 21, 2010Assignee: Oracle America, Inc.Inventors: Chris Nelson, Matthew Ashcraft, John Gregory Favor
-
Patent number: 7853860Abstract: A programmable signal processing circuit has an instruction processing circuit (23, 24, 26), with an instruction set that comprises a depuncture instruction. The instruction processing circuit (23, 24, 26) forms the depuncture result by copying bit metrics from a bit metrics operand and inserting one or more predetermined bit metric values between the bit metrics from the bit metric operand in the depuncture result. The instruction processing circuit (23, 24, 26) changes the relative locations of the copied bit metrics with respect to each other in the depuncture result as compared to the relative locations of the copied bit metrics with respect to each other in the bit metric operand, to an extent needed for accommodating the inserted predetermined bit metric value or values.Type: GrantFiled: December 13, 2005Date of Patent: December 14, 2010Assignee: Silicon Hive B.V.Inventors: Paulus W. F. Gruijters, Marcus M. G. Quax
-
Patent number: 7853713Abstract: The communication I/F unit according to the present invention includes a chain executing unit that executes all the chain SWRs. The SWR-chain storage unit stores therein a chain of SWRs. The chain executing unit sequentially reads the SWRs and executes the corresponding operations of an atomic operation so that the corresponding packets are sent outside.Type: GrantFiled: April 26, 2007Date of Patent: December 14, 2010Assignee: Fujitsu LimitedInventor: Nobutaka Imamura
-
Patent number: 7853778Abstract: A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register.Type: GrantFiled: December 20, 2001Date of Patent: December 14, 2010Assignee: Intel CorporationInventor: Patrice Roussel
-
Publication number: 20100313001Abstract: An apparatus includes a plurality of processing modules which are connected to each other by corresponding communication unit and the modules transfer packets in a predetermined direction to execute a plurality of operations of pipeline processing. The module includes a storage unit for storing a first identification and a second identification for each of the plurality of operations, a reception unit for extracting data from a packet which has the first identification, a processing unit for processing the data extracted by the reception unit, and a transmission unit for storing the second identification corresponding to the first identification of the packet a packet and transmitting the packet to the module arranged in the predetermined direction.Type: ApplicationFiled: June 4, 2010Publication date: December 9, 2010Applicant: CANON KABUSHIKI KAISHAInventor: Hisashi Ishikawa
-
Publication number: 20100303157Abstract: The present invention is directed to lessen burden at the time of solving a conflict of overlapping processes in processes for a plurality of interruption factors. On completion of data transfer to an external memory, a data transfer completion interruption of high priority is generated. In the case where data transfer of predetermined number of packets is not completed in reception interruption, a timer interruption of low priority is generated. Before processing data in an external memory responding to the interruption, the number of transfer packets is obtained from a counter. After restart of reception, the counter stores the number of transfer restart packets. After obtaining the number of transfer packets from a counter responding to the occurrence of the timer interruption, a data transfer completion interruption is generated.Type: ApplicationFiled: May 19, 2010Publication date: December 2, 2010Inventors: Hiroshige Abe, Isamu Mochizuki, Mika Mizutani
-
Publication number: 20100306511Abstract: There is a need for providing a communication data processor easily adaptable to network configurations required for industrial Ethernet. The apparatus successively analyzes received packets. The apparatus uses a register to determine whether or not to transmit the received packet as transmission data to another port. Rewritable memory saves a program code that provides control for analyzing a reception packet and generating a transmission packet. The apparatus is capable of complying with various communication protocols by changing the program code.Type: ApplicationFiled: May 26, 2010Publication date: December 2, 2010Applicant: RENESAS ELECTRONICS CORPORATIONInventors: Yoshinori Mochizuki, Takatoshi Kato, Nobuaki Kohinata, Shigeki Taira
-
Publication number: 20100306502Abstract: A digital signal processor uses a number of independent sub-processors that may be controlled by a master programmable controller. For example, a specialized input processor may process input signals while a specialized output processor may process output signals. Each of these processors may also accomplish math functions when input and output processing is not necessary. The various processors may communicate with one another through general purpose registers which receive data and provide data to any of the processors in the system. Math processors may be added as needed to accomplish desired mathematical functions. In addition, a RAM processor may be utilized to hold the results of intermediate calculations in one embodiment of the present invention. In this way, an adaptable and scaleable design may be implemented that accommodates a variety of different operations without requiring redesign of all the components.Type: ApplicationFiled: July 30, 2010Publication date: December 2, 2010Inventors: David K. Vavro, James A. Mitchell
-
Patent number: 7844801Abstract: Apparatus, system and methods are provided for performing speculative data prefetching in a chip multiprocessor (CMP). Data is prefetched by a helper thread that runs on one core of the CMP while a main program runs concurrently on another core of the CMP. Data prefetched by the helper thread is provided to the helper core. For one embodiment, the data prefetched by the helper thread is pushed to the main core. It may or may not be provided to the helper core as well. A push of prefetched data to the main core may occur during a broadcast of the data to all cores of an affinity group. For at least one other embodiment, the data prefetched by a helper thread is provided, upon request from the main core, to the main core from the helper core's local cache.Type: GrantFiled: July 31, 2003Date of Patent: November 30, 2010Assignee: Intel CorporationInventors: Hong Wang, Perry H. Wang, Jeffery A. Brown, Per Hammarlund, George Z. Chrysos, Doron Orenstein, Steve Shih-wei Liao, John P. Shen
-
Patent number: 7844802Abstract: Ordering instructions for specifying the execution order of other instructions improve throughput in a pipelined multiprocessor. Memory write operations local to a CPU are allowed to occur in an arbitrary order, and constraints are placed on shared memory operations. Multiple sets of instructions are provided in which order of execution of the instructions is maintained through the use of CPU registers, write buffers in conjunction with assignment of sequence numbers to the instruction, or a hierarchical ordering system. The system ensures that an earlier designated instruction has reach a specified state of execution prior to a latter instruction reaching a specified state of execution. The ordering of operations allows memory operations local to a CPU to occur in conjunction with other memory operations that are not affected by such execution.Type: GrantFiled: June 24, 2008Date of Patent: November 30, 2010Assignee: International Business Machines CorporationInventor: Paul E. McKenney
-
Publication number: 20100299130Abstract: Apparatus and method for processing information may determine whether a migration condition exists by a source information processing unit executing a program. When a migration condition is determined to exist by the source information processing unit, a destination information processing unit may determine whether an instruction to be executed of the program is a predetermined instruction. The instruction to be executed is converted by an instruction emulator, when a result of a determination by the destination information processing unit is the predetermined instruction.Type: ApplicationFiled: May 10, 2010Publication date: November 25, 2010Applicant: Sony CorporationInventors: Atsushi Mitsuzawa, Yuji Matsuyama, Toshihiko Kawai
-
Publication number: 20100293359Abstract: A clone set of General Purpose Registers (GPRs) is created to be used by a set of helper thread binaries, which is created from a set of main thread binaries. When the set of main thread binaries enters a wait state, the set of helper thread binaries uses the clone set of GPRs to continue using unused execution units within a processor core. The set of helper threads are thus able to warm up local cache memory with data that will be needed when execution of the set of main thread binaries resumes.Type: ApplicationFiled: February 1, 2008Publication date: November 18, 2010Inventors: Ravi K. Arimilli, Juan C. Rubio, Balaram Sinharoy
-
Patent number: 7835806Abstract: A controller can process an instruction directed to the controller itself to access data in the memory of the controller dynamically at runtime, where the data can be indirectly accessed by referencing a tag name, associated with the data and a memory space in memory, which can be included in a string tag associated with the instruction. Multiple tags, each tag associated with a respective item of data, can be located or referenced dynamically at runtime to access the respective items of data where one tag can be associated with a first structure, array, and/or scope and a disparate tag can be associated with a disparate structure, array, and/or scope, via an instruction.Type: GrantFiled: January 29, 2007Date of Patent: November 16, 2010Assignee: Rockwell Automation Technologies, Inc.Inventors: Ronald E. Bliss, David A. Johnston
-
Publication number: 20100287357Abstract: In one embodiment, a serial processor is configured to execute software instructions in a software program in serial. A serial memory is configured to store data for use by the serial processor in executing the software instructions in serial. A plurality of parallel processors are configured to execute software instructions in the software program in parallel. A plurality of partitioned memory modules are provided and configured to store data for use by the plurality of parallel processors in executing software instructions in parallel. Accordingly, a processor/memory structure is provided that allows serial programs to use quick local serial memories and parallel programs to use partitioned parallel memories. The system may switch between a serial mode and a parallel mode. The system may incorporate pre-fetching commands of several varieties.Type: ApplicationFiled: March 10, 2010Publication date: November 11, 2010Applicant: XMTT INC.Inventor: Uzi Y. Vishkin
-
Publication number: 20100287360Abstract: The speed of task scheduling by a multitask OS is increased. A task processor includes a CPU, a save circuit, and a task control circuit. The CPU is provided with a processing register and an execution control circuit operative to load data from a memory into a processing register and execute a task in accordance with the data in the processing register. The save circuit is provided with a plurality of save registers respectively associated with a plurality of tasks. In executing a predetermined system call, the execution control circuit notifies the task control circuit as such. The task control circuit switches between tasks for execution upon receipt of the system call signal, by saving, in the save register associated with a task being executed, the data in the processing register, selecting a task to be executed next, and loading data in the save register associated with the selected task into the processing register.Type: ApplicationFiled: August 24, 2006Publication date: November 11, 2010Inventor: Naotaka Maruyama
-
Patent number: 7831812Abstract: A processor includes a processor core with a core interface unit that includes an age queue and a request queue. The core interface unit receives load requests from the processor core. The request queue stores the requests in respective slots of the request queue. The age queue stores ID tags in respective age queue slots. Each ID tag in the age queue corresponds to a respective address of a load instruction in the request queue. In one embodiment, ID tags propagate through the age queue at a fixed rate of two at a time from a tail of the age queue to a head of the age queue. Arbitration control circuitry generates an enable bit vector that identifies the oldest ID tag in the age queue corresponding to the oldest load request in the request queue. The arbitration circuitry selects the identified oldest instruction in the request queue as the next to dispatch. In one embodiment, the core interface unit exhibits an input frequency that is a multiple of an internal operating frequency of the core interface unit.Type: GrantFiled: August 31, 2007Date of Patent: November 9, 2010Assignee: International Business Machines CorporationInventors: Alvan Wing Ng, Takuya Kano
-
Patent number: 7831811Abstract: A virtual machine in a processing system manages type information for operands. In one embodiment, the virtual machine accomplishes the following results through execution of a single instruction: adding an operand tag to a tag stack, and updating a stack pointer for the tag stack to recognize the addition of the operand tag to the tag stack. The single instruction may be a shift instruction, for example. The tag stack may reside in a tag stack register, and each operand tag may indicate whether a corresponding operand on an operand stack is to be treated as a reference operand or a non-reference operand. Other embodiments are described and claimed.Type: GrantFiled: October 31, 2005Date of Patent: November 9, 2010Assignee: Intel CorporationInventors: Jinzhan Peng, Gansha Wu, Peng Guo, Xin Zhou, Zhiwei Ying
-
Publication number: 20100281236Abstract: An apparatus for processing data may include an array of processing elements (such as an n×m or n×n array of processing elements) configured to simultaneously perform operations on a plurality of data elements using a single instruction. Each processing element in the array may be configured to transfer data directly to at least one neighboring processing element within the array. In selected embodiments, the apparatus may include exchange registers to temporarily store data transferred between neighboring processing elements.Type: ApplicationFiled: April 30, 2009Publication date: November 4, 2010Applicant: Novafora, Inc.Inventors: Shlomo Selim Rakib, Muhammad Ahmed, Marc Schaub
-
Patent number: 7827389Abstract: A method, system, and computer program product are provided for enhancing the execution of independent loads in a processing unit. The processing unit dispatches a first set of instructions in order from a first buffer for execution. The processing unit receives updated results from the execution of the first set of instructions. The processing unit updates, in a first register, at least one register entry associated with each instruction in the first set of instructions, with the updated results. The processing unit determines if the first set of instructions from the first buffer have completed execution. Responsive to the completed execution of the first set of instructions from the first buffer, the processing unit copies the set of entries from the first register to a second register.Type: GrantFiled: June 15, 2007Date of Patent: November 2, 2010Assignee: International Business Machines CorporationInventors: Hung Q. Le, Dung Q. Nguyen
-
Patent number: 7827320Abstract: A Serial Advanced Technology Attachment (SATA) device for communicating with a host is disclosed. The SATA device comprises control circuitry which enters a XRDY state in preparation for sending data to the host, receives a first XRDY from the host while in the XRDY state, and sets a RXRDY flag. After receiving the first XRDY, the control circuitry receives a RRDY from the host while in the XRDY state, transmits a data block to the host in response to the RRDY, and enters an idle state after transmitting the data block to the host. If the RXRDY flag is set while in the idle state, the control circuitry waits for the host to transmit a second XRDY.Type: GrantFiled: March 28, 2008Date of Patent: November 2, 2010Assignee: Western Digital Technologies, Inc.Inventor: Curtis E. Stevens
-
Patent number: 7827390Abstract: A microprocessor includes a private RAM (PRAM), for use by microcode, which is non-user-accessible and within its own distinct address space from the system memory address space. The PRAM is denser and slower than user-accessible registers of the microprocessor macroarchitecture, thereby enabling it to provide significantly more storage for microcode. The microinstruction set includes a microinstruction for loading data from the PRAM into the user-accessible registers, and a microinstruction for storing data from user-accessible registers to the PRAM. The microcode may also use the two microinstructions to load/store between the PRAM and non-user-accessible registers of the microarchitecture.Type: GrantFiled: February 20, 2008Date of Patent: November 2, 2010Assignee: VIA Technologies, Inc.Inventors: G. Glenn Henry, Colin Eddy, Rodney E. Hooker, Terry Parks
-
Publication number: 20100274997Abstract: Methods, apparatus, and computer program products are disclosed for executing a gather operation on a parallel computer according to embodiments of the present invention. Embodiments include configuring, by the logical root, a result buffer or the logical root, the result buffer having positions, each position corresponding to a ranked node in the operational group and for storing contribution data gathered from that ranked node.Type: ApplicationFiled: May 29, 2007Publication date: October 28, 2010Inventors: Charles J. Archer, Joseph D. Ratterman
-
Patent number: 7822946Abstract: A computing and communication chip architecture is provided wherein the interfaces of processor access to the memory chips are implemented as a high-speed packet switched serial interface as part of each chip. In one embodiment, the interface is accomplished through a gigabit Ethernet interface provided by protocol processor integrated as part of the chip. The protocol processor encapsulates the memory address and control information like Read, Write, number of successive bytes etc, as an Ethernet packet for communication among the processor and memory chips that are located on the same motherboard, or even on different circuit cards. In one embodiment, the communication over head of the Ethernet protocol is further reduced by using an enhanced Ethernet protocol with shortened data frames within a constrained neighborhood, and/or by utilizing a bit stream switch where direct connection paths can be established between elements that comprise the computing or communication architecture.Type: GrantFiled: February 4, 2008Date of Patent: October 26, 2010Assignee: PSIMAST, IncInventor: Viswa Sharma