Processing Control For Data Transfer Patents (Class 712/225)
-
Publication number: 20100115221Abstract: A system and method are described for a memory management processor which, using a table of reference addresses embedded in the object code, can open the appropriate memory pages to expedite the retrieval of information from memory referenced by instructions in the execution pipeline. A suitable compiler parses the source code and collects references to branch addresses, calls to other routines, or data references, and creates reference tables listing the addresses for these references at the beginning of each routine. These tables are received by the memory management processor as the instructions of the routine are beginning to be loaded into the execution pipeline, so that the memory management processor can begin opening memory pages where the referenced information is stored. Opening the memory pages where the referenced information is located before the instructions reach the instruction processor helps lessen memory latency delays which can greatly impede processing performance.Type: ApplicationFiled: January 11, 2010Publication date: May 6, 2010Inventor: Dean A. Klein
-
MANAGING AN OUT-OF-ORDER ASYNCHRONOUS HETEROGENEOUS REMOTE DIRECT MEMORY ACCESS (RDMA) MESSAGE QUEUE
Publication number: 20100106948Abstract: A system and method operable to manage a message queue is provided. This management may involve out-of-order asynchronous heterogeneous remote direct memory access (RDMA) to the message queue. This system includes a pair of processing devices, a primary processing device and an additional processing device, a memory in storage location and a data bus coupled to the processing devices. The processing devices cooperate to process queue data within a shared message queue wherein when an individual processing device successfully accesses queue data the queue data is locked for the exclusive use of the processing device. When the processing device acquires the queue data, the queue data is locked and the queue data acquired by the acquiring processing device includes the queue data for both the primary processing device and additional processing device such that the processing device has all queue data necessary to process the data and return processed queue data.Type: ApplicationFiled: October 24, 2008Publication date: April 29, 2010Inventors: Gregory Howard Bellows, Jason N. Dale -
Patent number: 7707151Abstract: One aspect is directed to a method for performing data migration from a first volume to a second volume while allowing a write operation to be performed on the first volume during the act of migrating. Another aspect is a method and apparatus that stores, in a persistent manner, state information indicating a portion of the first volume successfully copied to the second volume. Another aspect is a method and apparatus for migrating data from a first volume to a second volume, and resuming, after an interruption of the migration, copying data from the first volume to the second volume without starting from the beginning of the data. Another aspect is a method and apparatus for migrating to data from a first to a second volume, receiving an access request directed to the first volume from an application that stores data on the first volume, and redirecting the access request to the second volume without having to reconfigure the application that accesses data on the first volume.Type: GrantFiled: January 29, 2003Date of Patent: April 27, 2010Assignee: EMC CorporationInventors: Steven M. Blumenau, Stephen J. Todd
-
Patent number: 7707392Abstract: An information processing system includes a first processor that accesses a first memory, a second processor that accesses a second memory, and a data transfer unit for executing data transfer between the first memory and the second memory. The first processor executes functions of translating an instruction out of instructions included in the program except a memory access instruction into an instruction for the second processor and translating the memory access instruction into an instruction sequence containing a call instruction of the program to transfer the access data on the first memory to the second memory via a data transfer unit.Type: GrantFiled: March 13, 2008Date of Patent: April 27, 2010Assignee: Kabushiki Kaisha ToshibaInventors: Seiji Maeda, Hidenori Matsuzaki, Yusuke Shirota, Kazuya Kitsunai
-
Patent number: 7707388Abstract: In one embodiment, a serial processor is configured to execute software instructions in a software program in serial. A serial memory is configured to store data for use by the serial processor in executing the software instructions in serial. A plurality of parallel processors are configured to execute software instructions in the software program in parallel. A plurality of partitioned memory modules are provided and configured to store data for use by the plurality of parallel processors in executing software instructions in parallel. Accordingly, a processor/memory structure is provided that allows serial programs to use quick local serial memories and parallel programs to use partitioned parallel memories. The system may switch between a serial mode and a parallel mode. The system may incorporate pre-fetching commands of several varieties.Type: GrantFiled: November 29, 2006Date of Patent: April 27, 2010Assignee: XMTT Inc.Inventor: Uzi Vishkin
-
Patent number: 7698538Abstract: A method and apparatus are provided for downloading a program by using hand-shaking in a digital signal processor (DSP), in which the program stored at an external memory is downloaded to an internal memory by using the hand-shaking in an asynchronous system having a dual CPU, wherein current operation of the digital signal processor is temporarily held to shorten a downloading time.Type: GrantFiled: January 9, 2003Date of Patent: April 13, 2010Assignee: Samsung Electronics Co., Ltd.Inventor: Seong-Ho Yoon
-
Patent number: 7694084Abstract: A microcomputer architecture comprises a microprocessor unit and a first memory unit, the microprocessor unit comprising a functional unit and at least one data register, the functional unit and the at least one data register being linked to a data bus internal to the microprocessor unit. The data register is a wide register comprising a plurality of second memory units which are capable to each contain one word. The wide register is adapted so that the second memory units are simultaneously accessible by the first memory unit, and so that at least part of the second memory units are separately accessible by the functional unit.Type: GrantFiled: March 10, 2006Date of Patent: April 6, 2010Assignee: IMECInventors: Praveen Raghavan, Francky Catthoor
-
Patent number: 7689779Abstract: Access to a memory area by a first processor that executes a first processor program and a second processor that executes a second processor program is granted to one of the first processor and the second processor at a time. Access to the memory area by the first processor and the second processor are cyclically uniquely allocated (e.g., t?[(ad mod m)=o]) between the first and the second processor by the first and second processor programs.Type: GrantFiled: August 14, 2006Date of Patent: March 30, 2010Assignee: Micronas GmbHInventors: Matthias Vierthaler, Carsten Noeske
-
Patent number: 7689806Abstract: A method and system to indicate which page within a software-managed page table triggers an exception within a microprocessor, such as, for example, a digital signal processor, wherein a software-managed translation lookaside buffer (TLB) module receives a virtual address produced by an instruction within a Very Long Instruction Word (VLIW) packet, such as, for example, a fetch instruction, and further compares the virtual address to each stored TLB entry. If a match exists, then the TLB module outputs a corresponding mapped physical address for the instruction. Otherwise, if the VLIW packet spans two pages, where a first page is present as a TLB entry within the TLB module and the second page is missing from the stored TLB entries, an indication bit within a data field of a control register is set to identify the TLB miss exception to a software management unit.Type: GrantFiled: July 14, 2006Date of Patent: March 30, 2010Assignee: QInventors: Lucian Codrescu, Erich Plondke, Muhammad Ahmed, Vijaya Kumar Janjanam
-
Publication number: 20100070742Abstract: An embedded-DRAM processor architecture includes a DRAM array, a set of register files, set of functional units, and a data assembly unit. The data assembly unit includes a set of row-address registers and is responsive to commands to activate and deactivate DRAM rows and to control the movement of data throughout the system. A pipelined data assembly approach allowing the functional units to perform register-to-register operations, and allowing the data assembly unit to perform all load/store operations using wide data busses. Data masking and switching hardware allows individual data words or groups of words to be transferred between the registers and memory. Other aspects of the disclosure include a memory and logic structure and an associated method to extract data blocks from memory to accelerate, for example, operations related to image compression and decompression.Type: ApplicationFiled: November 20, 2009Publication date: March 18, 2010Applicant: Micron Technology, Inc.Inventor: Eric M. Dowling
-
Patent number: 7680962Abstract: An array type processor comprises a data path unit to execute processing, and a state management unit to control the state of the data path unit in accordance with a command that specifies processing on the data. An input DMA circuit reads from a memory information and data to be processed including a command corresponding to the data. The input DMA circuit first transfers the command to the state management unit, and then transfers the data to be processed to the data path unit.Type: GrantFiled: December 21, 2005Date of Patent: March 16, 2010Assignee: NEC Electronics CorporationInventors: Kenichiro Anjo, Katsumi Togawa, Ryoko Sasaki, Taro Fujii, Masato Motomura
-
Patent number: 7680964Abstract: A method for improving timing behavior of a processing unit in a multithreading environment is disclosed, wherein the processing unit generates data frames for an output unit by combining data from a plurality of input units, and the processed data are buffered in an output buffer between the processing unit and the output unit. The method comprises sending from the output unit to the processing unit a value corresponding to the filling of the output buffer, calculating a timer value, setting a timer with the timer value, wherein the timer calls the processing unit thread after the specified time. The timer value depends on the value corresponding to the averaged filling of the output buffer. As a result, the average filling of the output buffer is lower compared to conventional thread management, and thus the system is more flexible and reacts quicker.Type: GrantFiled: May 26, 2005Date of Patent: March 16, 2010Assignee: Thomson LicensingInventor: Jürgen Schmidt
-
Patent number: 7681018Abstract: A parallel hardware-based multithreaded processor is described. The processor includes a general purpose processor that coordinates system functions and a plurality of microengines that support multiple hardware threads or contexts. The processor also includes a memory control system that has a first memory controller that sorts memory references based on whether the memory references are directed to an even bank or an odd bank of memory and a second memory controller that optimizes memory references based upon whether the memory references are read references or write references. Instructions for switching and branching based on executing contexts are also disclosed.Type: GrantFiled: January 12, 2001Date of Patent: March 16, 2010Assignee: Intel CorporationInventors: Gilbert Wolrich, Matthew J. Adiletta, William Wheeler
-
Patent number: 7676661Abstract: A fast linked multiprocessor network including a plurality of processing modules implemented on a field programmable gate array and a plurality of configurable uni-directional links coupled among at least two of the plurality processing modules provide a streaming communication channel between at least two of the plurality of processing modules. Such configuration provides a function accelerator that can feed at least one processor with data values using one custom instruction to put data values on at least one uni-directional serial link and that can extract data values from at least one processor using one custom instruction to get data values from the at least one uni-directional serial link.Type: GrantFiled: October 5, 2004Date of Patent: March 9, 2010Assignee: Xilinx, Inc.Inventors: Sundararajarao Mohan, Satish R. Ganesan, Goran Bilski
-
Patent number: 7676646Abstract: A Wide Register Set (WRS) is used in a packet processor to increase performance for certain packet processing operations. The registers in the WRS have wider bit lengths than the main registers used for primary packet processing operations. A wide logic unit is configured to conduct logic operations on the wide register set and in one implementation includes hardware primitives specifically configured for packet scheduling operations. A special interlocking mechanism is additionally used to coordinate accesses among multiple processors or threads to the same wide register address locations. The WRS produces a scheduling engine that is much cheaper than previous hardware solutions with higher performance than previous software solutions. The WRS provides a small, compact, flexible, and scalable scheduling sub-system and can tolerate long memory latencies by using cheaper memory while sharing memory with other uses.Type: GrantFiled: March 2, 2005Date of Patent: March 9, 2010Assignee: Cisco Technology, Inc.Inventor: Earl T. Cohen
-
Patent number: 7673121Abstract: A method for the transmission of digital messages by the output terminals of a monitoring circuit which is integrated into a microprocessor, the digital messages being representative of first specific events which are dependent on the execution of a series of instructions by the microprocessor.Type: GrantFiled: November 14, 2002Date of Patent: March 2, 2010Assignee: STMicroelectronics S.A.Inventors: Catherine Robert, Xavier Robert, Jehan-Philippe Barbiero
-
Patent number: 7672305Abstract: Port input sections generate, when the head flits of a packet are stored in the first and second registers, first and second mediation request signals destined for a desired request destination, and further generate a first notification signal used to notify the presence or absence of the first mediation request signal destined for any request destination. Upon reception of a mediation result signal, the port input sections output the flit from the first register and sequentially forward flits to be stored in the first register and the second register, and the port output sections sequentially output the flit outputted from the first register of any one of the port input sections to the node.Type: GrantFiled: October 2, 2006Date of Patent: March 2, 2010Assignee: NEC CorporationInventor: Yoshihisa Yamada
-
Publication number: 20100049953Abstract: A microprocessor includes an N-way cache and a logic block that selectively enables and disables the N-way cache for at least one clock cycle if a first register load instructions and a second register load instruction, following the first register load instruction, are detected as pointing to the same index line in which the requested data is stored. The logic block further provides a disabling signal to the N-way cache for at least one clock cycle if the first and second instructions are detected as pointing to the same cache way.Type: ApplicationFiled: August 20, 2008Publication date: February 25, 2010Applicant: MIPS Technologies, Inc.Inventors: Ajit Karthik Mylavarapu, Sanjai Balakrishnan Athi
-
Patent number: 7669041Abstract: A processor having a zero-overhead operand copy capability. The processor includes multiple execution units to execute instructions in parallel and multiple register files each associated with one or more of the execution units. The processor further includes circuitry to select either an instruction execution result from a first one of the execution units or content of a register within a first one of the register files associated with the first one of the execution units to be stored within a register within a second one of the register files.Type: GrantFiled: April 30, 2007Date of Patent: February 23, 2010Assignee: Stream Processors, Inc.Inventors: Brucek Khailany, Ujval J. Kapasi
-
Patent number: 7669040Abstract: A system that executes a long transaction in a system with limited transactional hardware resources. During operation, the system executes the long transaction in a non transactional mode, which does not use transactional hardware resources. The system defers stores generated during the long transaction so that the stores are not committed to the architectural state of a processor until the transaction is successfully completed. If the long transaction successfully completes, the system commits the long transaction, which involves performing multiple hardware transactions to commit the deferred stores to the architectural state of the processor.Type: GrantFiled: December 15, 2006Date of Patent: February 23, 2010Assignee: Sun Microsystems, Inc.Inventor: David Dice
-
Patent number: 7660970Abstract: Disclosed is a data processing system and method. The data processing method determines the number of static registers and the number of rotating registers for assigning a register to a variable contained in a certain program, assigns the register to the variable based on the number of the static registers and the number of the rotating registers, and compiles the program. Further, the method stores in the special register a value corresponding to the number of the rotating registers in the compiling operation, and obtains a physical address from a logical address of the register based on the value. Accordingly, the present invention provides an aspect of efficiently using register files by dynamically controlling the number of rotating registers and the number of static registers for a software pipelined loop, and has an effect capable of reducing the generations of spill/fill codes unnecessary during program execution to a minimum.Type: GrantFiled: August 21, 2006Date of Patent: February 9, 2010Assignee: Samsung Electronics Co., Ltd.Inventors: Suk-jin Kim, Jeong-wook Kim, Hong-seok Kim, Soo-jung Ryu
-
Patent number: 7660967Abstract: A computer processor is responsive to successive processing instructions in an issue order to process regular vectors to generate a result vector without use of a cache. At least two architectural registers having input-vector capability are selectively coupled to memory to receive corresponding vector-elements of two vectors and transfer the vector-elements to a selected functional unit. At least one architectural register having output capability is selectively coupled to an output, which in turn is coupled to transfer result vector-elements to the memory. The functional unit performs a function on the vector-elements to generate a respective result-element. The result-elements are transferred to a selected architectural register for processing as operands in performance of further functions by a functional unit, or are transferred to the output for transfer to memory. In either case, the order of the result vector-elements is restored to the issue order of the successive processing instructions.Type: GrantFiled: January 30, 2008Date of Patent: February 9, 2010Assignee: Efficient Memory TechnologyInventor: Maurice L. Hutson
-
Patent number: 7656409Abstract: In a many core system, receiving a call to a graphics driver; translating the call into a command executable on a core of the many core system; and executing the translated call on the core.Type: GrantFiled: December 23, 2005Date of Patent: February 2, 2010Assignee: Intel CorporationInventors: Lyle Cool, Yasser Rasheed
-
Publication number: 20100017453Abstract: A programmable signal processing circuit has an instruction processing circuit (23, 24. 26), which has an instruction set that comprises a demapping instruction. The instruction processing circuit (23, 24, 26) has an operand input (30a) for receiving a complex number operand of the demapping instruction from a register file (22) and a result output (34) for writing a demapping result of the demapping instruction to the register file (22). The instruction processing circuit (23, 24, 26) determines at least four bit metrics in response to the demapping instruction, each indicating a relative position of the complex number relative to respective border line in a complex plane. The instruction processing circuit (23, 24, 26) writes a combination of the at least four bit metrics together to the result output (34) in the demapping result.Type: ApplicationFiled: December 13, 2005Publication date: January 21, 2010Applicant: KONINKLIJKE PHILIPS ELECTRONICS, N.V.Inventors: Ingolf Held, Marcus M.G. Quax, Paulus W.F. Gruijters
-
Patent number: 7650488Abstract: In an embodiment, a method is provided that may include providing a first address space exclusively and coherently accessible by a first processor core partition in a platform. A second address space may be provided in this embodiment that is exclusively and coherently accessible by a second processor core partition in the platform. Also in this embodiment, a third address space in the platform may be provided that is accessible, at least in part, by both the first and second processor core partitions and may be to permit communication between the first and second processor core partitions of at least one packet and at least one descriptor associated with the at least one packet. The at least one descriptor may indicate, at least in part, one or more locations in the third address space to store, at least in part, the at least one packet. Of course, many alternatives, modifications, and variations are possible without departing from this embodiment.Type: GrantFiled: June 18, 2008Date of Patent: January 19, 2010Assignee: Intel CorporationInventors: Annie Foong, Bryan E. Veal, Arun Raghunath
-
Patent number: 7650605Abstract: A multi-streaming processor has a plurality of streams for streaming one or more instruction threads, a set of functional resources for processing instructions from streams, and a lock mechanism for locking selected memory locations shared by streams of the processor, the hardware-lock mechanism operating to set a lock when an atomic memory sequence is started and to clear a lock when an atomic memory sequence is completed. In preferred embodiments the lock mechanism comprises one or more storage locations associated with each stream of the processor, each storage location enabled to store a memory address a lock bit, and a stall bit. Methods for practicing the invention using the apparatus are also taught.Type: GrantFiled: February 20, 2007Date of Patent: January 19, 2010Assignee: MIPS Technologies, Inc.Inventors: Stephen Melvin, Mario D. Nemirovsky
-
Publication number: 20100005257Abstract: A storage device includes a first storage unit that stores data read from a recording medium based on an instruction received from a processing device, and transmitting the data stored in the first storage unit to the processing device. The storage device also includes a second storage unit that stores the instruction received from the processing device; a counter that counts the number of pieces of data stored in the first storage unit; and a control unit that transmits the data stored in the first storage unit to the processing device based on a count value of the counter and, when the data read upon the instruction is stored in the first storage unit, writes identification information indicating that storing data has been completed in the second storage unit and, based on the identification information, transmits the data stored in the first storage unit to the processing device.Type: ApplicationFiled: June 8, 2009Publication date: January 7, 2010Applicant: FUJITSU LIMITEDInventors: Masaaki Tamura, Gen Ohshima
-
Publication number: 20100005316Abstract: Method, system, and computer program product embodiments for performing a branch trace operation on a computer system of an end user are provided. An encrypted mapping macro is provided to the end user to be made operational on the computer system. A trace program is provided to the end user. The end user executes the trace program on the computer system as a diagnostic tool. The trace program is adapted for decrypting the encrypted mapping macro, determining a storage offset location of a branch instruction; checking the storage offset location for an identifying constant, cross referencing the identifying constant with an entry in the decrypted mapping macro to identify a branch triggering bit and diagnostic information associated with the branch instruction, and returning the branch triggering bit and diagnostic information, the branch triggering bit and diagnostic information provided to a coder.Type: ApplicationFiled: July 7, 2008Publication date: January 7, 2010Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: David Bruce LeGENDRE, David Charles REED, Max Douglas SMITH
-
Publication number: 20100005279Abstract: The data processor executes an instruction having a direction for write to a reference register of other instruction flow and an instruction having a direction for reference register invalidation. The data processor is arranged as a data processor having typical functions as an integrated whole of processors (CPU1 and CPU2) which execute simple instruction flows. When executing the instruction having a direction for write to a reference register of other instruction flow, the processor confirms whether a write register is invalid. The processor waits for the register to be made invalid, if the register is not invalid, and performs write if the register is invalid. After having executed the instruction having a direction for reference register invalidation, the processor invalidates the register to which a reference has been made. When the reference register is invalid, execution of the referring instruction is suspended until it is made valid.Type: ApplicationFiled: September 14, 2009Publication date: January 7, 2010Inventor: Fumio Arakawa
-
Publication number: 20090327667Abstract: Systems and methods to perform fast rotation operations are disclosed. In a particular embodiment, a method includes executing a single instruction. The method includes receiving first data indicating a first coordinate and a second coordinate, receiving a first control value that indicates a first rotation value selected from a set of ninety degree multiples, and writing output data corresponding to the first data rotated by the first rotation value.Type: ApplicationFiled: June 26, 2008Publication date: December 31, 2009Applicant: QUALCOMM INCORPORATEDInventors: Shankar Krithivasan, Erich James Plondke, Lucian Codrescu, Mao Zeng, Remi Jonathan Gurski
-
Publication number: 20090327693Abstract: A network task offload apparatus includes an offload circuit and a buffer scheduler. The offload circuit performs corresponding network task processing on a plurality of packets in parallel according to an offload command. The buffer scheduler includes a buffer control unit and a plurality of buffer units. The plurality of buffer units are controlled by the buffer control unit and are scheduled to store the processed packets.Type: ApplicationFiled: June 24, 2009Publication date: December 31, 2009Inventors: Li-Han Liang, Tao-Chun Wang, Kuo-Nan Yang, Shieh-Hsing Kuo
-
Publication number: 20090327668Abstract: Tools and techniques are described for multi-threaded processing for opening and saving documents. These tools may provide load processes for reading documents from storage devices, and for loading the documents into applications. These tools may spawn a load process thread for executing a given load process on a first processing unit, and an application thread may execute a given application on a second processing unit. A first pipeline may be created for executing the load process thread, with the first pipeline performing tasks associated with loading the document into the application. A second pipeline may be created for executing the application process thread, with the second pipeline performing tasks associated with operating on the documents. The tasks in the first pipeline are configured to pass tokens as input to the tasks in the second pipeline.Type: ApplicationFiled: June 27, 2008Publication date: December 31, 2009Applicant: MICROSOFT CORPORATIONInventors: Uladzislau Sudzilouski, Igor Zaika
-
Publication number: 20090327666Abstract: A method for managing data, including obtaining a first instruction for moving a first data item from a first source to a first destination, determining a data type of the first data item, determining a data type supported by the first destination, comparing the data type of the first data item with the data type supported by the first destination to test a validity of the first instruction, and moving the first data item from the first source to the first destination based on the validity of the first instruction.Type: ApplicationFiled: June 25, 2008Publication date: December 31, 2009Applicant: SUN MICROSYSTEMS, INC.Inventors: Mario I. Wolczko, Gregory M. Wright, Matthew L. Seidl
-
Patent number: 7640414Abstract: Methods, systems, and computer program products for forwarding store data to loads in a pipelined processor are provided. In one implementation, a processor is provided that includes a decoder operable to decode an instruction, and a plurality of execution units operable to respectively execute a decoded instruction from the decoder. The plurality of execution units include a load/store execution unit operable to execute decoded load instructions and decoded store instructions and generate corresponding load memory operations and store memory operations. The store queue is operable to buffer one or more store memory operations prior to the one or more memory operations being completed, and the store queue is operable to forward store data of the one or more store memory operations buffered in the store queue to a load memory operation on a byte-by-byte basis.Type: GrantFiled: November 16, 2006Date of Patent: December 29, 2009Assignee: International Business Machines CorporationInventors: Jason Alan Cox, Kevin Chih Kang Lin, Eric Francis Robinson
-
Publication number: 20090319760Abstract: An n architecture for implementing an instruction pipeline within a CPU comprises an arithmetic logic unit (ALU), an address arithmetic unit (AAU), a program counter (PC), a read-only memory (ROM) coupled to the program counter, to an instruction register, and to an instruction decoder coupled to the arithmetic logic unit. A random access memory (RAM) is coupled to the instruction decoder, to the arithmetic logic unit, and to a RAM address register.Type: ApplicationFiled: August 27, 2009Publication date: December 24, 2009Inventors: Benjamin F. Froemming, Emil Lambrache
-
Publication number: 20090313459Abstract: A system, method and article of manufacture are disclosed for processing Low Density Parity Check (LDPC) codes. The system comprises a multitude of processing units for processing the codes; and a processor chip including an on-chip, multi-port data cache for temporarily storing the LDPC codes. This data cache includes a plurality of input ports for receiving the LDPC codes from some of the processing units, and a plurality of output ports for sending the LDPC codes to others of the processing units. An off-chip, external memory stores the LDPC codes and transmits the LDPC codes to and receives the LDPC codes from at least some of the processing units. A sequence processor controls the transmission of the LDPC codes between the processor units and the on-chip data cache so that the LDPC codes are processed by the processing units according to a given sequence.Type: ApplicationFiled: June 13, 2008Publication date: December 17, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Thomas A. Horvath
-
Patent number: 7634635Abstract: Systems and methods for reordering processor instructions. In accordance with a first embodiment of the present invention, a microprocessor comprises circuitry to process an instruction extension, wherein the instruction extension is transparent to the programming model of the microprocessor. The instruction extension may comprise a field for indicating an offset from a memory structure pointer. The microprocessor includes circuitry for adding the offset to the memory structure pointer to indicate a specific element of the memory structure. The specific element of the memory structure comprises address information corresponding to speculative data.Type: GrantFiled: April 7, 2006Date of Patent: December 15, 2009Inventors: Brian Holscher, Guillermo Rozas, James Van Zoeren, David Dunn
-
Patent number: 7634639Abstract: One embodiment of the present invention provides a system which avoids a live-lock state in a processor that supports speculative-execution. The system starts by issuing instructions for execution in program order during execution of a program in a normal-execution mode. Upon encountering a launch condition during the execution of an instruction (a “launch instruction”) which causes the processor to enter a speculative-execution mode, the system checks status indicators associated with a forward progress buffer. If the status indicators indicate that the forward progress buffer contains data for the launch instruction, the system resumes normal-execution mode. Upon resumption of normal-execution mode, the system retrieves the data from a data field contained in the forward progress buffer and executes the launch instruction using the retrieved data as input data for the launch instruction. The system next deasserts the status indicators.Type: GrantFiled: August 23, 2005Date of Patent: December 15, 2009Assignee: Sun Microsystems, Inc.Inventors: Shailender Chaudhry, Paul Caprioli, Sherman H. Yip, Guarav Garg, Ketaki Rao
-
Publication number: 20090307467Abstract: Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.Type: ApplicationFiled: May 21, 2008Publication date: December 10, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Ahmad Faraj
-
Patent number: 7631170Abstract: An efficient embedded-DRAM processor architecture and associated methods. In one exemplary embodiment, the architecture includes a DRAM array, a set of register files, set of functional units, and a data assembly unit. The data assembly unit includes a set of row-address registers and is responsive to commands to activate and deactivate DRAM rows and to control the movement of data throughout the system. A pipelined data assembly approach allowing the functional units to perform register-to-register operations, and allowing the data assembly unit to perform all load/store operations using wide data busses. Data masking and switching hardware allows individual data words or groups of words to be transferred between the registers and memory. Other aspects of the invention include a memory and logic structure and an associated method to extract data blocks from memory to accelerate, for example, operations related to image compression and decompression.Type: GrantFiled: February 13, 2002Date of Patent: December 8, 2009Assignee: Micron Technology, Inc.Inventor: Eric M. Dowling
-
Publication number: 20090300337Abstract: Improved instruction set and core design, control and communication for programmable microprocessors is disclosed, involving the strategy for replacing centralized program sequencing in present-day and prior art processors with a novel distributed program sequencing wherein each functional unit has its own instruction fetch and decode block, and each functional unit has its own local memory for program storage; and wherein computational hardware execution units and memory units are flexibly pipelined as programmable embedded processors with reconfigurable pipeline stages of different order in response to varying application instruction sequences that establish different configurations and switching interconnections of the hardware units.Type: ApplicationFiled: May 29, 2008Publication date: December 3, 2009Inventors: Xiaolin Wang, Qian Wu, Benjamin Marshall, Fugui Wang, Gregory Pitarys, Ke Ning
-
Patent number: 7627743Abstract: A multi-word transfer instruction, a memory transfer method using the multi-word transfer instruction and a circuit implementation for transferring multiple words between a memory subsystem and a processor register file are provided. The multi-word transfer instruction specifies an access type (load or store), a consecutive register group, a selection mask and a base register for the starting address of the corresponding memory locations. Therefore, the total number of words accessed by this instruction is equal to the number of registers specified in the consecutive register group along with the number of the registers specified by the selection mask. Besides, additional information, such as an address update mode, an order mode and a modification mode, may be further specified in the multi-word transfer instruction.Type: GrantFiled: January 12, 2007Date of Patent: December 1, 2009Assignee: Andes Technology CorporationInventors: Hong-Men Su, Chuan-Hua Chang, Jen-Chih Tseng
-
Patent number: 7627698Abstract: A variety of advantageous mechanisms for improved data transfer control within a data processing system are described. A DMA controller is described which is implemented as a multiprocessing transfer engine supporting multiple transfer controllers which may work independently or in cooperation to carry out data transfers, with each transfer controller acting as an autonomous processor, fetching and dispatching DMA instructions to multiple execution units. In particular, mechanisms for initiating and controlling the sequence of data transfers are provided, as are processes for autonomously fetching DMA instructions which are decoded sequentially but executed in parallel.Type: GrantFiled: July 30, 2007Date of Patent: December 1, 2009Assignee: Altera CorporationInventors: Edwin Franklin Barry, Edward A. Wolff
-
Patent number: 7627744Abstract: An integrated circuit comprises an external memory, a plurality of parallel connected Vector Processing Engines (VPEs), and an External Memory Unit (EMU) providing a data transfer path between the VPEs and the external memory. Each VPE contains a plurality of data processing units and a message queuing system adapted to transfer messages between the data processing units and other components of the integrated circuit.Type: GrantFiled: May 10, 2007Date of Patent: December 1, 2009Assignee: NVIDIA CorporationInventors: Monier Maher, Jean Pierre Bordes, Christopher Lamb, Sanjay J. Patel
-
Publication number: 20090292905Abstract: Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: performing, for each node, a local reduction operation using allreduce contribution data for the cores of that node, yielding, for each node, a local reduction result for one or more representative cores for that node; establishing one or more logical rings among the nodes, each logical ring including only one of the representative cores from each node; performing, for each logical ring, a global allreduce operation using the local reduction result for the representative cores included in that logical ring, yielding a global allreduce result for each representative core included in that logical ring; and performing, for each node, a local broadcast operation using the global allreduce results for each representative core on that node.Type: ApplicationFiled: May 21, 2008Publication date: November 26, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Ahmad Faraj
-
Patent number: 7624251Abstract: One embodiment of the present invention provides a processor that is configured to execute load-swapped-partial instructions. An instruction fetch unit within the processor is configured to fetch the load-swapped-partial instruction to be executed. Note that the load-swapped-partial instruction specifies a source address in a memory, which is possibly an unaligned address. Furthermore, an execution unit within the processor is configured to execute the load-swapped-partial instruction. This involves loading a partial-vector-sized datum from a naturally-aligned memory region encompassing the source address.Type: GrantFiled: January 18, 2007Date of Patent: November 24, 2009Assignee: Apple Inc.Inventors: Jeffry E. Gonion, Keith E. Diefendorff
-
Publication number: 20090287911Abstract: A programmable signal processing circuit has an instruction processing circuit (23, 24, 26), with an instruction set that comprises a depuncture instruction. The instruction processing circuit (23, 24, 26) forms the depuncture result by copying bit metrics from a bit metrics operand and inserting one or more predetermined bit metric values between the bit metrics from the bit metric operand in the depuncture result. The instruction processing circuit (23, 24, 26) changes the relative locations of the copied bit metrics with respect to each other in the depuncture result as compared to the relative locations of the copied bit metrics with respect to each other in the bit metric operand, to an extent needed for accommodating the inserted predetermined bit metric value or values.Type: ApplicationFiled: December 13, 2005Publication date: November 19, 2009Applicant: KONINKLIJKE PHILIPS ELECTRONICS, N.V.Inventors: Paulus W.F. Gruijters, Marcus M.G. Quax
-
Patent number: 7620797Abstract: One embodiment of the present invention provides a processor which is configured to execute load-swapped instructions, which are possibly directed to unaligned source address. The processor is configured to execute the load-swapped instruction by loading a vector from a naturally-aligned memory region encompassing the source address, and in doing so rotating the bytes of the vector to cause the byte at the specified source address to reside at the least-significant byte position within the vector for a little-endian memory transaction, or causing said byte to be positioned at the most-significant byte position within the vector for a big-endian memory transaction.Type: GrantFiled: November 1, 2006Date of Patent: November 17, 2009Assignee: Apple Inc.Inventors: Jeffry E. Gonion, Keith E. Diefendorff
-
Publication number: 20090282226Abstract: A network on chip (‘NOC’) that includes IP blocks, routers, memory communications controllers, and network interface controllers, each IP block adapted to the network by an application messaging interconnect including an inbox and an outbox, one or more of the IP blocks including computer processors supporting a plurality of threads, the NOC also including an inbox and outbox controller configured to set pointers to the inbox and outbox, respectively, that identify valid message data for a current thread; and software running in the current thread that, upon a context switch to a new thread, is configured to: save the pointer values for the current thread, and reset the pointer values to identify valid message data for the new thread, where the inbox and outbox controller are further configured to retain the valid message data for the current thread in the boxes until context switches again to the current thread.Type: ApplicationFiled: May 9, 2008Publication date: November 12, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Russell D. Hoover, Eric O. Mejdrich, Paul E. Schardt, Robert A. Shearer
-
Publication number: 20090282225Abstract: Embodiments of the present invention provide a system which executes a load instruction or a store instruction. During operation the system receives a load instruction. The system then determines if an unrestricted entry or a restricted entry in a store queue contains data that satisfies the load instruction. If not, the system retrieves data for the load instruction from a cache.Type: ApplicationFiled: May 6, 2008Publication date: November 12, 2009Applicant: SUN MICROSYSTEMS, INC.Inventors: Paul Caprioli, Martin Karlsson, Shailender Chaudhry, Gideon N. Levinsky