Processing Control Patents (Class 712/220)
  • Publication number: 20110119470
    Abstract: In a multiprocessor system, a central memory synchronization module coordinates memory synchronization requests responsive to memory access requests in flight, a generation counter, and a reclaim pointer. The central module communicates via point-to-point communication. The module includes a global OR reduce tree for each memory access requesting device, for detecting memory access requests in flight. An interface unit is implemented associated with each processor requesting synchronization. The interface unit includes multiple generation completion detectors. The generation count and reclaim pointer do not pass one another.
    Type: Application
    Filed: June 8, 2010
    Publication date: May 19, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Martin Ohmacht
  • Patent number: 7945915
    Abstract: Methods and systems for efficiently interpreting operating system service requests on the same register or vector of a processor or CPU where the operating system service requests are initiated from native and non-native applications are provided. More particularly, a switching layer can enable processing of the operating system service requests by routing control of a particular request to an appropriate kernel subsystem or module based on the type of operating system service being requested and the type of application initiating the request. Additionally, the performance impact of the switching layer for native applications is overcome by dynamically reprogramming the processor or CPU on every change of active process so that only foreign applications are subject to the processing requirements of the switching layer.
    Type: Grant
    Filed: December 12, 2006
    Date of Patent: May 17, 2011
    Assignee: Oracle America, Inc.
    Inventor: Nils A. Nieuwejaar
  • Patent number: 7941647
    Abstract: A computer. A processor pipeline alternately executes instructions coded for first and second different computer architectures or coded to implement first and second different processing conventions. A memory stores instructions for execution by the processor pipeline, the memory being divided into pages for management by a virtual memory manager, a single address space of the memory having first and second pages. A memory unit fetches instructions from the memory for execution by the pipeline, and fetches stored indicator elements associated with respective memory pages of the single address space from which the instructions are to be fetched. Each indicator element is designed to store an indication of which of two different computer architectures and/or execution conventions under which instruction data of the associated page are to be executed by the processor pipeline.
    Type: Grant
    Filed: October 31, 2007
    Date of Patent: May 10, 2011
    Assignee: ATI Technologies ULC
    Inventors: John S. Yates, Jr., David L. Reese, Korbin S. Van Dyke, T. R. Ramesh, Paul H. Hohensee
  • Patent number: 7941649
    Abstract: A SIMD processor responds to a single min/max instruction to find the minimum or maximum valued data unit in an array of data units. The determined minimum/maximum value and an associated index value thereto may be output. Alternatively, the value of a data unit in another array may be output at a corresponding location. A further single instruction executable by the SIMD processor, may be applied to results obtained using such a single min/max instruction, to allow such instructions to operate on two dimensional arrays.
    Type: Grant
    Filed: September 5, 2008
    Date of Patent: May 10, 2011
    Assignee: Broadcom Corporation
    Inventors: Richard J. Selvaggi, Larry A. Pearlstein
  • Patent number: 7941645
    Abstract: An isochronous processor includes a state register, a functional unit, a control module, and an activation unit. The state register includes an arm buffer and an active buffer. The functional unit performs a transformation operation on the data stream in response to an active value of the control parameter obtained from the active buffer. The control module updates the arm value of the control parameter in the arm buffer in response to control instructions. The activation unit detects a load event propagating with the data stream and transfers the parameter value from the arm buffer to the active buffer in response to the load event. During this transfer, the control module is inhibited from updating the arm buffer.
    Type: Grant
    Filed: July 28, 2004
    Date of Patent: May 10, 2011
    Assignee: NVIDIA Corporation
    Inventors: Duncan A. Riach, Leslie E. Neft, Michael A. Ogrinc, Wayne Douglas Young
  • Patent number: 7941648
    Abstract: A scalable reconfigurable register file (SRRF) containing multiple register files, read and write multiplexer complexes, and a control unit operating in response to instructions is described. Multiple address configurations of the register files are supported by each instruction and different configurations are operable simultaneously during a single instruction execution. For example, with separate files of the size 32×32 supported configurations of 128×32 bit s, 64×64 bit s and 32×128 bit s can be in operation each cycle. Single width, double width, quad width operands are optimally supported without increasing the register file size and without increasing the number of register file read or write ports.
    Type: Grant
    Filed: June 3, 2008
    Date of Patent: May 10, 2011
    Assignee: Altera Corporation
    Inventors: Gerald George Pechanek, Edward A. Wolff
  • Patent number: 7941646
    Abstract: A thread switch mechanism and technique for a microprocessor is disclosed wherein a processing of a first thread is completed, and a continuation of a second thread is initiated during completion of the first thread. In one form, the technique includes processing a first thread at a pipeline of a processing device, and initiating processing of a second thread at a front end of the pipeline in response to an occurrence of a context switch event. The technique can also include initiating a instruction progress metric in response the context switch event. The technique can further include enabling completion of processing of instructions of the first thread that are at a back end of the pipeline at the occurrence of the context switch event until an expiry of the instruction progress metric.
    Type: Grant
    Filed: December 31, 2007
    Date of Patent: May 10, 2011
    Assignee: Freescale Semicondoctor, Inc.
    Inventors: David C. Holloway, Michael D. Snyder, Suresh Venkumahanti
  • Publication number: 20110107067
    Abstract: A single-chip multiprocessor system and operation method of this system based on a static macro-scheduling of parallel streams for multiprocessor parallel execution. The single-chip multiprocessor system has buses for direct exchange between the processor register files and access to their store addresses and data. Each explicit parallelism architecture processor of this system has an interprocessor interface providing the synchronization signals exchange, data exchange at the register file level and access to store addresses and data of other processors. The single-chip multiprocessor system uses ILP to increase the performance. Synchronization of the streams parallel execution is ensured using special operations setting a sequence of streams and stream fragments execution prescribed by the program algorithm.
    Type: Application
    Filed: January 10, 2011
    Publication date: May 5, 2011
    Applicant: Elbrus International
    Inventors: Boris A. Babaian, Yuli Kh. Sakhin, Vladimir Yu. Volkonskiy, Sergey A. Rozhkov, Vladimir V. Tikhorsky, Feodor A. Gruzdov, Leonid N. Nazarov, Mikhail L. Chudakov
  • Publication number: 20110107061
    Abstract: A hardware pipeline has a number of rows including a first row, a last row, and an intermediate row between the first row and the last row. Each row stores a number of bytes of data as the data moves through the pipeline on a row-by-row basis from the first row towards the last row. A mechanism performs a first macro on the data beginning at the first row. The mechanism performs a second macro different than the first macro on the data beginning at the intermediate row where the first macro has been completely performed when the data has reached the intermediate row. The first and second macros each include a number of modifications of the data as the data moves through the pipeline to effect a complete transformation of the data. The complete transformation of the first macro is different than the complete transformation of the second data.
    Type: Application
    Filed: October 30, 2009
    Publication date: May 5, 2011
    Inventor: David A. Warren
  • Publication number: 20110106870
    Abstract: A system and method for providing non-deterministic data for processes executed by non-synchronized processor elements of a fault resilient system is discussed. The steps of the method comprise receiving a request for getting non-deterministic data from a requesting processor element; assigning non-deterministic data generated by an entropy source to the request; and supplying the non-deterministic data assigned to the request, to the requesting processor element.
    Type: Application
    Filed: October 28, 2010
    Publication date: May 5, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Silvio Dragone, Tamas Visegrady, Vincenzo Condorelli
  • Publication number: 20110107066
    Abstract: Accelerator functions are cascaded, such that a result of one accelerator function is directly forwarded to another accelerator function, bypassing the processor requesting the functions to be performed. The cascading may be provided during compilation of a program specifying the functions to be performed, but can be dynamically reversed during runtime of the program.
    Type: Application
    Filed: October 30, 2009
    Publication date: May 5, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Rajaram B. Krishnamurthy, Thomas A. Gregg
  • Publication number: 20110107031
    Abstract: A method, programmed medium and system are provided for enabling a core's cache capacity to be increased by using the caches of the disabled or non-enabled cores on the same chip. Caches of disabled or non-enabled cores on a chip are made accessible to store cachelines for those chip cores that have been enabled, thereby extending cache capacity of enabled cores.
    Type: Application
    Filed: November 5, 2009
    Publication date: May 5, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Vaijayanthimala K. Anand, Diane Garza Flemming, William A. Maron, Mysore Sathyanarayana Srinivas
  • Publication number: 20110107065
    Abstract: A data processing device includes an interconnect controller operable to manage the communication of information between modules of the data processing device via an interconnect. In response to a transaction request the interconnect controller selects a tag value from a set of available tag values, assigns the tag to the transaction and reserves the tag value so that it is unavailable for assignment to other transactions. If an expected response to the transaction request is not received within a designated amount of time, the transaction enters a timed-out state and the interconnect controller locks the tag value, so that it remains unavailable for assignment to other transactions until an unlock event, such as a request from software.
    Type: Application
    Filed: October 29, 2009
    Publication date: May 5, 2011
    Applicant: FREESCALE SEMICONDUCTOR, INC.
    Inventors: Gus P. Ikonomopoulos, Thang Q. Nguyen, Jose M. Nunez, Kun Xu
  • Patent number: 7937567
    Abstract: Parallelism in a parallel processing subsystem is exploited in a scalable manner. A problem to be solved can be hierarchically decomposed into at least two levels of sub-problems. Individual threads of program execution are defined to solve the lowest-level sub-problems. The threads are grouped into one or more thread arrays, each of which solves a higher-level sub-problem. The thread arrays are executable by processing cores, each of which can execute at least one thread array at a time. Thread arrays can be grouped into grids of independent thread arrays, which solve still higher-level sub-problems or an entire problem. Thread arrays within a grid, or entire grids, can be distributed across all of the available processing cores as available in a particular system implementation.
    Type: Grant
    Filed: November 1, 2006
    Date of Patent: May 3, 2011
    Assignee: Nvidia Corporation
    Inventors: John R. Nickolls, Stephen D. Lew
  • Patent number: 7934075
    Abstract: A computer array (10) has a plurality of computers (12). The computers (12) communicate with each other asynchronously and operate in a generally asynchronous manner internally. When one computer (12) attempts to communicate with another it goes to sleep until the other computer (12) is ready to complete the transaction, thereby saving power and reducing heat production. The instructions executed by the computers (12) can include a micro-loop (100) which is capable of performing a series of operations repeatedly. In one application, the sleeping computer (12) is awakened by an input such that it commences an action that would otherwise required an interrupt of an otherwise active computer. For example, one computer (12f) can be used to monitor an input/output port of the computer array (10).
    Type: Grant
    Filed: May 26, 2006
    Date of Patent: April 26, 2011
    Assignee: VNS Portfolio LLC
    Inventors: Charles H. Moore, Jeffrey Arthur Fox, John W. Rible
  • Patent number: 7932911
    Abstract: A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.
    Type: Grant
    Filed: October 31, 2007
    Date of Patent: April 26, 2011
    Assignee: MicroUnity Systems Engineering, Inc.
    Inventors: Craig Hansen, John Moussouris, Alexia Massalin
  • Patent number: 7934031
    Abstract: An asynchronous logic family of circuits which communicate on delay-insensitive flow-controlled channels with 4-phase handshakes and 1 of N encoding, compute output data directly from input data using domino logic, and use the state-holding ability of the domino logic to implement pipelining without additional latches.
    Type: Grant
    Filed: May 11, 2006
    Date of Patent: April 26, 2011
    Assignee: California Institute of Technology
    Inventors: Andrew M. Lines, Alain J. Martin, Uri Cummings
  • Patent number: 7934079
    Abstract: An instruction issue method for use in a pipelined processor, comprising the steps of: decoding an instruction to be processed to get a type of the instruction; computing the number of cycles to be occupied at execution stage for the instruction, according to the type of the instruction; marking a target operand of the instruction as acquirable in a predefined cycle before the instruction enters write-back stage, according to the number of cycles, so that subsequent instructions taking the target operand as their source operands perform subsequent operations according to the case that the target operand is acquirable.
    Type: Grant
    Filed: January 10, 2006
    Date of Patent: April 26, 2011
    Assignee: NXP B.V.
    Inventor: Xia Zhu
  • Patent number: 7934080
    Abstract: Embodiments of the present invention provide a processor that merges stores in an N-entry first-in-first-out (FIFO) store queue. In these embodiments, the processor starts by executing instructions before a checkpoint is generated. When executing instructions before the checkpoint is generated, the processor is configured to perform limited or no merging of stores into existing entries in the store queue. Then, upon detecting a predetermined condition, the processor is configured to generate a checkpoint. After generating the checkpoint, the processor is configured to continue to execute instructions. When executing instructions after the checkpoint is generated, the processor is configured to freely merge subsequent stores into post-checkpoint entries in the store queue.
    Type: Grant
    Filed: May 28, 2008
    Date of Patent: April 26, 2011
    Assignee: Oracle America, Inc.
    Inventors: Paul Caprioli, Martin Karlsson, Gideon N. Levinsky, Khondakar A. Mujtaba, Shailender Chaudhry, Murali K. Inaganti
  • Patent number: 7932910
    Abstract: A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.
    Type: Grant
    Filed: August 20, 2007
    Date of Patent: April 26, 2011
    Assignee: MicroUnity Systems Engineering, Inc.
    Inventors: Craig Hansen, John Moussouris, Alexia Massalin
  • Publication number: 20110093683
    Abstract: A data processing apparatus includes a data engine 6 having an instruction decoder 18 for generating one or more control signals 24 for controlling processing circuitry 20 to perform data processing operations specified by the program instructions decoded. The instruction decoder 18 responsive to a marker instruction to read a programmable flow control value from a flow control register 38. The programmable flow control value specifies the action to be taken upon completion of execution of a current sequence of program instructions. The action taken may be jumping to a target program instruction at the start of a target sequence of program instructions or entry into an idle state awaiting a new processing task to be initiated.
    Type: Application
    Filed: September 7, 2010
    Publication date: April 21, 2011
    Applicant: ARM LIMITED
    Inventors: Merlijn Aurich, Jef Verdonck
  • Publication number: 20110093685
    Abstract: A data processing circuit is disclosed in the present invention. The data processing circuit includes a decoder and a number of N-stage circuits. The circuits receive input data from at least a memory and separate the input data into N stages. The circuit process and store the N input data simultaneously to decrease the time of data processing in the data processing circuit.
    Type: Application
    Filed: September 8, 2010
    Publication date: April 21, 2011
    Inventors: Chien-Chou CHEN, Ming-Sung Huang, Wen Min Lu
  • Patent number: 7930521
    Abstract: Methods and apparatus are provided for reducing the amount of resources allocated for handling multiplexing in a processor. Characteristics associated with processing blocks are analyzed. Operand restrictions and register groups can be configured to allow the use of more resource efficient multiplexing circuitry in a processor.
    Type: Grant
    Filed: June 11, 2008
    Date of Patent: April 19, 2011
    Assignee: Altera Corporation
    Inventor: Paul Metzgen
  • Patent number: 7930522
    Abstract: A method for speculative execution of instructions, the method includes: decoding a compare instruction; speculatively executing, in a continuous manner, conditional instructions that are conditioned by a condition that is related to a resolution of the compare instruction and are decoded during a speculation window that starts at the decoding of the compare instruction and ends when the compare instruction is resolved; and stalling an execution of a non-conditional instruction that is dependent upon an outcome of at least one of the conditional instructions, until the speculation window ends.
    Type: Grant
    Filed: August 19, 2008
    Date of Patent: April 19, 2011
    Assignee: Freescale Semiconductor, Inc.
    Inventors: Guy Shumeli, Itzhak Barak, Uri Dayan, Amir Paran, Idan Rozenberg, Doron Schupper
  • Publication number: 20110087864
    Abstract: One embodiment of the present invention sets forth a technique for providing state information to one or more shader engines within a processing pipeline. State information received from an application accessing the processing pipeline is stored in constant buffer memory accessible to each of the shader engines. The shader engines can then retrieve the state information during execution.
    Type: Application
    Filed: October 6, 2010
    Publication date: April 14, 2011
    Inventors: Jerome F. DULUK, JR., Jesse David Hall
  • Publication number: 20110087863
    Abstract: In an apparatus which includes a plurality of processing modules connected via a ring-shape bus, if a plurality pieces of pipeline processing to be processed in a different order is allocated to a plurality of processing modules, the transfer efficiency may decrease when an amount of data transferred from one of the processing modules to a post-stage module exceeds a processing capacity of the post-stage module. Accordingly, a module positioned on the preceding side in the pipeline processing controls a transmission interval of processed data so that the post-stage module can receive the data processed by the preceding module.
    Type: Application
    Filed: October 4, 2010
    Publication date: April 14, 2011
    Applicant: CANON KABUSHIKI KAISHA
    Inventors: Hiroyasu Watanabe, Hirowo Inoue, Hisashi Ishikawa
  • Patent number: 7925471
    Abstract: Brings the response time of a Web server and the like closer to a targeted value. A controller controlling the average response time elapsed between reception by information processing apparatus of a processing request and response of information processing apparatus to the processing request. The controller including: a section for obtaining a response time goal which is a target value of the average response time; a section for calculating a predicted response time which is a predicted value of the average response time at the time point when a predetermined reference period has elapsed from setting an operation mode in the information processing apparatus, the operation mode being any of a plurality of operation modes which provide different throughputs; and a section for setting the operation mode in the information processing apparatus if predicted response time calculated by the predicted response time calculating section is less than goal.
    Type: Grant
    Filed: August 12, 2008
    Date of Patent: April 12, 2011
    Assignee: International Business Machines Corporation
    Inventors: Takuya Nakaike, Hideaki Komatsu
  • Patent number: 7925869
    Abstract: A system and method for enabling multithreading in a embedded processor, invoking zero-time context switching in a multithreading environment, scheduling multiple threads to permit numerous hard-real time and non-real time priority levels, fetching data and instructions from multiple memory blocks in a multithreading environment, and enabling a particular thread to modify the multiple states of the multiple threads in the processor core.
    Type: Grant
    Filed: December 21, 2000
    Date of Patent: April 12, 2011
    Assignee: Ubicom, Inc.
    Inventors: Nicholas J Kelsey, Christopher J Waters, Tibet Mimaroglu, David A Fotland
  • Patent number: 7921279
    Abstract: Result and operand forwarding is provided between differently sized operands in a superscalar processor by grouping a first set of instructions for operand forwarding, and grouping a second set of instructions for result forwarding, the first set of instructions comprising a first source instruction having a first operand and a first dependent instruction having a second operand, the first dependent instruction depending from the first source instruction; the second set of instructions comprising a second source instruction having a third operand and a second dependent instruction having a fourth operand, the second dependent instruction depending from the second source instruction, performing operand forwarding by forwarding the first operand, either whole or in part, as it is being read to the first dependent instruction prior to execution; performing result forwarding by forwarding a result of the second source instruction, either whole or in part, to the second dependent instruction, after execution; wher
    Type: Grant
    Filed: March 19, 2008
    Date of Patent: April 5, 2011
    Assignee: International Business Machines Corporation
    Inventors: David S. Hutton, Fadi Y. Busaba, Bruce C. Giamei, Christopher A. Krygowski, Edward T. Malley, Jeffrey S. Plate, John G. Rell, Jr., Chung-Lung Kevin Shum, Timothy J. Slegel
  • Publication number: 20110078486
    Abstract: Methods and apparatus relating to dynamic selection of execution stage are described. In some embodiments, logic may determine whether to execute an instruction at one of a plurality of stages in a processor. In some embodiments, the plurality of stages are to at least correspond to an address generation stage or an execution stage of the instruction. Other embodiments are also described and claimed.
    Type: Application
    Filed: September 30, 2009
    Publication date: March 31, 2011
    Inventors: Deepak Limaye, Kulin N. Kothari, James D. Allen, James E. Phillips
  • Publication number: 20110078418
    Abstract: One embodiment of the present invention sets forth a method for executing a non-local return instruction in a parallel thread processor. The method comprises the steps of receiving, within the thread group, a first long jump instruction and, in response, popping a first token from the execution stack. The method also comprises determining whether the first token is a first long jump token that was pushed onto the execution stack when a first push instruction associated with the first long jump instruction was executed, and when the first token is the first long jump token, jumping to the second instruction based on the address specified by the first long jump token, or, when the first token is not the first long jump token, disabling the active thread until the first long jump token is popped from the execution stack.
    Type: Application
    Filed: September 13, 2010
    Publication date: March 31, 2011
    Inventors: Guillermo Juan Rozas, Brett W. Coon
  • Publication number: 20110078419
    Abstract: A measurement sampling facility takes snapshots of the central processing unit (CPU) on which it is executing at specified sampling intervals to collect data relating to tasks executing on the CPU. The collected data is stored in a buffer, and at selected times, an interrupt is provided to remove data from the buffer to enable reuse thereof. The interrupt is not taken after each sample, but in sufficient time to remove the data and minimize data loss.
    Type: Application
    Filed: December 7, 2010
    Publication date: March 31, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jane H. Bartik, Lisa Cranton Heller, Damian L. Osisek, Donald W. Schmidt, Patrick M. West, JR., Phil C. Yeh
  • Patent number: 7917910
    Abstract: Briefly, techniques to manage interrupts and swaps of threads operating in critical region. In an embodiment, a thread is to be interrupted during a first critical region with an interrupt routine. The thread may be set to restart at a beginning of the first critical region in response to an indication that the thread is working in a critical region. Other embodiments are also claimed and disclosed.
    Type: Grant
    Filed: March 26, 2004
    Date of Patent: March 29, 2011
    Assignee: Intel Corporation
    Inventor: Joseph S. Cavallo
  • Patent number: 7917659
    Abstract: The invention relates to a method for computer signal processing data and command transfer over an interface and more particularly to a communication between peripheral firmware and a host processor or Basic Input/Output System (BIOS) on a Peripheral Component Interconnect (PCI) bus. In one embodiment, a device and method for reducing the load on the PCI Bus is described. In yet another embodiment, a device and method is described for constructing a variable length command block comprising message frames and aligning all message frames for a particular command block that are contiguous in memory.
    Type: Grant
    Filed: March 1, 2005
    Date of Patent: March 29, 2011
    Assignee: LSI Corporation
    Inventors: Parag Maharana, Basavaraj Hallyal, Senthil Murugan Thangaraj, Gurpreet Singh Anand
  • Publication number: 20110072247
    Abstract: Methods, systems, and computer program products for implementing fast application programmable timers are provided. A computer program product includes a tangible storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method. The method includes receiving a request to set a user accessible timer, the request received from an application thread. The user accessible timer is set in response to receiving the request, the setting including initializing a counter. The counter is decremented until an interrupt threshold has been reached. An interrupt signal is transmitted to the application thread in response to detecting that the interrupt threshold has been reached.
    Type: Application
    Filed: September 21, 2009
    Publication date: March 24, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Hubertus Franke, James Xenidis, Terry L. Nelms, II, Hollis R. Blanchard
  • Publication number: 20110072246
    Abstract: A node control device is interposed between processor nodes and IO nodes in an information processing system, wherein each IO node subordinates at least one IO device. The node control device includes a register storing a base address of a mapping destination of an IO space, a table describing a plurality of entries retaining a plurality of IO space numbers and address ranges, and an IO space access detection circuit. The table stores an identification flag as to whether or not IO spaces are each mapped onto a memory space. The IO space access detection circuit decodes a command code and an address of an FRTT signal output from a processor node, thus detecting a target IO space and detecting whether the processor node is accessing an IO space mapped onto the memory space or another IO space.
    Type: Application
    Filed: September 20, 2010
    Publication date: March 24, 2011
    Inventor: YOSHIHISA YAMADA
  • Publication number: 20110072245
    Abstract: A method for providing state inheritance across command lists in a multi-threaded processing environment. The method includes receiving an application program that includes a plurality of parallel threads; generating a command list for each thread of the plurality of parallel threads; causing a first command list associated with a first thread of the plurality of parallel threads to be executed by a processing unit; and causing a second command list associated with a second thread of the plurality of parallel threads to be executed by the processing unit, where the second command list inherits from the first command list state associated with the processing unit.
    Type: Application
    Filed: August 25, 2010
    Publication date: March 24, 2011
    Inventors: Jerome F. DULUK, JR., Jesse David Hall, Henry Packard Moreton, Patrick R. Brown
  • Publication number: 20110072211
    Abstract: A method for providing state inheritance across command lists in a multi-threaded processing environment. The method includes receiving an application program that includes a plurality of parallel threads; generating a command list for each thread of the plurality of parallel threads; causing a first command list associated with a first thread of the plurality of parallel threads to be executed by a processing unit; and causing a second command list associated with a second thread of the plurality of parallel threads to be executed by the processing unit, where the second command list inherits from the first command list state associated with the processing unit.
    Type: Application
    Filed: August 9, 2010
    Publication date: March 24, 2011
    Inventors: Jerome F. DULUK, JR., Jesse David Hall, Henry Packard Moreton, Patrick R. Brown
  • Patent number: 7913067
    Abstract: A system and method for overlapping execution (OE) of instructions through non-uniform execution pipelines in an in-order processor are provided. The system includes a first execution unit to perform instruction execution in a first execution pipeline. The system also includes a second execution unit to perform instruction execution in a second execution pipeline, where the second execution pipeline includes a greater number of stages than the first execution pipeline. The system further includes an instruction dispatch unit (IDU), the IDU including OE registers and logic for dispatching an OE-capable instruction to the first execution unit such that the instruction completes execution prior to completing execution of a previously dispatched instruction to the second execution unit. The system additionally includes a latch to hold a result of the execution of the OE-capable instruction until after the second execution unit completes the execution of the previously dispatched instruction.
    Type: Grant
    Filed: February 20, 2008
    Date of Patent: March 22, 2011
    Assignee: International Business Machines Corporation
    Inventors: David S. Hutton, Khary J. Alexander, Fadi Y. Busaba, Bruce C. Giamei, John G. Rell, Jr., Eric M. Schwarz, Chung-Lung Kevin Shum
  • Publication number: 20110066828
    Abstract: Techniques are generally described for mapping a thread onto heterogeneous processor cores. Example techniques may include associating the thread with one or more predefined execution characteristic(s), assigning the thread to one or more heterogeneous processor core(s) based on the one or more predefined execution characteristic(s), and/or executing the thread by the respective assigned heterogeneous processor core(s).
    Type: Application
    Filed: September 11, 2009
    Publication date: March 17, 2011
    Inventors: Andrew Wolfe, Thomas M. Conte
  • Patent number: 7908464
    Abstract: A functional-level instruction-set computing (FLIC) architecture executes higher-level functional instructions such as lookups and bit-compares of variable-length operands. Each FLIC processing-engine slice has specialized processing units including a lookup unit that searches for a matching entry in a lookup cache. Variable-length operands are stored in execution buffers. The operand length and location in the execution buffer are stored in fixed-length general-purpose registers (GPRs) that also store fixed-length operands. A copy/move unit moves data between input and output buffers and one or more FLIC processing-engine slices. Multiple contexts can each have a set of GPRs and execution buffers. An expansion buffer in a FLIC slice can be allocated to a context to expand that context's execution buffer for storing longer operands.
    Type: Grant
    Filed: July 31, 2007
    Date of Patent: March 15, 2011
    Assignee: Alacritech, Inc.
    Inventors: Millind Mittal, Mehul Kharidia, Tarun Kumar Tripathy, J. Sukarno Mertoguno
  • Publication number: 20110060891
    Abstract: A parallel processing data processing system builds at least one data structure indicating a communication schedule for a plurality of processes each having a respective one of a plurality of equal length vectors formed of multiple equal size chunks. The data processing system, based upon the at least one data structure, communicates chunks of the plurality of vectors among the plurality of processes and performs partial reduction operations on chunks in accordance with the communication schedule. The data processing system then stores a result vector representing reduction of the plurality of vectors.
    Type: Application
    Filed: September 4, 2009
    Publication date: March 10, 2011
    Applicant: International Business Machines Corporation
    Inventor: BIN JIA
  • Publication number: 20110055487
    Abstract: In one embodiment, the present invention includes a method to obtain topology information regarding a system including at least one multicore processor, provide the topology information to a plurality of parallel processes, generate a topological map based on the topology information, access the topological map to determine a topological relationship between a sender process and a receiver process, and select a given memory copy routine to pass a message from the sender process to the receiver process based at least in part on the topological relationship. Other embodiments are described and claimed.
    Type: Application
    Filed: March 31, 2008
    Publication date: March 3, 2011
    Inventors: Sergey I. Sapronov, Alexey V. Bayduraev, Alexander V. Supalov, Vladimir D. Truschin, Igor Ermolaev, Dmitry Mishura
  • Patent number: 7900024
    Abstract: Mechanisms for handling data cache misses out-of-order for asynchronous pipelines are provided. The mechanisms associate load tag (LTAG) identifiers with the load instructions and uses them to track the load instruction across multiple pipelines as an index into a load table data structure of a load target buffer. The load table is used to manage cache “hits” and “misses” and to aid in the recycling of data from the L2 cache. With cache misses, the LTAG indexed load table permits load data to recycle from the L2 cache in any order. When the load instruction issues and sees its corresponding entry in the load table marked as a “miss,” the effects of issuance of the load instruction are canceled and the load instruction is stored in the load table for future reissuing to the instruction pipeline when the required data is recycled.
    Type: Grant
    Filed: October 17, 2008
    Date of Patent: March 1, 2011
    Assignee: International Business Machines Corporation
    Inventors: Christopher M. Abernathy, Jeffrey P. Bradford, Ronald P. Hall, Timothy H. Heil, David Shippy
  • Publication number: 20110047357
    Abstract: Efficient techniques are described for not executing an issued conditional non-branch instruction. A conditional non-branch instruction is identified as being eligible for a prediction, the prediction indicating that the eligible conditional non-branch (ECNB) instruction would not execute. The ECNB instruction executes as a no operation (NOP) instruction in response to the prediction that the ECNB instruction would not execute. A source operand required for the ECNB instruction to execute is not fetched in response to the prediction to not execute.
    Type: Application
    Filed: August 19, 2009
    Publication date: February 24, 2011
    Applicant: QUALCOMM INCORPORATED
    Inventors: Brian M. Stempel, James N. Dieffenderfer, Thomas A. Sartorius, David J. Mandzak, Rodney W. Smith
  • Patent number: 7895416
    Abstract: A reconfigurable integrated circuit is provided wherein the available hardware resources can be optimised for a particular application. Dynamically reconfiguring (in both real-time and non real-time) the available resources and sharing a plurality of processing elements with a plurality of controller elements achieve this. In a preferred embodiment the integrated circuit includes a plurality of processing blocks, which interface to a reconfigurable interconnection means. A processing block has two forms, namely a shared resource block and a dedicated resource block. Each processing block consists of one or a plurality of controller elements and a plurality of processing elements. The controller element and processing element generally comprise diverse rigid coarse and fine grained circuits and are interconnected through dedicated and reconfigurable interconnect. The processing blocks can be configured as a hierarchy of blocks and or fractal architecture.
    Type: Grant
    Filed: June 24, 2009
    Date of Patent: February 22, 2011
    Assignee: Akya (Holdings) Limited
    Inventors: Graeme Roy Smith, Dyson Wilkes
  • Patent number: 7895417
    Abstract: A data processing system 2 is provided including an instruction decoder 34 responsive to program instructions within an instruction register 32 to generate control signals for controlling data processing circuitry 36. The instructions supported include an address calculation instruction which splits an input address value at a position dependent upon a size value into a first portion and second portion, adds a non-zero offset value to the first portion, sets the second portion to a value and then concatenates the result of the processing on the first portion and the second portion to form the output address value. Another type of instruction supported is a select-and-insert instruction. This instruction takes a first input value and shifts it by N bit positions to form a shifted value, selects N bits from within a second input value in dependence upon the first input value and then concatenates the shifted value with the N bits to form an output value.
    Type: Grant
    Filed: April 30, 2010
    Date of Patent: February 22, 2011
    Assignee: ARM Limited
    Inventors: Dominic Hugo Symes, Daniel Kershaw, Mladen Wilder
  • Patent number: 7895423
    Abstract: Systems and methods that allow for extracting a field from data stored in a pair of registers using two instructions. A first instruction extracts any part of the field from a first register designated as a first source register, and executes a second instruction extracting any part of the field from a second general register designated as a second source register. The second instruction inserts any extracted field parts in a result register.
    Type: Grant
    Filed: August 19, 2009
    Date of Patent: February 22, 2011
    Assignee: MIPS Technologies, Inc.
    Inventors: Sol Katzman, Robert Gelinas, W. Patrick Hays
  • Patent number: 7895587
    Abstract: A single-chip multiprocessor system and operation method of this system based on a static macro-scheduling of parallel streams for multiprocessor parallel execution. The single-chip multiprocessor system has buses for direct exchange between the processor register files and access to their store addresses and data. Each explicit parallelism architecture processor of this system has an interprocessor interface providing the synchronization signals exchange, data exchange at the register file level and access to store addresses and data of other processors. The single-chip multiprocessor system uses ILP to increase the performance. Synchronization of the streams parallel execution is ensured using special operations setting a sequence of streams and stream fragments execution prescribed by the program algorithm.
    Type: Grant
    Filed: September 8, 2006
    Date of Patent: February 22, 2011
    Assignee: Elbrus International
    Inventors: Boris A. Babaian, Yuli Kh. Sakhin, Vladimir Yu. Volkonskiy, Sergey A. Rozhkov, Vladimir V. Tikhorsky, Feodor A. Gruzdov, Leonid N. Nazarov, Mikhail L. Chudakov
  • Patent number: 7895415
    Abstract: Apparatus and computing systems associated with cache sharing based thread control are described. One embodiment includes a memory to store a thread control instruction and a processor to execute the thread control instruction. The processor is coupled to the memory. The processor includes a first unit to dynamically determine a cache sharing behavior between threads in a multi-threaded computing system and a second unit to dynamically control the composition of a set of threads in the multi-threaded computing system. The composition of the set of threads is based, at least in part, on thread affinity as exhibited by cache-sharing behavior. The thread control instruction controls the operation of the first unit and the second unit.
    Type: Grant
    Filed: February 14, 2007
    Date of Patent: February 22, 2011
    Assignee: Intel Corporation
    Inventors: Antonio Gonzalez, Josep M. Codina, Pedro Lopez, Fernando Latorre, Jose-Alejandro Pineiro, Enric Gibert, Jaume Abella, Jaideep Moses, Donald Newell, Ravishankar Iyer, Ramesh G. Illikkal, Srihari Makineni