Operation Patents (Class 712/30)
  • Publication number: 20130086356
    Abstract: A method for generating a distributed data scalable adaptive map-reduce framework for at least one multi-core cluster. The method includes partitioning a cluster into at least one computational group, determining at least one key-group leader within each computational group, performing a local combine operation at each computational group, performing a global combine operation at each of the at least one key-group leader within each computational group based on a result from the local combine operation, and performing a global map-reduce operation across the at least one key-group leader within each computational group.
    Type: Application
    Filed: August 1, 2012
    Publication date: April 4, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ankur Narang, Jyothish Soman
  • Publication number: 20130073832
    Abstract: A parallel computer that includes compute nodes having computer processors and a CAU (Collectives Acceleration Unit) that couples processors to one another for data communications. In embodiments of the present invention, deterministic reduction operation include: organizing processors of the parallel computer and a CAU into a branched tree topology, where the CAU is a root of the branched tree topology and the processors are children of the root CAU; establishing a receive buffer that includes receive elements associated with processors and configured to store the associated processor's contribution data; receiving, in any order from the processors, each processor's contribution data; tracking receipt of each processor's contribution data; and reducing, the contribution data in a predefined order, only after receipt of contribution data from all processors in the branched tree topology.
    Type: Application
    Filed: November 1, 2012
    Publication date: March 21, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: International Business Machines Corporation
  • Publication number: 20130067198
    Abstract: A parallel computer is provided that includes a collection of compute nodes organized as a tree, including: initiating a collective gather operation by a logical root of the collection of compute nodes, including adding result data of the logical root to a gather buffer; for each compute node in the collection of compute nodes, determining whether result data of the compute node is already written in the gather buffer; and if the result data of the compute node is already written in the gather buffer, incrementing a counter assigned to that result data already written in the gather buffer; and if the result data of the compute node is not already written in the gather buffer, writing the result data of the compute node as new result data in the gather buffer, incrementing a counter assigned to that new result data, and writing in the gather buffer a node ID.
    Type: Application
    Filed: November 1, 2012
    Publication date: March 14, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: International Business Machines Corporation
  • Publication number: 20130060555
    Abstract: Methods and apparatus for controlling at least two processing cores in a multi-processor device or system include accessing an operating system run queue to generate virtual pulse trains for each core and correlating the virtual pulse trains to identify patterns of interdependence. The correlated information may be used to determine dynamic frequency/voltage control settings for the first and second processing cores to provide a performance level that accommodates interdependent processes, threads and processing cores.
    Type: Application
    Filed: February 27, 2012
    Publication date: March 7, 2013
    Applicant: QUALCOMM INCORPORATED
    Inventors: Steven S. Thomson, Edoardo Regini, Mriganka Mondal, Nishant Hariharan
  • Publication number: 20130061078
    Abstract: A computing apparatus and corresponding method for operating are disclosed. The computing apparatus may comprise a set of interconnected central processing units (CPUs). Each CPU may embed an operating system including a kernel comprising a protocol stack. At least one of the CPUs may further embed executable instructions for allocating multiple strands among the rest of the CPUs. The protocol stack may comprise a Transmission Control Protocol/Internet Protocol (TCP/IP), a User Datagram Protocol/Internet Protocol (UDP/IP) stack, an Internet Control Message Protocol (ICMP) stack or any other suitable Internet protocol. The method for operating the computing apparatus may comprise receiving input/output (I/O) requests, generating multiple strands according to the I/O requests, and allocating the multiple strands to one or more CPUs.
    Type: Application
    Filed: December 21, 2011
    Publication date: March 7, 2013
    Inventor: Ian Henry Stuart Cullimore
  • Patent number: 8381216
    Abstract: Dynamically managing a thread pool associated with a plurality of sub-applications. A request for at least one of the sub-applications is received. A quantity of threads currently assigned to the at least one of the sub-applications is determined. The determined quantity of threads is compared to a predefined maximum thread threshold. A thread in the thread pool is assigned to handle the received request if the determined quantity of threads is not greater than the predefined maximum thread threshold. Embodiments enable control of the quantity of threads within the thread pool assigned to each of the sub-applications. Further embodiments manage the threads for the sub-applications based on latency of the sub-applications.
    Type: Grant
    Filed: March 5, 2010
    Date of Patent: February 19, 2013
    Assignee: Microsoft Corporation
    Inventor: Rohith Thammana Gowda
  • Publication number: 20130042088
    Abstract: Collective operation protocol selection in a parallel computer that includes compute nodes may be carried out by calling a collective operation with operating parameters; selecting a protocol for executing the operation and executing the operation with the selected protocol. Selecting a protocol includes: iteratively, until a prospective protocol meets predetermined performance criteria: providing, to a protocol performance function for the prospective protocol, the operating parameters; determining whether the prospective protocol meets predefined performance criteria by evaluating a predefined performance fit equation, calculating a measure of performance of the protocol for the operating parameters; determining that the prospective protocol meets predetermined performance criteria and selecting the protocol for executing the operation only if the calculated measure of performance is greater than a predefined minimum performance threshold.
    Type: Application
    Filed: August 9, 2011
    Publication date: February 14, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
  • Patent number: 8375197
    Abstract: Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: performing, for each node, a local reduction operation using allreduce contribution data for the cores of that node, yielding, for each node, a local reduction result for one or more representative cores for that node; establishing one or more logical rings among the nodes, each logical ring including only one of the representative cores from each node; performing, for each logical ring, a global allreduce operation using the local reduction result for the representative cores included in that logical ring, yielding a global allreduce result for each representative core included in that logical ring; and performing, for each node, a local broadcast operation using the global allreduce results for each representative core on that node.
    Type: Grant
    Filed: May 21, 2008
    Date of Patent: February 12, 2013
    Assignee: International Business Machines Corporation
    Inventor: Ahmad Faraj
  • Patent number: 8370605
    Abstract: A system includes first and second processors, first and second graphics processing units (GPUs), one or more peripheral devices, a switch matrix, and processor-readable memory. The switch matrix comprises programmable data paths between the processors, the GPUs, and the peripheral devices. Software encoded in the process-readable memory includes a first operating system (OS) executed by the first processor, a second OS executed by the second processor, a matrix scheduling engine, and a media interface switch (MIS) engine. The first OS boots faster than the second OS. The matrix scheduling engine runs on both OSs and configures the data paths in the switch matrix to couple the processors and the GPUs, and to couple the processors and the peripheral devices. The MIS engine runs on the operating systems, detects presence of the peripheral devices, and configures the data paths in the switch matrix to couple the processors and the peripheral devices.
    Type: Grant
    Filed: November 11, 2009
    Date of Patent: February 5, 2013
    Assignee: Sunman Engineering, Inc.
    Inventors: Allen Nejah, Gholam Reza Golshan, George W. Harvey
  • Publication number: 20130031335
    Abstract: Techniques are described for transmitting predicted output data on a processing element in a stream computing application instead of processing currently received input data. The stream computing application monitors the output of a processing element and determines whether its output is predictable, for example, if the previously transmitted output values are within a predefined range or if one or more input values correlate with the same one or more output values. The application may then generate a predicted output value to transmit from the processing element instead of transmitting a processed output value based on current input values. The predicted output value may be, for example, an average of the previously transmitted output values or a previously transmitted output value that was transmitted in response to a previously received input value that is similar to a currently received input value.
    Type: Application
    Filed: July 26, 2011
    Publication date: January 31, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: John M. Santosuosso, Brandon W. Schulz
  • Publication number: 20130031334
    Abstract: A mechanism is provided for automatically routing network interconnects in a data processing system. A processor in a node of a plurality of nodes receives network topology from neighboring nodes in the plurality of nodes within the data processing system. The processor constructs a system node map that identifies a physical connectivity between the node and the neighboring nodes. The processor programs a switch in the node with a connectivity map that indicates a set of point-to-point connections with the neighboring nodes. The set of point-to-point connections comprise locally-connected connections and pass-through connections.
    Type: Application
    Filed: July 25, 2011
    Publication date: January 31, 2013
    Applicant: International Business Machines Corporation
    Inventors: Wael R. El-Essawy, David A. Papa, Jarrod A. Roy
  • Publication number: 20130031336
    Abstract: An external intrinsic interface. A processor may include a core including a plurality of functional units, an intrinsic module located outside the core, and an interface module to perform relaying between the intrinsic module and a functional unit, among the plurality of functional units.
    Type: Application
    Filed: February 16, 2012
    Publication date: January 31, 2013
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: Kwon Taek KWON, Seok Yoon Jung
  • Publication number: 20130024659
    Abstract: In a logically partitioned host computer system comprising host processors (host CPUs) partitioned into a plurality of guest processors (guest CPUs) of a guest configuration, a perform topology function instruction is executed by a guest processor specifying a topology change of the guest configuration. The topology change preferably changes the polarization of guest CPUs, the polarization related to the amount of a host CPU resource is provided to a guest CPU.
    Type: Application
    Filed: September 27, 2012
    Publication date: January 24, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: International Business Machines Corporation
  • Publication number: 20130024658
    Abstract: Technology to suppress the drop in SIMD processor efficiency that occurs when exchanging two-dimensional data in a plurality of rectangular regions, between an external section and a plurality of processor elements in an SIMD processor, so that one rectangular region corresponds to one processor element. In the SIMD processor, an address storage unit in a memory controller is capable of setting N number of addresses Ai (i=1 through N) in an external memory by utilizing a control processor. A parameter storage unit is capable of setting a first parameter OSV, a second parameter W, and a third parameter L by utilizing a control processor. A data transfer unit executes the transfer of data between an external memory, and the buffers in N number of processor elements contained in the applicable SIMD processor, based on the contents of the address storage unit and the parameter storage unit.
    Type: Application
    Filed: July 3, 2012
    Publication date: January 24, 2013
    Inventor: Shorin KYO
  • Publication number: 20130013839
    Abstract: A portable handheld device including a CPU for processing a script; a multi-core processor for processing an image; an input buffer for receiving data for processing by the multi-core processor, the input buffer being provided under the control of the multi-core processor to send data thereto; and an output buffer for receiving data processed by the multi-core processor, the output buffer being provided under the control of the multi-core processor to receive data therefrom. The multi-core processor comprises a plurality of micro-coded processing units. The CPU is configured with authority to clear and query the input and output buffers.
    Type: Application
    Filed: September 15, 2012
    Publication date: January 10, 2013
    Inventor: Kia Silverbrook
  • Publication number: 20130013891
    Abstract: A hierarchical barrier synchronization of cores and nodes on a multiprocessor system, in one aspect, may include providing by each of a plurality of threads on a chip, input bit signal to a respective bit in a register, in response to reaching a barrier; determining whether all of the plurality of threads reached the barrier by electrically tying bits of the register together and “AND”ing the input bit signals; determining whether only on-chip synchronization is needed or whether inter-node synchronization is needed; in response to determining that all of the plurality of threads on the chip reached the barrier, notifying the plurality of threads on the chip, if it is determined that only on-chip synchronization is needed; and after all of the plurality of threads on the chip reached the barrier, communicating the synchronization signal to outside of the chip, if it is determined that inter-node synchronization is needed.
    Type: Application
    Filed: September 13, 2012
    Publication date: January 10, 2013
    Applicant: International Business Machines Corporation
    Inventors: Valentina Salapura, Robert W. Wisniewski
  • Publication number: 20130007412
    Abstract: A method, system, and computer program product for maintaining reliability in a computer system. In an example embodiment, the method includes managing workloads on a first processor with a first processor architecture by an agent process executing on a second processor with a second processor architecture. The method proceeds by activating redundant computation on the second processor by the agent process. The method continues by performing a same computation from a workload of the workloads at least twice. Finally, the method includes comparing results of the same computation. In this embodiment the first processor is coupled the second processor by a network, and the first processor architecture and second processor architecture are different architectures.
    Type: Application
    Filed: June 28, 2011
    Publication date: January 3, 2013
    Applicant: International Business Machines Corporation
    Inventors: Rajaram B. Krishnamurthy, Carl J. Parris, Donald W. Schmidt, Benjamin P. Segal
  • Publication number: 20130007413
    Abstract: Methods and apparatus for accomplishing dynamic frequency/voltage control between at least two processor cores in a multi-processor device or system include receiving busy, idle and wait, time and/or frequency information from a first processor core and receiving busy, idle, wait, time and/or frequency information from a second processor core. The received busy, idle, wait, time and/or frequency information may be correlated to identify patterns of interdependence. The correlated information may be used to determine dynamic frequency/voltage control settings for the first and second processor cores to provide a performance level that accommodates interdependent processes, threads and processor cores. The correlation of received busy, idle, wait, time and/or frequency information may involve generating a consolidated busy/idle pulse train that can then be used to set the frequency or voltage of each processor core independently.
    Type: Application
    Filed: January 5, 2012
    Publication date: January 3, 2013
    Applicant: QUALCOMM INCORPORATED
    Inventors: Steven S. Thomson, Mriganka Mondal, Nishant Hariharan
  • Publication number: 20120331270
    Abstract: Compressing result data for a compute node in a parallel computer, the parallel computer including a collection of compute nodes organized as a tree, including: initiating a collective gather operation by a logical root of the collection of compute nodes, including adding result data of the logical root to a gather buffer; for each compute node in the collection of compute nodes, determining whether result data of the compute node is already written in the gather buffer; and if the result data of the compute node is already written in the gather buffer, incrementing a counter assigned to that result data already written in the gather buffer; and if the result data of the compute node is not already written in the gather buffer, writing the result data of the compute node as new result data in the gather buffer, incrementing a counter assigned to that new result data, and writing in the gather buffer a node ID.
    Type: Application
    Filed: June 22, 2011
    Publication date: December 27, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Charles J. Archer, James E. Carey, Matthew W. Markland, Philip J. Sanders
  • Publication number: 20120324166
    Abstract: A computer-implemented method for managing processing resources of a computerized system having at least a first processor and a second processor, each of the processors operatively interconnected to a memory storing a set of data to be processed by a processor, the method comprising: monitoring data accessed by the first processor while executing; and if the second processor is at a shorter distance than the first processor from the monitored data, instructing to interrupt execution at the first processor and resume the execution at the second processor.
    Type: Application
    Filed: August 30, 2012
    Publication date: December 20, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Hillery C Hunter, Ronald P. Luijten, Phillip Stanley-Marbell
  • Publication number: 20120317398
    Abstract: A method to reduce buffer capacity in a processor includes giving the data packets admittance to the processor through at least one interface, storing the data packets in at least one input buffer, and using a packet rate shaper outside of a processing pipeline to control flow of the data packets to the pipeline before the data packets enter the pipeline. First and second data packets are given admittance to the pipeline in dependence on cost information per packet that is dependent upon an expected time period of residence of the first data packet in the pipeline. Cost information dependent upon an expected time period of residence of the second data packet in the pipeline differs from said cost information dependent upon the expected time period of residence of the first data packet in the pipeline.
    Type: Application
    Filed: August 15, 2012
    Publication date: December 13, 2012
    Inventors: Thomas Bodén, Jakob Carlström
  • Publication number: 20120317399
    Abstract: A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.
    Type: Application
    Filed: August 15, 2012
    Publication date: December 13, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael A. Blocksome, Daniel A. Faraj
  • Patent number: 8332460
    Abstract: A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.
    Type: Grant
    Filed: April 14, 2010
    Date of Patent: December 11, 2012
    Assignee: International Business Machines Corporation
    Inventors: Michael A. Blocksome, Daniel A. Faraj
  • Publication number: 20120311301
    Abstract: In a method of synchronizing data processing of processor arrangement, responsive to reaching, during execution of a program, a barrier included in a program sequence, the processor arrangement halts the program execution until it is determined that all instructions preceding the barrier in the program sequence have been successfully scheduled for execution.
    Type: Application
    Filed: June 8, 2012
    Publication date: December 6, 2012
    Inventors: Martin VORBACH, Volker Baumgarte, Gerd Ehlers, Frank May, Armin Nückel
  • Publication number: 20120311300
    Abstract: Disclosed is a method of synchronizing a plurality of processors accesses to at least one shared resource. One of a plurality of processors requests an exclusive region lock for a shared resource using a logical block address (LBA) of a dummy target. The LBA is defined in a region map that associates LBAs to shared resources. The exclusive region lock request is inserted as a node in a region lock tree of the dummy target. Access to the shared resource is granted based on a determination whether there is an existing region lock in the region lock tree that is overlapps with the new exclusive region lock request.
    Type: Application
    Filed: June 1, 2011
    Publication date: December 6, 2012
    Inventors: Kapil Sundrani, Lakshmi Kanth Reddy Kakanuru
  • Publication number: 20120303933
    Abstract: The present invention relates to a processor which comprises processing elements that execute instructions in parallel and are connected together with point-to-point communication links called data communication links (DCL). The instructions use DCLs to communicate data between them. In order to realize those communications, they specify the DCLs from which they take their operands, and the DCLs to which they write their results. The DCLs allow the instructions to synchronize their executions and to explicitly manage the data they manipulate. Communications are explicit and are used to realize the storage of temporary variables, which is decoupled from the storage of long-living variables.
    Type: Application
    Filed: January 31, 2011
    Publication date: November 29, 2012
    Inventors: Philippe Manet, Bertrand Rousseau
  • Publication number: 20120303932
    Abstract: A processor includes a plurality of processing tiles, wherein each tile is configured at runtime to perform a configurable operation. A first subset of tiles are configured to perform in a pipeline a first plurality of configurable operations in parallel. A second subset of tiles are configured to perform a second plurality of configurable operations in parallel with the first plurality of configurable operations. The process also includes a multi-port memory access module operably connected to the plurality of tiles via a data bus configured to control access to a memory and to provide data to two or more processing tiles simultaneously. The processor also includes a controller operably connected to the plurality of tiles and the multi-port memory access module via a runtime bus. The processor configures the tiles and the multi-port memory access module to execute a computation.
    Type: Application
    Filed: May 24, 2012
    Publication date: November 29, 2012
    Inventors: Clément Farabet, Yann LeCun
  • Publication number: 20120297164
    Abstract: This invention describes an apparatus, computer architecture, method, operating system, compiler, and application program products for MPEs as well as virtualization in a symmetric MCP. The disclosure is applied to a generic microprocessor architecture with a set (e.g., one or more) of controlling elements (e.g., MPEs) and a set of groups of sub-processing elements (e.g., SPEs). Under this arrangement, MPEs and SPEs are organized in a way that a smaller number MPEs control the behavior of a group of SPEs. The apparatus enables virtualized control threads within MPEs to be assigned to different groups of SPEs for controlling the same. The apparatus further includes a MCP coupled to a power supply coupled with cores to provide a supply voltage to each core (or core group) and controlling-digital elements and multiple instances of sub-processing elements.
    Type: Application
    Filed: July 31, 2012
    Publication date: November 22, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Karl J. Duvalsaint, Harm P. Hofstee, Daeik Kim, Moon J. Kim
  • Publication number: 20120290815
    Abstract: A data processing apparatus causes multiple processors to carry out a first data process in parallel, and when storing the data processed in parallel in a storage unit, converts the addresses of the data into addresses in the storage unit based on the data cache size of the multiple processors and stores the data. The data stored in the storage unit is then read out, and a second data process is carried out on the read-out data.
    Type: Application
    Filed: April 10, 2012
    Publication date: November 15, 2012
    Applicant: CANON KABUSHIKI KAISHA
    Inventor: Hirokazu Takahashi
  • Patent number: 8312455
    Abstract: A method for optimizing execution of a single threaded program on a multi-core processor. The method includes dividing the single threaded program into a plurality of discretely executable components while compiling the single threaded program; identifying at least some of the plurality of discretely executable components for execution by an idle core within the multi-core processor; and enabling execution of the at least one of the plurality of discretely executable components on the idle core.
    Type: Grant
    Filed: December 19, 2007
    Date of Patent: November 13, 2012
    Assignee: International Business Machines Corporation
    Inventors: Robert H. Bell, Jr., Louis Bennie Capps, Jr., Michael A. Paolini, Michael Jay Shapiro
  • Publication number: 20120278589
    Abstract: The present invention provides a storage system in which each microprocessor is able to execute synchronous processing and asynchronous processing in accordance with the operating status of the storage system. Any one attribute, from among multiple attributes (operating modes) prepared beforehand, is set in each microprocessor in accordance with the operating status of the storage system. The attribute that is set in each microprocessor is regularly reviewed and changed.
    Type: Application
    Filed: June 17, 2010
    Publication date: November 1, 2012
    Applicant: HITACHI, LTD.
    Inventors: Tomohiro Yoshihara, Shintaro Kudo, Norio Shimozono
  • Publication number: 20120278590
    Abstract: A reconfigurable processor is provided. The reconfigurable processor includes a plurality of functional blocks configured to perform corresponding operations. The reconfigurable processor also includes one or more data inputs coupled to the plurality of functional blocks to provide one or more operands to the plurality of functional blocks, and one or more data outputs to provide at least one result outputted from the plurality of functional blocks.
    Type: Application
    Filed: January 7, 2011
    Publication date: November 1, 2012
    Applicant: SHANGHAI XIN HAO MICRO ELECTRONICS CO. LTD.
    Inventors: Kenneth Chenghao Lin, Zhongmin Zhang, Haoqi Ren
  • Publication number: 20120272042
    Abstract: A wireless communication base station comprising a plurality of application specific instruction set processors (ASISPs) configured to support one or more processes hosted by the base station, and to track process state information associated with each of the processes; and a memory configured to store the tracked process state information, and when an ASISP of the plurality of ASISPs is reallocated from a first process to a second process, the respective ASISP is configured to retrieve from the memory process state information for the second process.
    Type: Application
    Filed: June 22, 2012
    Publication date: October 25, 2012
    Inventors: Song CHEN, Paul L. Chou, Christopher C. Woodthorpe, Venugopal Balasubramonian, Keith Rieken
  • Patent number: 8296350
    Abstract: The present invention provides a method and apparatus for QR-factorizing matrix on a multiprocessor system, wherein the multiprocessor system comprises at least one core processor and a plurality of accelerators, the method comprises the steps of: iteratively factorizing each panel in the matrix until the whole matrix is factorized; wherein in each iteration, the method comprises: partitioning an unprocessed matrix part in the matrix into a plurality of blocks according to a predetermined block size; partitioning a current processed panel in the unprocessed matrix part into at least two sub panels, wherein the current processed panel is composed of a plurality of blocks; and performing QR factorization one by one on the at least two sub panels with the plurality of accelerators, and updating the data of the sub panel(s) on which no QR factorization has been performed among the at least two sub panels by using the factorization result.
    Type: Grant
    Filed: March 12, 2009
    Date of Patent: October 23, 2012
    Assignee: International Business Machines Corporation
    Inventors: Hui Li, Bai Ling Wang
  • Patent number: 8291256
    Abstract: A digital VLSI circuit is provided with functions in which the number of switching operations to supply electric power to each arithmetic operation unit is reduced in a restricted period of time while electric power supply is controlled for each arithmetic operation unit, so that low power consumption can be achieved in real pipe-line arithmetic operation.
    Type: Grant
    Filed: February 5, 2007
    Date of Patent: October 16, 2012
    Assignee: National University Corporation Kobe University
    Inventors: Masahiko Yoshimoto, Kentaro Kawakami, Jun Takemura
  • Publication number: 20120239905
    Abstract: Embodiments of an apparatus including a first processor core having a local agent running thereon, the agent comprising a local process and a proxy agent and a second processor core having a remote agent running thereon, the remote agent being an instance of the local agent. A shared memory wherein coupled to the first processor core and the second processor core, wherein the local agent and the remote agent communicate via the shared memory. Other embodiments are disclosed and claimed.
    Type: Application
    Filed: March 16, 2011
    Publication date: September 20, 2012
    Applicant: MICROSCAN SYSTEMS, INC.
    Inventors: Danny S. Barnes, Serge H. Limondin
  • Patent number: 8261117
    Abstract: This invention describes an apparatus, computer architecture, method, operating system, compiler, and application program products for MPEs as well as virtualization in a symmetric MCP. The disclosure is applied to a generic microprocessor architecture with a set (e.g., one or more) of controlling elements (e.g., MPEs) and a set of groups of sub-processing elements (e.g., SPEs). Under this arrangement, MPEs and SPEs are organized in a way that a smaller number MPEs control the behavior of a group of SPEs. The apparatus enables virtualized control threads within MPEs to be assigned to different groups of SPEs for controlling the same. The apparatus further includes a MCP coupled to a power supply coupled with cores to provide a supply voltage to each core (or core group) and controlling-digital elements and multiple instances of sub-processing elements.
    Type: Grant
    Filed: September 11, 2008
    Date of Patent: September 4, 2012
    Assignee: International Business Machines Corporation
    Inventors: Karl J. Duvalsaint, Harm P. Hofstee, Daeik Kim, Moon J. Kim
  • Publication number: 20120216016
    Abstract: A processor instruction scheduler comprising an optimization engine which uses an optimization model for a processor architecture with: means to generate an optimization model for the optimization engine from a design of a processor and data representing optimization goals and constraints and a code stream, wherein the processor has at least two execution pipes and at least two registers, and wherein the design comprises data for processor instruction latency and execution pipes, and wherein the code stream comprises processor instructions with corresponding register selections; and reordering means to generate an optimized code stream from the code stream with the optimal solution provided by the optimization engine for the optimization model by reordering the code stream, such that optimum values for the optimization goals under the given constraints are achieved without affecting the operation results of the code stream.
    Type: Application
    Filed: April 28, 2012
    Publication date: August 23, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Juergen Koehl, Jens Leenstra, Philipp Panitz, Hans Schlenker
  • Publication number: 20120216017
    Abstract: Computational unit area selecting units, each of which is provided in individual multiple cores, sequentially select uncomputed computational unit areas in a computational area. Computing units, each of which is provided in the individual multiple cores, perform computation for the selected computational unit areas. In addition, the computing units write computational results in a memory device which is accessible from each of the multiple cores. Computational result transmitting unit of the core performs computational result acquisition and transmission processing in a different time period with respect to each of multiple computational result transmission areas. The computational result acquisition processing is for acquiring, from the memory device, computational results related to the computational result transmission areas.
    Type: Application
    Filed: April 30, 2012
    Publication date: August 23, 2012
    Applicant: FUJITSU LIMITED
    Inventor: Yoshie INADA
  • Publication number: 20120210097
    Abstract: A management unit causes a plurality of processing units to execute a calculation process. A determining unit determines whether a communication time for a communication process of exchanging a calculation result obtained from the calculation process is longer than a calculation time for the calculation process, the communication process being executed between a first computational node including the processor and a second computational node being a different computational node from the first computational node. A control unit limits number of processing units when the determining unit has determined that the communication time is longer than the calculation time.
    Type: Application
    Filed: January 5, 2012
    Publication date: August 16, 2012
    Applicant: Fujitsu Limited
    Inventor: Yusuke OISHI
  • Publication number: 20120204059
    Abstract: A distributed vehicle control system comprising a secure real-time executive running as a distributed abstraction of both the application and the operating system, where the SRE comprises a message manager, security manager, critical data manager, configuration manager, and multi-processor task control manager and is configured to control how the processors communicate with each other, how the processors are initiated, how the processors start tasks, and how priorities are set for messages.
    Type: Application
    Filed: April 16, 2012
    Publication date: August 9, 2012
    Inventor: Dan A. Preston
  • Publication number: 20120204002
    Abstract: A mechanism is provided for sharing a communication used by a parser (parser path) in a network adapter of a network processor for sending requests for a process to be executed by an external coprocessor. The parser path is shared by processors of the network processor (software path) to send requests to the external processor. The mechanism uses for the software path a request mailbox comprising a control address and a data field accessed by MMIO for sending two types of messages, one message type to read or write resources and one message type to trigger an external process in the coprocessor and a response mailbox for receiving response from the external coprocessor comprising a data field and a flag field. The other processors of the network poll the flag until set and get the coprocessor result in the data field.
    Type: Application
    Filed: February 3, 2012
    Publication date: August 9, 2012
    Applicant: Internaitonal Business Machines Corporation
    Inventors: Claude Basso, Jean L. Calvignac, Chih-jen Chang, Philippe Damon, Natarajan Vaidhyanathan, Fabrice J. Verplanken, Colin B. Verrilli
  • Publication number: 20120185673
    Abstract: Provided is a reconfigurable processor that may process a first type of operation in first mode using a first group of functional units, and process a second type of operation in second mode using a second group of functional units. The reconfigurable processor may selectively supply power to either the first group or the second group, in response to a mode-switch signal or a mode-switch instruction.
    Type: Application
    Filed: August 19, 2011
    Publication date: July 19, 2012
    Inventors: Sung-Joo Yoo, Yeon-Gon Cho, Bernhard Egger, Won-Sub Kim, Hee-Jin Ahn
  • Publication number: 20120185672
    Abstract: Performing a series of successive synchronizing operations by a core on data shared by a plurality of cores may include a first core indicating an upcoming synchronizing operation on shared data. A second memory layer stores the shared data and tracks the first core's ownership of the shared data. The second memory layer is shared via coherency operations among the first core and one or more second cores. The first core may perform one or more synchronization operations on the shared data without requiring interaction from the second memory layer.
    Type: Application
    Filed: January 18, 2011
    Publication date: July 19, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alan Gara, Martin Ohmacht, Burkhard Steinmacher-Burow, Robert W. Wisniewski
  • Patent number: 8209690
    Abstract: An Explicit Multi-Threading (XMT) system and method is provided for processing multiple spawned threads associated with SPAWN-type commands of an XMT program. The method includes executing a plurality of child threads by a plurality of TCUs including a first TCU executing a child thread which is allocated to it; completing execution of the child thread by the first TCU; announcing that the first TCU is available to execute another child thread; executing by a second TCU a parent child thread that includes a nested spawn-type command for spawning additional child threads of the plurality of child threads, wherein the parent child thread is related in a parent-child relationship to the child threads that are spawned in conjunction with the nested spawn-type command; assigning a thread ID (TID) to each child thread, wherein the TID is unique with respect to the other TIDs; and allocating a new child thread to the first TCU.
    Type: Grant
    Filed: January 19, 2007
    Date of Patent: June 26, 2012
    Assignee: University of Maryland
    Inventors: Xingzhi Wen, Uzi Yehoshua Vishkin
  • Publication number: 20120159121
    Abstract: A synchronization apparatus includes a receiver that receives data from a synchronization apparatus of another node that performs synchronization with its own node from among the plurality of synchronization apparatuses and extracts synchronization information from the received data, a transmitter that transmits the data to the synchronization apparatus of the other node, a receiving state register that stores the extracted synchronization information, a delay unit that delays the received data by a specified period of time, and a controller that stores the extracted synchronization information and synchronization information from its own controller in the reception state register and causes the transmitter to transmit the data to the other node and returns the data to its own node back to its own controller via the delay unit when the extracted synchronization information and the synchronization information from its own controller are stored in the reception state register.
    Type: Application
    Filed: December 13, 2011
    Publication date: June 21, 2012
    Applicant: Fujitsu Limited
    Inventors: Tomohiro INOUE, Yuichiro Ajima, Shinya Hiramoto
  • Publication number: 20120159513
    Abstract: Technologies pertaining to cluster-on-chip computing environments are described herein. More particularly, mechanisms for supporting message passing in such environments are described herein, where cluster-on-chip computing environments do not support hardware cache coherency.
    Type: Application
    Filed: December 15, 2010
    Publication date: June 21, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Alexey Pakhunov, Ajith Jayamohan, Suyash Sinha
  • Patent number: 8205210
    Abstract: A method, apparatus and system for adaptably distributing video server processes among processing elements within a video server such that video server operation may be adapted in a manner facilitating rigorous timing constraints.
    Type: Grant
    Filed: August 12, 2008
    Date of Patent: June 19, 2012
    Assignee: Comcast IP Holdings I, LLC
    Inventors: Geoffrey Alan Cleary, Joseph I. Brown
  • Patent number: 8205201
    Abstract: A process for maintaining synchronization of processors that are executing a same plurality of applications in parallel includes interrupting a current task between processing two successive instructions of an application being processed when an interrupt request occurs to process another application. An intermediate state reached by the current task is saved when the interrupt request occurs, and a counter for each of the processors indicating a number of instructions processed by each of the processors is maintained. A processor is caused to issue a synchronization confirmation in response to a comparison result that the numbers of instructions processed are identical. The processor is caused to enter a wait state when its number of processed-instructions is the largest among the processors or to execute a procedure for processing the instructions until its processed-instruction counter reaches the largest number.
    Type: Grant
    Filed: February 13, 2008
    Date of Patent: June 19, 2012
    Assignee: Thales
    Inventor: Christophe Ple
  • Patent number: 8204629
    Abstract: The invention relates to a control device for lubrication systems, having a control processor which is arranged in a housing, having connections, which are formed on the housing, for sensor inputs and control outputs, which are connected to the control processor, and having an operator interface which is secured to the outside of the housing and is intended to input control parameters. Provision is made for the control processor to be set up with different control programs for different lubrication systems and for program switches for selecting the different control programs to be arranged inside the housing.
    Type: Grant
    Filed: April 23, 2008
    Date of Patent: June 19, 2012
    Assignee: Lincoln GmbH
    Inventor: Armin Guenther