Operation Patents (Class 712/30)

Master/slave (Class 712/31)

Allocation of Mainframe Computing Resources Using Distributed Computing

Publication number: 20120144157

Abstract: There is disclosed a system and method for allocation of mainframe computing resources using distributed computing. In particular, the present application is directed to a system whereby a mainframe process intended for execution on a metered processor may be identified as executable on a non-metered processor. Thereafter, the mainframe computer may initiate execution of the remote process on the remote non-metered processor. If necessary, high-speed access to data available to the metered processor is provided to the non-metered processor. The process operates directly on data available to the metered processor. Once completed, the process signals the mainframe computer that the process is complete. Both metered and non-metered processor configuration and management may be accomplished using the administrative interface.

Type: Application

Filed: December 6, 2010

Publication date: June 7, 2012

Inventors: James Reginald Crew, Pradeep Kumar Reddy Gundavarapu, Balaji Swaminathan, William Donald Pagdin, Lary Edward Klein
ADAPTIVE INTEGRATED CIRCUITRY WITH HETEROGENEOUS AND RECONFIGURABLE MATRICES OF DIVERSE AND ADAPTIVE COMPUTATIONAL UNITS HAVING FIXED, APPLICATION SPECIFIC COMPUTATIONAL ELEMENTS

Publication number: 20120124333

Abstract: The present invention concerns a new category of integrated circuitry and a new methodology for adaptive or reconfigurable computing. The preferred IC embodiment includes a plurality of heterogeneous computational elements coupled to an interconnection network. The plurality of heterogeneous computational elements include corresponding computational elements having fixed and differing architectures, such as fixed architectures for different functions such as memory, addition, multiplication, complex multiplication, subtraction, configuration, reconfiguration, control, input, output, and field programmability. In response to configuration information, the interconnection network is operative in real-time to configure and reconfigure the plurality of heterogeneous computational elements for a plurality of different functional modes, including linear algorithmic operations, non-linear algorithmic operations, finite state machine operations, memory operations, and bit-level manipulations.

Type: Application

Filed: January 19, 2012

Publication date: May 17, 2012

Applicant: QST Holdings LLC

Inventors: Paul L. Master, Eugene Hogenauer, Walter James Scheuermann
Multi-cluster dynamic reconfigurable circuit for context valid processing of data by clearing received data with added context change indicative signal

Patent number: 8171259

Abstract: A dynamic reconfigurable circuit includes multiple clusters each including a group of reconfigurable processing elements. The dynamic reconfigurable circuit is capable of dynamically changing a configuration of the clusters according to a context including a description of processing of the processing elements and of connection between the processing elements. A first cluster among the clusters includes a signal generating circuit that when an instruction to change the context is received, generates a report signal indicative of the instruction to change the context; a signal adding circuit that adds the report signal generated by the signal generating circuit to output data that is to be transmitted from the first cluster to a second cluster; and a data clearing circuit that, when output data to which a report signal generated by the second cluster is added is received, performs a clearing process of clearing the output data received.

Type: Grant

Filed: February 27, 2009

Date of Patent: May 1, 2012

Assignee: Fujitsu Semiconductor Limited

Inventors: Takashi Hanai, Shinichi Sutou
STALL PROPAGATION IN A PROCESSING SYSTEM WITH INTERSPERSED PROCESSORS AND COMMUNICATON ELEMENTS

Publication number: 20120102299

Abstract: A processing system includes processors and dynamically configurable communication elements (DCCs) coupled together in an interspersed arrangement. A source device may transfer a data item through an intermediate subset of the DCCs to a destination device. The source and destination devices may each correspond to different processors, DCCs, or input/output devices, or mixed combinations of these. In response to detecting a stall after the source device begins transfer of the data item to the destination device and prior to receipt of all of the data item at the destination device, a stalling device is operable to propagate stalling information through one or more of the intermediate subset towards the source device. In response to receiving the stalling information, at least one of the intermediate subset is operable to buffer all or part of the data item.

Type: Application

Filed: December 30, 2011

Publication date: April 26, 2012

Inventors: Michael B. Doerr, William H. Hallidy, David A. Gibson, Craig M. Chase
Performing an allreduce operation on a plurality of compute nodes of a parallel computer

Patent number: 8161268

Abstract: Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.

Type: Grant

Filed: May 21, 2008

Date of Patent: April 17, 2012

Assignee: International Business Machines Corporation

Inventor: Ahmad Faraj
Reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application

Patent number: 8161307

Abstract: Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation.

Type: Grant

Filed: October 20, 2011

Date of Patent: April 17, 2012

Assignee: International Business Machines Corporation

Inventors: Charles J. Archer, Michael A. Blocksome, Amanda E. Peters, Joseph D. Ratterman, Brian E. Smith
COMPUTING APPARATUS BASED ON RECONFIGURABLE ARCHITECTURE AND MEMORY DEPENDENCE CORRECTION METHOD THEREOF

Publication number: 20120089813

Abstract: Provided are a computing apparatus based on a reconfigurable architecture and a memory dependence correction method thereof. In one general aspect, a computing apparatus has a reconfigurable architecture. The computing apparatus may include: a reconfiguration unit having processing elements configured to reconfigure data paths between one or more of the processing elements; a compiler configured to analyze instructions to generate reconfiguration information for reconfiguring one or more of the reconfigurable data paths; a configuration memory configured to store the reconfiguration information; and a processor configured to execute the instructions through the reconfiguration unit, and to correct at least one memory dependency among the processing elements.

Type: Application

Filed: July 7, 2011

Publication date: April 12, 2012

Inventors: Tai-Song Jin, Dong-Hoon Yoo, Bernhard Egger
Inter-Processor Protocol in a Multi-Processor System

Publication number: 20120089814

Abstract: In a multiprocessor system, a primary processor may store an executable image for a secondary processor. A communication protocol assists the transfer of an image header and data segment(s) of the executable image from the primary processor to the secondary processor. Messages between the primary processor and secondary processor indicate successful receipt of transferred data, termination of a transfer process, and acknowledgement of same.

Type: Application

Filed: December 5, 2011

Publication date: April 12, 2012

Applicant: QUALCOMM INCORPORATED

Inventors: Nitin Gupta, Daniel H. Kim, Igor Malamant, Steve Haehnichen
Application-based specialization for computing nodes within a distributed processing system

Patent number: 8151245

Abstract: A distributed processing system is described that employs “application-based” specialization. In particular, the distributed processing system is constructed as a collection of computing nodes in which each computing node performs a particular processing role within the operation of the overall distributed processing system. Each of the computing nodes includes an operating system, such as the Linux operating system, and includes a plug-in software module to provide a distributed memory operating system that employs the role-based computing techniques. An administration node maintains a database that defines a plurality of application roles. Each role is associated with a software application, and specifies a set of software components necessary for execution of the software application. The administration node deploys the software components to the application nodes in accordance with the application roles associates with each of the application nodes.

Type: Grant

Filed: December 16, 2005

Date of Patent: April 3, 2012

Assignee: Computer Associates Think, Inc.

Inventors: Steven M. Oberlin, David W. McAllister
SCALABLE AND PROGRAMMABLE PROCESSOR COMPRISING MULTIPLE COOPERATING PROCESSOR UNITS

Publication number: 20120079236

Abstract: A processor comprises a plurality of processor units arranged to operate concurrently and in cooperation with one another, and control logic configured to direct the operation of the processor units. At least a given one of the processor units comprises a memory, an arithmetic engine and a switch fabric. The switch fabric provides controllable connectivity between the memory, the arithmetic engine and input and output ports of the given processor unit, and has control inputs driven by corresponding outputs of the control logic. In an illustrative embodiment, the processor units may be configured to perform computations associated with a key equation solver in a Reed-Solomon (RS) decoder or other type of forward error correction (FEC) decoder.

Type: Application

Filed: September 29, 2011

Publication date: March 29, 2012

Inventors: Dusan Suvakovic, Adriaan J. de Lind van Wijngaarden, Man Fai Lau
APPLICATION SCHEDULING IN HETEROGENEOUS MULTIPROCESSOR COMPUTING PLATFORMS

Publication number: 20120079235

Abstract: Methods and apparatus to schedule applications in heterogeneous multiprocessor computing platforms are described. In one embodiment, information regarding performance (e.g., execution performance and/or power consumption performance) of a plurality of processor cores of a processor is stored (and tracked) in counters and/or tables. Logic in the processor determines which processor core should execute an application based on the stored information. Other embodiments are also claimed and disclosed.

Type: Application

Filed: September 25, 2010

Publication date: March 29, 2012

Inventors: Ravishankar Iyer, Sadagopan Srinivasan, Li Zhao, Rameshkumar G. Illikkal
Handling transaction buffer overflow in multiprocessor by re-executing after waiting for peer processors to complete pending transactions and bypassing the buffer

Patent number: 8140828

Abstract: There is disclosed a method and apparatus for handling transaction buffer overflow in a multi-processor system as well as a transaction memory system in a multi-processor system. The method comprises the steps of: when overflow occurs in a transaction buffer of one processor, disabling peer processors from entering transactions, and waiting for any processor having a current transaction to complete its current transaction; re-executing the transaction resulting in the transaction buffer overflow without using the transaction buffer; and when the transaction execution is completed, enabling the peer processors for entering transactions.

Type: Grant

Filed: December 1, 2008

Date of Patent: March 20, 2012

Assignee: International Business Machines Corporation

Inventors: Xiaowei Shen, Hua Yong Wang, Kun Wang
TRAFFIC CONTROL METHOD AND APPARATUS OF MULTIPROCESSOR SYSTEM

Publication number: 20120060007

Abstract: A method and apparatus for controlling traffic of multiprocessor system or multi-core system is provided. The traffic control apparatus of a multiprocessor system according to the present invention includes a request handler for processing a traffic request of a first processor, and a Quality of Service (QoS) manager for receiving a QoS guaranty start instruction for a second processor from the multiprocessor system, and for transmitting, when traffic of the second processor is detected, a traffic adjustment signal to the request handler. The request handler adjusts the traffic of the first processor according to the received traffic adjustment signal. The traffic control method and apparatus of the present invention is capable of adjusting the required bandwidths of individual technologies and guaranteeing the real-timeness in the multiprocessor system or multi-core system.

Type: Application

Filed: September 2, 2011

Publication date: March 8, 2012

Applicant: SAMSUNG ELECTRONICS CO. LTD.

Inventors: Min Seung BAIK, Joong Baik KIM, Seung Wook LEE, Soon Wan KWON
CONTROLLING SIMD PARALLEL PROCESSORS

Publication number: 20120047350

Abstract: A processing apparatus for processing source code comprising a plurality of single line instructions to implement a desired processing function is described.

Type: Application

Filed: May 4, 2010

Publication date: February 23, 2012

Inventors: John Lancaster, Martin Whitaker
MULTIPROCESSOR SYSTEM-ON-A-CHIP FOR MACHINE VISION ALGORITHMS

Publication number: 20120042150

Abstract: A multiprocessor system includes a main memory and multiple processing cores that are configured to execute software that uses data stored in the main memory. In some embodiments, the multiprocessor system includes a data streaming unit, which is connected between the processing cores and the main memory and is configured to pre-fetch the data from the main memory for use by the multiple processing cores. In some embodiments, the multiprocessor system includes a scratch-pad processing unit, which is connected to the processing cores and is configured to execute, on behalf of the multiple processing cores, a selected part of the software that causes two or more of the processing cores to access concurrently a given item of data.

Type: Application

Filed: March 29, 2011

Publication date: February 16, 2012

Applicant: PRIMESENSE LTD.

Inventor: Idan Saar
ACHIEVING ULTRA-HIGH AVAILABILITY USING A SINGLE CPU

Publication number: 20120023309

Abstract: Techniques for achieving high-availability using a single processor (CPU). In a system comprising a multi-core processor, at least two partitions may be configured with each partition being allocated one or more cores of the multiple cores. The partitions may be configured such that one partition operates in active mode while another partition operates in standby mode. In this manner, a single processor is able to provide active-standby functionality, thereby enhancing the availability of the system comprising the processor.

Type: Application

Filed: July 23, 2010

Publication date: January 26, 2012

Applicant: Brocade Communications Systems, Inc.

Inventors: Vineet M. Abraham, Bill Ying Chin, William R. Mahoney, Aditya Saxena, Xupei Liang, Bill Jianqiang Zhou
Performance monitoring for new phase dynamic optimization of instruction dispatch cluster configuration

Patent number: 8103856

Abstract: In a processor having multiple clusters which operate in parallel, the number of clusters in use can be varied dynamically. At the start of each program phase, the configuration option for an interval is run to determine the optimal configuration, which is used until the next phase change is detected. The optimum instruction interval is determined by starting with a minimum interval and doubling it until a low stability factor is reached.

Type: Grant

Filed: January 12, 2009

Date of Patent: January 24, 2012

Assignee: University of Rochester

Inventors: Rajeev Balasubramonian, Sandhya Dwarkadas, David Albonesi
Data Processing Using On-Chip Memory In Multiple Processing Units

Publication number: 20120017062

Abstract: Methods are disclosed for improving data processing performance in a processor using on-chip local memory in multiple processing units. According to an embodiment, a method of processing data elements in a processor using a plurality of processing units, includes: launching, in each of the processing units, a first wavefront having a first type of thread followed by a second wavefront having a second type of thread, where the first wavefront reads as input a portion of the data elements from an off-chip shared memory and generates a first output; writing the first output to an on-chip local memory of the respective processing unit; and writing to the on-chip local memory a second output generated by the second wavefront, where input to the second wavefront comprises a first plurality of data elements from the first output. Corresponding system and computer program product embodiments are also disclosed.

Type: Application

Filed: July 19, 2011

Publication date: January 19, 2012

Applicant: Advanced Micro Devices, Inc.

Inventors: Vineet GOEL, Todd Martin, Mangesh Nijasure
Reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application

Patent number: 8095811

Abstract: Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation.

Type: Grant

Filed: May 29, 2008

Date of Patent: January 10, 2012

Assignee: International Business Machines Corporation

Inventors: Charles J. Archer, Michael A. Blocksome, Amanda A. Peters, Joseph D. Ratterman, Brian E. Smith
Error management firewall in a multiprocessor computer

Patent number: 8095759

Abstract: A multiprocessor computer system comprises a plurality of processors and a plurality of nodes, each node comprising one or more processors. A local memory in each of the plurality of nodes is coupled to the processors in each node, and a hardware firewall comprising a part of one or more of the nodes is operable to prevent a write from an unauthorized processor from writing to the local memory.

Type: Grant

Filed: May 29, 2009

Date of Patent: January 10, 2012

Assignee: Cray Inc.

Inventors: Dennis C. Abts, Steven L. Scott, Aaron F. Godfrey
PARALLEL COMPUTING DEVICE, INFORMATION PROCESSING SYSTEM, PARALLEL COMPUTING METHOD, AND INFORMATION PROCESSING DEVICE

Publication number: 20110320769

Abstract: A computing section is provided with a plurality of computing units and correlatively stores entries of configuration information that describes configurations of the plurality of computing units with physical configuration numbers that represent the entries of configuration information and executes a computation in a configuration corresponding to a designated physical configuration number. A status management section designates a physical configuration number corresponding to a status to which the computing section needs to advance the next time for the computing section and outputs the status to which the computing section needs to advance the next time as a logical status number that uniquely identifies the status to which the computing section needs to advance the next time in an object code.

Type: Application

Filed: December 25, 2009

Publication date: December 29, 2011

Inventors: Takeshi Inuo, Kengo Nishino, Nobuki Kajihara
Parallelization of Online Learning Algorithms

Publication number: 20110320767

Abstract: Methods, systems, and media are provided for a dynamic batch strategy utilized in parallelization of online learning algorithms. The dynamic batch strategy provides a merge function on the basis of a threshold level difference between the original model state and an updated model state, rather than according to a constant or pre-determined batch size. The merging includes reading a batch of incoming streaming data, retrieving any missing model beliefs from partner processors, and training on the batch of incoming streaming data. The steps of reading, retrieving, and training are repeated until the measured difference in states exceeds a set threshold level. The measured differences which exceed the threshold level are merged for each of the plurality of processors according to attributes. The merged differences which exceed the threshold level are combined with the original partial model states to obtain an updated global model state.

Type: Application

Filed: June 24, 2010

Publication date: December 29, 2011

Applicant: MICROSOFT CORPORATION

Inventors: Taha Bekir Eren, Oleg Isakov, Weizhu Chen, Jeffrey Scott Dunn, Thomas Ivan Borchert, Joaquin Quinonero Candela, Thore Kurt Hartwig Graepel, Ralf Herbrich
METHOD OF, AND APPARATUS FOR, MITIGATING MEMORY BANDWIDTH LIMITATIONS WHEN PERFORMING NUMERICAL CALCULATIONS

Publication number: 20110320768

Abstract: There is provided a method of, and apparatus for, processing a computation on a computing device comprising at least one processor and a memory, the method comprising: storing, in said memory, plural copies of a set of data, each copy of said set of data having a different compression ratio and/or compression scheme; selecting a copy of said set of data; and performing, on a processor, a computation using said selected copy of said set of data. By providing such a method, different compression ratios and/or compression schemes can be selected as appropriate. For example, if high precision is required in a computation, a copy of the set of data can be chosen which has a low compression ratio at the expense of processing time and memory transfer time. In the alternative, if low precision is acceptable, then the speed benefits of a high compression ratio and/or lossy compression scheme may be utilised.

Type: Application

Filed: June 25, 2010

Publication date: December 29, 2011

Applicant: MAXELER TECHNOLOGIES, LTD.

Inventors: Oliver Pell, Stephen Girdlestone
Multiprocessor computing systems with heterogeneous processors

Patent number: 8086828

Abstract: Heterogeneous processors can cooperate for distributed processing tasks in a multiprocessor computing system. Each processor is operable in a “compatible” mode, in which all processors within a family accept the same baseline command set and produce identical results upon executing any command in the baseline command set. The processors also have a “native” mode of operation in which the command set and/or results may differ in at least some respects from the baseline command set and results. Heterogeneous processors with a compatible mode defined by reference to the same baseline can be used cooperatively for distributed processing by configuring each processor to operate in the compatible mode.

Type: Grant

Filed: March 25, 2009

Date of Patent: December 27, 2011

Assignee: NVIDIA Corporation

Inventors: Henry Packard Moreton, Abraham B.de Waal
DISTRIBUTED MICRO INSTRUCTIONS SET PROCESSOR ARCHITECTURE FOR HIGH-EFFICIENCY SIGNAL PROCESSING

Publication number: 20110314257

Abstract: A wireless communication system hosts a plurality of processes in accordance with a communication protocol. The system includes application specific instruction set processors (ASISPs) that provided computation support for the process. Each ASISP is capable of executing a subset of the functions of a communication protocol. A scheduler is used to schedule the ASISPs in a time-sliced algorithm so that each ASISP supports several processes. In this architecture, the ASISP actively performs computations for one of the supported processes (active process) at any given time. The state information of each process supported by a particular ASISP is stored in a memory bank that is uniquely associated with the ASISP. When a scheduler instructs an ASISP to change which process is the active process, the state information for the inactivated process is stored in the memory bank and the state information for the newly activated process is retrieved from the memory bank.

Type: Application

Filed: July 29, 2011

Publication date: December 22, 2011

Inventors: Song CHEN, Paul L. CHOU, Christopher C. WOODTHORPE, Venugopal BALASUBRAMONIAN, Keith RIEKEN
Data Parallel Programming Model

Publication number: 20110314256

Abstract: Described herein are techniques for enabling a programmer to express a call for a data parallel call-site function in a way that is accessible and usable to the typical programmer. With some of the described techniques, an executable program is generated based upon expressions of those data parallel tasks. During execution of the executable program, data is exchanged between non-data parallel (non-DP) capable hardware and DP capable hardware for the invocation of data parallel functions.

Type: Application

Filed: June 18, 2010

Publication date: December 22, 2011

Applicant: Microsoft Corporation

Inventors: Charles David Callahan, II, Paul F. Ringseth, Yosseff Levanoni, Weirong Zhu, Lingli Zhang
MESSAGE BROADCAST WITH ROUTER BYPASSING

Publication number: 20110314255

Abstract: A processor and method for broadcasting data among a plurality of processing cores is disclosed. The processor includes a plurality of processing cores connected by point-to-point connections. A first of the processing cores includes a router that includes at least an allocation unit and an output port. The allocation unit is configured to determine that respective input buffers on at least two others of the processing cores are available to receive given data. The output port is usable by the router to send the given data across one of the point-to-point connections. The router is configured to send the given data contingent on determining that the respective input buffers are available. Furthermore, the processor is configured to deliver the data to the at least two other processing cores in response to the first processing core sending the data once across the point-to-point connection.

Type: Application

Filed: June 17, 2010

Publication date: December 22, 2011

Inventors: Tushar Krishna, Bradford M. Beckmann, Steven K. Reinhardt
Prefix sum pass to linearize A-buffer storage

Patent number: 8081181

Abstract: The architecture implements A-buffer in hardware by extending hardware to efficiently store a variable amount of data for each pixel. In operation, a prepass is performed to generate the counts of the fragments per pixel in a count buffer, followed by a prefix sum pass on the generated count buffer to calculate locations in a fragment buffer in which to store all the fragments linearly. An index is generated for a given pixel in the prefix sum pass and stored in a location buffer. Access to the pixel fragments is then accomplished using the index. Linear storage of the data allows for a fast rendering pass that stores all the fragments to a memory buffer without needing to look at the contents of the fragments. This is then followed by a resolve pass on the fragment buffer to generate the final image.

Type: Grant

Filed: June 20, 2007

Date of Patent: December 20, 2011

Assignee: Microsoft Corporation

Inventor: Craig Peeper
Processing system having multiple engines connected in a daisy chain configuration

Patent number: 8074054

Abstract: A processing system includes a group of processing units (“PUs”) arranged in a daisy chain configuration or a sequence capable of parallel processing. The processing system, in one embodiment, includes PUs, a demultiplexer (“demux”), and a multiplexer (“mux”). The PUs are connected or linked in a sequence or a daisy chain configuration wherein a first PU is located at the beginning of the sequence and a last digital PU is located at the end of the sequence. Each PU is configured to read an input data packet from a packet stream during a designated reading time frame. If the time frame is outside of the designated reading time frame, a PU allows a packet stream to pass through. The demux forwards a packet stream to the first digital processing unit. The mux receives a packet steam from the last digital processing unit.

Type: Grant

Filed: December 12, 2007

Date of Patent: December 6, 2011

Assignee: Tellabs San Jose, Inc.

Inventors: Venkata Rangavajjhala, Naveen K. Jain
Performing A Deterministic Reduction Operation In A Parallel Computer

Publication number: 20110296139

Abstract: Performing a deterministic reduction operation in a parallel computer that includes compute nodes, each of which includes computer processors and a CAU (Collectives Acceleration Unit) that couples computer processors to one another for data communications, including organizing processors and a CAU into a branched tree topology in which the CAU is a root and the processors are children; receiving, from each of the processors in any order, dummy contribution data, where each processor is restricted from sending any other data to the root CAU prior to receiving an acknowledgement of receipt from the root CAU; sending, by the root CAU to the processors in the branched tree topology, in a predefined order, acknowledgements of receipt of the dummy contribution data; receiving, by the root CAU from the processors in the predefined order, the processors' contribution data to the reduction operation; and reducing, by the root CAU, the processors' contribution data.

Type: Application

Filed: May 28, 2010

Publication date: December 1, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
Performing A Deterministic Reduction Operation In A Parallel Computer

Publication number: 20110296137

Abstract: A parallel computer that includes compute nodes having computer processors and a CAU (Collectives Acceleration Unit) that couples processors to one another for data communications. In embodiments of the present invention, deterministic reduction operation include: organizing processors of the parallel computer and a CAU into a branched tree topology, where the CAU is a root of the branched tree topology and the processors are children of the root CAU; establishing a receive buffer that includes receive elements associated with processors and configured to store the associated processor's contribution data; receiving, in any order from the processors, each processor's contribution data; tracking receipt of each processor's contribution data; and reducing, the contribution data in a predefined order, only after receipt of contribution data from all processors in the branched tree topology.

Type: Application

Filed: May 28, 2010

Publication date: December 1, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
Iteratively processing data segments by concurrently transmitting to, processing by, and receiving from partnered process

Patent number: 8065503

Abstract: Methods, systems and computer programs for distributing a computing operation among a plurality of processes and for gathering results of the computing operation from the plurality of processes are described.

Type: Grant

Filed: December 15, 2006

Date of Patent: November 22, 2011

Assignee: International Business Machines Corporation

Inventor: Bin Jia
MODULARIZED MICRO PROCESSOR DESIGN

Publication number: 20110283089

Abstract: A method and system of modularized design for a microprocessor are disclosed. Embodiments disclose modularization techniques, whereby the overall design of the execution unit of the processor is split into different functional modules. The modules are configured to function independent of each other. The microprocessor comprises different components such as a cache logic (201), a clock generation unit (202), a dispatcher (203), a special asynchronous interface (204), an interrupt unit (205), a register file (206) and a multiplexer unit (207). Temporary storage of data in the register files is eliminated, and thus data fetch latency is eliminated. The asynchronous transfer triggered execution architecture increases speed of execution.

Type: Application

Filed: August 10, 2009

Publication date: November 17, 2011

Inventor: Harshal Ingale
STREAMING PHYSICS COLLISION DETECTION IN MULTITHREADED RENDERING SOFTWARE PIPELINE

Publication number: 20110283086

Abstract: A circuit arrangement, program product and method stream level of detail components between hardware threads in a multithreaded circuit arrangement to perform physics collision detection. Typically, a master hardware thread, e.g., a component loader hardware thread, is used to retrieve level of detail data for an object from a memory and stream the data to one or more slave hardware threads, e.g., collision detection hardware threads, to perform the actual collision detection. Because the slave hardware threads receive the level of detail data from the master thread, typically the slave hardware threads are not required to load the data from the memory, thereby reducing memory bandwidth requirements and accelerating performance.

Type: Application

Filed: May 12, 2010

Publication date: November 17, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Eric Oliver Mejdrich, Paul Emery Schardt, Robert Allen Shearer
IMAGE FORMING APPARATUS, IMAGE FORMING METHOD, AND COMPUTER READABLE MEDIUM STORING CONTROL PROGRAM THEREFOR

Publication number: 20110283087

Abstract: A first processing unit is implemented by executing a first application program by using an internal computer in an environment where a first operating system is operating. The first processing unit performs a first process or an external service call in accordance with instruction information describing a process to be executed. A second processing unit is implemented by executing a second application program by using the internal computer or an additional computer connected to the internal computer in an environment where a second operating system is operating. The second processing unit performs a second process when instructed by an external service call to execute the second process. When the instruction information includes information specifying the second process as the process to be executed, a transfer unit updates the information included in the instruction information, and transfers the updated instruction information to the first processing unit.

Type: Application

Filed: October 29, 2010

Publication date: November 17, 2011

Applicant: FUJI XEROX CO., LTD.

Inventors: Tsuyoshi WATANABE, Yoshiaki TEZUKA, Kunihiko KOBAYASHI, Tomomichi ADEGAWA
TECHNIQUES FOR ACCELERATING COMPUTATIONS USING FIELD PROGRAMMABLE GATE ARRAY PROCESSORS

Publication number: 20110283059

Abstract: Various embodiments are disclosed for accelerating computations using field programmable gate arrays (FPGA). Various tree traversal techniques, architectures, and hardware implementations are disclosed. Various disclosed embodiments comprise hybrid architectures comprising a central processing unit (CPU), a graphics processor unit (GPU), a field programmable gate array (FPGA), and variations or combinations thereof, to implement raytracing techniques. Additional disclosed embodiments comprise depth-breadth search tree tracing techniques, blocking tree branch traversal techniques to avoid data explosion, compact data structure representations for ray and node representations, and multiplexed processing of multiple rays in a programming element (PE) to leverage pipeline bubble.

Type: Application

Filed: May 10, 2011

Publication date: November 17, 2011

Applicant: Progeniq Pte Ltd

Inventors: Sundar Govindarajan, Vinod Ranganathan Iyer, Darran Nathan
DATA PROCESSING APPARATUS AND DATA PROCESSING METHOD

Publication number: 20110283088

Abstract: A data processing apparatus includes a connecting unit that distributes the plurality of processing modules over the stages, and connects the plurality of processing modules such that a plurality of partial data are processed in parallel. The data processing apparatus detects, with respect to at least a part of the stages, a ratio of an amount of data for which processing in the subsequent stage has been executed, as a passage rate, acquires a processing time for a data amount to be processed in each stage, for which the passage rate was detected, based on the passage rate, and determines the number of processing modules distributed to each stage based on the data amount.

Type: Application

Filed: May 6, 2011

Publication date: November 17, 2011

Applicant: CANON KABUSHIKI KAISHA

Inventors: Ryoko Natori, Shinji Shiraga
SYSTEMS AND METHODS FOR PROCESSING DATA

Publication number: 20110264889

Abstract: Systems, methods, and an article of manufacture for the reduction in process load experienced by a primary processor when executing an application by dynamically reassigning portions of the application to one or more secondary processors are shown and described. A second processing unit is queried for one or more characteristics. One or more performance characteristics of the second processor are measured. A portion of the application can be reassigned to the second processing unit based on the queried characteristics and performance measurements.

Type: Application

Filed: April 21, 2010

Publication date: October 27, 2011

Applicant: MIRICS SEMICONDUCTOR LIMITED

Inventor: Christopher Stolarik
Performing A Local Reduction Operation On A Parallel Computer

Publication number: 20110258245

Abstract: A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.

Type: Application

Filed: April 14, 2010

Publication date: October 20, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Michael A. Blocksome, Daniel A. Faraj
INFORMATION PROCESSING APPARATUS

Publication number: 20110252219

Abstract: According to an aspect of the present invention, there is provided an information processing apparatus including: a first processor; a second processor that has an information processing capability and a power consumption higher than those of the first processor; a temperature monitoring module configured to acquire an operating temperature of the second processor; a throttle number determination module configured to determine whether the throttling control is performed a given number of times or more within a given time interval; and a processor switching control module configured to perform, when the operating temperature of the second processor is equal to or higher than a given temperature: stopping an operation of the second processor; causing the first processor to perform an information process; and prohibiting the operation of the second processor.

Type: Application

Filed: June 20, 2011

Publication date: October 13, 2011

Inventor: Hajime Sonobe
CACHE-AWARE THREAD SCHEDULING IN MULTI-THREADED SYSTEMS

Publication number: 20110246995

Abstract: The disclosed embodiments provide a system that facilitates scheduling threads in a multi-threaded processor with multiple processor cores. During operation, the system executes a first thread in a processor core that is associated with a shared cache. During this execution, the system measures one or more metrics to characterize the first thread. Then, the system uses the characterization of the first thread and a characterization for a second, second thread to predict a performance impact that would occur if the second thread were to simultaneously execute in a second processor core that is also associated with the cache. If the predicted performance impact indicates that executing the second thread on the second processor core will improve performance for the multi-threaded processor, the system executes the second thread on the second processor core.

Type: Application

Filed: April 5, 2010

Publication date: October 6, 2011

Applicant: ORACLE INTERNATIONAL CORPORATION

Inventors: Alexandra Fedorova, David Vengerov, Kishore Kumar Pusukuri
Managing Sensor and Actuator Data for a Processor and Service Processor Located on a Common Socket

Publication number: 20110246748

Abstract: Illustrated is a system and method that includes a processor and service processor co-located on a common socket, the service processor to aggregate data from a distributed network of additional service processors and processors both of which are co-located on an additional common socket. The system and method also includes a first sensor to record the data from the processor. The system and method also includes a second sensor to record the data from a software stack. The system and method further includes a registry to store the data.

Type: Application

Filed: April 6, 2010

Publication date: October 6, 2011

Inventors: Vanish Talwar, Jeffrey R. Hilland, Vidhya Kannan, Sandeep KS, Prashanth V
IMAGE FORMING APPARATUS, IMAGE FORMING SYSTEM, AND INFORMATION GENERATING METHOD

Publication number: 20110238951

Abstract: An image forming apparatus includes: plural processing units which execute plural processing functions that are different from each other; an execution-in-progress information acquiring unit which acquires execution-in-progress function information that is information about a first processing unit which is executing processing, of the plural processing units; a discrimination unit which discriminates a second processing unit that cannot execute processing when the first processing unit indicated by the execution-in-progress function information acquired by the execution-in-progress information acquiring unit is executing processing, from among the plural processing units; and an executability information generating unit which generates inexecutable function information that is information about the second processing unit, based on a result of determination by the discrimination unit.

Type: Application

Filed: March 23, 2011

Publication date: September 29, 2011

Applicants: KABUSHIKI KAISHA TOSHIBA, TOSHIBA TEC KABUSHIKI KAISHA

Inventor: Kanako Asari
Distributed Administration Of A Lock For An Operational Group Of Compute Nodes In A Hierarchical Tree Structured Network

Publication number: 20110238949

Abstract: Distributed administration of a lock for an operational group of compute nodes in a hierarchical tree structured network including assigning the root node of the operational group to send acknowledgments for lock requests, the root lock administration module comprising a module of automated computing machinery; receiving a lock request assigned to a particular node from a child node; determining whether another request from another child is directly ahead in an acknowledgement queue; if a request from another child is directly ahead in the acknowledgement queue, putting the lock request for the particular node in the acknowledgement queue until the lock request directly ahead in the acknowledgement queue is satisfied and when the lock request ahead in the queue is satisfied, sending the particular node for whom the lock request is assigned a message acknowledging the particular node has the lock; and if a request from another child is not directly ahead in a queue, sending to the particular node for whom the

Type: Application

Filed: March 29, 2010

Publication date: September 29, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
Performing A Scatterv Operation On A Hierarchical Tree Network Optimized For Collective Operations

Publication number: 20110238950

Abstract: Performing a scattery operation on a hierarchical tree network optimized for collective operations including receiving, by the scattery module installed on the node, from a nearest neighbor parent above the node a chunk of data having at least a portion of data for the node; maintaining, by the scattery module installed on the node, the portion of the data for the node; determining, by the scattery module installed on the node, whether any portions of the data are for a particular nearest neighbor child below the node or one or more other nodes below the particular nearest neighbor child; and sending, by the scattery module installed on the node, those portions of data to the nearest neighbor child if any portions of the data are for a particular nearest neighbor child below the node or one or more other nodes below the particular nearest neighbor child.

Type: Application

Filed: March 29, 2010

Publication date: September 29, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
Unit status reporting protocol

Patent number: 8019978

Abstract: A unit status reporting protocol may also be used for context switching, debugging, and removing deadlock conditions in a processing unit. A processing unit is in one of five states: empty, active, stalled, quiescent, and halted. The state that a processing unit is in is reported to a front end monitoring unit to enable the front end monitoring unit to determine when a context switch may be performed or when a deadlock condition exists. The front end monitoring unit can issue a halt command to perform a context switch or take action to remove a deadlock condition and allow processing to resume.

Type: Grant

Filed: August 13, 2007

Date of Patent: September 13, 2011

Assignee: NVIDIA Corporation

Inventors: Michael C. Shebanow, Robert C. Keller, Richard A. Silkebakken
CPU CORE UNLOCKING DEVICE APPLIED TO COMPUTER SYSTEM

Publication number: 20110219211

Abstract: A CPU core unlocking device applied to a computer system is provided. The core unlocking device includes a CPU having a plurality of signal terminals and a core unlocking executing unit having a plurality of GPIO ports connected with the corresponding signal terminals of the CPU. The GPIO ports of the core unlocking executing unit generate and transmit and transmit a combination of core unlocking signal to the signal terminals of the CPU to unlock the CPU core.

Type: Application

Filed: March 3, 2011

Publication date: September 8, 2011

Applicant: ASUSTeK COMPUTER INC.

Inventors: Pei-Hua Sun, Pai-Ching Huang, Yi-Min Huang, Meng-Hsiung Lee, Nan-Kun Lo
Data processing apparatus and method for switching a workload between first and second processing circuitry

Publication number: 20110213934

Abstract: A data processing apparatus and method are provided for switching performance of a workload between two processing circuits. The data processing apparatus has first processing circuitry which is architecturally compatible with second processing circuitry, but with the first processing circuitry being micro-architecturally different from the second processing circuitry. At any point in time, a workload consisting of at least one application and at least one operating system for running that application is performed by one of the first processing circuitry and the second processing circuitry. A switch controller is responsive to a transfer stimulus to perform a handover operation to transfer performance of the workload from source processing circuitry to destination processing circuitry, with the source processing circuitry being one of the first and second processing circuitry and the destination processing circuitry being the other of the first and second processing circuitry.

Type: Application

Filed: March 1, 2010

Publication date: September 1, 2011

Applicant: ARM Limited

Inventors: Peter Richard Greenhalgh, Richard Roy Grisenthwaite
System and Method for Power Optimization

Publication number: 20110213950

Abstract: A technique for reducing the power consumption required to execute processing operations. A processing complex, such as a CPU or a GPU, includes a first set of cores comprising one or more fast cores and second set of cores comprising one or more slow cores. A processing mode of the processing complex can switch between a first mode of operation and a second mode of operation based on one or more of the workload characteristics, performance characteristics of the first and second sets of cores, power characteristics of the first and second sets of cores, and operating conditions of the processing complex. A controller causes the processing operations to be executed by either the first set of cores or the second set of cores to achieve the lowest total power consumption.

Type: Application

Filed: May 25, 2010

Publication date: September 1, 2011

Inventors: John George Mathieson, Phil Carmack, Brian Smith
Data processing apparatus and method for switching a workload between first and second processing circuitry

Publication number: 20110213935

Abstract: A data processing apparatus and method are provided for switching performance of a workload between two processing circuits. The data processing apparatus has first processing circuitry which is architecturally compatible with second processing circuitry, but with the first processing circuitry being micro-architecturally different from the second processing circuitry. At any point in time, a workload consisting of at least one application and at least one operating system for running that application is performed by one of the first processing circuitry and the second processing circuitry. A switch controller is responsive to a transfer stimulus to perform a handover operation to transfer performance of the workload from source processing circuitry to destination processing circuitry, with the source processing circuitry being one of the first and second processing circuitry and the destination processing circuitry being the other of the first and second processing circuitry.

Type: Application

Filed: March 1, 2010

Publication date: September 1, 2011

Applicant: ARM Limited

Inventors: Peter Richard Greenhalgh, Richard Roy Grisenthwaite

prev … 2 3 4 5 6 7 8 9 10 … next