Operation Patents (Class 712/30)
-
Publication number: 20120144157Abstract: There is disclosed a system and method for allocation of mainframe computing resources using distributed computing. In particular, the present application is directed to a system whereby a mainframe process intended for execution on a metered processor may be identified as executable on a non-metered processor. Thereafter, the mainframe computer may initiate execution of the remote process on the remote non-metered processor. If necessary, high-speed access to data available to the metered processor is provided to the non-metered processor. The process operates directly on data available to the metered processor. Once completed, the process signals the mainframe computer that the process is complete. Both metered and non-metered processor configuration and management may be accomplished using the administrative interface.Type: ApplicationFiled: December 6, 2010Publication date: June 7, 2012Inventors: James Reginald Crew, Pradeep Kumar Reddy Gundavarapu, Balaji Swaminathan, William Donald Pagdin, Lary Edward Klein
-
Publication number: 20120124333Abstract: The present invention concerns a new category of integrated circuitry and a new methodology for adaptive or reconfigurable computing. The preferred IC embodiment includes a plurality of heterogeneous computational elements coupled to an interconnection network. The plurality of heterogeneous computational elements include corresponding computational elements having fixed and differing architectures, such as fixed architectures for different functions such as memory, addition, multiplication, complex multiplication, subtraction, configuration, reconfiguration, control, input, output, and field programmability. In response to configuration information, the interconnection network is operative in real-time to configure and reconfigure the plurality of heterogeneous computational elements for a plurality of different functional modes, including linear algorithmic operations, non-linear algorithmic operations, finite state machine operations, memory operations, and bit-level manipulations.Type: ApplicationFiled: January 19, 2012Publication date: May 17, 2012Applicant: QST Holdings LLCInventors: Paul L. Master, Eugene Hogenauer, Walter James Scheuermann
-
Patent number: 8171259Abstract: A dynamic reconfigurable circuit includes multiple clusters each including a group of reconfigurable processing elements. The dynamic reconfigurable circuit is capable of dynamically changing a configuration of the clusters according to a context including a description of processing of the processing elements and of connection between the processing elements. A first cluster among the clusters includes a signal generating circuit that when an instruction to change the context is received, generates a report signal indicative of the instruction to change the context; a signal adding circuit that adds the report signal generated by the signal generating circuit to output data that is to be transmitted from the first cluster to a second cluster; and a data clearing circuit that, when output data to which a report signal generated by the second cluster is added is received, performs a clearing process of clearing the output data received.Type: GrantFiled: February 27, 2009Date of Patent: May 1, 2012Assignee: Fujitsu Semiconductor LimitedInventors: Takashi Hanai, Shinichi Sutou
-
Publication number: 20120102299Abstract: A processing system includes processors and dynamically configurable communication elements (DCCs) coupled together in an interspersed arrangement. A source device may transfer a data item through an intermediate subset of the DCCs to a destination device. The source and destination devices may each correspond to different processors, DCCs, or input/output devices, or mixed combinations of these. In response to detecting a stall after the source device begins transfer of the data item to the destination device and prior to receipt of all of the data item at the destination device, a stalling device is operable to propagate stalling information through one or more of the intermediate subset towards the source device. In response to receiving the stalling information, at least one of the intermediate subset is operable to buffer all or part of the data item.Type: ApplicationFiled: December 30, 2011Publication date: April 26, 2012Inventors: Michael B. Doerr, William H. Hallidy, David A. Gibson, Craig M. Chase
-
Patent number: 8161268Abstract: Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.Type: GrantFiled: May 21, 2008Date of Patent: April 17, 2012Assignee: International Business Machines CorporationInventor: Ahmad Faraj
-
Patent number: 8161307Abstract: Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation.Type: GrantFiled: October 20, 2011Date of Patent: April 17, 2012Assignee: International Business Machines CorporationInventors: Charles J. Archer, Michael A. Blocksome, Amanda E. Peters, Joseph D. Ratterman, Brian E. Smith
-
Publication number: 20120089813Abstract: Provided are a computing apparatus based on a reconfigurable architecture and a memory dependence correction method thereof. In one general aspect, a computing apparatus has a reconfigurable architecture. The computing apparatus may include: a reconfiguration unit having processing elements configured to reconfigure data paths between one or more of the processing elements; a compiler configured to analyze instructions to generate reconfiguration information for reconfiguring one or more of the reconfigurable data paths; a configuration memory configured to store the reconfiguration information; and a processor configured to execute the instructions through the reconfiguration unit, and to correct at least one memory dependency among the processing elements.Type: ApplicationFiled: July 7, 2011Publication date: April 12, 2012Inventors: Tai-Song Jin, Dong-Hoon Yoo, Bernhard Egger
-
Publication number: 20120089814Abstract: In a multiprocessor system, a primary processor may store an executable image for a secondary processor. A communication protocol assists the transfer of an image header and data segment(s) of the executable image from the primary processor to the secondary processor. Messages between the primary processor and secondary processor indicate successful receipt of transferred data, termination of a transfer process, and acknowledgement of same.Type: ApplicationFiled: December 5, 2011Publication date: April 12, 2012Applicant: QUALCOMM INCORPORATEDInventors: Nitin Gupta, Daniel H. Kim, Igor Malamant, Steve Haehnichen
-
Patent number: 8151245Abstract: A distributed processing system is described that employs “application-based” specialization. In particular, the distributed processing system is constructed as a collection of computing nodes in which each computing node performs a particular processing role within the operation of the overall distributed processing system. Each of the computing nodes includes an operating system, such as the Linux operating system, and includes a plug-in software module to provide a distributed memory operating system that employs the role-based computing techniques. An administration node maintains a database that defines a plurality of application roles. Each role is associated with a software application, and specifies a set of software components necessary for execution of the software application. The administration node deploys the software components to the application nodes in accordance with the application roles associates with each of the application nodes.Type: GrantFiled: December 16, 2005Date of Patent: April 3, 2012Assignee: Computer Associates Think, Inc.Inventors: Steven M. Oberlin, David W. McAllister
-
Publication number: 20120079236Abstract: A processor comprises a plurality of processor units arranged to operate concurrently and in cooperation with one another, and control logic configured to direct the operation of the processor units. At least a given one of the processor units comprises a memory, an arithmetic engine and a switch fabric. The switch fabric provides controllable connectivity between the memory, the arithmetic engine and input and output ports of the given processor unit, and has control inputs driven by corresponding outputs of the control logic. In an illustrative embodiment, the processor units may be configured to perform computations associated with a key equation solver in a Reed-Solomon (RS) decoder or other type of forward error correction (FEC) decoder.Type: ApplicationFiled: September 29, 2011Publication date: March 29, 2012Inventors: Dusan Suvakovic, Adriaan J. de Lind van Wijngaarden, Man Fai Lau
-
Publication number: 20120079235Abstract: Methods and apparatus to schedule applications in heterogeneous multiprocessor computing platforms are described. In one embodiment, information regarding performance (e.g., execution performance and/or power consumption performance) of a plurality of processor cores of a processor is stored (and tracked) in counters and/or tables. Logic in the processor determines which processor core should execute an application based on the stored information. Other embodiments are also claimed and disclosed.Type: ApplicationFiled: September 25, 2010Publication date: March 29, 2012Inventors: Ravishankar Iyer, Sadagopan Srinivasan, Li Zhao, Rameshkumar G. Illikkal
-
Patent number: 8140828Abstract: There is disclosed a method and apparatus for handling transaction buffer overflow in a multi-processor system as well as a transaction memory system in a multi-processor system. The method comprises the steps of: when overflow occurs in a transaction buffer of one processor, disabling peer processors from entering transactions, and waiting for any processor having a current transaction to complete its current transaction; re-executing the transaction resulting in the transaction buffer overflow without using the transaction buffer; and when the transaction execution is completed, enabling the peer processors for entering transactions.Type: GrantFiled: December 1, 2008Date of Patent: March 20, 2012Assignee: International Business Machines CorporationInventors: Xiaowei Shen, Hua Yong Wang, Kun Wang
-
Publication number: 20120060007Abstract: A method and apparatus for controlling traffic of multiprocessor system or multi-core system is provided. The traffic control apparatus of a multiprocessor system according to the present invention includes a request handler for processing a traffic request of a first processor, and a Quality of Service (QoS) manager for receiving a QoS guaranty start instruction for a second processor from the multiprocessor system, and for transmitting, when traffic of the second processor is detected, a traffic adjustment signal to the request handler. The request handler adjusts the traffic of the first processor according to the received traffic adjustment signal. The traffic control method and apparatus of the present invention is capable of adjusting the required bandwidths of individual technologies and guaranteeing the real-timeness in the multiprocessor system or multi-core system.Type: ApplicationFiled: September 2, 2011Publication date: March 8, 2012Applicant: SAMSUNG ELECTRONICS CO. LTD.Inventors: Min Seung BAIK, Joong Baik KIM, Seung Wook LEE, Soon Wan KWON
-
Publication number: 20120047350Abstract: A processing apparatus for processing source code comprising a plurality of single line instructions to implement a desired processing function is described.Type: ApplicationFiled: May 4, 2010Publication date: February 23, 2012Inventors: John Lancaster, Martin Whitaker
-
Publication number: 20120042150Abstract: A multiprocessor system includes a main memory and multiple processing cores that are configured to execute software that uses data stored in the main memory. In some embodiments, the multiprocessor system includes a data streaming unit, which is connected between the processing cores and the main memory and is configured to pre-fetch the data from the main memory for use by the multiple processing cores. In some embodiments, the multiprocessor system includes a scratch-pad processing unit, which is connected to the processing cores and is configured to execute, on behalf of the multiple processing cores, a selected part of the software that causes two or more of the processing cores to access concurrently a given item of data.Type: ApplicationFiled: March 29, 2011Publication date: February 16, 2012Applicant: PRIMESENSE LTD.Inventor: Idan Saar
-
Publication number: 20120023309Abstract: Techniques for achieving high-availability using a single processor (CPU). In a system comprising a multi-core processor, at least two partitions may be configured with each partition being allocated one or more cores of the multiple cores. The partitions may be configured such that one partition operates in active mode while another partition operates in standby mode. In this manner, a single processor is able to provide active-standby functionality, thereby enhancing the availability of the system comprising the processor.Type: ApplicationFiled: July 23, 2010Publication date: January 26, 2012Applicant: Brocade Communications Systems, Inc.Inventors: Vineet M. Abraham, Bill Ying Chin, William R. Mahoney, Aditya Saxena, Xupei Liang, Bill Jianqiang Zhou
-
Patent number: 8103856Abstract: In a processor having multiple clusters which operate in parallel, the number of clusters in use can be varied dynamically. At the start of each program phase, the configuration option for an interval is run to determine the optimal configuration, which is used until the next phase change is detected. The optimum instruction interval is determined by starting with a minimum interval and doubling it until a low stability factor is reached.Type: GrantFiled: January 12, 2009Date of Patent: January 24, 2012Assignee: University of RochesterInventors: Rajeev Balasubramonian, Sandhya Dwarkadas, David Albonesi
-
Publication number: 20120017062Abstract: Methods are disclosed for improving data processing performance in a processor using on-chip local memory in multiple processing units. According to an embodiment, a method of processing data elements in a processor using a plurality of processing units, includes: launching, in each of the processing units, a first wavefront having a first type of thread followed by a second wavefront having a second type of thread, where the first wavefront reads as input a portion of the data elements from an off-chip shared memory and generates a first output; writing the first output to an on-chip local memory of the respective processing unit; and writing to the on-chip local memory a second output generated by the second wavefront, where input to the second wavefront comprises a first plurality of data elements from the first output. Corresponding system and computer program product embodiments are also disclosed.Type: ApplicationFiled: July 19, 2011Publication date: January 19, 2012Applicant: Advanced Micro Devices, Inc.Inventors: Vineet GOEL, Todd Martin, Mangesh Nijasure
-
Patent number: 8095811Abstract: Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation.Type: GrantFiled: May 29, 2008Date of Patent: January 10, 2012Assignee: International Business Machines CorporationInventors: Charles J. Archer, Michael A. Blocksome, Amanda A. Peters, Joseph D. Ratterman, Brian E. Smith
-
Patent number: 8095759Abstract: A multiprocessor computer system comprises a plurality of processors and a plurality of nodes, each node comprising one or more processors. A local memory in each of the plurality of nodes is coupled to the processors in each node, and a hardware firewall comprising a part of one or more of the nodes is operable to prevent a write from an unauthorized processor from writing to the local memory.Type: GrantFiled: May 29, 2009Date of Patent: January 10, 2012Assignee: Cray Inc.Inventors: Dennis C. Abts, Steven L. Scott, Aaron F. Godfrey
-
Publication number: 20110320769Abstract: A computing section is provided with a plurality of computing units and correlatively stores entries of configuration information that describes configurations of the plurality of computing units with physical configuration numbers that represent the entries of configuration information and executes a computation in a configuration corresponding to a designated physical configuration number. A status management section designates a physical configuration number corresponding to a status to which the computing section needs to advance the next time for the computing section and outputs the status to which the computing section needs to advance the next time as a logical status number that uniquely identifies the status to which the computing section needs to advance the next time in an object code.Type: ApplicationFiled: December 25, 2009Publication date: December 29, 2011Inventors: Takeshi Inuo, Kengo Nishino, Nobuki Kajihara
-
Publication number: 20110320767Abstract: Methods, systems, and media are provided for a dynamic batch strategy utilized in parallelization of online learning algorithms. The dynamic batch strategy provides a merge function on the basis of a threshold level difference between the original model state and an updated model state, rather than according to a constant or pre-determined batch size. The merging includes reading a batch of incoming streaming data, retrieving any missing model beliefs from partner processors, and training on the batch of incoming streaming data. The steps of reading, retrieving, and training are repeated until the measured difference in states exceeds a set threshold level. The measured differences which exceed the threshold level are merged for each of the plurality of processors according to attributes. The merged differences which exceed the threshold level are combined with the original partial model states to obtain an updated global model state.Type: ApplicationFiled: June 24, 2010Publication date: December 29, 2011Applicant: MICROSOFT CORPORATIONInventors: Taha Bekir Eren, Oleg Isakov, Weizhu Chen, Jeffrey Scott Dunn, Thomas Ivan Borchert, Joaquin Quinonero Candela, Thore Kurt Hartwig Graepel, Ralf Herbrich
-
Publication number: 20110320768Abstract: There is provided a method of, and apparatus for, processing a computation on a computing device comprising at least one processor and a memory, the method comprising: storing, in said memory, plural copies of a set of data, each copy of said set of data having a different compression ratio and/or compression scheme; selecting a copy of said set of data; and performing, on a processor, a computation using said selected copy of said set of data. By providing such a method, different compression ratios and/or compression schemes can be selected as appropriate. For example, if high precision is required in a computation, a copy of the set of data can be chosen which has a low compression ratio at the expense of processing time and memory transfer time. In the alternative, if low precision is acceptable, then the speed benefits of a high compression ratio and/or lossy compression scheme may be utilised.Type: ApplicationFiled: June 25, 2010Publication date: December 29, 2011Applicant: MAXELER TECHNOLOGIES, LTD.Inventors: Oliver Pell, Stephen Girdlestone
-
Patent number: 8086828Abstract: Heterogeneous processors can cooperate for distributed processing tasks in a multiprocessor computing system. Each processor is operable in a “compatible” mode, in which all processors within a family accept the same baseline command set and produce identical results upon executing any command in the baseline command set. The processors also have a “native” mode of operation in which the command set and/or results may differ in at least some respects from the baseline command set and results. Heterogeneous processors with a compatible mode defined by reference to the same baseline can be used cooperatively for distributed processing by configuring each processor to operate in the compatible mode.Type: GrantFiled: March 25, 2009Date of Patent: December 27, 2011Assignee: NVIDIA CorporationInventors: Henry Packard Moreton, Abraham B.de Waal
-
Publication number: 20110314257Abstract: A wireless communication system hosts a plurality of processes in accordance with a communication protocol. The system includes application specific instruction set processors (ASISPs) that provided computation support for the process. Each ASISP is capable of executing a subset of the functions of a communication protocol. A scheduler is used to schedule the ASISPs in a time-sliced algorithm so that each ASISP supports several processes. In this architecture, the ASISP actively performs computations for one of the supported processes (active process) at any given time. The state information of each process supported by a particular ASISP is stored in a memory bank that is uniquely associated with the ASISP. When a scheduler instructs an ASISP to change which process is the active process, the state information for the inactivated process is stored in the memory bank and the state information for the newly activated process is retrieved from the memory bank.Type: ApplicationFiled: July 29, 2011Publication date: December 22, 2011Inventors: Song CHEN, Paul L. CHOU, Christopher C. WOODTHORPE, Venugopal BALASUBRAMONIAN, Keith RIEKEN
-
Publication number: 20110314256Abstract: Described herein are techniques for enabling a programmer to express a call for a data parallel call-site function in a way that is accessible and usable to the typical programmer. With some of the described techniques, an executable program is generated based upon expressions of those data parallel tasks. During execution of the executable program, data is exchanged between non-data parallel (non-DP) capable hardware and DP capable hardware for the invocation of data parallel functions.Type: ApplicationFiled: June 18, 2010Publication date: December 22, 2011Applicant: Microsoft CorporationInventors: Charles David Callahan, II, Paul F. Ringseth, Yosseff Levanoni, Weirong Zhu, Lingli Zhang
-
Publication number: 20110314255Abstract: A processor and method for broadcasting data among a plurality of processing cores is disclosed. The processor includes a plurality of processing cores connected by point-to-point connections. A first of the processing cores includes a router that includes at least an allocation unit and an output port. The allocation unit is configured to determine that respective input buffers on at least two others of the processing cores are available to receive given data. The output port is usable by the router to send the given data across one of the point-to-point connections. The router is configured to send the given data contingent on determining that the respective input buffers are available. Furthermore, the processor is configured to deliver the data to the at least two other processing cores in response to the first processing core sending the data once across the point-to-point connection.Type: ApplicationFiled: June 17, 2010Publication date: December 22, 2011Inventors: Tushar Krishna, Bradford M. Beckmann, Steven K. Reinhardt
-
Patent number: 8081181Abstract: The architecture implements A-buffer in hardware by extending hardware to efficiently store a variable amount of data for each pixel. In operation, a prepass is performed to generate the counts of the fragments per pixel in a count buffer, followed by a prefix sum pass on the generated count buffer to calculate locations in a fragment buffer in which to store all the fragments linearly. An index is generated for a given pixel in the prefix sum pass and stored in a location buffer. Access to the pixel fragments is then accomplished using the index. Linear storage of the data allows for a fast rendering pass that stores all the fragments to a memory buffer without needing to look at the contents of the fragments. This is then followed by a resolve pass on the fragment buffer to generate the final image.Type: GrantFiled: June 20, 2007Date of Patent: December 20, 2011Assignee: Microsoft CorporationInventor: Craig Peeper
-
Patent number: 8074054Abstract: A processing system includes a group of processing units (“PUs”) arranged in a daisy chain configuration or a sequence capable of parallel processing. The processing system, in one embodiment, includes PUs, a demultiplexer (“demux”), and a multiplexer (“mux”). The PUs are connected or linked in a sequence or a daisy chain configuration wherein a first PU is located at the beginning of the sequence and a last digital PU is located at the end of the sequence. Each PU is configured to read an input data packet from a packet stream during a designated reading time frame. If the time frame is outside of the designated reading time frame, a PU allows a packet stream to pass through. The demux forwards a packet stream to the first digital processing unit. The mux receives a packet steam from the last digital processing unit.Type: GrantFiled: December 12, 2007Date of Patent: December 6, 2011Assignee: Tellabs San Jose, Inc.Inventors: Venkata Rangavajjhala, Naveen K. Jain
-
Publication number: 20110296139Abstract: Performing a deterministic reduction operation in a parallel computer that includes compute nodes, each of which includes computer processors and a CAU (Collectives Acceleration Unit) that couples computer processors to one another for data communications, including organizing processors and a CAU into a branched tree topology in which the CAU is a root and the processors are children; receiving, from each of the processors in any order, dummy contribution data, where each processor is restricted from sending any other data to the root CAU prior to receiving an acknowledgement of receipt from the root CAU; sending, by the root CAU to the processors in the branched tree topology, in a predefined order, acknowledgements of receipt of the dummy contribution data; receiving, by the root CAU from the processors in the predefined order, the processors' contribution data to the reduction operation; and reducing, by the root CAU, the processors' contribution data.Type: ApplicationFiled: May 28, 2010Publication date: December 1, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
-
Publication number: 20110296137Abstract: A parallel computer that includes compute nodes having computer processors and a CAU (Collectives Acceleration Unit) that couples processors to one another for data communications. In embodiments of the present invention, deterministic reduction operation include: organizing processors of the parallel computer and a CAU into a branched tree topology, where the CAU is a root of the branched tree topology and the processors are children of the root CAU; establishing a receive buffer that includes receive elements associated with processors and configured to store the associated processor's contribution data; receiving, in any order from the processors, each processor's contribution data; tracking receipt of each processor's contribution data; and reducing, the contribution data in a predefined order, only after receipt of contribution data from all processors in the branched tree topology.Type: ApplicationFiled: May 28, 2010Publication date: December 1, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
-
Patent number: 8065503Abstract: Methods, systems and computer programs for distributing a computing operation among a plurality of processes and for gathering results of the computing operation from the plurality of processes are described.Type: GrantFiled: December 15, 2006Date of Patent: November 22, 2011Assignee: International Business Machines CorporationInventor: Bin Jia
-
Publication number: 20110283089Abstract: A method and system of modularized design for a microprocessor are disclosed. Embodiments disclose modularization techniques, whereby the overall design of the execution unit of the processor is split into different functional modules. The modules are configured to function independent of each other. The microprocessor comprises different components such as a cache logic (201), a clock generation unit (202), a dispatcher (203), a special asynchronous interface (204), an interrupt unit (205), a register file (206) and a multiplexer unit (207). Temporary storage of data in the register files is eliminated, and thus data fetch latency is eliminated. The asynchronous transfer triggered execution architecture increases speed of execution.Type: ApplicationFiled: August 10, 2009Publication date: November 17, 2011Inventor: Harshal Ingale
-
Publication number: 20110283086Abstract: A circuit arrangement, program product and method stream level of detail components between hardware threads in a multithreaded circuit arrangement to perform physics collision detection. Typically, a master hardware thread, e.g., a component loader hardware thread, is used to retrieve level of detail data for an object from a memory and stream the data to one or more slave hardware threads, e.g., collision detection hardware threads, to perform the actual collision detection. Because the slave hardware threads receive the level of detail data from the master thread, typically the slave hardware threads are not required to load the data from the memory, thereby reducing memory bandwidth requirements and accelerating performance.Type: ApplicationFiled: May 12, 2010Publication date: November 17, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Eric Oliver Mejdrich, Paul Emery Schardt, Robert Allen Shearer
-
Publication number: 20110283087Abstract: A first processing unit is implemented by executing a first application program by using an internal computer in an environment where a first operating system is operating. The first processing unit performs a first process or an external service call in accordance with instruction information describing a process to be executed. A second processing unit is implemented by executing a second application program by using the internal computer or an additional computer connected to the internal computer in an environment where a second operating system is operating. The second processing unit performs a second process when instructed by an external service call to execute the second process. When the instruction information includes information specifying the second process as the process to be executed, a transfer unit updates the information included in the instruction information, and transfers the updated instruction information to the first processing unit.Type: ApplicationFiled: October 29, 2010Publication date: November 17, 2011Applicant: FUJI XEROX CO., LTD.Inventors: Tsuyoshi WATANABE, Yoshiaki TEZUKA, Kunihiko KOBAYASHI, Tomomichi ADEGAWA
-
Publication number: 20110283059Abstract: Various embodiments are disclosed for accelerating computations using field programmable gate arrays (FPGA). Various tree traversal techniques, architectures, and hardware implementations are disclosed. Various disclosed embodiments comprise hybrid architectures comprising a central processing unit (CPU), a graphics processor unit (GPU), a field programmable gate array (FPGA), and variations or combinations thereof, to implement raytracing techniques. Additional disclosed embodiments comprise depth-breadth search tree tracing techniques, blocking tree branch traversal techniques to avoid data explosion, compact data structure representations for ray and node representations, and multiplexed processing of multiple rays in a programming element (PE) to leverage pipeline bubble.Type: ApplicationFiled: May 10, 2011Publication date: November 17, 2011Applicant: Progeniq Pte LtdInventors: Sundar Govindarajan, Vinod Ranganathan Iyer, Darran Nathan
-
Publication number: 20110283088Abstract: A data processing apparatus includes a connecting unit that distributes the plurality of processing modules over the stages, and connects the plurality of processing modules such that a plurality of partial data are processed in parallel. The data processing apparatus detects, with respect to at least a part of the stages, a ratio of an amount of data for which processing in the subsequent stage has been executed, as a passage rate, acquires a processing time for a data amount to be processed in each stage, for which the passage rate was detected, based on the passage rate, and determines the number of processing modules distributed to each stage based on the data amount.Type: ApplicationFiled: May 6, 2011Publication date: November 17, 2011Applicant: CANON KABUSHIKI KAISHAInventors: Ryoko Natori, Shinji Shiraga
-
Publication number: 20110264889Abstract: Systems, methods, and an article of manufacture for the reduction in process load experienced by a primary processor when executing an application by dynamically reassigning portions of the application to one or more secondary processors are shown and described. A second processing unit is queried for one or more characteristics. One or more performance characteristics of the second processor are measured. A portion of the application can be reassigned to the second processing unit based on the queried characteristics and performance measurements.Type: ApplicationFiled: April 21, 2010Publication date: October 27, 2011Applicant: MIRICS SEMICONDUCTOR LIMITEDInventor: Christopher Stolarik
-
Publication number: 20110258245Abstract: A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.Type: ApplicationFiled: April 14, 2010Publication date: October 20, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael A. Blocksome, Daniel A. Faraj
-
Publication number: 20110252219Abstract: According to an aspect of the present invention, there is provided an information processing apparatus including: a first processor; a second processor that has an information processing capability and a power consumption higher than those of the first processor; a temperature monitoring module configured to acquire an operating temperature of the second processor; a throttle number determination module configured to determine whether the throttling control is performed a given number of times or more within a given time interval; and a processor switching control module configured to perform, when the operating temperature of the second processor is equal to or higher than a given temperature: stopping an operation of the second processor; causing the first processor to perform an information process; and prohibiting the operation of the second processor.Type: ApplicationFiled: June 20, 2011Publication date: October 13, 2011Inventor: Hajime Sonobe
-
Publication number: 20110246995Abstract: The disclosed embodiments provide a system that facilitates scheduling threads in a multi-threaded processor with multiple processor cores. During operation, the system executes a first thread in a processor core that is associated with a shared cache. During this execution, the system measures one or more metrics to characterize the first thread. Then, the system uses the characterization of the first thread and a characterization for a second, second thread to predict a performance impact that would occur if the second thread were to simultaneously execute in a second processor core that is also associated with the cache. If the predicted performance impact indicates that executing the second thread on the second processor core will improve performance for the multi-threaded processor, the system executes the second thread on the second processor core.Type: ApplicationFiled: April 5, 2010Publication date: October 6, 2011Applicant: ORACLE INTERNATIONAL CORPORATIONInventors: Alexandra Fedorova, David Vengerov, Kishore Kumar Pusukuri
-
Publication number: 20110246748Abstract: Illustrated is a system and method that includes a processor and service processor co-located on a common socket, the service processor to aggregate data from a distributed network of additional service processors and processors both of which are co-located on an additional common socket. The system and method also includes a first sensor to record the data from the processor. The system and method also includes a second sensor to record the data from a software stack. The system and method further includes a registry to store the data.Type: ApplicationFiled: April 6, 2010Publication date: October 6, 2011Inventors: Vanish Talwar, Jeffrey R. Hilland, Vidhya Kannan, Sandeep KS, Prashanth V
-
Publication number: 20110238951Abstract: An image forming apparatus includes: plural processing units which execute plural processing functions that are different from each other; an execution-in-progress information acquiring unit which acquires execution-in-progress function information that is information about a first processing unit which is executing processing, of the plural processing units; a discrimination unit which discriminates a second processing unit that cannot execute processing when the first processing unit indicated by the execution-in-progress function information acquired by the execution-in-progress information acquiring unit is executing processing, from among the plural processing units; and an executability information generating unit which generates inexecutable function information that is information about the second processing unit, based on a result of determination by the discrimination unit.Type: ApplicationFiled: March 23, 2011Publication date: September 29, 2011Applicants: KABUSHIKI KAISHA TOSHIBA, TOSHIBA TEC KABUSHIKI KAISHAInventor: Kanako Asari
-
Publication number: 20110238949Abstract: Distributed administration of a lock for an operational group of compute nodes in a hierarchical tree structured network including assigning the root node of the operational group to send acknowledgments for lock requests, the root lock administration module comprising a module of automated computing machinery; receiving a lock request assigned to a particular node from a child node; determining whether another request from another child is directly ahead in an acknowledgement queue; if a request from another child is directly ahead in the acknowledgement queue, putting the lock request for the particular node in the acknowledgement queue until the lock request directly ahead in the acknowledgement queue is satisfied and when the lock request ahead in the queue is satisfied, sending the particular node for whom the lock request is assigned a message acknowledging the particular node has the lock; and if a request from another child is not directly ahead in a queue, sending to the particular node for whom theType: ApplicationFiled: March 29, 2010Publication date: September 29, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
-
Publication number: 20110238950Abstract: Performing a scattery operation on a hierarchical tree network optimized for collective operations including receiving, by the scattery module installed on the node, from a nearest neighbor parent above the node a chunk of data having at least a portion of data for the node; maintaining, by the scattery module installed on the node, the portion of the data for the node; determining, by the scattery module installed on the node, whether any portions of the data are for a particular nearest neighbor child below the node or one or more other nodes below the particular nearest neighbor child; and sending, by the scattery module installed on the node, those portions of data to the nearest neighbor child if any portions of the data are for a particular nearest neighbor child below the node or one or more other nodes below the particular nearest neighbor child.Type: ApplicationFiled: March 29, 2010Publication date: September 29, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
-
Patent number: 8019978Abstract: A unit status reporting protocol may also be used for context switching, debugging, and removing deadlock conditions in a processing unit. A processing unit is in one of five states: empty, active, stalled, quiescent, and halted. The state that a processing unit is in is reported to a front end monitoring unit to enable the front end monitoring unit to determine when a context switch may be performed or when a deadlock condition exists. The front end monitoring unit can issue a halt command to perform a context switch or take action to remove a deadlock condition and allow processing to resume.Type: GrantFiled: August 13, 2007Date of Patent: September 13, 2011Assignee: NVIDIA CorporationInventors: Michael C. Shebanow, Robert C. Keller, Richard A. Silkebakken
-
Publication number: 20110219211Abstract: A CPU core unlocking device applied to a computer system is provided. The core unlocking device includes a CPU having a plurality of signal terminals and a core unlocking executing unit having a plurality of GPIO ports connected with the corresponding signal terminals of the CPU. The GPIO ports of the core unlocking executing unit generate and transmit and transmit a combination of core unlocking signal to the signal terminals of the CPU to unlock the CPU core.Type: ApplicationFiled: March 3, 2011Publication date: September 8, 2011Applicant: ASUSTeK COMPUTER INC.Inventors: Pei-Hua Sun, Pai-Ching Huang, Yi-Min Huang, Meng-Hsiung Lee, Nan-Kun Lo
-
Publication number: 20110213934Abstract: A data processing apparatus and method are provided for switching performance of a workload between two processing circuits. The data processing apparatus has first processing circuitry which is architecturally compatible with second processing circuitry, but with the first processing circuitry being micro-architecturally different from the second processing circuitry. At any point in time, a workload consisting of at least one application and at least one operating system for running that application is performed by one of the first processing circuitry and the second processing circuitry. A switch controller is responsive to a transfer stimulus to perform a handover operation to transfer performance of the workload from source processing circuitry to destination processing circuitry, with the source processing circuitry being one of the first and second processing circuitry and the destination processing circuitry being the other of the first and second processing circuitry.Type: ApplicationFiled: March 1, 2010Publication date: September 1, 2011Applicant: ARM LimitedInventors: Peter Richard Greenhalgh, Richard Roy Grisenthwaite
-
Publication number: 20110213950Abstract: A technique for reducing the power consumption required to execute processing operations. A processing complex, such as a CPU or a GPU, includes a first set of cores comprising one or more fast cores and second set of cores comprising one or more slow cores. A processing mode of the processing complex can switch between a first mode of operation and a second mode of operation based on one or more of the workload characteristics, performance characteristics of the first and second sets of cores, power characteristics of the first and second sets of cores, and operating conditions of the processing complex. A controller causes the processing operations to be executed by either the first set of cores or the second set of cores to achieve the lowest total power consumption.Type: ApplicationFiled: May 25, 2010Publication date: September 1, 2011Inventors: John George Mathieson, Phil Carmack, Brian Smith
-
Publication number: 20110213935Abstract: A data processing apparatus and method are provided for switching performance of a workload between two processing circuits. The data processing apparatus has first processing circuitry which is architecturally compatible with second processing circuitry, but with the first processing circuitry being micro-architecturally different from the second processing circuitry. At any point in time, a workload consisting of at least one application and at least one operating system for running that application is performed by one of the first processing circuitry and the second processing circuitry. A switch controller is responsive to a transfer stimulus to perform a handover operation to transfer performance of the workload from source processing circuitry to destination processing circuitry, with the source processing circuitry being one of the first and second processing circuitry and the destination processing circuitry being the other of the first and second processing circuitry.Type: ApplicationFiled: March 1, 2010Publication date: September 1, 2011Applicant: ARM LimitedInventors: Peter Richard Greenhalgh, Richard Roy Grisenthwaite