Abstract: An automated method of optimizing execution of a program in a parallel processing environment is disclosed. The program has a plurality of threads and is executable in parallel and serial hardware. The method includes receiving the program at an optimizer and compiling the program to execute in parallel hardware. The execution of the program is observed by the optimizer to identify a subset of memory operations that execute more efficiently on serial hardware than parallel hardware. A subset of memory operations that execute more efficiently on parallel hardware than serial hardware are identified. The program is recompiled so that threads that include memory operations that execute more efficiently on serial hardware than parallel hardware are compiled for serial hardware, and threads that include memory operations that execute more efficiently on parallel hardware than serial hardware are compiled for parallel hardware. Subsequent execution of the program occurs using the recompiled program.
Abstract: An automated method of optimizing execution of a program in a parallel processing environment is described. The program is adapted to execute in data memory and instruction memory. An optimizer receives the program to be optimized. The optimizer instructs the program to be compiled and executed. The optimizer observes execution of the program and identifies a subset of instructions that execute most often. The optimizer also identifies groups of instructions associated with the subset of instructions that execute most often. The identified groups of instructions include the identified subset of instructions that execute most often. The optimizer recompiles the program and stores the identified groups of instructions in instruction memory. The remaining instructions portions of the program are stored in the data memory. The instruction memory has a higher access rate and smaller capacity than the data memory. Once recompiled, subsequent execution of the program occurs using the recompiled program.
Abstract: A method of providing network security for executing applications is disclosed. One or more servers including a plurality of microprocessors and a plurality of network processors are provided. A first grouping of microprocessors executes a first application. The first application is executed using the microprocessors in the first grouping. The microprocessors in the first grouping of microprocessors are permitted to communicate with each other via one or more of the network processors. A second grouping of microprocessors executes a second application. At least one server has one or more microprocessors for executing the first application and one or more different microprocessors for executing the second application. The second application is executed using the microprocessors in the second grouping of microprocessors.
Abstract: An automated method of performing exponentiation is disclosed. A plurality of tables holding factors for obtaining results of Exponentiations are provided. The plurality of tables are loaded into computer memory. Each factor is the result of a second exponentiation of a constant and an exponent. The exponent is related to a memory address corresponding to the factor. A plurality of memory addresses are identified for performing the first exponentiation by breaking up the first exponentiation into equations, the results of which are factors of the first Exponentiation. The exponents of the equations are related to the memory addresses corresponding to the factors held in the tables. A plurality of lookups into the computer memory are performed to retrieve the factors held in the tables corresponding to the respective memory addresses. The retrieved factors are multiplied together to obtain the result of the first exponentiation.
Abstract: An integrated circuit (IC) is disclosed. The integrated circuit includes a non-reconfigurable multi-threaded processor core that implements a pipeline having n ordered stages, wherein n is an integer greater than 1. The multi-threaded processor core implements a default instruction set. The integrated circuit also includes reconfigurable hardware that implements n discrete pipeline stages of a reconfigurable execution unit. The n discrete pipeline stages of the reconfigurable execution unit are pipeline stages of the pipeline that is implemented by the multi-threaded processor core.
Abstract: A computing system is provided that includes a web page search node including a web page collection, a web server, and a search page returner.
Abstract: This invention provides a system and method that can employ a low-instruction-per-second (lower-power), highly parallel processor architecture to perform the low-precision computations. These are aggregated at high-precision by an aggregator. Either a high-precision processor arrangement, or a low-precision processor arrangement, employing soft-ware-based high-precision program instructions performs the less-frequent, generally slower high-precision computations of the aggregated, more-frequent low-precision computations. One final aggregator totals all low-precision computations and another high-precision aggregator totals all high-precision computations. An equal number of low precision computations are used to generate the error value that is subtracted from the low-precision average. A plurality of lower-power processors can be arrayed to provide the low-precision computation function.
Abstract: This invention provides a computer system architecture and method for providing the same which can include a web page search node including a web page collection. The system and method can also include a web server configured to receive, from a given user via a web browser, a search query including keywords. The node is caused to search pages in its own collection that best match the search query. A search page returner may be provided which is configured to return, to the user, high ranked pages. The node may include a power-efficiency-enhanced processing subsystem, which includes M processors. The M processors are configured to emulate N virtual processors, and they are configured to limit a virtual processor memory access rate at which each of the N virtual processors accesses memory. The memory accessed by each of the N virtual processors may be RAM. In select embodiments, the memory accessed by each of the N virtual processors includes DRAM having a high capacity yet lower power consumption then SRAM.
Abstract: This invention provides a system and method that can employ a low-instruction-per-second (lower-power), highly parallel processor architecture to perform the low-precision computations. These are aggregated at high-precision by an aggregator. Either a high-precision processor arrangement, or a low-precision processor arrangement, employing soft-ware-based high-precision program instructions performs the less-frequent, generally slower high-precision computations of the aggregated, more-frequent low-precision computations. One final aggregator totals all low-precision computations and another high-precision aggregator totals all high-precision computations. An equal number of low precision computations are used to generate the error value that is subtracted from the low-precision average. A plurality of lower-power processors can be arrayed to provide the low-precision computation function.
Abstract: This invention provides a computer system architecture and method for providing the same which can include a web page search node including a web page collection. The system and method can also include a web server configured to receive, from a given user via a web browser, a search query including keywords. The node is caused to search pages in its own collection that best match the search query. A search page returner may be provided which is configured to return, to the user, high ranked pages. The node may include a power-efficiency-enhanced processing subsystem, which includes M processors. The M processors are configured to emulate N virtual processors, and they are configured to limit a virtual processor memory access rate at which each of the N virtual processors accesses memory. The memory accessed by each of the N virtual processors may be RAM. In select embodiments, the memory accessed by each of the N virtual processors includes DRAM having a high capacity yet lower power consumption then SRAM.