For A Parallel Or Multiprocessor System Patents (Class 717/149)
-
Publication number: 20140380288Abstract: Apparatus, systems, and methods for a compiler are described. One such compiler generates machine code corresponding to a set of elements including a general purpose element and a special purpose element. The compiler identifies a portion in an arrangement of relationally connected operators that corresponds to a special purpose element. The compiler also determines whether the portion meets a condition to be mapped to the special purpose element. The compiler also converts the arrangement into an automaton comprising a plurality of states, wherein the portion is converted using a special purpose state that corresponds to the special purpose element if the portion meets the condition. The compiler also converts the automaton into machine code. Additional apparatus, systems, and methods are disclosed.Type: ApplicationFiled: September 5, 2014Publication date: December 25, 2014Inventors: Junjuan Xu, Paul Glendenning
-
Patent number: 8918770Abstract: A system and method for compiling includes, for a parallelizable code portion of an application stored on a computer readable storage medium, determining one or more variables that are to be transferred to and/or from a coprocessor if the parallelizable code portion were to be offloaded. A start location and an end location are determined for at least one of the one or more variables as a size in memory. The parallelizable code portion is transformed by inserting an offload construct around the parallelizable code portion and passing the one or more variables and the size as arguments of the offload construct such that the parallelizable code portion is offloaded to a coprocessor at runtime.Type: GrantFiled: August 24, 2012Date of Patent: December 23, 2014Assignee: NEC Laboratories America, Inc.Inventors: Nishkam Ravi, Tao Bao, Ozcan Ozturk, Srimat Chakradhar
-
Patent number: 8914781Abstract: Described is predicting cache locality in a multicore/multithreaded processing environment including when threads share cache data in a non-uniform interleaving manner. Thread execution traces are analyzed to compute a set of per-thread parameters that can then be used to predict cache miss rates for other cache sizes. In one aspect, a model is based upon a probability that the cache reuse distance will increase because of accesses by other threads, and another probability that the reuse distance will decrease because of intercept accesses by other threads to shared data blocks. Estimates of the number of shared data blocks, possibly shared data blocks and private data blocks are used in the computations.Type: GrantFiled: October 24, 2008Date of Patent: December 16, 2014Assignee: Microsoft CorporationInventors: Trishul A. Chilimbi, Chen Ding
-
Publication number: 20140359589Abstract: System and method for configuring a system of heterogeneous hardware components, including at least one: programmable hardware element (PHE), digital signal processor (DSP) core, and programmable communication element (PCE). A program, e.g., a graphical program (GP), which includes floating point math functionality and which is targeted for distributed deployment on the system is created. Respective portions of the program for deployment to respective ones of the hardware components are automatically determined. Program code implementing communication functionality between the at least one PHE and the at least one DSP core and targeted for deployment to the at least one PCE is automatically generated. At least one hardware configuration program (HCP) is generated from the program and the code, including compiling the respective portions of the program and the program code for deployment to respective hardware components. The HCP is deployable to the system for concurrent execution of the program.Type: ApplicationFiled: October 25, 2013Publication date: December 4, 2014Applicant: NATIONAL INSTRUMENTS CORPORATIONInventors: Jeffrey L. Kodosky, Hugo A. Andrade, Brian Keith Odom, Cary Paul Butler, Brian C. MacCleery, James C. Nagle, J. Marcus Monroe, Alexandre M. Barp
-
Publication number: 20140359590Abstract: System and method for configuring a system of heterogeneous hardware components, including at least one: programmable hardware element (PHE), digital signal processor (DSP) core, and programmable communication element (PCE). A program, e.g., a graphical program (GP), which includes floating point math functionality and which is targeted for distributed deployment on the system is created. Respective portions of the program for deployment to respective ones of the hardware components are automatically determined. Program code implementing communication functionality between the at least one PHE and the at least one DSP core and targeted for deployment to the at least one PCE is automatically generated. At least one hardware configuration program (HCP) is generated from the program and the code, including compiling the respective portions of the program and the program code for deployment to respective hardware components. The HCP is deployable to the system for concurrent execution of the program.Type: ApplicationFiled: October 25, 2013Publication date: December 4, 2014Applicant: NATIONAL INSTRUMENTS CORPORATIONInventors: Jeffrey L. Kodosky, Hugo A. Andrade, Brian Keith Odom, Cary Paul Butler, Brian C. MacCleery, James C. Nagle, J. Marcus Monroe, Alexandre M. Barp
-
Patent number: 8904369Abstract: A method for automated process distribution includes selecting a process definition; identifying a first process portion and at least one second process portion in the process definition; generating a first further process definition for the first process portion; generating a second further process definition for each the second process portion; generating a corresponding service definition for each the second further process definition. In the method, generating the first further process definition includes generating a process definition element configured to invoke at least one service of the service definitions, and generating the second further process definition includes generating a process definition element configured to offer a service of the service definition corresponding to that second further process definition.Type: GrantFiled: June 6, 2007Date of Patent: December 2, 2014Assignee: International Business Machines CorporationInventor: Dieter Roller
-
Patent number: 8898648Abstract: A profiling tool identifies a code region with a false sharing potential. A static analysis tool classifies variables and arrays in the identified code region. A mapping detection library correlates memory access instructions in the identified code region with variables and arrays in the identified code region while a processor is running the identified code region. The mapping detection library identifies one or more instructions at risk, in the identified code region, which are subject to an analysis by a false sharing detection library. A false sharing detection library performs a run-time analysis of the one or more instructions at risk while the processor is re-running the identified code region. The false sharing detection library determines, based on the performed run-time analysis, whether two different portions of the cache memory line are accessed by the generated binary code.Type: GrantFiled: November 30, 2012Date of Patent: November 25, 2014Assignee: International Business Machines CorporationInventors: I-Hsin Chung, Guojing Cong, Hiroki Murata, Yasushi Negishi, Hui-Fang Wen
-
Patent number: 8898652Abstract: Various technologies and techniques are disclosed for providing a hardware accelerated software transactional memory application. The software transactional memory application has access to metadata in a cache of a central processing unit that can be used to improve the operation of the STM system. For example, open read barrier filtering is provided that uses an opened-for-read bit that is contained in the metadata to avoid redundant open read processing. Similarly, redundant read log validation can be avoided using the metadata. For example, upon entering commit processing for a particular transaction, a get-evictions instruction in an instruction set architecture of the central processing unit is invoked. A retry operation can be optimized using the metadata. The particular transaction is aborted at a current point and put to sleep. The corresponding cache line metadata in the metadata are marked appropriately to efficiently detect a write by another CPU.Type: GrantFiled: June 8, 2007Date of Patent: November 25, 2014Assignee: Microsoft CorporationInventors: Jan Gray, Timothy L. Harris, James Larus, Burton Smith
-
Patent number: 8892738Abstract: A technique for generating component usage statistics involves associating components with blocks of a stream-enabled application. When the streaming application is executed, block requests may be logged by Block ID in a log. The frequency of component use may be estimated by analyzing the block request log with the block associations.Type: GrantFiled: April 4, 2008Date of Patent: November 18, 2014Assignee: Numecent Holdings, Inc.Inventors: Jeffrey de Vries, Arthur Shingen Hitomi
-
Patent number: 8881124Abstract: According to the conventional loop parallelization method, when a loop in which a value of a loop-carried dependency variable can be calculated in all of the iterations without sequentially executing the loop from the start, it is determined that DOALL parallelization is not applicable due to the loop-carried dependency variable. Accordingly, the loop is sequentially executed or parallelized by using DOACROSS parallelization that executes a loop including a loop-carried dependency variable. That is, there is a problem that an expression including a loop-carried dependency cannot be parallelized and efficiently processed with use of a multi-processor. By generating initial value calculating codes, the loop-carried dependency in a source code prior to parallelization can be solved, and by dividing a loop included in the source code into subloops that can be executed in parallel, the multi-processor can efficiently process the source code.Type: GrantFiled: December 13, 2011Date of Patent: November 4, 2014Assignee: Panasonic CorporationInventor: Daisuke Baba
-
Publication number: 20140325494Abstract: A device including a data analysis element including a plurality of memory cells. The memory cells analyze at least a portion of a data stream and output a result of the analysis. The device also includes a detection cell. The detection cell includes an AND gate. The AND gate receives result of the analysis as a first input. The detection cell also includes a D flip-flop including an output coupled to a second input of the AND gate.Type: ApplicationFiled: July 11, 2014Publication date: October 30, 2014Inventors: David R. Brown, Harold B Noyes
-
Patent number: 8869126Abstract: A method and apparatus is disclosed for compilation of an original Cobol program with support for improved performance by increased parallelism during execution using multiple threads of processing. The approach includes a two stage compilation process, the first compilation/translation step by a first specialized compiler/translator that takes as input a Cobol source program that includes parallelization directives, and produces as output an intermediate computer program in a second computer programming language, the intermediate program including parallelization directives in the second computer programming language. The intermediate program is then compiled utilizing a selected second compiler that provides support for parallelism described in the second programming language. The approach optionally allows for use of pragmas serving as parallelization directives to the compiler in the original Cobol program or in the intermediate program.Type: GrantFiled: December 28, 2012Date of Patent: October 21, 2014Assignee: Bull HN Information Systems Inc.Inventors: Cynthia S. Guenthner, Russell W. Guenthner, John Edward Heath, Albert Henry John Wigchert, F. Michel Brown, Nicholas John Colasacco, Clinton B. Eckard
-
Patent number: 8869125Abstract: The invention relates to a system and method for demarcating information related to one or more blocks in an application source code. This invention provides a means to annotate block information in the source code. It parses the application source code to generate an abstract syntax tree and instruments the source code to capture information related to the one or more blocks generated at the time of dynamic analysis of the application. The information related to the one or more blocks are stored in Hash Map and based on this information the abstract syntax tree is modified to add the information related to the one or more blocks and inserting this information in the application source code.Type: GrantFiled: December 13, 2012Date of Patent: October 21, 2014Assignee: Infosys LimitedInventors: Murali Krishna Emani, Sudeep Mallick, Balkrishna Prasad
-
Patent number: 8869121Abstract: Data processing using multidimensional fields is described along with methods for advantageously using high-level language codes.Type: GrantFiled: July 7, 2011Date of Patent: October 21, 2014Assignee: Pact XPP Technologies AGInventors: Martin Vorbach, Frank May, Armin Nückel
-
Patent number: 8869104Abstract: A system and method for managing several versions of a device with embedded object code by using an editor to scan the object code, find a signature, change one or more parameters within the object code, and replace the object code. The device may be shipped to a customer in a standard configuration and the object code may be changed by the customer using the editor.Type: GrantFiled: June 30, 2004Date of Patent: October 21, 2014Assignee: LSI CorporationInventor: Roy Wade
-
Patent number: 8863105Abstract: An automatic control system capable of executing a control program in parallel is described. The system includes more than one unit controller, each executing in parallel at least a part of the program to be executed by the automatic control system; a compiler, connected to one of the unit controllers, for converting the program to be executed by the automatic control system into tasks executed in parallel by the unit controllers; an interconnection network, for connecting the unit controllers, such that information on one of the unit controllers is transferred to another one via the interconnection network.Type: GrantFiled: November 28, 2008Date of Patent: October 14, 2014Assignee: Siemens AktiengesellschaftInventors: Ming Jie, Fei Long, Li Pan, Detlef Pauly
-
Patent number: 8863104Abstract: Systems and methods for parallelizing applications that operate on irregular data structures. In an embodiment, the methods and systems enable programmers to use set iterators to express algorithms containing amorphous data parallelism. Parallelization can be achieved by speculatively executing multiple iterations of the iterator in parallel. Conflicts between speculatively executing iterations can be detected and handled using information in class libraries.Type: GrantFiled: June 10, 2009Date of Patent: October 14, 2014Assignee: Board of Regents, The University of Texas SystemInventors: Keshav Kumar Pingali, Milind Vidyadhar Kulkarni
-
Patent number: 8850409Abstract: A method is provided for translating sets of constraint declarations to imperative code sequences based on defining an instantiatable object per set, inserting calls to a notification callback mechanism on state modification and defining calls in the constraint context as imperative code sequences that, in response to these callbacks, take actions to maintain these constraints.Type: GrantFiled: May 21, 2008Date of Patent: September 30, 2014Assignee: OptumSoft, Inc.Inventor: David R. Cheriton
-
Patent number: 8850410Abstract: A system and method for improving software maintainability, performance, and/or security by associating a unique marker to each software code-block; the system comprising of a plurality of processors, a plurality of code-blocks, and a marker associated with each code-block. The system may also include a special hardware register (code-block marker hardware register) in each processor for identifying the markers of the code-blocks executed by the processor, without changing any of the plurality of code-blocks.Type: GrantFiled: January 29, 2010Date of Patent: September 30, 2014Assignee: International Business Machines CorporationInventors: Ramanjaneya S. Burugula, Joefon Jann, Pratap C. Pattnaik
-
Patent number: 8843910Abstract: A facility for identifying functionally distinct memory access reorderings for a multithreaded program is described. The facility monitors execution of the program to detect, for each of one or more memory locations, an order in which the memory location was accessed by the threads of the program, each access being at least one of a read access and a write access. Among a number of possible memory access reorderings of a read access by a reading thread to a location and a write access by a writing thread to the same location where the write access preceded the read access, the facility identifies as functionally distinct memory access reorderings those possible memory access reorderings where the reading thread could have become newly aware of changed state of the writing thread as a result of the indicated read access.Type: GrantFiled: March 14, 2011Date of Patent: September 23, 2014Assignee: F5 Networks, Inc.Inventors: Andrew M. Schwerin, Peter J. Godman, Kaya Bekiroglu
-
Patent number: 8843909Abstract: A method for transforming a procedural program having procedural language code into an object-oriented distributed software program is provided. A procedural program is transformed into intermediate client-server code. The intermediate client-server code is partitioned into an N-tier application program.Type: GrantFiled: July 8, 2004Date of Patent: September 23, 2014Assignee: CA, Inc.Inventors: David L. Tondreau, Jr., John P. Mahony
-
Patent number: 8838626Abstract: Embodiments of techniques and systems for parallel XML parsing are described. An event-level XML parser may include a lightweight events partitioning stage, parallel events parsing stages, and a post-processing stage. The events partition may pick out event boundaries using single-instruction, multiple-data instructions to find occurrences of the “<” character, marking event boundaries. Subsequent checking may be performed to help identify other event boundaries, as well as non-boundary instances of the “<” character. During events parsing, unresolved items, such as namespace resolution or matching of start and end elements, may be recorded in structure metadata. This structure metadata may be used during the subsequent post-processing to perform a check of the XML data. If the XML data is well-formed, individual sub-event streams formed by the events parsing processes may be assembled into a flat result event stream structure. Other embodiments may be described and claimed.Type: GrantFiled: December 17, 2009Date of Patent: September 16, 2014Assignee: Intel CorporationInventors: Zhiqiang Yu, Yuejian Fang, Lei Zhai, Yun Wang, Zhonghai Wu, Mo Dai
-
Patent number: 8839213Abstract: A compiler is provided that determines when the use of software transactional memory (STM) primitives may be optimized with respect to a set of collectively dominating STM primitives. The compiler analysis coordinates the use of variables containing possible shadow copy pointers to allow the analysis to be performed for both direct write and buffered write STM systems. The coordination of the variables containing the possible shadow copy pointers ensures that the results of STM primitives are properly reused. The compiler analysis identifies memory accesses where STM primitives may be eliminated, combined, or substituted for lower overhead STM primitives.Type: GrantFiled: June 27, 2008Date of Patent: September 16, 2014Assignee: Microsoft CorporationInventors: David L. Detlefs, Michael M. Magruder, Yosseff Levanoni, Vinod K. Grover
-
Patent number: 8839212Abstract: A code generator and multi-core framework are executable in a computer system to implement methods as disclosed herein, including a method for the code generator to automatically generate multi-threaded source code from functional specifications, and for the multi-core framework, which is a run time component, to generate multi-threaded task object code from the multi-threaded source code and to execute the multi-threaded task object code on respective processor cores. The methods provide transparency to the programmer, and during execution, provide automatic identification of processing parallelisms. The methods implement Consume-Simplify-Produce and Normalize-Transpose-Distribute operations to reduce complex expression sets in a functional specification to simplified expression sets operable in parallel processing environments through the generated multi-threaded task object code.Type: GrantFiled: September 30, 2013Date of Patent: September 16, 2014Assignee: Texas Tech University SystemInventors: Daniel E. Cooke, J. Nelson Rushton, Brad Nemanich
-
Patent number: 8839214Abstract: A high level programming language provides an extensible set of transformations for use on indexable types in a data parallel processing environment. A compiler for the language implements each transformation as a map from indexable types to allow each transformation to be applied to other transformations. At compile time, the compiler identifies sequences of the transformations on each indexable type in data parallel source code and generates data parallel executable code to implement the sequences as a combined operation at runtime using the transformation maps. The compiler also incorporates optimizations that are based on the sequences of transformations into the data parallel executable code.Type: GrantFiled: June 30, 2010Date of Patent: September 16, 2014Assignee: Microsoft CorporationInventors: Paul F. Ringseth, Weirong Zhu, Rick Molloy, Charles D. Callahan, II, Yosseff Levanoni, Lingli Zhang
-
Publication number: 20140258995Abstract: A compiler and language using the comma as a parallelism operator may ensure that variables on the left hand side of a line of code are only used once, and that the variables on the left hand side of the line of code are not being used as function arguments. Commas may be replaced with semi-colons.Type: ApplicationFiled: March 5, 2014Publication date: September 11, 2014Inventor: Steven Mark CASSELMAN
-
Patent number: 8826234Abstract: A relational model may be used to encode primitives for each of a plurality of threads in a multi-core processor. The primitives may include tasks and parameters, such as buffers. The relationships may be linked to particular tasks. The tasks with the coding, which indicates the relationships, may then be used upon user selection to display a visualization of the functional relationships between tasks.Type: GrantFiled: December 23, 2009Date of Patent: September 2, 2014Assignee: Intel CorporationInventors: Christopher J. Cormack, Nathaniel Duca, Jason Plumb
-
Patent number: 8826258Abstract: A method of generating a computer program, the method comprising: independently compiling a plurality of source code modules to generate a plurality of respective object modules comprising a plurality of respective threads explicitly designated by a user to be executed in parallel; in each of the object modules, inserting at least one symbol indicative of a property of the object module's thread potentially conflicting with a corresponding property of a thread of another of said object module as a result of parallel execution of those threads; executing a linker to perform a linking process on said object modules, the linking process comprising: assessing the symbols in conjunction with one another to determine whether a conflict exists between the threads of two or more of the respective object modules; and linking the object modules to generate a computer program in which said threads are executable in parallel, wherein the linking is performed in dependence on said assessment.Type: GrantFiled: May 11, 2009Date of Patent: September 2, 2014Assignee: Xmos LimitedInventors: Martin Young, Richard Osborne, Douglas Watt
-
Patent number: 8826053Abstract: An engine processor program, stored in a non-volatile storage region 37 of a storage section 35 connected to a host processor 31 of a host section 30, for execution in an engine processor 41 of an engine section 40, is transmitted from the host section 30 to the engine section 40. The engine processor program received by the engine section 40 is stored in a volatile storage section 42 connected to an engine processor 51. Then, the host section 30 notifies an execution instruction for a specified program, among the engine processor programs stored in the storage section 42, to the engine section 40 and causes execution on the engine processor 41. As a result, even in a structure provided, the engine section 40 does not need a large capacity non-volatile storage region, thereby configuring a compact mobile communication terminal.Type: GrantFiled: December 4, 2006Date of Patent: September 2, 2014Assignee: Vodafone Group PLCInventors: Masahiko Kuwabara, Kazuo Aoki, Toshiro Matsumura
-
Patent number: 8813044Abstract: A method, system, and article of manufacture are disclosed for transforming a definition of a process for delivering a service. This service process definition is comprised of computer readable code. The method comprises the steps of expressing a given set of assumptions in a computer readable code; and transforming said process definition by using a processing unit to apply said assumptions to said process definition to change the configuration of the process definition. The process definition may be transformed by using factors relating to the specific context in or for which the process definition is executed. Also, the process definition may be transformed by identifying, in a flow diagram for the service process definition, flows to which the assumptions apply, and applying program rewriting techniques to those identified flows.Type: GrantFiled: September 6, 2012Date of Patent: August 19, 2014Assignee: International Business Machines CorporationInventors: David F. Bantz, Steven J. Mastrianni, James R. Moulic, Dennis G. Shea
-
Patent number: 8813052Abstract: Various technologies and techniques are disclosed for providing a bounded transactional memory application that accesses cache metadata in a cache of a central processing unit. When performing a transactional read from the bounded transactional memory application, a cache line metadata transaction-read bit is set. When performing a transactional write from the bounded transactional memory application, a cache line metadata transaction-write bit is set and a conditional store is performed. At commit time, if any lines marked with the transaction-read bit or the transaction-write bit were evicted or invalidated, all speculatively written lines are discarded. The application can also interrogate a cache line metadata eviction summary to determine whether a transaction is doomed and then take an appropriate action.Type: GrantFiled: June 8, 2007Date of Patent: August 19, 2014Assignee: Microsoft CorporationInventors: Jan Gray, Timothy L. Harris, James Larus, Burton Smith
-
Patent number: 8813053Abstract: Systems and methods for parallel incomplete LU (ILU) factorization in distributed sparse linear systems, which order nodes underlying the equations in the system(s) by dividing nodes into interior nodes and boundary nodes and assigning no more than three codes to distinguish the boundary nodes. Each code determines an ordering of the nodes, which in turn determines the order in which the equations will be factored and the solution performed.Type: GrantFiled: September 25, 2012Date of Patent: August 19, 2014Assignee: Landmark Graphics CorporationInventors: Qinghua Wang, James William Watts, III
-
Publication number: 20140229926Abstract: Apparatus, systems, and methods for a compiler are disclosed. One such compiler parses a human readable expression into a syntax tree and converts the syntax tree into an automaton having in-transitions and out-transitions. Converting can include unrolling the quantification as a function of in-degree limitations wherein in-degree limitations includes a limit on the number of transitions into a state of the automaton. The compiler can also convert the automaton into an image for programming a parallel machine, and publishes the image. Additional apparatus, systems, and methods are disclosed.Type: ApplicationFiled: April 14, 2014Publication date: August 14, 2014Applicant: Micron Technology, Inc.Inventors: Junjuan Xu, Paul Glendenning
-
Patent number: 8806466Abstract: A program generation apparatus references a source program including a loop for executing a block N times (N?2) and having such dependence that a variable defined in a statement in the block pertaining to ith execution (1?i<N) is referenced by a statement in the block pertaining to jth execution (i<j?N), calculates equivalent representations of variables in the block pertaining to the ith execution and the block pertaining to any other execution than the ith execution, specifies, with respect to each representation of a target variable causing the dependence, a representation of a variable not causing the dependence that is equivalent to the representation of the target variable, and generates a program being for executing the block M times (M?N) and including a statement including the specified representation in place of each representation of the target variable.Type: GrantFiled: July 4, 2011Date of Patent: August 12, 2014Assignee: Panasonic CorporationInventors: Akira Tanaka, Hiroyuki Morishita, Akihiko Inoue
-
Patent number: 8806458Abstract: Intermediate representation (IR) code is received as compiled from a shader in the form of shader language source code. The input IR code is first analyzed during an analysis pass, during which operations, scopes, parts of scopes, and if-statement scopes are annotated for predication, mask usage, and branch protection and predication. This analysis outputs vectorization information that is then used by various sets of vectorization transformation rules to vectorize the input IR code, thus producing vectorized output IR code.Type: GrantFiled: February 16, 2012Date of Patent: August 12, 2014Assignee: Microsoft CorporationInventors: Andy Glaister, Blaise Pascal Tine, Blake Pelton, Derek Sessions, Mikhail Lyapunov, Yuri Dotsenko
-
Publication number: 20140223419Abstract: According to one embodiment, a compiler applicable to a parallel computer including processors, wherein a source program is input to the compiler and a local code for each of the processors are generated, the compiler includes a generation module and an object code generation module. The generation module is configured to analyze the input source program, extract a data transfer point from a procedure described in the source program, and generate a call processing for data copy. The object code generation module is configured to generate an object code including the call processing.Type: ApplicationFiled: August 30, 2013Publication date: August 7, 2014Applicant: Kabushiki Kaisha ToshibaInventor: Ryuji SAKAI
-
Patent number: 8799880Abstract: A method of identifying and extracting functional parallelism from a PLC program has been developed that results in the ability of the extracted program fragments to be executed in parallel across a plurality of separate resources, and a compiler configured to perform the functional parallelism (i.e., identification and extraction processes) and perform the scheduling of the separate fragments within a given set of resources. The inventive functional parallelism creates a larger number of separable elements than was possible with prior dataflow analysis methodologies.Type: GrantFiled: March 15, 2012Date of Patent: August 5, 2014Assignee: Siemens AktiengesellschaftInventors: Arquimedes Martinez Canedo, Mohammad Abdullah Al Faruque, Mitchell Packer, Richard Freitag
-
Patent number: 8799881Abstract: According to one embodiment, a parallelizing unit divides a loop into first and second processes based on a program to be converted and division information. The first and second processes respectively have termination control information, loop control information, and change information. The parallelizing unit inserts into the first process a determination process determining whether the second process is terminated at execution of an (n?1)th iteration of the second process when the second process is subsequent to the first process or determining whether the second process is terminated at execution of an nth iteration of the second process when the second process precedes the first process. The parallelizing unit inserts into the second process a control process controlling execution of the second process based on the result of determination notified by the determination process.Type: GrantFiled: July 12, 2011Date of Patent: August 5, 2014Assignee: Kabushiki Kaisha ToshibaInventors: Nobuaki Tojo, Hidenori Matsuzaki
-
Patent number: 8789031Abstract: In one embodiment, the present invention includes a software-controlled method of forming instruction strands. The software may include instructions to obtain code of a superblock including a plurality of basic blocks, build a dependency directed acyclic graph (DAG) for the code, sort nodes coupled by edges of the dependency DAG into a topological order, form strands from the nodes based on hardware constraints, rule constraints, and scheduling constraints, and generate executable code for the strands and store the executable code in a storage. Other embodiments are described and claimed.Type: GrantFiled: September 18, 2007Date of Patent: July 22, 2014Assignee: Intel CorporationInventors: Wei Liu, Lixin Su, Youfeng Wu, Herbert Hum
-
Patent number: 8789063Abstract: Systems and methods establish communication and control between various heterogeneous processors in a computing system so that an operating system can run an application across multiple heterogeneous processors. With a single set of development tools, software developers can create applications that will flexibly run on one CPU or on combinations of central, auxiliary, and peripheral processors. In a computing system, application-only processors can be assigned a lean subordinate kernel to manage local resources. An application binary interface (ABI) shim is loaded with application binary images to direct kernel ABI calls to a local subordinate kernel or to the main OS kernel depending on which kernel manifestation is controlling requested resources.Type: GrantFiled: March 30, 2007Date of Patent: July 22, 2014Assignee: Microsoft CorporationInventors: Orion Hodson, Haryadi Gunawi, Galen C. Hunt
-
Patent number: 8782624Abstract: A device including a data analysis element including a plurality of memory cells. The memory cells analyze at least a portion of a data stream and output a result of the analysis. The device also includes a detection cell. The detection cell includes an AND gate. The AND gate receives result of the analysis as a first input. The detection cell also includes a D-flip flop including an output coupled to a second input of the AND gate.Type: GrantFiled: December 15, 2011Date of Patent: July 15, 2014Assignee: Micron Technology, Inc.Inventors: David R. Brown, Harold B Noyes
-
Patent number: 8782625Abstract: Concepts and technologies are described herein for determining memory safety of floating-point computations. The concepts and technologies described herein analyze code to determine if any floating-point computations exist in the code, and if so, if the floating-point computations are memory safe. The analysis can include identifying floating-point instructions and conditional statements in the code. The code can be symbolically executed, and behavior of the floating-point instructions and the conditional statements can be monitored to determine if a floating point calculation is ever involved in computation of any memory address during the execution of the code.Type: GrantFiled: June 17, 2010Date of Patent: July 15, 2014Assignee: Microsoft CorporationInventors: Patrice Godefroid, Johannes Kinder
-
Publication number: 20140196017Abstract: A method of enabling compiler assisted parallelization of one or more stream processing operators in a stream processing application, which consists of a data flow graph with operators as vertices connected by streams. The method includes specifying a parallelized version of one or more of the operators, with a parameterized degree of parallelism, in the stream application, evaluating whether or not to use the parallelized operator, deciding the degree of parallelism of the parallelized operator, if there is a need for a parallelized operator.Type: ApplicationFiled: June 18, 2012Publication date: July 10, 2014Applicant: International Business Machines CorporationInventors: Nagui Halim, Vibhore Kumar, Kung-Lung Wu, Sai Wu
-
Patent number: 8775510Abstract: The invention provides, in one aspect, an improved system for data access comprising a file server that is coupled to a client device or application executing thereon via one or more networks. The server comprises static storage that is organized in one or more directories, each containing, zero, one or more files. The server also comprises a file system operable, in cooperation with a file system on the client device, to provide authorized applications executing on the client device access to those directories and/or files. Fast file server (FFS) software or other functionality executing on or in connection with the server responds to requests received from the client by transferring requested data to the client device over multiple network pathways. That data can comprise, for example, directory trees, files (or portions thereof), and so forth.Type: GrantFiled: January 31, 2013Date of Patent: July 8, 2014Assignee: PME IP Australia Pty LtdInventors: Malte Westerhoff, Detlev Stalling
-
Patent number: 8776030Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.Type: GrantFiled: March 31, 2009Date of Patent: July 8, 2014Assignee: NVIDIA CorporationInventors: Vinod Grover, Bastiaan Joannes Matheus Aarts, Michael Murphy
-
Patent number: 8769485Abstract: A stream processing platform that provides fast execution of stream processing applications within a safe runtime environment. The platform includes a stream compiler that converts a representation of a stream processing application into executable program modules for a safe environment. The platform allows users to specify aspects of the program that contribute to generation of modules that execute as intended. A user may specify aspects to control a type of implementation for loops, order of execution for parallel paths, whether multiple instances of an operation can be performed in parallel or whether certain operations should be executed in separate threads. In addition, the stream compiler may generate executable modules in a way that cause a safe runtime environment to allocate memory or otherwise operate efficiently.Type: GrantFiled: December 22, 2006Date of Patent: July 1, 2014Assignee: TIBCO Software, Inc.Inventors: Jonathan Salz, Richard S. Tibbetts
-
Patent number: 8769510Abstract: A device receives program code, and receives size/type information associated with inputs to the program code. The device determines, prior to execution of the program code and based on the input size/type information, a portion of the program code that is executable by a graphical processing unit (GPU), and determines, prior to execution of the program code and based on the input size/type information, a portion of the program code that is executable by a central processing unit (CPU). The device compiles the GPU-executable portion of the program code to create a compiled GPU-executable portion of the program code, and compiles the CPU-executable portion of the program code to create a compiled CPU-executable portion of the program code. The device provides, to the GPU for execution, the compiled GPU-executable portion of the program code, and provides, to the CPU for execution, the compiled CPU-executable portion of the program code.Type: GrantFiled: September 30, 2010Date of Patent: July 1, 2014Assignee: The MathWorks, Inc.Inventors: Jocelyn Luke Martin, Joseph F. Hicklin
-
Patent number: 8768678Abstract: One or more embodiments provide a load balancing solution for improving the runtime performance of parallel HDL simulators. During compilation each process is analyzed to determine a simulation cost based on complexity of the HDL processes. During simulation, processes to be executed in the same simulation cycle are scheduled using the simulation costs computed at compile-time in order to reduce the delay incurred during simulation.Type: GrantFiled: September 26, 2011Date of Patent: July 1, 2014Assignee: Xilinx, Inc.Inventors: Valeria Mihalache, Christopher H. Kingsley, Jimmy Z. Wang, Kumar Deepak
-
Patent number: 8769507Abstract: A method, system, and article of manufacture are disclosed for transforming a definition of a process for delivering a service on a specified computing device. This service process definition is comprised of computer readable code. The method comprises the steps of expressing a given set of assumptions in a computer readable code; and transforming the definition by using a processing unit to apply the assumptions to the definition of the process to change the way in which the process operates. The definition of the process may be transformed by using factors relating to the specific context in or for which the definition is executed. Also, the definition may be transformed by identifying, in a flow diagram for the process, flows to which the assumptions apply, and applying program rewriting techniques to those identified flows.Type: GrantFiled: May 14, 2009Date of Patent: July 1, 2014Assignee: International Business Machines CorporationInventors: David F. Bantz, Steven J. Mastrianni, James R. Moulic, Dennis G. Shea
-
Patent number: 8756590Abstract: A compile environment is provided in a computer system that allows programmers to program both CPUs and data parallel devices (e.g., GPUs) using a high level general purpose programming language that has data parallel (DP) extensions. A compilation process translates modular DP code written in the general purpose language into DP device source code in a high level DP device programming language using a set of binding descriptors for the DP device source code. A binder generates a single, self-contained DP device source code unit from the set of binding descriptors. A DP device compiler generates a DP device executable for execution on one or more data parallel devices from the DP device source code unit.Type: GrantFiled: June 22, 2010Date of Patent: June 17, 2014Assignee: Microsoft CorporationInventors: Weirong Zhu, Lingli Zhang, Sukhdeep S. Sodhi, Yosseff Levanoni