For A Parallel Or Multiprocessor System Patents (Class 717/149)
  • Patent number: 10853079
    Abstract: A method and computer program product for performing a plurality of processing operations. A plurality of processor nodes each include one or more operational instances. Each processor node includes criteria for generating its operational instances. The processor nodes are linked together in a directed acyclic processing graph in which dependent nodes use data from the operational instances of upstream nodes to perform a node-specific set of processing operations. Dependency relationships between the processor nodes are defined on an operational instance basis, where operational instances in dependent processor nodes identify data associated with, or generated by, specific upstream operational instances that is used to perform the node-specific set of operations for that dependent operational instance. The processing graph may also include connectors nodes defining instance-level dependency relationships between processor nodes.
    Type: Grant
    Filed: September 19, 2019
    Date of Patent: December 1, 2020
    Assignee: Side Effects Software Inc.
    Inventors: Ken Xu, Taylor James Petrick
  • Patent number: 10853131
    Abstract: System and methods for implementing dataflow life cycles are described and include forming, by a first server computing system, a dataflow life cycle by associating a dataflow with a customized code; associating, by the first server computing system, the customized code of the dataflow life cycle with context information, the customized code including one or more of pre-processing customized code and post-processing customized code; scheduling, by the first server computing system, the dataflow of the dataflow life cycle to be executed by a second server computing system when the customized code includes the pre-processing customized code and when the pre-processing customized code is successfully executed by the first server computing system; and executing, by the first server computing system, the post-processing customized code when the customized code includes the post-processing customized code and when the dataflow of the dataflow life cycle is successfully executed by the second server computing system
    Type: Grant
    Filed: November 20, 2017
    Date of Patent: December 1, 2020
    Assignee: salesforce.com, inc.
    Inventors: Ruisheng Shi, Farid Nabavi, Alex Gitelman
  • Patent number: 10831691
    Abstract: The present disclosure relates to a method for implementing processing elements in a chip card such that the processing elements can communicate data between each other in order to perform a computation task, wherein the data communication requires each processing element to have a respective number of connections to other processing elements. The method comprises: providing a complete graph with an even number of nodes that is higher than the maximum of the numbers of connections by one or two. If the number of processing elements is higher that the number of nodes of the graph, the graph may be duplicated and the duplicated graphs may be combined into a combined graph. A methodology for placing and connecting the processing elements may be determined in accordance with the structure of nodes of a resulting graph, the resulting graph being the complete graph or the combined graph.
    Type: Grant
    Filed: May 24, 2019
    Date of Patent: November 10, 2020
    Assignee: International Business Machines Corporation
    Inventors: Martino Dazzi, Pier Andrea Francese, Abu Sebastian, Riduan Khaddam-Aljameh, Evangelos Stavros Eleftheriou
  • Patent number: 10785089
    Abstract: A set of service-level reliability metrics and a method to allocate these metrics to the layers of the service delivery platform. These initial targets can be tuned during service design and delivery, and feed the vendor requirements process, forming the basis for measuring, tracking, and responding based on the service-level reliability metrics.
    Type: Grant
    Filed: May 7, 2018
    Date of Patent: September 22, 2020
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Paul Reeser
  • Patent number: 10741226
    Abstract: A multi-processor computer architecture incorporating distributed multi-ported common memory modules wherein each of the memory modules comprises a control block functioning as a cross-bar router in conjunction with one or more associated memory banks or other data storage devices. Each memory module has multiple I/O ports and the ability to relay requests to other memory modules if the desired memory location is not found on the first module. A computer system in accordance with the invention may comprise memory module cards along with processor cards interconnected using a baseboard or backplane having a toroidal interconnect architecture between the cards.
    Type: Grant
    Filed: May 28, 2013
    Date of Patent: August 11, 2020
    Assignee: FG SRC LLC
    Inventors: Jon M. Huppenthal, Timothy J. Tewalt, Lee A. Burton, David E. Caliga
  • Patent number: 10691432
    Abstract: A method for generating a program to run on multiple tiles. The method comprises: receiving an input graph comprising data nodes, compute vertices and edges; receiving an initial tile-mapping specifying which data nodes and vertices are allocated to which tile; and determining a subgraph of the input graph that meets one or more heuristic rules. The rules comprises: the subgraph comprises at least one data node, the subgraph spans no more than a threshold number of tiles in the initial tile-mapping, and the subgraph comprises at least a minimum number of edges outputting to one or more vertices on one or more other tiles. The method further comprises adapting the initial mapping to migrate the data nodes and any vertices of the determined subgraph to said one or more other tiles, and compiling the executable program from the graph with the vertices and data nodes allocated by the adapted mapping.
    Type: Grant
    Filed: February 15, 2019
    Date of Patent: June 23, 2020
    Assignee: Graphcore Limited
    Inventors: Mark Lloyd Pupilli, David Lacey
  • Patent number: 10678616
    Abstract: Implementations of the present disclosure are directed to a method, a system, and an article for binding computer languages. An example computer-implemented method includes: operating an application on at least one computer in a first computer language; operating a platform for the application on the at least one computer in a second computer language; binding the first computer language with the second computer language; and communicating between the application and the platform using the binding of the first computer language and the second computer language.
    Type: Grant
    Filed: November 9, 2018
    Date of Patent: June 9, 2020
    Assignee: MZ IP Holdings, LLC
    Inventors: John O'Connor, Nathan Spencer, Garth Gillespie, Yan Zhang
  • Patent number: 10642586
    Abstract: An optimizing compiler includes a vector optimization mechanism that optimizes vector operations that are reformatting-resistant, such as source instructions that do not have a corresponding reformatting operation, sink instructions that do not have a corresponding reformatting operation, a source instruction that is a scalar value, a sink instruction that may produce a scalar value, and an internal operation that depends on lanes being in a specified order. The ability to optimize vector instructions that are reformatting-resistant reduces the number of operations to improve the run-time performance of the code.
    Type: Grant
    Filed: December 8, 2018
    Date of Patent: May 5, 2020
    Assignee: International Business Machines Corporation
    Inventors: Michael Karl Gschwind, William J. Schmidt
  • Patent number: 10599482
    Abstract: A programming model generates a graph for a program, the graph including a plurality of nodes and edges, wherein each node of the graph represents an operation and edges between the nodes represent streams of data input to and output from the operations represented by the nodes. The model determines where in a distributed architecture to execute the operations represented by the nodes. Such determining may include determining which nodes have location restrictions, assigning locations to each node having a location restriction based on the restriction, and partitioning the graph into a plurality of subgraphs, the partitioning including assigning locations to nodes without location restrictions in accordance with a first set of constraints, wherein each node within a particular subgraph is assigned to the same location. Each of the subgraphs is executed at its assigned location in a respective single thread.
    Type: Grant
    Filed: August 24, 2017
    Date of Patent: March 24, 2020
    Assignee: Google LLC
    Inventors: Gautham Thambidorai, Matthew Rosencrantz, Sanjay Ghemawat, Srdjan Petrovic, Ivan Posva
  • Patent number: 10592233
    Abstract: Techniques for specifying and implementing a software application targeted for execution on a multiprocessor array (MPA). The MPA may include a plurality of processing elements, supporting memory, and a high bandwidth interconnection network (IN), communicatively coupling the plurality of processing elements and supporting memory. In some embodiments, software code may include first program instructions executable to perform a function. In some embodiments, the software code may also include one or more language constructs that are configurable to specify one or more parameter inputs. In some embodiments, the one or more parameter inputs are configurable to specify a set of hardware resources usable to execute the software code. In some embodiments, the hardware resources include multiple processors and may include multiple supporting memories.
    Type: Grant
    Filed: January 16, 2018
    Date of Patent: March 17, 2020
    Assignee: COHERENT LOGIX, INCORPORATED
    Inventors: Stephen E. Lim, Viet N. Ngo, Jeffrey M. Nicholson, John Mark Beardslee, Teng-I Wang, Zhong Qing Shang, Michael Lyle Purnell
  • Patent number: 10572515
    Abstract: The invention relates to electronic indexing, and more particularly, to the parallelization of indexing. Systems and methods of the invention index data archives by breaking a job into work items and sending the work items to multiple processors that can each determine whether to index data associated with the work item or to create a new work item and have a different processor index the data. This gives the system an internal load-balancing that results in indexing jobs during which no processor stands idle while another processor indexes data of unexpected complexity.
    Type: Grant
    Filed: October 9, 2017
    Date of Patent: February 25, 2020
    Assignee: NUIX PTY LTD
    Inventors: David Sitsky, Eddie Sheehy
  • Patent number: 10372677
    Abstract: A cache management system for managing a plurality of intermediate data includes a processor and a memory having stored thereon instructions that cause the processor to perform identifying a new intermediate data to be accessed, loading the intermediate data from the memory in response to identifying the new intermediate data as one of the plurality of intermediate data, in response to not identifying the new intermediate data as one of the plurality of intermediate data, selecting a set of victim intermediate data to evict from the memory based on a plurality of scores associated with respective ones of the plurality of intermediate data, the scores being based on a score table, evicting the set of victim intermediate data from the memory, updating the score table based on the set of victim intermediate data, and adding the new intermediate data to the plurality of intermediate data stored in the memory.
    Type: Grant
    Filed: January 11, 2017
    Date of Patent: August 6, 2019
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Zhengyu Yang, Jiayin Wang, Thomas David Evans
  • Patent number: 10360073
    Abstract: A system for scheduling mobile augmented reality tasks performed on at least one mobile device and a workspace includes: a mobile device, comprising a central processing unit (CPU) and a graphics processing unit (GPU); a Network Profiling Component, configured to gather network related context data; a Device Profiling Component, configured to gather hardware related context data; an Application Profiling Component, configured to gather application related context data; and a Scheduling Component, configured to receive the network related context data, the hardware related context data, and the application related context data, and to schedule tasks between the CPU, the GPU and the workspace.
    Type: Grant
    Filed: December 16, 2014
    Date of Patent: July 23, 2019
    Assignee: DEUTSCHE TELEKOM AG
    Inventors: Pan Hui, Junqiu Wei, Christoph Peylo
  • Patent number: 10331495
    Abstract: An apparatus including a processor to: parse comments of multiple task routines to identify I/O parameters; generate a visualization of a DAG to include a visual representation of each task routine, wherein each representation includes a task graph object of the task routine, at least one input data graph object that represents an input to the task routine and that includes a visual indication of at least one characteristic of the input; and at least one output data graph object that represents an output of the task routine and that includes a visual indication of at least one characteristic of the output; in the I/O parameters, identify each dependency between an output of one task routine and an input of another; for each identified dependency, augment the visualization with a dependency marker that visually links the visual representations of each associated pair of task routines; and visually output the visualization.
    Type: Grant
    Filed: February 15, 2018
    Date of Patent: June 25, 2019
    Assignee: SAS INSTITUTE INC.
    Inventors: Henry Gabriel Victor Bequet, Chaowang “Ricky” Zhang
  • Patent number: 10331615
    Abstract: The present invention relates to a method for compiling code for a multi-core processor, comprising: detecting and optimizing a loop, partitioning the loop into partitions executable and mappable on physical hardware with optimal instruction level parallelism, optimizing the loop iterations and/or loop counter for ideal mapping on hardware, chaining the loop partitions generating a list representing the execution sequence of the partitions.
    Type: Grant
    Filed: May 22, 2017
    Date of Patent: June 25, 2019
    Assignee: Hyperion Core, Inc.
    Inventor: Martin Vorbach
  • Patent number: 10311025
    Abstract: A cache management system for managing a plurality of intermediate data includes a processor, and a memory having stored thereon the plurality of intermediate data and instructions that when executed by the processor, cause the processor to perform identifying a new intermediate data to be accessed, loading the intermediate data from the memory in response to identifying the new intermediate data as one of the plurality of intermediate data, and in response to not identifying the new intermediate data as one of the plurality of intermediate data identifying a reusable intermediate data having a longest duplicate generating logic chain that is at least in part the same as a generating logic chain of the new intermediate data, and generating the new intermediate data from the reusable intermediate data and a portion of the generating logic chain of the new intermediate data not in common with the reusable intermediate data.
    Type: Grant
    Filed: January 11, 2017
    Date of Patent: June 4, 2019
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Zhengyu Yang, Jiayin Wang, Thomas David Evans
  • Patent number: 10303522
    Abstract: A system and method for distributed graphics processing unit (GPU) computation are disclosed. A particular embodiment includes: receiving a user task service request from a user node; querying resource availability from a plurality of slave nodes having a plurality of graphics processing units (GPUs) thereon; assigning the user task service request to a plurality of available GPUs based on the resource availability and resource requirements of the user task service request, the assigning including starting a service on a GPU using a distributed processing container and creating a corresponding uniform resource locator (URL); and retaining a list of URLs corresponding to the resources assigned to the user task service request.
    Type: Grant
    Filed: July 1, 2017
    Date of Patent: May 28, 2019
    Assignee: TUSIMPLE
    Inventors: Kai Zhou, Siyuan Liu
  • Patent number: 10284457
    Abstract: A method, an information handling system (IHS), and a virtual link trunking (VLT) system for determining VLT ports to block and unblock in an IHS. The method includes calculating a forwarding table index for a local switch of currently active and inactive switch peers for a VLT port. A pre-determined forwarding table is retrieved from a memory containing a plurality of port blocking and unblocking actions for the switch peers. Current port blocking and unblocking actions are identified in the pre-determined forwarding table corresponding to the forwarding table index. Changes are determined between the previous port blocking and unblocking actions and the current port blocking and unblocking actions. The input/output (I/O) ports are configured for the local switch based on the determined changes in the port blocking and unblocking actions.
    Type: Grant
    Filed: July 12, 2016
    Date of Patent: May 7, 2019
    Assignee: Dell Products, L.P.
    Inventors: Senthil Nathan Muthukaruppan, Shankara Ramamurthy, Pugalendran Rajendran
  • Patent number: 10255115
    Abstract: An apparatus including a processor to: parse comments of multiple task routines to identify I/O parameters; generate a visualization of a DAG to include a visual representation of each task routine, wherein each representation includes a task graph object of the task routine, at least one input data graph object that represents an input to the task routine and that includes a visual indication of at least one characteristic of the input; and at least one output data graph object that represents an output of the task routine and that includes a visual indication of at least one characteristic of the output; in the I/O parameters, identify each dependency between an output of one task routine and an input of another; for each identified dependency, augment the visualization with a dependency marker that visually links the visual representations of each associated pair of task routines; and visually output the visualization.
    Type: Grant
    Filed: February 15, 2018
    Date of Patent: April 9, 2019
    Assignee: SAS INSTITUTE INC.
    Inventors: Henry Gabriel Victor Bequet, Chaowang “Ricky” Zhang
  • Patent number: 10228923
    Abstract: A parallelization compiling method for generating a segmented program from a sequential program, in which multiple macro tasks are included and at least two of the macro tasks have a data dependency relationship with one another, includes determining an existence of invalidation information for invalidating at least a part of the data dependency relationship between the at least two of the plurality of macro tasks before compiling the sequential program into the segmented program, and generating the segmented program by compiling the sequential program into the segmented program with reference to a determination result of the existence of the invalidation information. When the invalidation information is determined to exist, the at least a part of the data dependency relationship is invalidated before the compiling of the sequential program into the segmented program.
    Type: Grant
    Filed: March 29, 2016
    Date of Patent: March 12, 2019
    Assignees: DENSO CORPORATION, WASEDA UNIVERSITY
    Inventors: Yoshihiro Yatou, Noriyuki Suzuki, Kenichi Mineda, Hironori Kasahara, Keiji Kimura, Hiroki Mikami, Dan Umeda
  • Patent number: 10210025
    Abstract: An apparatus including a processor to: parse comments of multiple task routines to identify I/O parameters; generate a visualization of a DAG to include a visual representation of each task routine, wherein each representation includes a task graph object of the task routine, at least one input data graph object that represents an input to the task routine and that includes a visual indication of at least one characteristic of the input; and at least one output data graph object that represents an output of the task routine and that includes a visual indication of at least one characteristic of the output; in the I/O parameters, identify each dependency between an output of one task routine and an input of another; for each identified dependency, augment the visualization with a dependency marker that visually links the visual representations of each associated pair of task routines; and visually output the visualization.
    Type: Grant
    Filed: February 15, 2018
    Date of Patent: February 19, 2019
    Assignee: SAS INSTITUTE INC.
    Inventors: Henry Gabriel Victor Bequet, Chaowang “Ricky” Zhang
  • Patent number: 10169012
    Abstract: An optimizing compiler includes a vector optimization mechanism that optimizes vector operations that are reformatting-resistant, such as source instructions that do not have a corresponding reformatting operation, sink instructions that do not have a corresponding reformatting operation, a source instruction that is a scalar value, a sink instruction that may produce a scalar value, and an internal operation that depends on lanes being in a specified order. The ability to optimize vector instructions that are reformatting-resistant reduces the number of operations to improve the run-time performance of the code.
    Type: Grant
    Filed: November 1, 2017
    Date of Patent: January 1, 2019
    Assignee: International Business Machines Corporation
    Inventors: Michael Karl Gschwind, William J. Schmidt
  • Patent number: 10157466
    Abstract: A method to segment images that contain multiple objects in a nested structure including acquiring an image; defining the multiple objects by layers, each layer corresponding to one region, where a region contains an innermost object and all the objects nested within the innermost object; stacking the layers in an order of the nested structure of the multiple objects, the stack of layers having at least a top layer and a bottom layer; extending each layer with padded nodes; connecting the top layer to a sink and the bottom layer to a source, wherein each intermediate layer between the top layer and the bottom layer are connected only to the adjacent layer by undirected links; and measuring a boundary length for each layer.
    Type: Grant
    Filed: January 23, 2017
    Date of Patent: December 18, 2018
    Assignees: Riverside Research Institute, New York University
    Inventors: Jen-wei Kuo, Jonathan Mamou, Xuan Zhao, Jeffrey A. Ketterling, Orlando Aristizabal, Daniel H. Turnbull, Yao Wang
  • Patent number: 10157055
    Abstract: Methods, systems, apparatuses, and computer program products are provided for transforming asynchronous code into more efficient, logically equivalent asynchronous code; Program code is converted into a first syntax tree. A dependency graph is generated from the first syntax tree with each node of the dependency graph corresponding to a code statement and having an assigned weight. Weighted topological sorting of the dependency graph is performed to generate a sorted dependency graph. A second syntax tree is generated from the sorted dependency graph. In another implementation, the program code is transformed into await-relaxed and/or loop-relaxed program code prior to being transformed into the first syntax tree.
    Type: Grant
    Filed: September 29, 2016
    Date of Patent: December 18, 2018
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Gal Tamir, Elad Iwanir, Eli Koreh
  • Patent number: 10157086
    Abstract: An apparatus including a processor to: parse comments of multiple task routines to identify I/O parameters; for each task routine, generate a macro including its I/O parameters; transmit the macros to a requesting device to enable it to generate a visualization of a DAG to include visual representations of the task routines; wherein each representation includes a task graph object, an input data graph object representing an input and indicating a characteristic of the input, and an output data graph object representing an output and indicating a characteristic of the output; and wherein the requesting device is to: identify, in the I/O parameters, each dependency between an output and an input of a pair of task routines; augment the visualization, for each identified dependency, with a dependency marker that visually links the visual representations of the pair of task routines; and visually output the visualization.
    Type: Grant
    Filed: February 16, 2018
    Date of Patent: December 18, 2018
    Assignee: SAS INSTITUTE INC.
    Inventors: Henry Gabriel Victor Bequet, Chaowang “Ricky” Zhang
  • Patent number: 10146849
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing search results. In one aspect, a method includes receiving a query. A plurality of search results responsive to the query are identified. The search results are analyzed to determine that at least a first search result is associated with a first answer box topic. The search results are provided along with an answer box precursor for the first answer box topic.
    Type: Grant
    Filed: October 25, 2017
    Date of Patent: December 4, 2018
    Assignee: Google LLC
    Inventors: Tal Cohen, Ziv Bar-Yossef, Igor Tsvetkov, Adi Mano, Oren Naim, Nitsan Oz, Nir Andelman, Pravir Kumar Gupta
  • Patent number: 10067910
    Abstract: A method and system performing a general matrix-matrix multiplication (GEMM) operation using a kernel compiled with optimal maximum register count (MRC). During operation, the system may generate the kernel compiled with optimal MRC. This may involve determining a fastest compiled kernel among a set of compiled kernels by comparing the speeds of the compiled kernels. Each kernel may be compiled with a different MRC value between zero and a predetermined maximum number of registers per thread. The fastest compiled kernel is determined to be the kernel with optimal MRC. The system may receive data representing at least two matrices. The system may select the kernel compiled with optimal MRC, and perform the GEMM operation on the two matrices using the selected kernel. Some embodiments may also perform general matrix-vector multiplication (GEMV), sparse matrix-vector multiplication (SpMV), or k-means clustering operations using kernels compiled with optimal MRC.
    Type: Grant
    Filed: July 1, 2016
    Date of Patent: September 4, 2018
    Assignee: PALO ALTO RESEARCH CENTER INCORPORATED
    Inventor: Rong Zhou
  • Patent number: 10002161
    Abstract: The subject matter disclosed herein provides methods and apparatus, including computer program products for rules-based processing. In one aspect there is provided a method. The method may include, for example, evaluating rules to determine whether to enable or disable one or more actions in a ready set of actions. Moreover, the method may include scheduling the ready set of actions, each of which is scheduled for execution and executed, the execution of each of the ready set of actions using a separate, concurrent thread, the concurrency of the actions controlled using a control mechanism. Related systems, apparatus, methods, and/or articles are also described.
    Type: Grant
    Filed: December 3, 2008
    Date of Patent: June 19, 2018
    Assignee: SAP SE
    Inventors: Sören Balko, Matthias Miltz
  • Patent number: 9898288
    Abstract: A computer-implemented method of executing an instruction sequence with a recursive function call of a plurality of threads within a thread group in a Single-Instruction-Multiple-Threads (SIMT) system is provided. Each thread is provided with a function call counter (FCC), an active mask, an execution mask and a per-thread program counter (PTPC). The instruction sequence with the recursive function call is executed by the threads in the thread group according to a program counter (PC) indicating a target. Upon executing the recursive function call, for each thread, the active mask is set according to the PTPC and the target indicated by the PC, the FCC is determined when entering or returning from the recursive function call, the execution mask is determined according to the FCC and the active mask. It is determined whether an execution result of the recursive function call takes effects according to the execution mask.
    Type: Grant
    Filed: December 29, 2015
    Date of Patent: February 20, 2018
    Assignee: MEDIATEK INC.
    Inventors: Yan-Hong Lu, Jia-Yang Chang, Pao-Hung Kuo, Chia-Chi Chang, Pei-Kuei Tsung
  • Patent number: 9891924
    Abstract: A method for implementing a reduced size register view data structure in a microprocessor. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks; using a plurality of multiplexers to access ports of a scheduling array to store the instruction blocks as a series of chunks.
    Type: Grant
    Filed: March 14, 2014
    Date of Patent: February 13, 2018
    Assignee: Intel Corporation
    Inventor: Mohammad A. Abdallah
  • Patent number: 9886252
    Abstract: An optimizing compiler includes a vector optimization mechanism that optimizes vector operations that are reformatting-resistant, such as source instructions that do not have a corresponding reformatting operation, sink instructions that do not have a corresponding reformatting operation, a source instruction that is a scalar value, a sink instruction that may produce a scalar value, and an internal operation that depends on lanes being in a specified order. The ability to optimize vector instructions that are reformatting-resistant reduces the number of operations to improve the run-time performance of the code.
    Type: Grant
    Filed: August 31, 2015
    Date of Patent: February 6, 2018
    Assignee: International Business Machines Corporation
    Inventors: Michael Karl Gschwind, William J. Schmidt
  • Patent number: 9880821
    Abstract: An optimizing compiler includes a vector optimization mechanism that optimizes vector operations that are reformatting-resistant, such as source instructions that do not have a corresponding reformatting operation, sink instructions that do not have a corresponding reformatting operation, a source instruction that is a scalar value, a sink instruction that may produce a scalar value, and an internal operation that depends on lanes being in a specified order. The ability to optimize vector instructions that are reformatting-resistant reduces the number of operations to improve the run-time performance of the code.
    Type: Grant
    Filed: August 17, 2015
    Date of Patent: January 30, 2018
    Assignee: International Business Machines Corporation
    Inventors: Michael Karl Gschwind, William J. Schmidt
  • Patent number: 9823927
    Abstract: According to some embodiments, the workgroup divisibility requirement may be dispensed with on a selective or permanent basis, i.e. in all cases, particular cases or at particular times and/or under particular conditions. An application programming interface implementation may be allowed to launch workgroups with non-uniform local sizes. Two different local sizes may be used in a case of a one-dimensional workload.
    Type: Grant
    Filed: November 30, 2012
    Date of Patent: November 21, 2017
    Assignee: Intel Corporation
    Inventors: Aaron R. Kunze, Dillon Sharlet, Andrew E. Brownsword
  • Patent number: 9805110
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing search results. In one aspect, a method includes receiving a query. A plurality of search results responsive to the query are identified. The search results are analyzed to determine that at least a first search result is associated with a first answer box topic. The search results are provided along with an answer box precursor for the first answer box topic.
    Type: Grant
    Filed: May 2, 2016
    Date of Patent: October 31, 2017
    Assignee: Google Inc.
    Inventors: Tal Cohen, Ziv Bar-Yossef, Igor Tsvetkov, Adi Mano, Oren Naim, Nitsan Oz, Nir Andelman, Pravir Kumar Gupta
  • Patent number: 9798587
    Abstract: An embodiment of the invention includes applying a first partition to a plurality of LPs, wherein a particular LP is assigned to a first set of LPs. A second partition is applied to the LPs, wherein the particular LP is assigned to an LP set different from the first set. For both the first and second partitions, lookahead values and transit times are determined for each of the LPs and related links. For the first partition, a first system progression rate is computed using a specified function with the lookahead values and transit times determined for the first partition. For the second partition, a second system progression rate is computed using the specified function with the lookahead values and transit times determined for the second partition. The first and second system progression rates are compared to determine which is the lowest.
    Type: Grant
    Filed: May 13, 2011
    Date of Patent: October 24, 2017
    Assignee: International Business Machines Corporation
    Inventors: Cheng-Hong Li, Alfred J. Park, Eugen Schenfeld
  • Patent number: 9799087
    Abstract: Systems, methods, and computer readable media to improve the development of image processing intensive programs are described. In general, techniques are disclosed to non-intrusively monitor the run-time performance of shader programs on a graphics processing unit (GPU)—that is, to profile shader program execution. More particularly, GPU-based hardware threads may be configured to run in parallel too, and not interfere with, the execution environment of a GPU during shader program execution. When so configured, GPU hardware threads may be used to accurately characterize the run-time behavior of an application program's shader operations.
    Type: Grant
    Filed: September 9, 2013
    Date of Patent: October 24, 2017
    Assignee: Apple Inc.
    Inventors: Sun Tjen Fam, Andrew Sowerby
  • Patent number: 9785700
    Abstract: The invention relates to electronic indexing, and more particularly, to the parallelization of indexing. Systems and methods of the invention index data archives by breaking a job into work items and sending the work items to multiple processors that can each determine whether to index data associated with the work item or to create a new work item and have a different processor index the data. This gives the system an internal load-balancing that results in indexing jobs during which no processor stands idle while another processor indexes data of unexpected complexity.
    Type: Grant
    Filed: August 7, 2013
    Date of Patent: October 10, 2017
    Assignee: Nuix Pty Ltd
    Inventors: David Sitsky, Eddie Sheehy
  • Patent number: 9720743
    Abstract: General-purpose distributed data-parallel computing using a high-level language is disclosed. Data parallel portions of a sequential program that is written by a developer in a high-level language are automatically translated into a distributed execution plan. The distributed execution plan is then executed on large compute clusters. Thus, the developer is allowed to write the program using familiar programming constructs in the high level language. Moreover, developers without experience with distributed compute systems are able to take advantage of such systems.
    Type: Grant
    Filed: July 23, 2015
    Date of Patent: August 1, 2017
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yuan Yu, Dennis Fetterly, Michael Isard, Ulfar Erlingsson, Mihai Budiu
  • Patent number: 9652286
    Abstract: Embodiments include systems and methods for handling task dependencies in a runtime environment using dependence graphs. For example, a computer-implemented runtime engine includes runtime libraries configured to handle tasks and task dependencies. The task dependencies can be converted into data dependencies. At runtime, as the runtime engine encounters tasks and associated data dependencies, it can add those identified tasks as nodes of a dependence graph, and can add edges between the nodes that correspond to the data dependencies without deadlock. The runtime engine can schedule the tasks for execution according to a topological traversal of the dependence graph in a manner that preserves task dependencies substantially as defined by the source code.
    Type: Grant
    Filed: March 21, 2014
    Date of Patent: May 16, 2017
    Assignee: Oracle International Corporation
    Inventor: Bin Fan
  • Patent number: 9577869
    Abstract: A method, system and program product for balanced workload distribution in a plurality of networked computing nodes. The networked computing nodes may be arranged as a connected graph defining at least one direct neighbor to each networked computing node. The method comprises determining a first workload indicator of the i-th computing node, at a first stage before a new task may be started by the i-th computing node, determining an estimated workload indicator of the i-th computing node, assuming that the new task is performed at a second stage on the i-th computing node, determining estimated workload indicators of each direct neighbor assuming that the new task is performed at the second stage, deciding whether to move the new task to another computing node, and moving the new task to one of the direct neighboring computing nodes of the i-th computing node such that workloads are balanced.
    Type: Grant
    Filed: August 28, 2013
    Date of Patent: February 21, 2017
    Assignee: International Business Machines Corporation
    Inventors: Gianluca Della Corte, Alessandro Donatelli, Antonio M. Sgro
  • Patent number: 9569272
    Abstract: A method and device for digital data processing based on a data flow processing model is suitable for the execution, in a distributed manner on multiple calculation nodes, of multiple data processing operations modelled by directed graphs, where two different processing operations include at least one common calculation node. The device includes an identification processor configured to, from a valued directed multi-graph made up of the union of several distinct processing graphs and divided into several valued directed sub-multi-graphs, called chunks, and whose input and output nodes are buffer memory nodes of the multi-graph, identify a coordination module for each chunk. Furthermore each identified coordination module is configured to synchronize portions of processing operations that are to be executed in the chunk with which the respective coordination module is associated, independently of portions of processing operations that are to be executed in other chunks.
    Type: Grant
    Filed: July 9, 2010
    Date of Patent: February 14, 2017
    Assignee: Commissariat a l'energie atomique et aux alternatives
    Inventor: Yvain Thonnart
  • Patent number: 9552197
    Abstract: A non-transitory computer-readable recording medium stores therein a program for causing an information processing apparatus to execute a process including analyzing a source program with respect to the information processing apparatus that starts hardware prefetching upon detecting an access to a consecutive area on a main storage device and stops the hardware prefetching upon detecting an end of the access to the consecutive area, specifying an array structure in a loop process as a hardware prefetching target, and generating, from the source program, a machine language program in which the array structure is changed so that a second access occurring next to a first access to the array structure refers to an area being consecutive from the area being referred to by the first access.
    Type: Grant
    Filed: August 31, 2015
    Date of Patent: January 24, 2017
    Assignee: FUJITSU LIMITED
    Inventor: Shigeru Kimura
  • Patent number: 9467532
    Abstract: A computing method is provided which includes calling a general purpose graphics processing subroutine for execution of a target program by a client; sending a program code and resource data for execution of the target program to a server by the client; and executing the program code using a general purpose graphics processing unit by the server.
    Type: Grant
    Filed: January 22, 2013
    Date of Patent: October 11, 2016
    Assignee: INDUSTRY-ACADEMIC COOPERATION FOUNDATION, YONSEI UNIVERSITY
    Inventors: Won Woo Ro, Keunsoo Kim, Seung Hun Kim
  • Patent number: 9448778
    Abstract: A computer-implemented method, system, and computer program product for performing object collocation on a computer system are provided. The method includes analyzing a sequence of computer instructions for object allocations and uses of the allocated objects. The method further includes creating an allocation interference graph of object allocation nodes with edges indicating pairs of allocations to be omitted from collocation. The method also includes coloring the allocation interference graph such that adjacent nodes are assigned different colors, and creating an object allocation at a program point prior to allocations of a selected color from the allocation interference graph. The method additionally includes storing an address associated with the created object allocation in a collocation pointer, and replacing a use of each allocation of the selected color with a use of the collocation pointer to collocate multiple objects.
    Type: Grant
    Filed: July 30, 2014
    Date of Patent: September 20, 2016
    Assignee: International Business Machines Corporation
    Inventors: Patrick Doyle, Pramod Ramarao, Vijay Sundaresan
  • Patent number: 9436589
    Abstract: An analysis system may perform network analysis on data gathered from an executing application. The analysis system may identify relationships between code elements and use tracer data to quantify and classify various code elements. In some cases, the analysis system may operate with only data gathered while tracing an application, while other cases may combine static analysis data with tracing data. The network analysis may identify groups of related code elements through cluster analysis, as well as identify bottlenecks from one to many and many to one relationships. The analysis system may generate visualizations showing the interconnections or relationships within the executing code, along with highlighted elements that may be limiting performance.
    Type: Grant
    Filed: March 29, 2013
    Date of Patent: September 6, 2016
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Ying Li, Alexander G. Gounares, Charles D. Garrett, Russell S. Krajec
  • Patent number: 9424103
    Abstract: A method for operating a lock in a computing system having plural processing units and running under multiple runtime environments is provided. When a requester thread attempts to acquire the lock while the lock is held by a holder thread, determine whether the holder thread is suspendable or non-suspendable. If the holder thread is non-suspendable, put the requester thread in a spin state regardless of whether the requester thread is suspendable or non-suspendable; otherwise determines whether the requester thread is suspendable or non-suspendable unless the requester thread quits acquiring the lock. If the requester thread is non-suspendable, arrange the requester thread to attempt acquiring the lock again; otherwise add the requester thread to a wait queue as an additional suspended thread. Suspended threads stored in the wait queue are allowable to be resumed later for lock acquisition. The method is applicable for the computing system with a multicore processor.
    Type: Grant
    Filed: September 30, 2014
    Date of Patent: August 23, 2016
    Assignee: Hong Kong Applied Science and Technology Research Institute Company Limited
    Inventors: Yi Al, Lin Xu, Jianchao Lu, Shaohua Zhang
  • Patent number: 9395957
    Abstract: A high level programming language provides an agile communication operator that generates a segmented computational space based on a resource map for distributing the computational space across compute nodes. The agile communication operator decomposes the computational space into segments, causes the segments to be assigned to compute nodes, and allows the user to centrally manage and automate movement of the segments between the compute nodes. The segment movement may be managed using either a full global-view representation or a local-global-view representation of the segments.
    Type: Grant
    Filed: December 22, 2010
    Date of Patent: July 19, 2016
    Assignee: Microsoft Technology Licensing, LLC
    Inventor: Paul F. Ringseth
  • Patent number: 9367291
    Abstract: An apparatus and method for generating vector code are provided. The apparatus and method generate vector code using scalar-type kernel code, without user's changing a code type or modifying data layout, thereby enhancing user's convenience of use and retaining the portability of OpenCL.
    Type: Grant
    Filed: March 28, 2014
    Date of Patent: June 14, 2016
    Assignees: Samsung Electronics Co., Ltd., Seoul National University R&DB Foundation
    Inventors: Jin-Seok Lee, Seong-Gun Kim, Dong-Hoon Yoo, Seok-Joong Hwang, Jeongho Nah, Jaejin Lee, Jun Lee
  • Patent number: 9354629
    Abstract: Example methods and apparatus to configure a process control system using an electronic description language (EDL) script are disclosed. A disclosed example method comprises loading a first script representative of a process plant, the first script comprising an interpretive system-level script structured in accordance with an electronic description language, and compiling the first script to form a second script, the second script structured in accordance with a vendor-specific configuration language associated with a particular process control system for the process plant.
    Type: Grant
    Filed: February 19, 2009
    Date of Patent: May 31, 2016
    Assignee: Fisher-Rosemount Systems, Inc.
    Inventors: James Randall Balentine, Gary Keith Law, Mark Nixon
  • Patent number: 9311065
    Abstract: Embodiments relate to data splitting for multi-instantiated objects. An aspect includes receiving a portion of source code for compilation having a dynamic object to split using object size array data splitting. Another aspect includes replacing all memory allocations for the dynamic object with a total size of an object size array and object field arrays including a predetermined padding. Another aspect includes inserting statements in the source code after the memory allocations to populate the object size array with a value of a number of elements of the object size array. Another aspect includes updating a stride for load and store operations using dynamic pointers. Yet another aspect includes modifying field references by adding a distance between the object size array and the object field array to respective address operations.
    Type: Grant
    Filed: June 19, 2014
    Date of Patent: April 12, 2016
    Assignee: International Business Machines Corporation
    Inventors: Shimin Cui, Yan Zhang