Patents by Inventor Chi-Keung Luk

Chi-Keung Luk has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230229978
    Abstract: A method includes training, using a first computing system having a first configuration, a first machine learning model having a machine learning model architecture, and training, using a second computing system having a different second configuration, a second machine learning model having the machine learning model architecture. The method also includes determining, for a shared training operation performed by both the first computing system and the second computing system, a similarity measure that represents a similarity between: a first training output generated by the first computing system during performance of the shared training operation during training of the first machine learning model; and a second training output generated by the second computing system during performance of the shared training operation during training of the second machine learning model.
    Type: Application
    Filed: January 6, 2023
    Publication date: July 20, 2023
    Applicant: Google LLC
    Inventors: Chi Keung Luk, Jose Americo Baiocchi Paredes, Russell Power, Mehmet Deveci
  • Patent number: 11556861
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for debugging correctness issues in training machine learning models. In one aspect, a method comprises training a first machine learning model using a first computing system having a first configuration; training a second machine learning model using a second computing system having a second configuration, wherein the second configuration of the second computing system is different than the first configuration of the first computing system; and determining, for each of a plurality of shared training operations that are performed by both the first computing system and the second computing system, a respective similarity measure that measures a similarity between: a first training output generated by the first computing system by performing the shared training operation, and a second training output generated by the second computing system by performing the shared training operation.
    Type: Grant
    Filed: May 6, 2019
    Date of Patent: January 17, 2023
    Assignee: Google LLC
    Inventors: Chi Keung Luk, Jose Americo Baiocchi Paredes, Russell Power, Mehmet Deveci
  • Publication number: 20200356905
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for debugging correctness issues in training machine learning models. In one aspect, a method comprises training a first machine learning model using a first computing system having a first configuration; training a second machine learning model using a second computing system having a second configuration, wherein the second configuration of the second computing system is different than the first configuration of the first computing system; and determining, for each of a plurality of shared training operations that are performed by both the first computing system and the second computing system, a respective similarity measure that measures a similarity between: a first training output generated by the first computing system by performing the shared training operation, and a second training output generated by the second computing system by performing the shared training operation.
    Type: Application
    Filed: May 6, 2019
    Publication date: November 12, 2020
    Inventors: Chi Keung Luk, Jose Americo Baiocchi Paredes, Russell Power, Mehmet Deveci
  • Publication number: 20100156888
    Abstract: Embodiments of a system, program product and method are presented to perform automatic partitioning of work between host processor (such as, e.g., a CPU) and at least one additional heterogeneous processing element (such as, e.g., a GPU) through run-time adaptive mapping. The adaptive mapping may be performed by a dynamic compiler, based on projected execution times predicted by curve fitting based on actual execution times generated during a profile run of the program. Other embodiments are described and claimed.
    Type: Application
    Filed: December 23, 2008
    Publication date: June 24, 2010
    Inventors: Chi-Keung Luk, Paul Geoffrey Lowney
  • Patent number: 7343602
    Abstract: A processor capable of running multiple threads runs a program in one thread (called the “main” thread) and at least a portion of the same program in another thread (called the “pre-execution” thread). The program in the main thread includes instructions that cause the processor to start and stop pre-execution threads and direct the processor as to which part of the program is to be run through the pre-execution threads. Preferably, such instructions cause the pre-execution thread to run ahead of the main thread in program order. In that way, any cache miss conditions that are encountered by the pre-execution thread are resolved before the main thread requires that same data. Therefore, the main thread should encounter few or no cache miss conditions.
    Type: Grant
    Filed: December 18, 2001
    Date of Patent: March 11, 2008
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Chi-Keung Luk, Joel S. Emer
  • Publication number: 20070234307
    Abstract: Methods and apparatus to inline conditional software instrumentation are disclosed. An example method comprises splitting a software instrumentation conditional analysis procedure for an application segment into an unconditional portion and a conditional portion, and inlining the unconditional portion.
    Type: Application
    Filed: March 6, 2006
    Publication date: October 4, 2007
    Inventors: Chi-Keung Luk, Robert Cohn
  • Patent number: 7181723
    Abstract: Methods and an apparatus for stride profiling a software application are disclosed. An example system uses a hardware performance counter to report instruction addresses and data addresses associated with memory access instructions triggered by some event, such as a data cache miss. When the same instruction address is associated with more than one data address, the difference between the two data addresses is recorded. When two or more of these data address differences are recorded for the same instruction, the system determines a stride associated with the instruction to be the greatest common divisor of the two or more differences. This stride may be used by a compiler to optimize data cache prefetching. In addition, any overhead associated with monitoring addresses of data cache misses may be reduced by cycling between an inspection phase and a skipping phase. More data cache misses are monitored during the inspection phase than during the skipping phase.
    Type: Grant
    Filed: May 27, 2003
    Date of Patent: February 20, 2007
    Assignee: Intel Corporation
    Inventors: Chi-Keung Luk, Geoff Lowney
  • Publication number: 20070006167
    Abstract: In one embodiment, the present invention includes a method for receiving a command to insert instrumentation code into a code segment, analyzing the code segment to determine an optimal location for the instrumentation code within the code segment, and inserting the instrumentation code at the optimal location to generate an instrumented code segment. The instrumented code segment may then be executed and may provide for improved performance over unoptimized instrumented code. Other embodiments are described and claimed.
    Type: Application
    Filed: May 31, 2005
    Publication date: January 4, 2007
    Inventors: Chi-Keung Luk, Ady Tal, Robert Cohn, Jonathan Beimel
  • Publication number: 20050050534
    Abstract: Methods and apparatus to pre-execute instructions on a single thread are disclosed. In an example method, at least one instruction associated with a latency condition is identified. A slice of instructions is identified. The slice of instructions is configured to generate a data address associated with the at least one instruction. At least one instruction slot in the single thread is identified. Code configured to execute the slice of instructions is generated within the at least one instruction slot.
    Type: Application
    Filed: September 2, 2003
    Publication date: March 3, 2005
    Inventors: Chi-Keung Luk, Paul Lowney
  • Publication number: 20040243981
    Abstract: Methods and an apparatus for stride profiling a software application are disclosed. An example system uses a hardware performance counter to report instruction addresses and data addresses associated with memory access instructions triggered by some event, such as a data cache miss. When the same instruction address is associated with more than one data address, the difference between the two data addresses is recorded. When two or more of these data address differences are recorded for the same instruction, the system determines a stride associated with the instruction to be the greatest common divisor of the two or more differences. This stride may be used by a compiler to optimize data cache prefetching. In addition, any overhead associated with monitoring addresses of data cache misses may be reduced by cycling between an inspection phase and a skipping phase. More data cache misses are monitored during the inspection phase than during the skipping phase.
    Type: Application
    Filed: May 27, 2003
    Publication date: December 2, 2004
    Inventors: Chi-Keung Luk, Geoff Lowney
  • Publication number: 20030084433
    Abstract: Executable code is modified to include prefetch instructions for certain loads. The targeted loads preferably include those loads for which a compiler cannot compute a stride (which represents the difference in memory addresses used in consecutive executions of a given load). Whether prefetch instructions should be included for such loads is determined preferably by running the code with a training data set which determines the frequency of strides for each subsequent execution of a load. If a stride occurs more than once for a load, then that load is prefetched by inserting a prefetch instruction into the executable code for that load. Further, a stride value is associated with the inserted prefetch. Preferably, the stride value is the most frequently occurring stride, which can be determined based on the results of the training data set. Alternatively, the stride can be computed during run-time by the code itself.
    Type: Application
    Filed: October 31, 2001
    Publication date: May 1, 2003
    Inventors: Chi-Keung Luk, Harish Patil, Robert Muth, Paul Geoffrey Lowney, Robert Cohn, Richard Weiss
  • Publication number: 20020055964
    Abstract: A processor capable of running multiple threads runs a program in one thread (called the “main” thread) and at least a portion of the same program in another thread (called the “pre-execution” thread). The program in the main thread includes instructions that cause the processor to start and stop pre-execution threads and direct the processor as to which part of the program is to be run through the pre-execution threads. Preferably, such instructions cause the pre-execution thread to run ahead of the main thread in program order. In that way, any cache miss conditions that are encountered by the pre-execution thread are resolved before the main thread requires that same data. Therefore, the main thread should encounter few or no cache miss conditions.
    Type: Application
    Filed: December 18, 2001
    Publication date: May 9, 2002
    Inventors: Chi-Keung Luk, Joel S. Emer