Patents by Inventor Dz Ching Ju

Dz Ching Ju has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240070450
    Abstract: Apparatuses, systems, and techniques to establish a correspondence between at least a plurality of tensors. In at least one embodiment, information is caused to be stored from one of two or more different tensors having one or more variable dimensions, the one of the two or more different tensors is to represent the two or more different tensors.
    Type: Application
    Filed: August 30, 2022
    Publication date: February 29, 2024
    Inventors: Dz-ching Ju, Rajnish Aggarwal
  • Publication number: 20230229588
    Abstract: Apparatus, systems, and techniques to transform data in memory for deep learning operations. In at least one embodiment, a compiler inserts one or more data transforms into a software program to transform one or more data elements arbitrarily arranged in memory and improve performance of one or more deep learning operations.
    Type: Application
    Filed: January 14, 2022
    Publication date: July 20, 2023
    Inventors: Rajnish Aggarwal, Dz-ching Ju
  • Publication number: 20220350683
    Abstract: Apparatuses, systems, and techniques to combine operations. In at least one embodiment, a processor causes two or more dependent reduction operations to be combined into a software kernel.
    Type: Application
    Filed: April 26, 2021
    Publication date: November 3, 2022
    Inventors: Rishi Surendran, Dz-ching Ju
  • Publication number: 20220343137
    Abstract: Apparatuses, systems, and techniques to automatically generate a reduced number of compute kernels for performing operations of one or more neural networks. In at least one embodiment, one or more operations of one or more neural network graph nodes of the one or more neural network are automatically adjusted to generate an optimized one or more operations that are compiled to generate the reduced number of compute kernels.
    Type: Application
    Filed: April 22, 2021
    Publication date: October 27, 2022
    Inventors: Rishi Surendran, Dz-ching Ju, Yuan Lin
  • Publication number: 20220342673
    Abstract: Apparatuses, systems, and techniques to identify instructions for advanced execution. In at least one embodiment, a processor performs one or more instructions that have been identified by a compiler to be speculatively performed in parallel.
    Type: Application
    Filed: April 23, 2021
    Publication date: October 27, 2022
    Inventors: Justin Wang, Dz-ching Ju
  • Publication number: 20220342666
    Abstract: Apparatuses, systems, and techniques to reduce a sequence of operations to an equivalent sequence having a smaller number of operations. In at least one embodiment, a sequence of matrix operations are accelerated by combining operations that reorder a matrix with a matrix multiplication operation.
    Type: Application
    Filed: April 26, 2021
    Publication date: October 27, 2022
    Inventor: Dz-ching Ju
  • Publication number: 20210192314
    Abstract: Apparatuses, systems, and techniques to implement a recurrent neural network. In at least one embodiment, an application programming interface receives one or more API calls comprising a graph definition and a recurrence attribute, and executes a recurrent neural network based on the graph definition.
    Type: Application
    Filed: December 18, 2019
    Publication date: June 24, 2021
    Inventors: Bastiaan Joannes Matheus Aarts, Xiangyun Kong, Dz-ching Ju, Yuan Lin
  • Patent number: 10019373
    Abstract: A memory management method includes: checking shared virtual memory (SVM) support ability of at least one device participating in data access of a buffer; referring to a checking result to adaptively select an SVM mode; and allocating the buffer in a physical memory region of a memory device, and configuring the buffer to operate in the selected SVM mode.
    Type: Grant
    Filed: August 23, 2015
    Date of Patent: July 10, 2018
    Assignee: MEDIATEK INC.
    Inventors: Dz-Ching Ju, Meng-Bing Yu, Yun-Ching Li
  • Patent number: 9921838
    Abstract: A method is presented for processing one or more instructions to be executed on multiple threads in a Single-Instruction-Multiple-Data (SIMD) computing system. The method includes the steps of analyzing the instructions to collect divergent threads among a plurality of thread groups of the multiple threads; obtaining a redirection array for thread-operand association adjustment among the divergent threads according to the analysis, where the redirection array is used for exchanging a first operand associated with a first divergent thread in a first thread group with a second operand associated with a second divergent thread in a second thread group; and generating compiled code corresponding to the instructions according to the redirection array.
    Type: Grant
    Filed: October 2, 2015
    Date of Patent: March 20, 2018
    Assignee: MEDIATEK INC.
    Inventors: Chen-Kang Lo, Shih-Wei Liao, Cheng-Ting Han, Dz-Ching Ju
  • Publication number: 20170097825
    Abstract: A method is presented for processing one or more instructions to be executed on multiple threads in a Single-Instruction-Multiple-Data (SIMD) computing system. The method includes the steps of analyzing the instructions to collect divergent threads among a plurality of thread groups of the multiple threads; obtaining a redirection array for thread-operand association adjustment among the divergent threads according to the analysis, where the redirection array is used for exchanging a first operand associated with a first divergent thread in a first thread group with a second operand associated with a second divergent thread in a second thread group; and generating compiled code corresponding to the instructions according to the redirection array.
    Type: Application
    Filed: October 2, 2015
    Publication date: April 6, 2017
    Inventors: Chen-Kang LO, Shih-Wei LIAO, Cheng-Ting HAN, Dz-Ching JU
  • Publication number: 20160179686
    Abstract: A memory management method includes: checking shared virtual memory (SVM) support ability of at least one device participating in data access of a buffer; referring to a checking result to adaptively select an SVM mode; and allocating the buffer in a physical memory region of a memory device, and configuring the buffer to operate in the selected SVM mode.
    Type: Application
    Filed: August 23, 2015
    Publication date: June 23, 2016
    Inventors: Dz-Ching Ju, Meng-Bing Yu, Yun-Ching Li
  • Patent number: 9015690
    Abstract: A system and method for optimization of code with non-adjacent loops. A compiler builds a node tree, which is not a control flow graph, that represents parent-child relationships of nodes of a computer program. Each node represents a control flow statement or a straight-line block of statements of the computer program. If a non-adjacent loop pair of nodes satisfy predetermined conditions, the compiler may perform legal code transformations on the computer program and corresponding node transformations on the node tree. These transformations may make adjacent this pair of loop nodes. The compiler may be configured to perform legal code transformations, such as head and tail duplication, code motion, and if-merging, in order to make adjacent these two loop nodes. Then loop fusion may be performed on this loop pair in order to increase instruction level parallelism (ILP) within an optimized version of the original source code.
    Type: Grant
    Filed: August 22, 2009
    Date of Patent: April 21, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Mei Ye, Dinesh Suresh, Dz-ching Ju, Michael Lai
  • Patent number: 8769509
    Abstract: Methods and apparatus for preserving precise exceptions in code reordering by using control speculation are disclosed. A disclosed system uses a control speculation module to reorder instructions within an application program and preserve precise exceptions. Instructions, excepting and non-excepting, can be reordered by the control speculation module if the instructions meet certain conditions. When an excepting instruction is reordered, a check instruction is inserted into the program execution path and a recovery block is generated. The check instruction determines if the reordered excepting instruction actually needs to generate an exception. The recovery block contains instructions to revert the effects of code reordering. If the check instruction detects the need for an exception, the recovery block is executed to restore the architectural state of the processor and the exception is handled.
    Type: Grant
    Filed: November 8, 2007
    Date of Patent: July 1, 2014
    Assignee: Intel Corporation
    Inventor: Dz-ching Ju
  • Patent number: 8683468
    Abstract: A system and method for automatically migrating the execution of work units between multiple heterogeneous cores. A computing system includes a first processor core with a single instruction multiple data micro-architecture and a second processor core with a general-purpose micro-architecture. A compiler predicts execution of a function call in a program migrates at a given location to a different processor core. The compiler creates a data structure to support moving live values associated with the execution of the function call at the given location. An operating system (OS) scheduler schedules at least code before the given location in program order to the first processor core. In response to receiving an indication that a condition for migration is satisfied, the OS scheduler moves the live values to a location indicated by the data structure for access by the second processor core and schedules code after the given location to the second processor core.
    Type: Grant
    Filed: May 16, 2011
    Date of Patent: March 25, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Mauricio Breternitz, Patryk Kaminski, Keith Lowery, Anton Chernoff, Dz-Ching Ju
  • Patent number: 8656409
    Abstract: Systems and methods provide a single reader single writer (SRSW) queue structure having entries that can be concurrently accessed in an atomic manner with a single memory access. The SRSW queues may be combined to create more complicated queues, including multiple reader single writer (MRSW), single reader multiple writer (SRMW), and multiple reader multiple writer (MRMW) queues.
    Type: Grant
    Filed: December 29, 2005
    Date of Patent: February 18, 2014
    Assignee: Intel Corporation
    Inventors: Xiao-Feng Li, Dz-ching Ju
  • Publication number: 20130159685
    Abstract: A function in source code is processed by a compiler for execution on a graphics processing unit, wherein the function includes an exception handling structure. An exception raising block is converted into a first control flow and an exception handler block is converted into a second control flow. The first control flow includes setting an exception raised indicator and finding an exception handler to process the raised exception. The exception raised indicator remains set until an appropriate exception handler is found. The second control flow includes clearing the exception raised indicator and processing the exception.
    Type: Application
    Filed: December 15, 2011
    Publication date: June 20, 2013
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Dz-ching Ju, Norman Rubin, Gang Chen
  • Publication number: 20130125100
    Abstract: A computer system is provided for compiling program code and a method for compiling program code by a processor. The method, for example, includes, but is not limited to, receiving, by the processor, the program code and compiling, by the processor, the program code, wherein the processor, when compiling the program code, parses the program code and assigns a default address space qualifier to each member functions without a defined address space qualifier and, when the member function is used, infers an address space for each default address qualifier based upon how the respective member function is being used.
    Type: Application
    Filed: November 15, 2011
    Publication date: May 16, 2013
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Bixia Zheng, Benedict R. Gaster, Dz-Ching Ju
  • Patent number: 8352686
    Abstract: An efficient and effective compiler data prefetching technique is disclosed in which memory accesses may be prefetched are represented in linear induction expressions. Furthermore, indirect memory accesses indexed by other memory accesses of linear induction expressions in scalar loops may be prefetched.
    Type: Grant
    Filed: March 16, 2010
    Date of Patent: January 8, 2013
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Dz-ching Ju
  • Publication number: 20120297163
    Abstract: A system and method for automatically migrating the execution of work units between multiple heterogeneous cores. A computing system includes a first processor core with a single instruction multiple data micro-architecture and a second processor core with a general-purpose micro-architecture. A compiler predicts execution of a function call in a program migrates at a given location to a different processor core. The compiler creates a data structure to support moving live values associated with the execution of the function call at the given location. An operating system (OS) scheduler schedules at least code before the given location in program order to the first processor core. In response to receiving an indication that a condition for migration is satisfied, the OS scheduler moves the live values to a location indicated by the data structure for access by the second processor core and schedules code after the given location to the second processor core.
    Type: Application
    Filed: May 16, 2011
    Publication date: November 22, 2012
    Inventors: Mauricio Breternitz, Patryk Kaminski, Keith Lowery, Anton Chernoff, Dz-Ching Ju
  • Publication number: 20110047534
    Abstract: A system and method for optimization of code with non-adjacent loops. A compiler builds a node tree, which is not a control flow graph, that represents parent-child relationships of nodes of a computer program. Each node represents a control flow statement or a straight-line block of statements of the computer program. If a non-adjacent loop pair of nodes satisfy predetermined conditions, the compiler may perform legal code transformations on the computer program and corresponding node transformations on the node tree. These transformations may make adjacent this pair of loop nodes. The compiler may be configured to perform legal code transformations, such as head and tail duplication, code motion, and if-merging, in order to make adjacent these two loop nodes. Then loop fusion may be performed on this loop pair in order to increase instruction level parallelism (ILP) within an optimized version of the original source code.
    Type: Application
    Filed: August 22, 2009
    Publication date: February 24, 2011
    Inventors: Mei Ye, Dinesh Suresh, Dz-ching Ju, Michael Lai