Patents by Inventor Dz Ching Ju

Dz Ching Ju has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

TENSOR PROCESSING FOR NEURAL NETWORK

Publication number: 20240070450

Abstract: Apparatuses, systems, and techniques to establish a correspondence between at least a plurality of tensors. In at least one embodiment, information is caused to be stored from one of two or more different tensors having one or more variable dimensions, the one of the two or more different tensors is to represent the two or more different tensors.

Type: Application

Filed: August 30, 2022

Publication date: February 29, 2024

Inventors: Dz-ching Ju, Rajnish Aggarwal
OPERATIONS ON MATRIX OPERANDS IRRESPECTIVE OF WHERE OPERANDS ARE STORED IN MEMORY

Publication number: 20230229588

Abstract: Apparatus, systems, and techniques to transform data in memory for deep learning operations. In at least one embodiment, a compiler inserts one or more data transforms into a software program to transform one or more data elements arbitrarily arranged in memory and improve performance of one or more deep learning operations.

Type: Application

Filed: January 14, 2022

Publication date: July 20, 2023

Inventors: Rajnish Aggarwal, Dz-ching Ju
TECHNIQUES FOR COMBINING OPERATIONS

Publication number: 20220350683

Abstract: Apparatuses, systems, and techniques to combine operations. In at least one embodiment, a processor causes two or more dependent reduction operations to be combined into a software kernel.

Type: Application

Filed: April 26, 2021

Publication date: November 3, 2022

Inventors: Rishi Surendran, Dz-ching Ju
KERNEL GENERATION FOR NEURAL NETWORKS

Publication number: 20220343137

Abstract: Apparatuses, systems, and techniques to automatically generate a reduced number of compute kernels for performing operations of one or more neural networks. In at least one embodiment, one or more operations of one or more neural network graph nodes of the one or more neural network are automatically adjusted to generate an optimized one or more operations that are compiled to generate the reduced number of compute kernels.

Type: Application

Filed: April 22, 2021

Publication date: October 27, 2022

Inventors: Rishi Surendran, Dz-ching Ju, Yuan Lin
TECHNIQUES FOR PARALLEL EXECUTION

Publication number: 20220342673

Abstract: Apparatuses, systems, and techniques to identify instructions for advanced execution. In at least one embodiment, a processor performs one or more instructions that have been identified by a compiler to be speculatively performed in parallel.

Type: Application

Filed: April 23, 2021

Publication date: October 27, 2022

Inventors: Justin Wang, Dz-ching Ju
ACCELERATION OF OPERATIONS

Publication number: 20220342666

Abstract: Apparatuses, systems, and techniques to reduce a sequence of operations to an equivalent sequence having a smaller number of operations. In at least one embodiment, a sequence of matrix operations are accelerated by combining operations that reorder a matrix with a matrix multiplication operation.

Type: Application

Filed: April 26, 2021

Publication date: October 27, 2022

Inventor: Dz-ching Ju
API FOR RECURRENT NEURAL NETWORKS

Publication number: 20210192314

Abstract: Apparatuses, systems, and techniques to implement a recurrent neural network. In at least one embodiment, an application programming interface receives one or more API calls comprising a graph definition and a recurrence attribute, and executes a recurrent neural network based on the graph definition.

Type: Application

Filed: December 18, 2019

Publication date: June 24, 2021

Inventors: Bastiaan Joannes Matheus Aarts, Xiangyun Kong, Dz-ching Ju, Yuan Lin
Memory management method for supporting shared virtual memories with hybrid page table utilization and related machine readable medium

Patent number: 10019373

Abstract: A memory management method includes: checking shared virtual memory (SVM) support ability of at least one device participating in data access of a buffer; referring to a checking result to adaptively select an SVM mode; and allocating the buffer in a physical memory region of a memory device, and configuring the buffer to operate in the selected SVM mode.

Type: Grant

Filed: August 23, 2015

Date of Patent: July 10, 2018

Assignee: MEDIATEK INC.

Inventors: Dz-Ching Ju, Meng-Bing Yu, Yun-Ching Li
System and method for managing static divergence in a SIMD computing architecture

Patent number: 9921838

Abstract: A method is presented for processing one or more instructions to be executed on multiple threads in a Single-Instruction-Multiple-Data (SIMD) computing system. The method includes the steps of analyzing the instructions to collect divergent threads among a plurality of thread groups of the multiple threads; obtaining a redirection array for thread-operand association adjustment among the divergent threads according to the analysis, where the redirection array is used for exchanging a first operand associated with a first divergent thread in a first thread group with a second operand associated with a second divergent thread in a second thread group; and generating compiled code corresponding to the instructions according to the redirection array.

Type: Grant

Filed: October 2, 2015

Date of Patent: March 20, 2018

Assignee: MEDIATEK INC.

Inventors: Chen-Kang Lo, Shih-Wei Liao, Cheng-Ting Han, Dz-Ching Ju
SYSTEM AND METHOD FOR MANAGING STATIC DIVERGENCE IN A SIMD COMPUTING ARCHITECTURE

Publication number: 20170097825

Abstract: A method is presented for processing one or more instructions to be executed on multiple threads in a Single-Instruction-Multiple-Data (SIMD) computing system. The method includes the steps of analyzing the instructions to collect divergent threads among a plurality of thread groups of the multiple threads; obtaining a redirection array for thread-operand association adjustment among the divergent threads according to the analysis, where the redirection array is used for exchanging a first operand associated with a first divergent thread in a first thread group with a second operand associated with a second divergent thread in a second thread group; and generating compiled code corresponding to the instructions according to the redirection array.

Type: Application

Filed: October 2, 2015

Publication date: April 6, 2017

Inventors: Chen-Kang LO, Shih-Wei LIAO, Cheng-Ting HAN, Dz-Ching JU
MEMORY MANAGEMENT METHOD FOR SUPPORTING SHARED VIRTUAL MEMORIES WITH HYBRID PAGE TABLE UTILIZATION AND RELATED MACHINE READABLE MEDIUM

Publication number: 20160179686

Abstract: A memory management method includes: checking shared virtual memory (SVM) support ability of at least one device participating in data access of a buffer; referring to a checking result to adaptively select an SVM mode; and allocating the buffer in a physical memory region of a memory device, and configuring the buffer to operate in the selected SVM mode.

Type: Application

Filed: August 23, 2015

Publication date: June 23, 2016

Inventors: Dz-Ching Ju, Meng-Bing Yu, Yun-Ching Li
Proactive loop fusion of non-adjacent loops with intervening control flow instructions

Patent number: 9015690

Abstract: A system and method for optimization of code with non-adjacent loops. A compiler builds a node tree, which is not a control flow graph, that represents parent-child relationships of nodes of a computer program. Each node represents a control flow statement or a straight-line block of statements of the computer program. If a non-adjacent loop pair of nodes satisfy predetermined conditions, the compiler may perform legal code transformations on the computer program and corresponding node transformations on the node tree. These transformations may make adjacent this pair of loop nodes. The compiler may be configured to perform legal code transformations, such as head and tail duplication, code motion, and if-merging, in order to make adjacent these two loop nodes. Then loop fusion may be performed on this loop pair in order to increase instruction level parallelism (ILP) within an optimized version of the original source code.

Type: Grant

Filed: August 22, 2009

Date of Patent: April 21, 2015

Assignee: Advanced Micro Devices, Inc.

Inventors: Mei Ye, Dinesh Suresh, Dz-ching Ju, Michael Lai
Methods and apparatus for preserving precise exceptions in code reordering by using control speculation

Patent number: 8769509

Abstract: Methods and apparatus for preserving precise exceptions in code reordering by using control speculation are disclosed. A disclosed system uses a control speculation module to reorder instructions within an application program and preserve precise exceptions. Instructions, excepting and non-excepting, can be reordered by the control speculation module if the instructions meet certain conditions. When an excepting instruction is reordered, a check instruction is inserted into the program execution path and a recovery block is generated. The check instruction determines if the reordered excepting instruction actually needs to generate an exception. The recovery block contains instructions to revert the effects of code reordering. If the check instruction detects the need for an exception, the recovery block is executed to restore the architectural state of the processor and the exception is handled.

Type: Grant

Filed: November 8, 2007

Date of Patent: July 1, 2014

Assignee: Intel Corporation

Inventor: Dz-ching Ju
Automatic kernel migration for heterogeneous cores

Patent number: 8683468

Abstract: A system and method for automatically migrating the execution of work units between multiple heterogeneous cores. A computing system includes a first processor core with a single instruction multiple data micro-architecture and a second processor core with a general-purpose micro-architecture. A compiler predicts execution of a function call in a program migrates at a given location to a different processor core. The compiler creates a data structure to support moving live values associated with the execution of the function call at the given location. An operating system (OS) scheduler schedules at least code before the given location in program order to the first processor core. In response to receiving an indication that a condition for migration is satisfied, the OS scheduler moves the live values to a location indicated by the data structure for access by the second processor core and schedules code after the given location to the second processor core.

Type: Grant

Filed: May 16, 2011

Date of Patent: March 25, 2014

Assignee: Advanced Micro Devices, Inc.

Inventors: Mauricio Breternitz, Patryk Kaminski, Keith Lowery, Anton Chernoff, Dz-Ching Ju
High performance queue implementations in multiprocessor systems

Patent number: 8656409

Abstract: Systems and methods provide a single reader single writer (SRSW) queue structure having entries that can be concurrently accessed in an atomic manner with a single memory access. The SRSW queues may be combined to create more complicated queues, including multiple reader single writer (MRSW), single reader multiple writer (SRMW), and multiple reader multiple writer (MRMW) queues.

Type: Grant

Filed: December 29, 2005

Date of Patent: February 18, 2014

Assignee: Intel Corporation

Inventors: Xiao-Feng Li, Dz-ching Ju
CONTROL FLOW-BASED APPROACH IN IMPLEMENTING EXCEPTION HANDLING ON A GRAPHICS PROCESSING UNIT

Publication number: 20130159685

Abstract: A function in source code is processed by a compiler for execution on a graphics processing unit, wherein the function includes an exception handling structure. An exception raising block is converted into a first control flow and an exception handler block is converted into a second control flow. The first control flow includes setting an exception raised indicator and finding an exception handler to process the raised exception. The exception raised indicator remains set until an appropriate exception handler is found. The second control flow includes clearing the exception raised indicator and processing the exception.

Type: Application

Filed: December 15, 2011

Publication date: June 20, 2013

Applicant: ADVANCED MICRO DEVICES, INC.

Inventors: Dz-ching Ju, Norman Rubin, Gang Chen
COMPUTER SYSTEM AND METHOD FOR COMPILING PROGRAM CODE AND ASSIGNING ADDRESS SPACES

Publication number: 20130125100

Abstract: A computer system is provided for compiling program code and a method for compiling program code by a processor. The method, for example, includes, but is not limited to, receiving, by the processor, the program code and compiling, by the processor, the program code, wherein the processor, when compiling the program code, parses the program code and assigns a default address space qualifier to each member functions without a defined address space qualifier and, when the member function is used, infers an address space for each default address qualifier based upon how the respective member function is being used.

Type: Application

Filed: November 15, 2011

Publication date: May 16, 2013

Applicant: ADVANCED MICRO DEVICES, INC.

Inventors: Bixia Zheng, Benedict R. Gaster, Dz-Ching Ju
Method and system for data prefetching for loops based on linear induction expressions

Patent number: 8352686

Abstract: An efficient and effective compiler data prefetching technique is disclosed in which memory accesses may be prefetched are represented in linear induction expressions. Furthermore, indirect memory accesses indexed by other memory accesses of linear induction expressions in scalar loops may be prefetched.

Type: Grant

Filed: March 16, 2010

Date of Patent: January 8, 2013

Assignee: Advanced Micro Devices, Inc.

Inventor: Dz-ching Ju
AUTOMATIC KERNEL MIGRATION FOR HETEROGENEOUS CORES

Publication number: 20120297163

Abstract: A system and method for automatically migrating the execution of work units between multiple heterogeneous cores. A computing system includes a first processor core with a single instruction multiple data micro-architecture and a second processor core with a general-purpose micro-architecture. A compiler predicts execution of a function call in a program migrates at a given location to a different processor core. The compiler creates a data structure to support moving live values associated with the execution of the function call at the given location. An operating system (OS) scheduler schedules at least code before the given location in program order to the first processor core. In response to receiving an indication that a condition for migration is satisfied, the OS scheduler moves the live values to a location indicated by the data structure for access by the second processor core and schedules code after the given location to the second processor core.

Type: Application

Filed: May 16, 2011

Publication date: November 22, 2012

Inventors: Mauricio Breternitz, Patryk Kaminski, Keith Lowery, Anton Chernoff, Dz-Ching Ju
PROACTIVE LOOP FUSION OF NON-ADJACENT LOOPS WITH INTERVENING CONTROL FLOW INSTRUCTIONS

Publication number: 20110047534

Abstract: A system and method for optimization of code with non-adjacent loops. A compiler builds a node tree, which is not a control flow graph, that represents parent-child relationships of nodes of a computer program. Each node represents a control flow statement or a straight-line block of statements of the computer program. If a non-adjacent loop pair of nodes satisfy predetermined conditions, the compiler may perform legal code transformations on the computer program and corresponding node transformations on the node tree. These transformations may make adjacent this pair of loop nodes. The compiler may be configured to perform legal code transformations, such as head and tail duplication, code motion, and if-merging, in order to make adjacent these two loop nodes. Then loop fusion may be performed on this loop pair in order to increase instruction level parallelism (ILP) within an optimized version of the original source code.

Type: Application

Filed: August 22, 2009

Publication date: February 24, 2011

Inventors: Mei Ye, Dinesh Suresh, Dz-ching Ju, Michael Lai

1 2 3 4 next