Patents by Inventor Liang-Kai Wang

Liang-Kai Wang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Routing Circuitry for Permutation of Single-Instruction Multiple-Data Operands

Publication number: 20210055931

Abstract: Techniques are disclosed relating to routing circuitry configured to perform permute operations for operands of threads in a single-instruction multiple-data group. In some embodiments, an apparatus includes hierarchical operand routing circuitry configured to route operands between a set of single-instruction multiple-data (SIMD) pipelines based on a permute instruction. In some embodiments, the routing circuitry includes a first level and a second level. The first level may include a set of multiple crossbar circuits each configured to receive operands from a respective subset of the pipelines and output one or more of the received operands on multiple output lines based on the permute instruction, where the crossbar circuits support full permutation within a respective subset.

Type: Application

Filed: August 22, 2019

Publication date: February 25, 2021

Inventors: Robert D. Kenney, Liang-Kai Wang, Terence M. Potter
SYSTEM AND METHOD FOR REHABILITATION

Publication number: 20200269091

Abstract: A system for determining an assessment of at least one exercise performed by a user is described. The system includes an input device, and a computing device. The input device is configured to monitor at least one exercise performed by a user. The computing device includes processors and a memory. The memory is coupled to the processors and stores program instructions that when executed by the processors cause the processors to: (1) generate a user interface displaying a content; (2) provide an instruction associated with the at least one exercise; (3) determine an indication of movement associated with the at least one exercise; (4) in response to a determination of the indication of movement, determine an assessment of the at least one exercise; and (5) in response to a determination of the assessment of the at least one exercise, perform an operation.

Type: Application

Filed: February 27, 2019

Publication date: August 27, 2020

Inventors: YAN-FU LIU, PO-JUI HUANG, LIANG-KAI WANG, CHUNG-HSIEN WU, JIAN-LIN CHEN
Floating-point division alternative techniques

Patent number: 10503473

Abstract: Techniques are disclosed relating to circuitry configured to perform reciprocal-based floating-point division. In some embodiments, floating-point circuitry includes reciprocal circuitry configured to generate a reciprocal of a divisor, multiplication circuitry configured to multiply the reciprocal results with a dividend, and circuitry configured to clear a least significant bit of an integer representation of the multiplication output to generate a modified multiplication output. The floating-point circuitry may be configured to convert the modified multiplication output to a representation using the first precision to generate a division output. In some embodiments, the refinement using the integer representation may provide correctly-rounded subnormal division results. The disclosed techniques may improve accuracy, reduce processing time, and/or reduce instructions needed for floating-point division, with little to no increase in chip area.

Type: Grant

Filed: May 30, 2018

Date of Patent: December 10, 2019

Assignee: Apple Inc.

Inventors: Anthony Y. Tai, Liang-Kai Wang, Luc R. Semeria, Xiao-Long Wu
Multi-path fused multiply-add with power control

Patent number: 10481869

Abstract: Techniques are disclosed relating to circuitry configured to perform floating-point operations such as fused multiply-addition (FMA) with multiple paths and power control. In some embodiments, an FMA unit includes a near path and multiple far paths and is configured to select a path based on a determined exponent difference. In some embodiments, the FMA unit is configured to operate portions of non-selected paths in a low power state.

Type: Grant

Filed: November 10, 2017

Date of Patent: November 19, 2019

Assignee: Apple Inc.

Inventors: Liang-Kai Wang, Ting Yu, Yu Sun
Processing circuitry for encoded fields of related threads

Patent number: 10387119

Abstract: Techniques are disclosed relating to performing arithmetic operations to generate values for different related threads. In some embodiments, the threads are graphics threads and the values are operand locations. In some embodiments, an apparatus includes circuitry configured to generate results for multiple threads by performing a plurality of arithmetic operations indicated by an instruction. In some embodiments, the instruction specifies: an input value that is common to the multiple threads and, for at least one of the multiple threads, a type value that indicates whether to generate a result for the thread by performing an arithmetic operation based on a first input that is a result of an arithmetic operation from another thread of the multiple threads or to generate a result for the thread using the input value that is common to the multiple threads.

Type: Grant

Filed: September 28, 2018

Date of Patent: August 20, 2019

Assignee: Apple Inc.

Inventors: Liang-Kai Wang, Terence M. Potter, Brian K. Reynolds, Justin Friesenhahn
Floating-point multiply-add with down-conversion

Patent number: 10282169

Abstract: Techniques are disclosed relating to floating-point operations with down-conversion. In some embodiments, a floating-point unit is configured to perform fused multiply-addition operations based on first and second different instruction types. In some embodiments, the first instruction type specifies result in the first floating-point format and the second instruction type specifies fused multiply addition of input operands in the first floating-point format to generate a result in a second, lower-precision floating-point format. For example, the first format may be a 32-bit format and the second format may be a 16-bit format. In some embodiments, the floating-point unit includes rounding circuitry, exponent circuitry, and/or increment circuitry configured to generate signals for the second instruction type in the same pipeline stage as for the first instruction type. In some embodiments, disclosed techniques may reduce the number of pipeline stages included in the floating-point circuitry.

Type: Grant

Filed: April 6, 2016

Date of Patent: May 7, 2019

Assignee: Apple Inc.

Inventors: Liang-Kai Wang, Terence M. Potter, Andrew M. Havlir, Yu Sun, Nicolas X. Pena, Xiao-Long Wu, Christopher A. Burns
Power saving with dynamic pulse insertion

Patent number: 10270434

Abstract: A method and apparatus for saving power in integrated circuits is disclosed. An IC includes functional circuit blocks which are not placed into a sleep mode when idle. A power management circuit may monitor the activity levels of the functional circuit blocks not placed into a sleep mode. When the power management circuit detects that an activity level of one of the non-sleep functional circuit blocks is less than a predefined threshold, it reduce the frequency of a clock signal provided thereto by scheduling only one pulse of a clock signal for every N pulses of the full frequency clock signal. The remaining N?1 pulses of the clock signal may be inhibited. If a high priority transaction inbound for the functional circuit block is detected, an inserted pulse of the clock signal may be provided to the functional unit irrespective of when a most recent regular pulse was provided.

Type: Grant

Filed: February 18, 2016

Date of Patent: April 23, 2019

Assignee: Apple Inc.

Inventors: James Wang, Benjiman L. Goodman, Liang-Kai Wang, Robert D. Kenney
Processing Circuitry for Encoded Fields of Related Threads

Publication number: 20190034166

Abstract: Techniques are disclosed relating to performing arithmetic operations to generate values for different related threads. In some embodiments, the threads are graphics threads and the values are operand locations. In some embodiments, an apparatus includes circuitry configured to generate results for multiple threads by performing a plurality of arithmetic operations indicated by an instruction. In some embodiments, the instruction specifies: an input value that is common to the multiple threads and, for at least one of the multiple threads, a type value that indicates whether to generate a result for the thread by performing an arithmetic operation based on a first input that is a result of an arithmetic operation from another thread of the multiple threads or to generate a result for the thread using the input value that is common to the multiple threads.

Type: Application

Filed: September 28, 2018

Publication date: January 31, 2019

Inventors: Liang-Kai Wang, Terence M. Potter, Brian K. Reynolds, Justin Friesenhahn
Pessimistic dependency handling based on storage regions

Patent number: 10114650

Abstract: Techniques are disclosed relating to handling dependencies between instructions. In one embodiment, an apparatus includes decode circuitry and dependency circuitry. In this embodiment, the decode circuitry is configured to receive an instruction that specifies a destination location and determine a first storage region that includes the destination location. In this embodiment, the storage region is one of a plurality of different storage regions accessible by instructions processed by the apparatus. In this embodiment, the dependency circuitry is configured to stall the instruction until one or more older instructions that specify source locations in the first storage region have read their source locations. The disclosed techniques may be described as “pessimistic” dependency handling, which may, in some instances, maintain performance while limiting complexity, power consumption, and area of dependency logic.

Type: Grant

Filed: February 23, 2015

Date of Patent: October 30, 2018

Assignee: Apple Inc.

Inventors: Robert D. Kenney, Liang-Kai Wang
Parallel processing circuitry for encoded fields of related threads

Patent number: 10089077

Abstract: Techniques are disclosed relating to performing arithmetic operations to generate values for different related threads. In some embodiments, the threads are graphics threads and the values are operand locations. In some embodiments, an apparatus performs an arithmetic operation using first circuitry, on type value inputs for different threads that are encoded to represent values to be operated on by the first circuitry. In some embodiments, second arithmetic circuitry is configured to perform an arithmetic operation on an output of the first circuitry and an input (e.g., address information such as a base and an offset) that is common to the different threads and has a greater number of bits than the output of the first circuitry. In various embodiments, disclosed techniques may allow decoding of encoded values for different threads (which may reduce memory requirements relative to non-encoded values) with a shorter critical path and lower power consumption, e.g., relative to sequential decoding.

Type: Grant

Filed: January 10, 2017

Date of Patent: October 2, 2018

Assignee: Apple Inc.

Inventors: Liang-Kai Wang, Terence M. Potter, Brian K. Reynolds, Justin Friesenhahn
Unified integer and floating-point compare circuitry

Patent number: 9846579

Abstract: Techniques are disclosed relating to comparison circuitry. In some embodiments, compare circuitry is configured to generate comparison results for sets of inputs in both one or more integer formats and one or more floating-point formats. In some embodiments, the compare circuitry includes padding circuitry configured to add one or more bits to each of first and second input values to generate first and second padded values. In some embodiments, the compare circuitry also includes integer subtraction circuitry configured to subtract the first padded value from the second padded value to generate a subtraction result. In some embodiments, the compare circuitry includes output logic configured to generate the comparison result based on the subtraction result. In various embodiments, using at least a portion of the same circuitry (e.g., the subtractor) for both integer and floating-point comparisons may reduce processor area.

Type: Grant

Filed: June 13, 2016

Date of Patent: December 19, 2017

Assignee: Apple Inc.

Inventors: Liang-Kai Wang, Terence M. Potter, Andrew M. Havlir
UNIFIED INTEGER AND FLOATING-POINT COMPARE CIRCUITRY

Publication number: 20170357506

Abstract: Techniques are disclosed relating to comparison circuitry. In some embodiments, compare circuitry is configured to generate comparison results for sets of inputs in both one or more integer formats and one or more floating-point formats. In some embodiments, the compare circuitry includes padding circuitry configured to add one or more bits to each of first and second input values to generate first and second padded values. In some embodiments, the compare circuitry also includes integer subtraction circuitry configured to subtract the first padded value from the second padded value to generate a subtraction result. In some embodiments, the compare circuitry includes output logic configured to generate the comparison result based on the subtraction result. In various embodiments, using at least a portion of the same circuitry (e.g., the subtractor) for both integer and floating-point comparisons may reduce processor area.

Type: Application

Filed: June 13, 2016

Publication date: December 14, 2017

Inventors: Liang-Kai Wang, Terence M. Potter, Andrew M. Havlir
Microarchitecture for floating point fused multiply-add with exponent scaling

Patent number: 9841948

Abstract: Systems and methods for implementing a floating point fused multiply and accumulate with scaling (FMASc) operation. A floating point unit receives input multiplier, multiplicand, addend, and scaling factor operands. A multiplier block is configured to multiply mantissas of the multiplier and multiplicand to generate an intermediate product. Alignment logic is configured to pre-align the addend with the intermediate product based on the scaling factor and exponents of the addend, multiplier, and multiplicand, and accumulation logic is configured to add or subtract a mantissa of the pre-aligned addend with the intermediate product to obtain a result of the floating point unit. Normalization and rounding are performed on the result, avoiding rounding during intermediate stages.

Type: Grant

Filed: August 12, 2015

Date of Patent: December 12, 2017

Assignee: QUALCOMM Incorporated

Inventor: Liang-Kai Wang
Instruction Storage

Publication number: 20170323420

Abstract: Techniques are disclosed relating to low-level instruction storage in a processing unit. In some embodiments, a graphics unit includes execution circuitry, decode circuitry, hazard circuitry, and caching circuitry. In some embodiments the execution circuitry is configured to execute clauses of graphics instructions. In some embodiments, the decode circuitry is configured to receive graphics instructions and a clause identifier for each received graphics instruction and to decode the received graphics instructions. In some embodiments, the caching circuitry includes a plurality of entries each configured to store a set of decoded instructions in the same clause. A given clause may be fetched and executed multiple times, e.g., for different SIMD groups, while stored in the caching circuitry.

Type: Application

Filed: July 24, 2017

Publication date: November 9, 2017

Inventors: Andrew M. Havlir, Dzung Q. Vu, Liang Kai Wang
Floating-Point Multiply-Add with Down-Conversion

Publication number: 20170293470

Abstract: Techniques are disclosed relating to floating-point operations with down-conversion. In some embodiments, a floating-point unit is configured to perform fused multiply-addition operations based on first and second different instruction types. In some embodiments, the first instruction type specifies result in the first floating-point format and the second instruction type specifies fused multiply addition of input operands in the first floating-point format to generate a result in a second, lower-precision floating-point format. For example, the first format may be a 32-bit format and the second format may be a 16-bit format. In some embodiments, the floating-point unit includes rounding circuitry, exponent circuitry, and/or increment circuitry configured to generate signals for the second instruction type in the same pipeline stage as for the first instruction type. In some embodiments, disclosed techniques may reduce the number of pipeline stages included in the floating-point circuitry.

Type: Application

Filed: April 6, 2016

Publication date: October 12, 2017

Inventors: Liang-Kai Wang, Terence M. Potter, Andrew M. Havlir, Yu Sun, Nicolas X. Pena, Xiao-Long Wu, Christopher A. Burns
Operand cache control techniques

Patent number: 9785567

Abstract: Techniques are disclosed relating to per-pipeline control for an operand cache. In some embodiments, an apparatus includes a register file and multiple execution pipelines. In some embodiments, the apparatus also includes an operand cache that includes multiple entries that each include multiple portions that are each configured to store an operand for a corresponding execution pipeline. In some embodiments, the operand cache is configured, during operation of the apparatus, to store data in only a subset of the portions of an entry. In some embodiments, the apparatus is configured to store, for each entry in the operand cache, a per-entry validity value that indicates whether the entry is valid and per-portion state information that indicates whether data for each portion is valid and whether data for each portion is modified relative to data in a corresponding entry in the register file.

Type: Grant

Filed: September 11, 2015

Date of Patent: October 10, 2017

Assignee: Apple Inc.

Inventors: Andrew M. Havlir, Terence M. Potter, Liang-Kai Wang
Power Saving with Dynamic Pulse Insertion

Publication number: 20170244391

Abstract: A method and apparatus for saving power in integrated circuits is disclosed. An IC includes functional circuit blocks which are not placed into a sleep mode when idle. A power management circuit may monitor the activity levels of the functional circuit blocks not placed into a sleep mode. When the power management circuit detects that an activity level of one of the non-sleep functional circuit blocks is less than a predefined threshold, it reduce the frequency of a clock signal provided thereto by scheduling only one pulse of a clock signal for every N pulses of the full frequency clock signal. The remaining N?1 pulses of the clock signal may be inhibited. If a high priority transaction inbound for the functional circuit block is detected, an inserted pulse of the clock signal may be provided to the functional unit irrespective of when a most recent regular pulse was provided.

Type: Application

Filed: February 18, 2016

Publication date: August 24, 2017

Inventors: James Wang, Benjiman L. Goodman, Liang-Kai Wang, Robert D. Kenney
GPU instruction storage

Patent number: 9727944

Abstract: Techniques are disclosed relating to low-level instruction storage in a graphics unit. In some embodiments, a graphics unit includes execution circuitry, decode circuitry, hazard circuitry, and caching circuitry. In some embodiments the execution circuitry is configured to execute clauses of graphics instructions. In some embodiments, the decode circuitry is configured to receive graphics instructions and a clause identifier for each received graphics instruction and to decode the received graphics instructions. In some embodiments, the hazard circuitry is configured to generate hazard information that specifies dependencies between ones of the decoded graphics instructions in the same clause. In some embodiments, the caching circuitry includes a plurality of entries each configured to store a set of decoded instructions in the same clause and hazard information generated by the decode circuitry for the clause.

Type: Grant

Filed: June 22, 2015

Date of Patent: August 8, 2017

Assignee: Apple Inc.

Inventors: Andrew M. Havlir, Dzung Q. Vu, Liang Kai Wang
OPERAND CACHE CONTROL TECHNIQUES

Publication number: 20170075810

Abstract: Techniques are disclosed relating to per-pipeline control for an operand cache. In some embodiments, an apparatus includes a register file and multiple execution pipelines. In some embodiments, the apparatus also includes an operand cache that includes multiple entries that each include multiple portions that are each configured to store an operand for a corresponding execution pipeline. In some embodiments, the operand cache is configured, during operation of the apparatus, to store data in only a subset of the portions of an entry. In some embodiments, the apparatus is configured to store, for each entry in the operand cache, a per-entry validity value that indicates whether the entry is valid and per-portion state information that indicates whether data for each portion is valid and whether data for each portion is modified relative to data in a corresponding entry in the register file.

Type: Application

Filed: September 11, 2015

Publication date: March 16, 2017

Inventors: Andrew M. Havlir, Terence M. Potter, Liang-Kai Wang
GPU Instruction Storage

Publication number: 20160371810

Abstract: Techniques are disclosed relating to low-level instruction storage in a graphics unit. In some embodiments, a graphics unit includes execution circuitry, decode circuitry, hazard circuitry, and caching circuitry. In some embodiments the execution circuitry is configured to execute clauses of graphics instructions. In some embodiments, the decode circuitry is configured to receive graphics instructions and a clause identifier for each received graphics instruction and to decode the received graphics instructions. In some embodiments, the hazard circuitry is configured to generate hazard information that specifies dependencies between ones of the decoded graphics instructions in the same clause. In some embodiments, the caching circuitry includes a plurality of entries each configured to store a set of decoded instructions in the same clause and hazard information generated by the decode circuitry for the clause.

Type: Application

Filed: June 22, 2015

Publication date: December 22, 2016

Inventors: Andrew M. Havlir, Dzung Q. Vu, Liang Kai Wang

prev 1 2 3 next