Patents by Inventor Liang-Kai Wang

Liang-Kai Wang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12008377
    Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline.
    Type: Grant
    Filed: April 12, 2023
    Date of Patent: June 11, 2024
    Assignee: Apple Inc.
    Inventors: Christopher A. Burns, Liang-Kai Wang, Robert D. Kenney, Terence M. Potter
  • Publication number: 20240061650
    Abstract: Techniques are disclosed relating to polynomial approximation of the base-2 logarithm. In some embodiments, floating-point circuitry is configured to perform an approximation of a base-2 logarithm operation and provide a fixed unit of least precision (ULP) error over a range of inputs. In some embodiments, the floating-point circuitry includes a set of parallel pipelines for polynomial approximation, where the output is chosen from a particular pipeline based on a determination of whether the input operand is in a first subset of a range of inputs. Disclosed techniques may advantageously provide fixed ULP error for an entire input operand range for the floating-point base-2 logarithmic function with minimal area and energy footprint, relative to traditional techniques.
    Type: Application
    Filed: August 18, 2022
    Publication date: February 22, 2024
    Inventors: Liang-Kai Wang, Ian R. Ollmann, Anthony Y. Tai
  • Publication number: 20240053960
    Abstract: Techniques are disclosed relating to circuitry for floating-point division. In some embodiments, the circuitry is configured to generate a subnormal result for a division operation that divides a numerator by a denominator. The circuitry may include floating-point circuitry configured to perform a reciprocal operation to determine a normalized mantissa value for the reciprocal of a floating-point representation of the denominator. The circuitry may further include fixed-point circuitry configured to multiply a fixed-point representation of the normalized mantissa value for the reciprocal by a mantissa of the numerator to generate an initial value. Control circuitry may determine error data for the initial value and generate a final subnormal mantissa result for the division operation based on the error data and the initial value. Embodiments with multiple modes with different accuracy guarantees are disclosed.
    Type: Application
    Filed: October 18, 2023
    Publication date: February 15, 2024
    Inventors: Liang-Kai Wang, Ian R. Ollmann, Anthony Y. Tai
  • Patent number: 11836459
    Abstract: Techniques are disclosed relating to circuitry for floating-point division. In some embodiments, the circuitry is configured to generate a subnormal result for a division operation that divides a numerator by a denominator. The circuitry may include floating-point circuitry configured to perform a reciprocal operation to determine a normalized mantissa value for the reciprocal of a floating-point representation of the denominator. The circuitry may further include fixed-point circuitry configured to multiply a fixed-point representation of the normalized mantissa value for the reciprocal by a mantissa of the numerator to generate an initial value. Control circuitry may determine error data for the initial value and generate a final subnormal mantissa result for the division operation based on the error data and the initial value. Embodiments with multiple modes with different accuracy guarantees are disclosed.
    Type: Grant
    Filed: March 30, 2021
    Date of Patent: December 5, 2023
    Assignee: Apple Inc.
    Inventors: Liang-Kai Wang, Ian R. Ollmann, Anthony Y. Tai
  • Publication number: 20230325196
    Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline.
    Type: Application
    Filed: April 12, 2023
    Publication date: October 12, 2023
    Inventors: Christopher A. Burns, Liang-Kai Wang, Robert D. Kenney, Terence M. Potter
  • Patent number: 11727530
    Abstract: Techniques are disclosed relating to low-level instruction storage in a processing unit. In some embodiments, a graphics unit includes execution circuitry, decode circuitry, hazard circuitry, and caching circuitry. In some embodiments the execution circuitry is configured to execute clauses of graphics instructions. In some embodiments, the decode circuitry is configured to receive graphics instructions and a clause identifier for each received graphics instruction and to decode the received graphics instructions. In some embodiments, the caching circuitry includes a plurality of entries each configured to store a set of decoded instructions in the same clause. A given clause may be fetched and executed multiple times, e.g., for different SIMD groups, while stored in the caching circuitry.
    Type: Grant
    Filed: May 28, 2021
    Date of Patent: August 15, 2023
    Assignee: Apple Inc.
    Inventors: Andrew M. Havlir, Dzung Q. Vu, Liang Kai Wang
  • Patent number: 11645084
    Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline.
    Type: Grant
    Filed: September 9, 2021
    Date of Patent: May 9, 2023
    Assignee: Apple Inc.
    Inventors: Christopher A. Burns, Liang-Kai Wang, Robert D. Kenney, Terence M. Potter
  • Publication number: 20220317971
    Abstract: Techniques are disclosed relating to circuitry for floating-point division. In some embodiments, the circuitry is configured to generate a subnormal result for a division operation that divides a numerator by a denominator. The circuitry may include floating-point circuitry configured to perform a reciprocal operation to determine a normalized mantissa value for the reciprocal of a floating-point representation of the denominator. The circuitry may further include fixed-point circuitry configured to multiply a fixed-point representation of the normalized mantissa value for the reciprocal by a mantissa of the numerator to generate an initial value. Control circuitry may determine error data for the initial value and generate a final subnormal mantissa result for the division operation based on the error data and the initial value. Embodiments with multiple modes with different accuracy guarantees are disclosed.
    Type: Application
    Filed: March 30, 2021
    Publication date: October 6, 2022
    Inventors: Liang-Kai Wang, Ian R. Ollmann, Anthony Y. Tai
  • Patent number: 11439871
    Abstract: A system for determining an assessment of at least one exercise performed by a user is described. The system includes an input device, and a computing device. The input device is configured to monitor at least one exercise performed by a user. The computing device includes processors and a memory. The memory is coupled to the processors and stores program instructions that when executed by the processors cause the processors to: (1) generate a user interface displaying a content; (2) provide an instruction associated with the at least one exercise; (3) determine an indication of movement associated with the at least one exercise; (4) in response to a determination of the indication of movement, determine an assessment of the at least one exercise; and (5) in response to a determination of the assessment of the at least one exercise, perform an operation.
    Type: Grant
    Filed: February 27, 2019
    Date of Patent: September 13, 2022
    Assignee: Conzian Ltd.
    Inventors: Yan-Fu Liu, Po-Jui Huang, Liang-Kai Wang, Chung-Hsien Wu, Jian-Lin Chen
  • Patent number: 11372621
    Abstract: Techniques are disclosed relating to floating-point circuitry configured to perform a corner check instruction for a floating-point power operation. In some embodiments, the power operation is performed by executing multiple instructions, including one or more instructions specify to generate an initial power result of a first input raised to the power of a second input as 2(second input*log2(first input)). In some embodiments, the corner check instruction operates on the first and second inputs and outputs output a corrected power result based on detection of a corner condition for the first and second inputs. Corner check circuitry may share circuits with other datapaths. In various embodiments, the disclosed techniques may reduce code size and power consumption for the power operation.
    Type: Grant
    Filed: June 4, 2020
    Date of Patent: June 28, 2022
    Assignee: Apple Inc.
    Inventors: Anthony Y. Tai, Liang-Kai Wang, Ian R. Ollmann, Anand Poovekurussi
  • Patent number: 11294672
    Abstract: Techniques are disclosed relating to routing circuitry configured to perform permute operations for operands of threads in a single-instruction multiple-data group. In some embodiments, an apparatus includes hierarchical operand routing circuitry configured to route operands between a set of single-instruction multiple-data (SIMD) pipelines based on a permute instruction. In some embodiments, the routing circuitry includes a first level and a second level. The first level may include a set of multiple crossbar circuits each configured to receive operands from a respective subset of the pipelines and output one or more of the received operands on multiple output lines based on the permute instruction, where the crossbar circuits support full permutation within a respective subset.
    Type: Grant
    Filed: August 22, 2019
    Date of Patent: April 5, 2022
    Assignee: Apple Inc.
    Inventors: Robert D. Kenney, Liang-Kai Wang, Terence M. Potter
  • Patent number: 11256518
    Abstract: Techniques are disclosed relating to sharing operands among SIMD threads for a larger arithmetic operation. In some embodiments, a set of multiple hardware pipelines is configured to execute single-instruction multiple-data (SIMD) instructions for multiple threads in parallel, where ones of the hardware pipelines include execution circuitry configured to perform floating-point operations using one or more pipeline stages of the pipeline and first routing circuitry configured to select, from among thread-specific operands stored for the hardware pipeline and from one or more other pipelines in the set, a first input operand for an operation by the execution circuitry. In some embodiments, a device is configured to perform a mathematical operation on source input data structures stored across thread-specific storage for the set of hardware pipelines, by executing multiple SIMD floating-point operations using the execution circuitry and the first routing circuitry.
    Type: Grant
    Filed: October 9, 2019
    Date of Patent: February 22, 2022
    Assignee: Apple Inc.
    Inventors: Liang-Kai Wang, Robert D. Kenney, Terence M. Potter, Vinod Reddy Nalamalapu, Sivayya V. Ayinala
  • Publication number: 20210406031
    Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline.
    Type: Application
    Filed: September 9, 2021
    Publication date: December 30, 2021
    Inventors: Christopher A. Burns, Liang-Kai Wang, Robert D. Kenney, Terence M. Potter
  • Patent number: 11210761
    Abstract: Techniques are disclosed relating to selecting a number of candidates based on priority. In some embodiments, position determination circuitry receives an input vector that orders a set of potential candidates from a highest-priority position within the input vector to a lowest priority position. In some embodiments, it determines, starting from a first end of the input vector and based on non-overlapping groups of candidates, a particular position within the input vector at which a threshold number of available candidate are found. This may include to generate respective count values within the groups of candidates, identify a transition group in which the particular position is located based on accumulation of the respective count values, and identify the particular position within the transition group. Output circuitry may generate, based on the particular position, an output vector that indicates the threshold number of available candidates from the input vector.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: December 28, 2021
    Assignee: Apple Inc.
    Inventors: Liang-Kai Wang, Vinod Reddy Nalamalapu
  • Publication number: 20210382687
    Abstract: Techniques are disclosed relating to floating-point circuitry configured to perform a corner check instruction for a floating-point power operation. In some embodiments, the power operation is performed by executing multiple instructions, including one or more instructions specify to generate an initial power result of a first input raised to the power of a second input as 2(second input*log2(first input)). In some embodiments, the corner check instruction operates on the first and second inputs and outputs output a corrected power result based on detection of a corner condition for the first and second inputs. Corner check circuitry may share circuits with other datapaths. In various embodiments, the disclosed techniques may reduce code size and power consumption for the power operation.
    Type: Application
    Filed: June 4, 2020
    Publication date: December 9, 2021
    Inventors: Anthony Y. Tai, Liang-Kai Wang, Ian R. Ollmann, Anand Poovekurussi
  • Publication number: 20210358078
    Abstract: Techniques are disclosed relating to low-level instruction storage in a processing unit. In some embodiments, a graphics unit includes execution circuitry, decode circuitry, hazard circuitry, and caching circuitry. In some embodiments the execution circuitry is configured to execute clauses of graphics instructions. In some embodiments, the decode circuitry is configured to receive graphics instructions and a clause identifier for each received graphics instruction and to decode the received graphics instructions. In some embodiments, the caching circuitry includes a plurality of entries each configured to store a set of decoded instructions in the same clause. A given clause may be fetched and executed multiple times, e.g., for different SIMD groups, while stored in the caching circuitry.
    Type: Application
    Filed: May 28, 2021
    Publication date: November 18, 2021
    Inventors: Andrew M. Havlir, Dzung Q. Vu, Liang Kai Wang
  • Patent number: 11126439
    Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline.
    Type: Grant
    Filed: November 15, 2019
    Date of Patent: September 21, 2021
    Assignee: Apple Inc.
    Inventors: Christopher A. Burns, Liang-Kai Wang, Robert D. Kenney, Terence M. Potter
  • Patent number: 11023997
    Abstract: Techniques are disclosed relating to low-level instruction storage in a processing unit. In some embodiments, a graphics unit includes execution circuitry, decode circuitry, hazard circuitry, and caching circuitry. In some embodiments the execution circuitry is configured to execute clauses of graphics instructions. In some embodiments, the decode circuitry is configured to receive graphics instructions and a clause identifier for each received graphics instruction and to decode the received graphics instructions. In some embodiments, the caching circuitry includes a plurality of entries each configured to store a set of decoded instructions in the same clause. A given clause may be fetched and executed multiple times, e.g., for different SIMD groups, while stored in the caching circuitry.
    Type: Grant
    Filed: July 24, 2017
    Date of Patent: June 1, 2021
    Assignee: Apple Inc.
    Inventors: Andrew M. Havlir, Dzung Q. Vu, Liang Kai Wang
  • Publication number: 20210149679
    Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline.
    Type: Application
    Filed: November 15, 2019
    Publication date: May 20, 2021
    Inventors: Christopher A. Burns, Liang-Kai Wang, Robert D. Kenney, Terence M. Potter
  • Publication number: 20210109761
    Abstract: Techniques are disclosed relating to sharing operands among SIMD threads for a larger arithmetic operation. In some embodiments, a set of multiple hardware pipelines is configured to execute single-instruction multiple-data (SIMD) instructions for multiple threads in parallel, where ones of the hardware pipelines include execution circuitry configured to perform floating-point operations using one or more pipeline stages of the pipeline and first routing circuitry configured to select, from among thread-specific operands stored for the hardware pipeline and from one or more other pipelines in the set, a first input operand for an operation by the execution circuitry. In some embodiments, a device is configured to perform a mathematical operation on source input data structures stored across thread-specific storage for the set of hardware pipelines, by executing multiple SIMD floating-point operations using the execution circuitry and the first routing circuitry.
    Type: Application
    Filed: October 9, 2019
    Publication date: April 15, 2021
    Inventors: Liang-Kai Wang, Robert D. Kenney, Terence M. Potter, Vinod Reddy Nalamalapu, Sivayya V. Ayinala