Patents by Inventor Liang-Kai Wang
Liang-Kai Wang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12008377Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline.Type: GrantFiled: April 12, 2023Date of Patent: June 11, 2024Assignee: Apple Inc.Inventors: Christopher A. Burns, Liang-Kai Wang, Robert D. Kenney, Terence M. Potter
-
Publication number: 20240061650Abstract: Techniques are disclosed relating to polynomial approximation of the base-2 logarithm. In some embodiments, floating-point circuitry is configured to perform an approximation of a base-2 logarithm operation and provide a fixed unit of least precision (ULP) error over a range of inputs. In some embodiments, the floating-point circuitry includes a set of parallel pipelines for polynomial approximation, where the output is chosen from a particular pipeline based on a determination of whether the input operand is in a first subset of a range of inputs. Disclosed techniques may advantageously provide fixed ULP error for an entire input operand range for the floating-point base-2 logarithmic function with minimal area and energy footprint, relative to traditional techniques.Type: ApplicationFiled: August 18, 2022Publication date: February 22, 2024Inventors: Liang-Kai Wang, Ian R. Ollmann, Anthony Y. Tai
-
Publication number: 20240053960Abstract: Techniques are disclosed relating to circuitry for floating-point division. In some embodiments, the circuitry is configured to generate a subnormal result for a division operation that divides a numerator by a denominator. The circuitry may include floating-point circuitry configured to perform a reciprocal operation to determine a normalized mantissa value for the reciprocal of a floating-point representation of the denominator. The circuitry may further include fixed-point circuitry configured to multiply a fixed-point representation of the normalized mantissa value for the reciprocal by a mantissa of the numerator to generate an initial value. Control circuitry may determine error data for the initial value and generate a final subnormal mantissa result for the division operation based on the error data and the initial value. Embodiments with multiple modes with different accuracy guarantees are disclosed.Type: ApplicationFiled: October 18, 2023Publication date: February 15, 2024Inventors: Liang-Kai Wang, Ian R. Ollmann, Anthony Y. Tai
-
Patent number: 11836459Abstract: Techniques are disclosed relating to circuitry for floating-point division. In some embodiments, the circuitry is configured to generate a subnormal result for a division operation that divides a numerator by a denominator. The circuitry may include floating-point circuitry configured to perform a reciprocal operation to determine a normalized mantissa value for the reciprocal of a floating-point representation of the denominator. The circuitry may further include fixed-point circuitry configured to multiply a fixed-point representation of the normalized mantissa value for the reciprocal by a mantissa of the numerator to generate an initial value. Control circuitry may determine error data for the initial value and generate a final subnormal mantissa result for the division operation based on the error data and the initial value. Embodiments with multiple modes with different accuracy guarantees are disclosed.Type: GrantFiled: March 30, 2021Date of Patent: December 5, 2023Assignee: Apple Inc.Inventors: Liang-Kai Wang, Ian R. Ollmann, Anthony Y. Tai
-
Publication number: 20230325196Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline.Type: ApplicationFiled: April 12, 2023Publication date: October 12, 2023Inventors: Christopher A. Burns, Liang-Kai Wang, Robert D. Kenney, Terence M. Potter
-
Patent number: 11727530Abstract: Techniques are disclosed relating to low-level instruction storage in a processing unit. In some embodiments, a graphics unit includes execution circuitry, decode circuitry, hazard circuitry, and caching circuitry. In some embodiments the execution circuitry is configured to execute clauses of graphics instructions. In some embodiments, the decode circuitry is configured to receive graphics instructions and a clause identifier for each received graphics instruction and to decode the received graphics instructions. In some embodiments, the caching circuitry includes a plurality of entries each configured to store a set of decoded instructions in the same clause. A given clause may be fetched and executed multiple times, e.g., for different SIMD groups, while stored in the caching circuitry.Type: GrantFiled: May 28, 2021Date of Patent: August 15, 2023Assignee: Apple Inc.Inventors: Andrew M. Havlir, Dzung Q. Vu, Liang Kai Wang
-
Patent number: 11645084Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline.Type: GrantFiled: September 9, 2021Date of Patent: May 9, 2023Assignee: Apple Inc.Inventors: Christopher A. Burns, Liang-Kai Wang, Robert D. Kenney, Terence M. Potter
-
Publication number: 20220317971Abstract: Techniques are disclosed relating to circuitry for floating-point division. In some embodiments, the circuitry is configured to generate a subnormal result for a division operation that divides a numerator by a denominator. The circuitry may include floating-point circuitry configured to perform a reciprocal operation to determine a normalized mantissa value for the reciprocal of a floating-point representation of the denominator. The circuitry may further include fixed-point circuitry configured to multiply a fixed-point representation of the normalized mantissa value for the reciprocal by a mantissa of the numerator to generate an initial value. Control circuitry may determine error data for the initial value and generate a final subnormal mantissa result for the division operation based on the error data and the initial value. Embodiments with multiple modes with different accuracy guarantees are disclosed.Type: ApplicationFiled: March 30, 2021Publication date: October 6, 2022Inventors: Liang-Kai Wang, Ian R. Ollmann, Anthony Y. Tai
-
Patent number: 11439871Abstract: A system for determining an assessment of at least one exercise performed by a user is described. The system includes an input device, and a computing device. The input device is configured to monitor at least one exercise performed by a user. The computing device includes processors and a memory. The memory is coupled to the processors and stores program instructions that when executed by the processors cause the processors to: (1) generate a user interface displaying a content; (2) provide an instruction associated with the at least one exercise; (3) determine an indication of movement associated with the at least one exercise; (4) in response to a determination of the indication of movement, determine an assessment of the at least one exercise; and (5) in response to a determination of the assessment of the at least one exercise, perform an operation.Type: GrantFiled: February 27, 2019Date of Patent: September 13, 2022Assignee: Conzian Ltd.Inventors: Yan-Fu Liu, Po-Jui Huang, Liang-Kai Wang, Chung-Hsien Wu, Jian-Lin Chen
-
Patent number: 11372621Abstract: Techniques are disclosed relating to floating-point circuitry configured to perform a corner check instruction for a floating-point power operation. In some embodiments, the power operation is performed by executing multiple instructions, including one or more instructions specify to generate an initial power result of a first input raised to the power of a second input as 2(second input*log2(first input)). In some embodiments, the corner check instruction operates on the first and second inputs and outputs output a corrected power result based on detection of a corner condition for the first and second inputs. Corner check circuitry may share circuits with other datapaths. In various embodiments, the disclosed techniques may reduce code size and power consumption for the power operation.Type: GrantFiled: June 4, 2020Date of Patent: June 28, 2022Assignee: Apple Inc.Inventors: Anthony Y. Tai, Liang-Kai Wang, Ian R. Ollmann, Anand Poovekurussi
-
Patent number: 11294672Abstract: Techniques are disclosed relating to routing circuitry configured to perform permute operations for operands of threads in a single-instruction multiple-data group. In some embodiments, an apparatus includes hierarchical operand routing circuitry configured to route operands between a set of single-instruction multiple-data (SIMD) pipelines based on a permute instruction. In some embodiments, the routing circuitry includes a first level and a second level. The first level may include a set of multiple crossbar circuits each configured to receive operands from a respective subset of the pipelines and output one or more of the received operands on multiple output lines based on the permute instruction, where the crossbar circuits support full permutation within a respective subset.Type: GrantFiled: August 22, 2019Date of Patent: April 5, 2022Assignee: Apple Inc.Inventors: Robert D. Kenney, Liang-Kai Wang, Terence M. Potter
-
Patent number: 11256518Abstract: Techniques are disclosed relating to sharing operands among SIMD threads for a larger arithmetic operation. In some embodiments, a set of multiple hardware pipelines is configured to execute single-instruction multiple-data (SIMD) instructions for multiple threads in parallel, where ones of the hardware pipelines include execution circuitry configured to perform floating-point operations using one or more pipeline stages of the pipeline and first routing circuitry configured to select, from among thread-specific operands stored for the hardware pipeline and from one or more other pipelines in the set, a first input operand for an operation by the execution circuitry. In some embodiments, a device is configured to perform a mathematical operation on source input data structures stored across thread-specific storage for the set of hardware pipelines, by executing multiple SIMD floating-point operations using the execution circuitry and the first routing circuitry.Type: GrantFiled: October 9, 2019Date of Patent: February 22, 2022Assignee: Apple Inc.Inventors: Liang-Kai Wang, Robert D. Kenney, Terence M. Potter, Vinod Reddy Nalamalapu, Sivayya V. Ayinala
-
Publication number: 20210406031Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline.Type: ApplicationFiled: September 9, 2021Publication date: December 30, 2021Inventors: Christopher A. Burns, Liang-Kai Wang, Robert D. Kenney, Terence M. Potter
-
Patent number: 11210761Abstract: Techniques are disclosed relating to selecting a number of candidates based on priority. In some embodiments, position determination circuitry receives an input vector that orders a set of potential candidates from a highest-priority position within the input vector to a lowest priority position. In some embodiments, it determines, starting from a first end of the input vector and based on non-overlapping groups of candidates, a particular position within the input vector at which a threshold number of available candidate are found. This may include to generate respective count values within the groups of candidates, identify a transition group in which the particular position is located based on accumulation of the respective count values, and identify the particular position within the transition group. Output circuitry may generate, based on the particular position, an output vector that indicates the threshold number of available candidates from the input vector.Type: GrantFiled: December 22, 2020Date of Patent: December 28, 2021Assignee: Apple Inc.Inventors: Liang-Kai Wang, Vinod Reddy Nalamalapu
-
Publication number: 20210382687Abstract: Techniques are disclosed relating to floating-point circuitry configured to perform a corner check instruction for a floating-point power operation. In some embodiments, the power operation is performed by executing multiple instructions, including one or more instructions specify to generate an initial power result of a first input raised to the power of a second input as 2(second input*log2(first input)). In some embodiments, the corner check instruction operates on the first and second inputs and outputs output a corrected power result based on detection of a corner condition for the first and second inputs. Corner check circuitry may share circuits with other datapaths. In various embodiments, the disclosed techniques may reduce code size and power consumption for the power operation.Type: ApplicationFiled: June 4, 2020Publication date: December 9, 2021Inventors: Anthony Y. Tai, Liang-Kai Wang, Ian R. Ollmann, Anand Poovekurussi
-
Publication number: 20210358078Abstract: Techniques are disclosed relating to low-level instruction storage in a processing unit. In some embodiments, a graphics unit includes execution circuitry, decode circuitry, hazard circuitry, and caching circuitry. In some embodiments the execution circuitry is configured to execute clauses of graphics instructions. In some embodiments, the decode circuitry is configured to receive graphics instructions and a clause identifier for each received graphics instruction and to decode the received graphics instructions. In some embodiments, the caching circuitry includes a plurality of entries each configured to store a set of decoded instructions in the same clause. A given clause may be fetched and executed multiple times, e.g., for different SIMD groups, while stored in the caching circuitry.Type: ApplicationFiled: May 28, 2021Publication date: November 18, 2021Inventors: Andrew M. Havlir, Dzung Q. Vu, Liang Kai Wang
-
Patent number: 11126439Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline.Type: GrantFiled: November 15, 2019Date of Patent: September 21, 2021Assignee: Apple Inc.Inventors: Christopher A. Burns, Liang-Kai Wang, Robert D. Kenney, Terence M. Potter
-
Patent number: 11023997Abstract: Techniques are disclosed relating to low-level instruction storage in a processing unit. In some embodiments, a graphics unit includes execution circuitry, decode circuitry, hazard circuitry, and caching circuitry. In some embodiments the execution circuitry is configured to execute clauses of graphics instructions. In some embodiments, the decode circuitry is configured to receive graphics instructions and a clause identifier for each received graphics instruction and to decode the received graphics instructions. In some embodiments, the caching circuitry includes a plurality of entries each configured to store a set of decoded instructions in the same clause. A given clause may be fetched and executed multiple times, e.g., for different SIMD groups, while stored in the caching circuitry.Type: GrantFiled: July 24, 2017Date of Patent: June 1, 2021Assignee: Apple Inc.Inventors: Andrew M. Havlir, Dzung Q. Vu, Liang Kai Wang
-
Publication number: 20210149679Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline.Type: ApplicationFiled: November 15, 2019Publication date: May 20, 2021Inventors: Christopher A. Burns, Liang-Kai Wang, Robert D. Kenney, Terence M. Potter
-
Publication number: 20210109761Abstract: Techniques are disclosed relating to sharing operands among SIMD threads for a larger arithmetic operation. In some embodiments, a set of multiple hardware pipelines is configured to execute single-instruction multiple-data (SIMD) instructions for multiple threads in parallel, where ones of the hardware pipelines include execution circuitry configured to perform floating-point operations using one or more pipeline stages of the pipeline and first routing circuitry configured to select, from among thread-specific operands stored for the hardware pipeline and from one or more other pipelines in the set, a first input operand for an operation by the execution circuitry. In some embodiments, a device is configured to perform a mathematical operation on source input data structures stored across thread-specific storage for the set of hardware pipelines, by executing multiple SIMD floating-point operations using the execution circuitry and the first routing circuitry.Type: ApplicationFiled: October 9, 2019Publication date: April 15, 2021Inventors: Liang-Kai Wang, Robert D. Kenney, Terence M. Potter, Vinod Reddy Nalamalapu, Sivayya V. Ayinala