Patents by Inventor Ping Tak Peter Tang

Ping Tak Peter Tang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Auto-configuration of hardware non-linear function acceleration

Patent number: 11868304

Abstract: In an embodiment, an example computer-implemented method for configuring a hardware accelerator to perform a non-linear function involves: determining a plurality of intervals that partition an input domain of the non-linear function; determining a plurality of subinterval configurations corresponding to different numbers of subintervals for partitioning that interval; generating an error set comprising an error for using a polynomial function to approximate the non-linear function within one or more corresponding subintervals specified by the subinterval configuration; using the error set and resource constraints, selecting one of the subinterval configurations for each of the intervals to generate a configuration set that minimizes a worst-case error across the intervals; selecting one of the subinterval configurations for each of the intervals to generate an improved configuration set that minimizes a cumulative error across the intervals without exceeding the worst-case error; and configuring the hardware

Type: Grant

Filed: September 20, 2021

Date of Patent: January 9, 2024

Assignee: Meta Platforms, Inc.

Inventors: Ping Tak Peter Tang, Nimit Singhania
Suppressing interaction between bonded particles

Patent number: 11264120

Abstract: A method for managing flow of particles into an array of pairwise-point-interaction-module includes receiving a first set of particles into a first queue. The first set is a proper subset of a second set of particles that comprises all particles that are to be passed into an array of pairwise-point-interaction-modules during a current time period. Prior to having received all particles from the second set, particles from the first set are allowed to pass from the first queue into the array.

Type: Grant

Filed: September 10, 2019

Date of Patent: March 1, 2022

Assignee: D. E. Shaw Research, LLC

Inventors: Ping Tak Peter Tang, J. P. Grossman, Brannon Batson, Ron Dror
Suppressing interaction between bonded particles

Patent number: 11139049

Abstract: A method comprising causing a simulation machine for molecular dynamic simulation to determine that a topological distance that separates two particles is less than a threshold. The simulation machine includes nodes connected by a network. The nodes collectively representing a volume with each node corresponding to a portion of the simulation space. A topological relationship between the nodes corresponds to spatial relationship thereof in the simulation space. The simulation volume is occupied by particles that interact with each other. The two particles are among these particles. The simulation volume includes node boxes, each of which is handled by one of the nodes. Each of the nodes is implemented as an application specific integrated circuit that includes a combination of first and second hardware elements. The first hardware elements are especially designed to perform pairwise interactions. The second hardware elements operate to provide potentially interacting particles to the first hardware elements.

Type: Grant

Filed: November 16, 2015

Date of Patent: October 5, 2021

Assignee: D.E. Shaw Research, LLC

Inventors: Ping Tak Peter Tang, J. P. Grossman, Brannon Batson, Ron Dror
Suppressing Interaction Between Bonded Particles

Publication number: 20200005904

Abstract: A method for managing flow of particles into an array of pairwise-point-interaction-module includes receiving a first set of particles into a first queue. The first set is a proper subset of a second set of particles that comprises all particles that are to be passed into an array of pairwise-point-interaction-modules during a current time period. Prior to having received all particles from the second set, particles from the first set are allowed to pass from the first queue into the array.

Type: Application

Filed: September 10, 2019

Publication date: January 2, 2020

Inventors: Ping Tak Peter Tang, J.P. Grossman, Brannon Batson, Ron Dror
Processors, methods, and systems for a configurable spatial accelerator with performance, correctness, and power reduction features

Patent number: 10445451

Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform an operation when an incoming operand set arrives at the plurality of processing elements. At least one of the plurality of processing elements includes a plurality of control inputs.

Type: Grant

Filed: July 1, 2017

Date of Patent: October 15, 2019

Assignee: Intel Corporation

Inventors: Kermin Fleming, Kent D. Glossop, Simon C. Steely, Jr., Ping Tak Peter Tang
SUPPRESSING INTERACTION BETWEEN BONDED PARTICLES

Publication number: 20190087546

Abstract: A method comprising causing a computer to determine that a topological distance between two particles is less than a threshold.

Type: Application

Filed: November 16, 2015

Publication date: March 21, 2019

Inventors: Ping Tak Peter Tang, J.P. Grossman, Brannon Batson, Ron Dror
PROCESSORS, METHODS, AND SYSTEMS FOR A CONFIGURABLE SPATIAL ACCELERATOR WITH PERFORMANCE, CORRECTNESS, AND POWER REDUCTION FEATURES

Publication number: 20190005161

Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform an operation when an incoming operand set arrives at the plurality of processing elements. At least one of the plurality of processing elements includes a plurality of control inputs.

Type: Application

Filed: July 1, 2017

Publication date: January 3, 2019

Inventors: Kermin Fleming, Kent D. Glossop, Simon C. Steely, JR., Ping Tak Peter Tang
Fourier transform computation for distributed processing environments

Patent number: 9292476

Abstract: Fourier transform computation for distributed processing environments is disclosed. Example methods disclosed herein to compute a Fourier transform of an input data sequence include performing first processing on the input data sequence using a plurality of processors, the first processing resulting in an output data sequence having more data elements than the input data sequence Such example methods also include performing second processing on the output data sequence using the plurality of processors, the output data sequence being permutated among the plurality of processors, each of the processors performing the second processing on a respective permutated portion of the output data sequence to determine a respective, ordered segment of the Fourier transform of the input data sequence.

Type: Grant

Filed: October 10, 2012

Date of Patent: March 22, 2016

Assignee: Intel Corporation

Inventors: Ping Tak Peter Tang, Jong Soo Park, Vladimir Petrov
Method and apparatus for performing multiplicative functions

Patent number: 8838663

Abstract: A new function for calculating the reciprocal residual of a floating-point number X is defined as recip_residual(X)=1?X*recip(X), where recip(X) represents the reciprocal of X. The function may be implemented using a fused multiply-add unit in a processor. The reciprocal value of X, recip(X), may be obtained from a lookup table. The recip_residual function may help reduce the latency of many multiplicative functions that are based on products of multiple numbers and can be expressed in simple terms of functions on each individual number (e.g., log(U*V)=log(U)+log(V)).

Type: Grant

Filed: March 30, 2007

Date of Patent: September 16, 2014

Assignee: Intel Corporation

Inventors: Ping Tak Peter Tang, Robert Cavin
FUNCTION APPROXIMATION BASED ON STATISTICAL PROPERTIES

Publication number: 20140250161

Abstract: Embodiments of techniques and systems for approximating a function are described. In embodiments, a computing device may receive one or more statistical properties associated with application of an approximation function of a function over a target domain. The computing device may formulate one or more constraints on one or more parameters of a functional form of the approximation function, based at least in part on the one or more statistical properties. The computing device may then determine the one or more parameters subject to the constraints and out put results of the determination. In embodiments, the one or more parameters may be determined through application of an optimization procedure. Other embodiments, may be described and claimed.

Type: Application

Filed: March 28, 2012

Publication date: September 4, 2014

Inventor: Ping Tak Peter Tang
FOURIER TRANSFORM COMPUTATION FOR DISTRIBUTED PROCESSING ENVIRONMENTS

Publication number: 20140101219

Abstract: Fourier transform computation for distributed processing environments is disclosed. Example methods disclosed herein to compute a Fourier transform of an input data sequence include performing first processing on the input data sequence using a plurality of processors, the first processing resulting in an output data sequence having more data elements than the input data sequence Such example methods also include performing second processing on the output data sequence using the plurality of processors, the output data sequence being permutated among the plurality of processors, each of the processors performing the second processing on a respective permutated portion of the output data sequence to determine a respective, ordered segment of the Fourier transform of the input data sequence.

Type: Application

Filed: October 10, 2012

Publication date: April 10, 2014

Inventors: Ping Tak Peter Tang, Jong Soo Park, Vladimir Petrov
Rounding of binary integers

Patent number: 7747669

Abstract: Methods and apparatus to provide rounding of a binary integer are described. In one embodiment, a value that indicates whether a divisor divides a binary integer is extracted from a product of the binary integer and a scaled approximate reciprocal of the divisor.

Type: Grant

Filed: March 31, 2006

Date of Patent: June 29, 2010

Assignee: Intel Corporation

Inventors: Ping Tak (Peter) Tang, John R. Harrison
Method and apparatus for performing multiplicative functions

Publication number: 20080243985

Abstract: A new function for calculating the reciprocal residual of a floating-point number X is defined as recip_residual(X)=1?X*recip(X), where recip(X) represents the reciprocal of X. The function may be implemented using a fused multiply-add unit in a processor. The reciprocal value of X, recip(X), may be obtained from a lookup table. The recip_residual function may help reduce the latency of many multiplicative functions that are based on products of multiple numbers and can be expressed in simple terms of functions on each individual number (e.g., log(U*V)=log(U)+log(V)).

Type: Application

Filed: March 30, 2007

Publication date: October 2, 2008

Inventors: Ping Tak Peter Tang, Robert Cavin
Methods and apparatus for fast argument reduction in a computing system

Patent number: 7366748

Abstract: There is disclosed method, software and apparatus for evaluating a function f in a computing device using a reduction, core approximation and final reconstruction stage. According to one embodiment of the invention, an argument reduction stage uses an approximate reciprocal table in the computing device. According to another embodiment, an approximate reciprocal instruction I is operative on the computing device to use the approximate reciprocal table such that the argument reduction stage provides that—C:=I(X) and R:=X×C?1, the core approximation stage provides that p(R) is computed so that it approximates f(1+R), and the final reconstruction stage provides that T=f(1/C) is fetched and calculated if necessary, and f(X) is reconstructed based on f(X)=f([1/C]×[X×C])=g(f(1/C), f(1+R)).

Type: Grant

Filed: June 30, 2000

Date of Patent: April 29, 2008

Assignee: Intel Corporation

Inventors: Ping Tak Peter Tang, John Harrison, Theodore Kubaska
Apparatus and method for remainder calculation using short approximate floating-point quotient

Patent number: 7013320

Abstract: An apparatus and method for creating lookup tables of approximate floating-point quotients which exactly represent the underlying value, within the range of the specified precision. The lookup tables are created without any extraneous data beyond what is needed and also without sacrificing numerical accuracy, and may be creating for any radix.

Type: Grant

Filed: January 25, 2002

Date of Patent: March 14, 2006

Assignee: Intel Corporation

Inventor: Ping Tak Peter Tang
Economical on-the-fly rounding for digit-recurrence algorithms

Patent number: 6792443

Abstract: Apparatus and methods are provided for an improved on-the-fly rounding technique for digit-recurrence algorithms, such as division and square root calculations. According to one embodiment, only two forms of an intermediate result of an operation to be performed by a digit-recurrence algorithm are maintained. A first form is maintained in a first register and a second form is maintained in a second register. Responsive to receiving digits 1 to L−2 of the intermediate result from a digit recurrence unit, where L represents a number of digits that satisfies a predetermined precision for the operation, both forms of the intermediate result are updated by register swapping or concatenation under the control of load and shift control logic and on-the-fly conversion logic. Then, a rounded result is generated by determining digits dL−1 and dL and appending a rounded last digit to the appropriate form of the intermediate result.

Type: Grant

Filed: June 29, 2001

Date of Patent: September 14, 2004

Assignee: Intel Corporation

Inventor: Ping Tak Peter Tang
Branch-free software methodology for transcendental functions

Publication number: 20040015882

Abstract: Various embodiments of a computer-implemented branch-free methodology for approximating a function of an input argument are disclosed. The methodology includes selecting one of a number of breakpoints, such that a reduced argument for the function is less than a predetermined value. An approximate function of the reduced argument is evaluated, including accessing a look-up table based on the selected breakpoint to obtain value of a term in the approximate function. The look-up table has at least one breakpoint for which the reduced argument can be computed without roundoff error when the input argument is close to a root of the function. The branch-free methodology may be applied to compute transcendental functions such as the exponential, logarithm, and trigonometric functions.

Type: Application

Filed: June 5, 2001

Publication date: January 22, 2004

Inventor: Ping Tak Peter Tang
Apparatus and method for remainder calculation using short approximate floating-point quotient

Publication number: 20030145029

Abstract: An apparatus and method for creating lookup tables of approximate floating-point quotients which exactly represent the underlying value, within the range of the specified precision. The lookup tables are created without any extraneous data beyond what is needed and also without sacrificing numerical accuracy, and may be creating for any radix.

Type: Application

Filed: January 25, 2002

Publication date: July 31, 2003

Inventor: Ping Tak Peter Tang
Fast calculation of (A/B)K by a parallel floating-point processor

Patent number: 6598063

Abstract: A method suitable for calculating an expression having the form (A/B)K by a processor that features separate sets of floating point units which can operate in parallel for greater speed of execution. The processor issues instructions to determine an approximate reciprocal R0 of a first variable B. Further instructions are issued to raise a second variable to the power of a third variable K by a first set of arithmetic units of the processor, where the second variable is a function of the approximate reciprocal R0. Still further instructions are issued to calculate a polynomial q at a fourth variable delta by a second set of arithmetic units of the processor. The fourth variable delta is also a function of the approximate reciprocal R0. Finally, one or more instructions are issued to multiply the calculated polynomial by the second variable, having been raised to the power of the third variable, to yield (A/B)K.

Type: Grant

Filed: August 14, 2000

Date of Patent: July 22, 2003

Assignee: lntel Corporation

Inventors: Ping Tak Peter Tang, Theodore E. Kubaska
Economical on-the-fly rounding for digit-recurrence algorithms

Publication number: 20030009501

Abstract: Apparatus and methods are provided for an improved on-the-fly rounding technique for digit-recurrence algorithms, such as division and square root calculations. According to one embodiment, only two forms of an intermediate result of an operation to be performed by a digit-recurrence algorithm are maintained. A first form is maintained in a first register and a second form is maintained in a second register. Responsive to receiving digits 1 to L−2 of the intermediate result from a digit recurrence unit, where L represents a number of digits that satisfies a predetermined precision for the operation, both forms of the intermediate result are updated by register swapping or concatenation under the control of load and shift control logic and on-the-fly conversion logic. Then, a rounded result is generated by determining digits dL−1 and dL and appending a rounded last digit to the appropriate form of the intermediate result.

Type: Application

Filed: June 29, 2001

Publication date: January 9, 2003

Inventor: Ping Tak Peter Tang

1 2 next