Patents by Inventor Aaron Ng

Aaron Ng has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

System and method for passwordless logins

Patent number: 11374927

Abstract: A login system allows users to access computer systems without using a password. The passwordless system and method can use other information to securely and reliably identify true authorized system users. The identity of a user can be associated with their mobile device. The login can be based upon a minimal amount of information such as a name and a phone number which can be stored as an identification record for each of the users in a database.

Type: Grant

Filed: June 15, 2020

Date of Patent: June 28, 2022

Assignee: Affirm, Inc.

Inventors: Jeffrey Howard Kaditz, Andrew Gettings Stevens, Bradley N. Selby, Aaron Ng Ligon, Manuel De Jesus Arias
Neural network processing system having multiple processors and a neural network accelerator

Patent number: 11222256

Abstract: At least one neural network accelerator performs operations of a first subset of layers of a neural network on an input data set, generates an intermediate data set, and stores the intermediate data set in a shared memory queue in a shared memory. A first processor element of a host computer system provides input data to the neural network accelerator and signals the neural network accelerator to perform the operations of the first subset of layers of the neural network on the input data set. A second processor element of the host computer system reads the intermediate data set from the shared memory queue, performs operations of a second subset of layers of the neural network on the intermediate data set, and generates an output data set while the neural network accelerator is performing the operations of the first subset of layers of the neural network on another input data set.

Type: Grant

Filed: October 17, 2017

Date of Patent: January 11, 2022

Assignee: XILINX, INC.

Inventors: Xiao Teng, Aaron Ng, Ashish Sirasao, Elliott Delaye
Re-targetable interface for data exchange between heterogeneous systems and accelerator abstraction into software instructions

Patent number: 11204747

Abstract: Embodiments herein describe techniques for interfacing a neural network application with a neural network accelerator that operate on two heterogeneous computing systems. For example, the neural network application may execute on a central processing unit (CPU) in a computing system while the neural network accelerator executes on a FPGA. As a result, when moving a software-hardware boundary between the two heterogeneous systems, changes may be made to both the neural network application (using software code) and to the accelerator (using RTL). The embodiments herein describe a software defined approach where shared interface code is used to express both sides of the interface between the two heterogeneous systems in a single abstraction (e.g., a software class).

Type: Grant

Filed: October 17, 2017

Date of Patent: December 21, 2021

Assignee: XILINX, INC.

Inventors: Jindrich Zejda, Elliott Delaye, Yongjun Wu, Aaron Ng, Ashish Sirasao, Khang K. Dao, Christopher J. Case
Refinancing Tools for Purchasing Transactions

Publication number: 20210366040

Abstract: A web based finance system allows users to refinance past eligible purchasing transactions. The system can include a system server(s) which is in communication through a network with client computing devices associated with customers. The server can receive customer information and determine a refinancing value range based upon the user has a bank account and/or credit risk rating information. The server can identify prior eligible credit card purchases for refinancing and the customer can select one or more purchases for refinancing and the refinancing loan terms.

Type: Application

Filed: August 6, 2021

Publication date: November 25, 2021

Inventors: Max Levchin, Christopher Beckmann, Jeffrey Howard Kaditz, Roberto Novoa, Andrew Gettings Stevens, Manuel De Jesus Arias, Aaron Ng Ligon
Software-defined buffer/transposer for general matrix multiplication in a programmable IC

Patent number: 11036827

Abstract: Methods and apparatus are described for simultaneously buffering and reformatting (e.g., transposing) a matrix for high-speed data streaming in general matrix multiplication (GEMM), which may be implemented by a programmable integrated circuit (IC). Examples of the present disclosure increase the effective double data rate (DDR) memory throughput for streaming data into GEMM digital signal processing (DSP) engine multifold, as well as eliminate slow data reformatting on a host central processing unit (CPU). This may be accomplished through software-defined (e.g., C++) data structures and access patterns that result in hardware logic that simultaneously buffers and reorganizes the data to achieve linear DDR addressing.

Type: Grant

Filed: October 17, 2017

Date of Patent: June 15, 2021

Assignee: XILINX, INC.

Inventors: Jindrich Zejda, Elliott Delaye, Yongjun Wu, Aaron Ng, Ashish Sirasao, Khang K. Dao
Refinancing tools for purchasing transactions

Patent number: 11030685

Abstract: A web based finance system allows users to refinance past eligible purchasing transactions. The system can include a system server(s) which is in communication through a network with client computing devices associated with customers. The server can receive customer information and determine a refinancing value range based upon the user has a bank account and/or credit risk rating information. The server can identify prior eligible credit card purchases for refinancing and the customer can select one or more purchases for refinancing and the refinancing loan terms.

Type: Grant

Filed: October 22, 2015

Date of Patent: June 8, 2021

Assignee: AFFIRM, INC.

Inventors: Max Levchin, Christopher Beckmann, Jeffrey Howard Kaditz, Roberto Novoa, Andrew Gettings Stevens, Manuel De Jesus Arias, Aaron Ng Ligon
Inline image preprocessing for convolution operations using a matrix multiplier on an integrated circuit

Patent number: 10984500

Abstract: An example preprocessor circuit for formatting image data into a plurality of streams of image samples includes: a plurality of memory banks configured to store the image data; multiplexer circuitry coupled to the memory banks; a first plurality of registers coupled to the multiplexer circuitry; a second plurality of registers coupled to the first plurality of registers, outputs of the second plurality of registers configured to provide the plurality of streams of image samples; bank address and control circuitry coupled to control inputs of the plurality of memory banks, the multiplexer circuitry, and the first plurality of registers; output control circuitry coupled to control inputs of the second plurality of registers; and a control state machine coupled to the bank address and control circuitry and the output control circuitry.

Type: Grant

Filed: September 19, 2019

Date of Patent: April 20, 2021

Assignee: XILINX, INC.

Inventors: Ashish Sirasao, Elliott Delaye, Aaron Ng, Ehsan Ghasemi
Software-driven design optimization for fixed-point multiply-accumulate circuitry

Patent number: 10943039

Abstract: An example multiply accumulate (MACC) circuit includes: a multiply-accumulator having an accumulator output register; a quantizer, coupled to the multiply accumulator; and a control circuit coupled to the multiply-accumulator and the quantizer, the control circuit configured to provide control data to the quantizer, the control data indicative of a most-significant bit (MSB) to least significant bit (LSB) range for selecting bit indices from the accumulator output register.

Type: Grant

Filed: October 17, 2017

Date of Patent: March 9, 2021

Assignee: XILINX, INC.

Inventors: Ashish Sirasao, Elliott Delaye, Sean Settle, Zhao Ma, Ehsan Ghasemi, Xiao Teng, Aaron Ng, Jindrich Zejda
Sparse matrix processing circuitry

Patent number: 10936311

Abstract: Disclosed approaches for multiplying a sparse matrix by dense a vector or matrix include first memory banks for storage of column indices, second memory banks for storage of row indices, and third memory banks for storage of non-zero values of a sparse matrix. A pairing circuit distributes an input stream of vector elements across first first-in-first-out (FIFO) buffers according to the buffered column indices. Multiplication circuitry multiplies vector elements output from the first FIFO buffers by corresponding ones of the non-zero values from the third memory banks, and stores products in second FIFO buffers. Row-aligner circuitry organize the products output from the second FIFO buffers into third FIFO buffers according to row indices in the second memory banks. Accumulation circuitry accumulates respective totals from products output from the third FIFO buffers.

Type: Grant

Filed: July 9, 2019

Date of Patent: March 2, 2021

Assignee: Xilinx, Inc.

Inventors: Ling Liu, Yifei Zhou, Xiao Teng, Ashish Sirasao, Chuanhua Song, Aaron Ng
System and method for passwordless logins

Patent number: 10686781

Abstract: A login system allows users to access computer systems without using a password. The passwordless system and method can use other information to securely and reliably identify true authorized system users. The identity of a user can be associated with their mobile device. The login can be based upon a minimal amount of information such as a name and a phone number which can be stored as an identification record for each of the users in a database.

Type: Grant

Filed: December 20, 2014

Date of Patent: June 16, 2020

Assignee: Affirm Inc.

Inventors: Jeffrey Howard Kaditz, Andrew Gettings Stevens, Bradley Neale Selby, Aaron Ng Ligon, Manuel De Jesus Arias
Software-driven design optimization for mapping between floating-point and fixed-point multiply accumulators

Patent number: 10678509

Abstract: An example multiply accumulate (MACC) circuit includes a multiply-accumulator having an accumulator output register, a scaler, coupled to the multiply accumulator, and a control circuit coupled to the multiply-accumulator and the scaler. The control circuit is configured to provide control data to the scaler, the control data indicative of: a most-significant bit (MSB) to least significant bit (LSB) range for selecting bit indices from the accumulator output register for implementing a first right shift; a multiplier; and a second right shift.

Type: Grant

Filed: August 21, 2018

Date of Patent: June 9, 2020

Assignee: XILINX, INC.

Inventors: Sean Settle, Elliott Delaye, Aaron Ng, Ehsan Ghasemi, Ashish Sirasao, Xiao Teng, Jindrich Zejda
Data format suitable for fast massively parallel general matrix multiplication in a programmable IC

Patent number: 10515135

Abstract: Methods and apparatus are described for performing data-intensive compute algorithms, such as fast massively parallel general matrix multiplication (GEMM), using a particular data format for both storing data to and reading data from memory. This data format may be utilized for arbitrarily-sized input matrices for GEMM implemented on a finite-size GEMM accelerator in the form of a rectangular compute array of digital signal processing (DSP) elements or similar compute cores. This data format solves the issue of double data rate (DDR) dynamic random access memory (DRAM) bandwidth by allowing both linear DDR addressing and single cycle loading of data into the compute array, avoiding input/output (I/O) and/or DDR bottlenecks.

Type: Grant

Filed: October 17, 2017

Date of Patent: December 24, 2019

Assignee: XILINX, INC.

Inventors: Jindrich Zejda, Elliott Delaye, Aaron Ng, Ashish Sirasao, Yongjun Wu
Inline image preprocessing for convolution operations using a matrix multiplier on an integrated circuit

Patent number: 10460416

Abstract: An example preprocessor circuit for formatting image data into a plurality of streams of image samples includes: a plurality of memory banks configured to store the image data; multiplexer circuitry coupled to the memory banks; a first plurality of registers coupled to the multiplexer circuitry; a second plurality of registers coupled to the first plurality of registers, outputs of the second plurality of registers configured to provide the plurality of streams of image samples; and control circuitry configured to generate addresses for the plurality of memory banks, control the multiplexer circuitry to select among outputs of the plurality of memory banks, control the first plurality of registers to store outputs of the second plurality of multiplexers, and control the second plurality of registers to store outputs of the first plurality of registers.

Type: Grant

Filed: October 17, 2017

Date of Patent: October 29, 2019

Assignee: XILINX, INC.

Inventors: Ashish Sirasao, Elliott Delaye, Aaron Ng, Ehsan Ghasemi
Timing closure of circuit designs for integrated circuits

Patent number: 10366201

Abstract: Closing timing for a circuit design can include displaying, using a display device, a first region having a plurality of controls corresponding to a plurality of data sets generated at different times during a phase of a design flow for a circuit design, wherein each control selects a data set associated with the control, and displaying, using the display device, a second region configured to display a list of critical paths for data sets selected from the first region using one of the plurality of controls. Closing timing further can include displaying, using the display device, a third region configured to display a representation of a target integrated circuit including layouts for the critical paths of the list for implementations of the circuit design specified by the selected data sets.

Type: Grant

Filed: April 24, 2017

Date of Patent: July 30, 2019

Assignee: XILINX, INC.

Inventors: Aaron Ng, Sridhar Krishnamurthy, Grigor S. Gasparyan
Software-defined memory bandwidth reduction by hierarchical stream buffering for general matrix multiplication in a programmable IC

Patent number: 10354733

Abstract: Methods and apparatus are described for partitioning and reordering block-based matrix multiplications for high-speed data streaming in general matrix multiplication (GEMM), which may be implemented by a programmable integrated circuit (IC). By preloading and hierarchically caching the blocks, examples of the present disclosure reduce the double data rate (DDR) memory intake bandwidth for software-defined GEMM accelerators.

Type: Grant

Filed: October 17, 2017

Date of Patent: July 16, 2019

Assignee: XILINX, INC.

Inventors: Jindrich Zejda, Elliott Delaye, Ashish Sirasao, Yongjun Wu, Aaron Ng
MACHINE LEARNING RUNTIME LIBRARY FOR NEURAL NETWORK ACCELERATION

Publication number: 20190114533

Abstract: Embodiments herein describe techniques for interfacing a neural network application with a neural network accelerator using a library. The neural network application may execute on a host computing system while the neural network accelerator executes on a massively parallel hardware system, e.g., a FPGA. The library operates a pipeline for submitting the tasks received from the neural network application to the neural network accelerator. In one embodiment, the pipeline includes a pre-processing stage, an FPGA execution stage, and a post-processing stage which each correspond to different threads. When receiving a task from the neural network application, the library generates a packet that includes the information required for the different stages in the pipeline to perform the tasks. Because the stages correspond to different threads, the library can process multiple packets in parallel which can increase the utilization of the neural network accelerator on the hardware system.

Type: Application

Filed: October 17, 2017

Publication date: April 18, 2019

Applicant: Xilinx, Inc.

Inventors: Aaron Ng, Jindrich Zejda, Elliott Delaye, Xiao Teng, Sonal Santan, Soren T. Soe, Ashish Sirasao, Ehsan Ghasemi, Sean Settle
NEURAL NETWORK PROCESSING SYSTEM HAVING MULTIPLE PROCESSORS AND A NEURAL NETWORK ACCELERATOR

Publication number: 20190114534

Abstract: At least one neural network accelerator performs operations of a first subset of layers of a neural network on an input data set, generates an intermediate data set, and stores the intermediate data set in a shared memory queue in a shared memory. A first processor element of a host computer system provides input data to the neural network accelerator and signals the neural network accelerator to perform the operations of the first subset of layers of the neural network on the input data set. A second processor element of the host computer system reads the intermediate data set from the shared memory queue, performs operations of a second subset of layers of the neural network on the intermediate data set, and generates an output data set while the neural network accelerator is performing the operations of the first subset of layers of the neural network on another input data set.

Type: Application

Filed: October 17, 2017

Publication date: April 18, 2019

Applicant: Xilinx, Inc.

Inventors: Xiao Teng, Aaron Ng, Ashish Sirasao, Elliott Delaye
IMAGE PREPROCESSING FOR GENERALIZED IMAGE PROCESSING

Publication number: 20190114499

Abstract: An example preprocessor circuit for formatting image data into a plurality of streams of image samples includes: a first buffer configured to store a plurality of rows of the image data and output a row of the plurality of rows; a second buffer, coupled to the first buffer, including a plurality of storage locations to store a respective plurality of image samples of the row output by the first buffer; a plurality of shift registers; an interconnect network including a plurality of connections, each connection coupling a respective one of the plurality of shift registers to more than one of the plurality of storage locations, one or more of the plurality of storage locations being coupled to more than one of the plurality of connections; and a control circuit configured to load the plurality of shift registers with the plurality of image samples based on the plurality of connections and shift the plurality of shift registers to output the plurality of streams of image samples.

Type: Application

Filed: October 17, 2017

Publication date: April 18, 2019

Applicant: Xilinx, Inc.

Inventors: Elliott Delaye, Ashish Sirasao, Aaron Ng, Yongjun Wu, Jindrich Zejda
NEURAL NETWORK PROCESSING SYSTEM HAVING HOST CONTROLLED KERNEL ACCLERATORS

Publication number: 20190114535

Abstract: A disclosed neural network processing system includes a host computer system, a RAMs coupled to the host computer system, and neural network accelerators coupled to the RAMs, respectively. The host computer system is configured with software that when executed causes the host computer system to write input data and work requests to the RAMS. Each work request specifies a subset of neural network operations to perform and memory locations in a RAM of the input data and parameters. A graph of dependencies among neural network operations is built and additional dependencies added. The operations are partitioned into coarse grain tasks and fine grain subtasks for optimal scheduling for parallel execution. The subtasks are scheduled to accelerator kernels of matching capabilities. Each neural network accelerator is configured to read a work request from the respective RAM and perform the subset of neural network operations on the input data using the parameters.

Type: Application

Filed: October 17, 2017

Publication date: April 18, 2019

Applicant: Xilinx, Inc.

Inventors: Aaron Ng, Jindrich Zejda, Elliott Delaye, Xiao Teng, Ashish Sirasao
HOST-DIRECTED MULTI-LAYER NEURAL NETWORK PROCESSING VIA PER-LAYER WORK REQUESTS

Publication number: 20190114538

Abstract: In disclosed approaches of neural network processing, a host computer system copies an input data matrix from host memory to a shared memory for performing neural network operations of a first layer of a neural network by a neural network accelerator. The host instructs the neural network accelerator to perform neural network operations of each layer of the neural network beginning with the input data matrix. The neural network accelerator performs neural network operations of each layer in response to the instruction from the host. The host waits until the neural network accelerator signals completion of performing neural network operations of layer i before instructing the neural network accelerator to commence performing neural network operations of layer i+1, for i?1. The host instructs the neural network accelerator to use a results data matrix in the shared memory from layer i as an input data matrix for layer i+1 for i?1.

Type: Application

Filed: October 17, 2017

Publication date: April 18, 2019

Applicant: Xilinx, Inc.

Inventors: Aaron Ng, Elliott Delaye, Jindrich Zejda, Ashish Sirasao

prev 1 2 3 next