Patents by Inventor Jian OUYANG

Jian OUYANG has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210318878
    Abstract: According to various embodiments, methods and systems are provided to accelerate artificial intelligence (AI) model training with advanced interconnect communication technologies and systematic zero-value compression over a distributed training system. According to an exemplary method, during each iteration of a Scatter-Reduce process performed on a cluster of processors arranged in a logical ring to train a neural network model, a processor receives a compressed data block from a prior processor in the logical ring, performs an operation on the received compressed data block and a compressed data block generated on the processor to obtain a calculated data block, and sends the calculated data block to a following processor in the logical ring. A compressed data block calculated from corresponding data blocks from the processors can be identified on each processor and distributed to each other processor and decompressed therein for use in the AI model training.
    Type: Application
    Filed: October 12, 2019
    Publication date: October 14, 2021
    Inventors: Zhibiao ZHAO, Jian OUYANG, Hefei ZHU, Qingshu CHEN, Wei QI
  • Publication number: 20210281408
    Abstract: According to one embodiment, a DP accelerator includes one or more execution units (EUs) configured to perform data processing operations in response to an instruction received from a host system coupled over a bus. The DP accelerator includes a time unit (TU) coupled to the security unit to provide timestamp services. The DP accelerator includes a security unit (SU) configured to establish and maintain a secure channel with the host system to exchange commands and data associated with the data processing operations, where the security unit includes a secure storage area to store a private root key associated with the DP accelerator, where the private root key is utilized for authentication. The SU includes a random number generator to generate a random number, and a cryptographic engine to perform cryptographic operations on data exchanged with the host system over the bus using a session key derived based on the random number.
    Type: Application
    Filed: January 4, 2019
    Publication date: September 9, 2021
    Inventors: Yong LIU, Yueqiang CHENG, Jian OUYANG, Tao WEI
  • Publication number: 20210279344
    Abstract: According to one embodiment, a system establishes a secure connection between a host system and a data processing (DP) accelerator over a bus, the secure connection including one or more data channels. The system transmits a first instruction from the host system to the DP accelerator over a command channel, the first instruction requesting the DP accelerator to perform a data preparation operation. The system receives a first request to read a first data from a first memory location of the host system from the DP accelerator over one data channel. In response to the request, the system transmits the first data to the DP accelerator over the data channel, where the first data is utilized for a computation or a configuration operation. The system transmits a second instruction from the host system to the DP accelerator over the command channel to perform the computation or the configuration operation.
    Type: Application
    Filed: January 4, 2019
    Publication date: September 9, 2021
    Inventors: Yong LIU, Yueqiang CHENG, Jian OUYANG, Tao WEI
  • Publication number: 20210271482
    Abstract: Example embodiments of the present application provide an instruction executing method and apparatus, an electronic device, and a computer-readable storage medium that may be applied in the field of artificial intelligence. The instruction executing method may include: executing an instruction sequence that includes memory instructions and non-memory instructions, the instructions in the sequence executed starting to be executed in order; determining that execution of a first memory instruction needs to be completed before a second memory instruction starts to be executed, the second memory instruction being a next memory instruction following the first memory instruction in the instruction sequence; and executing non-memory instructions between the first memory instruction and the second memory instruction without executing the second memory instruction, during a cycle of executing the first memory instruction.
    Type: Application
    Filed: March 24, 2021
    Publication date: September 2, 2021
    Inventors: Yingnan Xu, Jian Ouyang, Xueliang Du, Kang An
  • Publication number: 20210250174
    Abstract: According to one embodiment, in response to receiving a temporary public key (PK_d) from a data processing (DP) accelerator, a system generates a first nonce (nc) at the host system, where the DP accelerator is coupled to the host system over a bus. The system transmits a request to create a session key from the host system to the DP accelerator, the request including a host public key (PK_O) and the first nonce. The system receives a second nonce (ns) from the DP accelerator, where the second nonce is encrypted using the host public key and a temporary private key (SK_d) corresponding to the temporary public key. The system generates a first session key based on the first nonce and the second nonce, which is utilized to encrypt or decrypt subsequent data exchanges between the host system and the DP accelerator.
    Type: Application
    Filed: January 4, 2019
    Publication date: August 12, 2021
    Inventors: Yueqiang CHENG, Yong LIU, Tao WEI, Jian OUYANG
  • Patent number: 11087203
    Abstract: The present application discloses a method and apparatus for processing a data sequence. A specific implementation of the method includes: receiving an inputted to-be-processed data sequence; copying a weight matrix in a recurrent neural network model to an embedded block random access memory (RAM) of a field-programmable gate array (FPGA); processing sequentially each piece of to-be-processed data in the to-be-processed data sequence by using an activation function in the recurrent neural network model and the weight matrix stored in the embedded block RAM; and outputting a processed data sequence corresponding to the to-be-processed data sequence. This implementation improves the data sequence processing efficiency of the recurrent neural network model.
    Type: Grant
    Filed: June 9, 2017
    Date of Patent: August 10, 2021
    Assignee: Beijing Baidu Netcom Science and Technology, Co., Ltd
    Inventors: Yong Wang, Jian Ouyang, Wei Qi, Sizhong Li
  • Publication number: 20210241095
    Abstract: Embodiments of the present disclosure propose a deep learning processing apparatus and method, device and storage medium, relating to the field of artificial intelligence.
    Type: Application
    Filed: September 10, 2020
    Publication date: August 5, 2021
    Inventors: Xiaozhang Gong, Jian Ouyang, Jing Wang, Wei Qi
  • Publication number: 20210218896
    Abstract: Various embodiments include a dynamic flex circuit that may be used in a camera with a moveable image sensor. The dynamic flex circuit may include one or more fixed end portions, a moveable end portion, and an intermediate portion. In some embodiments, the fixed end portion may be connected to another flex circuit of the camera. The moveable end portion may be coupled with the moveable image sensor. The intermediate portion may be configured to allow the moveable end portion to move with the moveable image sensor. Some embodiments include a reinforcement arrangement that reinforces one or more portions of the dynamic flex circuit.
    Type: Application
    Filed: March 15, 2021
    Publication date: July 15, 2021
    Applicant: Apple Inc.
    Inventors: Nicholas D. Smyth, Jian Ouyang, Scott W. Miller, Samuel M. Hyatt, Martin Auclair, Phillip R. Sommer
  • Patent number: 11055100
    Abstract: Embodiments of the present disclosure relate to a method for processing information, and a processor. The processor includes an arithmetic and logic unit, a bypass unit, a queue unit, a multiplexer, and a register file. The bypass unit includes a data processing subunit; the data processing subunit is configured to acquire at least one valid processing result outputted by the arithmetic and logic unit, determine a processing result from the at least one valid processing result, output the determined processing result to the multiplexer, and output processing results except for the determined processing result of among the at least one valid processing result to the queue unit; and the multiplexer is configured to sequentially output more than one valid processing results to the register file.
    Type: Grant
    Filed: July 3, 2019
    Date of Patent: July 6, 2021
    Assignee: Beijing Baidu Netcom Science and Technology Co., Ltd.
    Inventor: Jian Ouyang
  • Publication number: 20210174174
    Abstract: A data processing system includes a central processing unit (CPU) and accelerator cards coupled to the CPU over a bus, each of the accelerator cards having a plurality of data processing (DP) accelerators to receive DP tasks from the CPU and to perform the received DP tasks. At least two of the accelerator cards are coupled to each other via an inter-card connection, and at least two of the DP accelerators are coupled to each other via an inter-chip connection. Each of the inter-card connection and the inter-chip connection is capable of being dynamically activated or deactivated, such that in response to a request received from the CPU, any one of the accelerator cards or any one of the DP accelerators within any one of the accelerator cards can be enabled or disabled to process any one of the DP tasks received from the CPU.
    Type: Application
    Filed: November 15, 2019
    Publication date: June 10, 2021
    Inventors: Hefei ZHU, Jian OUYANG, Zhibiao ZHAO, Xiaozhang GONG, Qingshu CHEN
  • Publication number: 20210173934
    Abstract: According to one embodiment, a system performs a secure boot using a security module such as a trusted platform module (TPM) of a host system. The system establishes a trusted execution environment (TEE) associated with one or more processors of the host system. The system launches a memory manager within the TEE, where the memory manager is configured to manage memory resources of a data processing (DP) accelerator coupled to the host system over a bus, including maintaining memory usage information of global memory of the DP accelerator. In response to a request received from an application running within the TEE for accessing a memory location of the DP accelerator, the system allows or denies the request based on the memory usage information.
    Type: Application
    Filed: January 4, 2019
    Publication date: June 10, 2021
    Applicants: Baidu.com Times Technology (Beijing) Co., Ltd., Baidu USA LLC
    Inventors: Yong LIU, Yueqiang CHENG, Jian OUYANG, Tao WEI
  • Publication number: 20210173666
    Abstract: According to one embodiment, a data processing system performs a secure boot using a security module (e.g., a trusted platform module (TPM)) of a host system. The system verifies that an operating system (OS) and one or more drivers including an accelerator driver associated with a data processing (DP) accelerator is provided by a trusted source. The system launches the accelerator driver within the OS. The system generates a trusted execution environment (TEE) associated with one or more processors of the host system. The system launches an application and a runtime library within the TEE, where the application communicates with the DP accelerator via the runtime library and the accelerator driver.
    Type: Application
    Filed: January 4, 2019
    Publication date: June 10, 2021
    Inventors: Yueqiang CHENG, Yong LIU, Tao WEI, Jian OUYANG
  • Publication number: 20210176035
    Abstract: According to one embodiment, a system receives, at a host system from a data processing (DP) accelerator, an accelerator identifier (ID) that uniquely identifies the DP accelerator), wherein the host system is coupled to the DP accelerator over a bus. The system transmits the accelerator ID to a predetermined trusted server over a network. The system receives a certificate from the predetermined trusted server over the network, the certificate certifying the DP accelerator. The system extracts a public root key (PK_RK) from the certificate for verification, the PK_RK corresponding to a private root key (SK_RK) associated with the DP accelerator. The system establishes a secure channel with the DP accelerator using the PK_RK based on the verification to exchange data securely between the host system and the DP accelerator.
    Type: Application
    Filed: January 4, 2019
    Publication date: June 10, 2021
    Inventors: Yueqiang CHENG, Yong LIU, Tao WEI, Jian OUYANG
  • Publication number: 20210173661
    Abstract: According to one embodiment, a system receives, at a host system a public attestation key (PK_ATT) or a signed PK_ATT from a data processing (DP) accelerator over a bus. The system verifies the PK_ATT using a public root key (PK_RK) associated with the DP accelerator. In response to successfully verifying the PK_ATT, the system transmits a kernel identifier (ID) to the DP accelerator to request attesting a kernel object stored in the DP accelerator. In response to the system receives a kernel digest or a signed kernel digest corresponding to the kernel object form the DP accelerator, verifying the kernel digest using the PK_ATT. The system sends the verification results to the DP accelerator for the DP accelerator to access the kernel object based on the verification results.
    Type: Application
    Filed: January 4, 2019
    Publication date: June 10, 2021
    Inventors: Yueqiang CHENG, Yong LIU, Tao WEI, Jian OUYANG
  • Publication number: 20210173917
    Abstract: According to one embodiment, a system receives, at a runtime library executed within a trusted execution environment (TEE) of a host system, a request from an application to invoke a predetermined function to perform a predefined operation. In response to the request, the system identifies a kernel object associated with the predetermined function. The system verifies an executable image of the kernel object using a public key corresponding to a private key that was used to sign the executable image of the kernel object. In response to successfully the system verifies the executable image of the kernel object, transmitting the verified executable image of the kernel object to a data processing (DP) accelerator over a bus to be executed by the DP accelerator to perform the predefined operation.
    Type: Application
    Filed: January 4, 2019
    Publication date: June 10, 2021
    Inventors: Yueqiang CHENG, Yong LIU, Tao WEI, Jian OUYANG
  • Publication number: 20210173428
    Abstract: According to one embodiment, a DP accelerator includes one or more execution units (EUs) configured to perform data processing operations in response to an instruction received from a host system coupled over a bus. The DP accelerator includes a security unit (SU) configured to establish and maintain a secure channel with the host system to exchange commands and data associated with the data processing operations. The DP accelerator includes a time unit (TU) coupled to the security unit to provide timestamp services to the security unit, where the time unit includes a clock generator to generate clock signals locally without having to derive the clock signals from an external source. The TU includes a timestamp generator coupled to the clock generator to generate a timestamp based on the clock signals, and a power supply to provide power to the clock generator and the timestamp generator.
    Type: Application
    Filed: January 4, 2019
    Publication date: June 10, 2021
    Inventors: Yong LIU, Yueqiang CHENG, Jian OUYANG, Tao WEI
  • Publication number: 20210176063
    Abstract: According to one embodiment, a system receives, at a host channel manager (HCM) of a host system, a request from an application to establish a secure channel with a data processing (DP) accelerator, where the DP accelerator is coupled to the host system over a bus. In response to the request, the system generates a first session key for the secure channel based on a first private key of a first key pair associated with the HCM and a second public key of a second key pair associated with the DP accelerator. In response to a first data associated with the application to be sent to the DP accelerator, the system encrypts the first data using the first session key. The system then transmits the encrypted first data to the DP accelerator via the secure channel over the bus.
    Type: Application
    Filed: January 4, 2019
    Publication date: June 10, 2021
    Inventors: Yong LIU, Yueqiang CHENG, Jian OUYANG, Tao WEI
  • Patent number: 11023391
    Abstract: Disclosed are an apparatus for data processing, an artificial intelligence chip, and an electronic device. The apparatus for data processing includes: at least one input memory, at least one data conveying component, at least one multiplexed arbitration component, and at least one output memory. The input memory is connected to the data conveying component, the data conveying component is connected to the multiplexed arbitration component, and the multiplexed arbitration component is connected to the output memory.
    Type: Grant
    Filed: July 9, 2019
    Date of Patent: June 1, 2021
    Assignee: Beijing Baidu Netcom Science and Technology Co., Ltd.
    Inventors: Peng Wu, Jian Ouyang, Canghai Gu, Wei Qi, Ningyi Xu
  • Patent number: 11023801
    Abstract: The present application discloses a data processing method and apparatus. A specific implementation of the method includes: receiving floating point data sent from an electronic device; converting the received floating point data into fixed point data according to a data length and a value range of the received floating point data; performing calculation on the obtained fixed point data according to a preset algorithm to obtain result data in a fixed point form; and converting the obtained result data in the fixed point form into result data in a floating point form and sending the result data in the floating point form to the electronic device. This implementation improves the data processing efficiency.
    Type: Grant
    Filed: June 9, 2017
    Date of Patent: June 1, 2021
    Assignee: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.
    Inventors: Jian Ouyang, Wei Qi, Yong Wang, Lin Liu
  • Publication number: 20210092297
    Abstract: Various embodiments include a dynamic flex circuit that may be used in a camera with a moveable image sensor. The dynamic flex circuit may include one or more fixed end portions, a moveable end portion, and an intermediate portion. In some embodiments, the fixed end portion may be connected to another flex circuit of the camera. The moveable end portion may be coupled with the moveable image sensor. The intermediate portion may be configured to allow the moveable end portion to move with the moveable image sensor. Some embodiments include a reinforcement arrangement that reinforces one or more portions of the dynamic flex circuit.
    Type: Application
    Filed: September 18, 2020
    Publication date: March 25, 2021
    Applicant: Apple Inc.
    Inventors: Nicholas D. Smyth, Jian Ouyang, Scott W. Miller