Patents by Inventor Yunchen Pu

Yunchen Pu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10929681
    Abstract: A surveillance system is provided that includes an image capture device configured to capture a video sequence of a target area that includes objects and is formed from a set of image frames. The system further includes a processor configured to apply a C3D to the image frames to obtain therefor (i) intermediate feature representations across L convolutional layers and (ii) top-layer features. The processor is further configured to produce a first word of a caption for the sequence by applying the top-layer features to a LSTM. The processor is further configured to produce subsequent words of the caption by (i) dynamically performing spatiotemporal attention and layer attention using the intermediate feature representations to form a context vector, and (ii) applying the LSTM to the context vector, a previous word of the caption, and a hidden state of the LSTM. The system includes a display device for displaying the caption.
    Type: Grant
    Filed: October 26, 2017
    Date of Patent: February 23, 2021
    Inventors: Renqiang Min, Yunchen Pu
  • Patent number: 10402658
    Abstract: A video retrieval system is provided, that includes a set of servers, configured to retrieve a video sequence from a database and forward it to a requesting device responsive to a match between an input text and a caption for the video sequence. The servers are further configured to translate the video sequence into the caption by (A) applying a C3D to image frames of the video sequence to obtain therefor (i) intermediate feature representations across L convolutional layers and (ii) top-layer features, (B) producing a first word of the caption for the video sequence by applying the top-layer features to a LSTM, and (C) producing subsequent words of the caption by (i) dynamically performing spatiotemporal attention and layer attention using the representations to form a context vector, and (ii) applying the LSTM to the context vector, a previous word of the caption, and a hidden state of the LSTM.
    Type: Grant
    Filed: October 26, 2017
    Date of Patent: September 3, 2019
    Assignee: NEC Corporation
    Inventors: Renqiang Min, Yunchen Pu
  • Patent number: 10366292
    Abstract: A system is provided for video captioning. The system includes a processor. The processor is configured to apply a three-dimensional Convolutional Neural Network (C3D) to image frames of a video sequence to obtain, for the video sequence, (i) intermediate feature representations across L convolutional layers and (ii) top-layer features. The processor is further configured to produce a first word of an output caption for the video sequence by applying the top-layer features to a Long Short Term Memory (LSTM). The processor is further configured to produce subsequent words of the output caption by (i) dynamically performing spatiotemporal attention and layer attention using the intermediate feature representations to form a context vector, and (ii) applying the LSTM to the context vector, a previous word of the output caption, and a hidden state of the LSTM. The system further includes a display device for displaying the output caption to a user.
    Type: Grant
    Filed: October 26, 2017
    Date of Patent: July 30, 2019
    Assignee: NEC Corporation
    Inventors: Renqiang Min, Yunchen Pu
  • Publication number: 20180121734
    Abstract: A system is provided for video captioning. The system includes a processor. The processor is configured to apply a three-dimensional Convolutional Neural Network (C3D) to image frames of a video sequence to obtain, for the video sequence, (i) intermediate feature representations across L convolutional layers and (ii) top-layer features. The processor is further configured to produce a first word of an output caption for the video sequence by applying the top-layer features to a Long Short Term Memory (LSTM). The processor is further configured to produce subsequent words of the output caption by (i) dynamically performing spatiotemporal attention and layer attention using the intermediate feature representations to form a context vector, and (ii) applying the LSTM to the context vector, a previous word of the output caption, and a hidden state of the LSTM. The system further includes a display device for displaying the output caption to a user.
    Type: Application
    Filed: October 26, 2017
    Publication date: May 3, 2018
    Inventors: Renqiang Min, Yunchen Pu
  • Publication number: 20180121731
    Abstract: A surveillance system is provided that includes an image capture device configured to capture a video sequence of a target area that includes objects and is formed from a set of image frames. The system further includes a processor configured to apply a C3D to the image frames to obtain therefor (i) intermediate feature representations across L convolutional layers and (ii) top-layer features. The processor is further configured to produce a first word of a caption for the sequence by applying the top-layer features to a LSTM. The processor is further configured to produce subsequent words of the caption by (i) dynamically performing spatiotemporal attention and layer attention using the intermediate feature representations to form a context vector, and (ii) applying the LSTM to the context vector, a previous word of the caption, and a hidden state of the LSTM. The system includes a display device for displaying the caption.
    Type: Application
    Filed: October 26, 2017
    Publication date: May 3, 2018
    Inventors: Renqiang Min, Yunchen Pu
  • Publication number: 20180124331
    Abstract: A video retrieval system is provided, that includes a set of servers, configured to retrieve a video sequence from a database and forward it to a requesting device responsive to a match between an input text and a caption for the video sequence. The servers are further configured to translate the video sequence into the caption by (A) applying a C3D to image frames of the video sequence to obtain therefor (i) intermediate feature representations across L convolutional layers and (ii) top-layer features, (B) producing a first word of the caption for the video sequence by applying the top-layer features to a LSTM, and (C) producing subsequent words of the caption by (i) dynamically performing spatiotemporal attention and layer attention using the representations to form a context vector, and (ii) applying the LSTM to the context vector, a previous word of the caption, and a hidden state of the LSTM.
    Type: Application
    Filed: October 26, 2017
    Publication date: May 3, 2018
    Inventors: Renqiang Min, Yunchen Pu