Patents by Inventor Yunchen Pu

Yunchen Pu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Surveillance system using adaptive spatiotemporal convolution feature representation with dynamic abstraction for video to language translation

Patent number: 10929681

Abstract: A surveillance system is provided that includes an image capture device configured to capture a video sequence of a target area that includes objects and is formed from a set of image frames. The system further includes a processor configured to apply a C3D to the image frames to obtain therefor (i) intermediate feature representations across L convolutional layers and (ii) top-layer features. The processor is further configured to produce a first word of a caption for the sequence by applying the top-layer features to a LSTM. The processor is further configured to produce subsequent words of the caption by (i) dynamically performing spatiotemporal attention and layer attention using the intermediate feature representations to form a context vector, and (ii) applying the LSTM to the context vector, a previous word of the caption, and a hidden state of the LSTM. The system includes a display device for displaying the caption.

Type: Grant

Filed: October 26, 2017

Date of Patent: February 23, 2021

Inventors: Renqiang Min, Yunchen Pu
Video retrieval system using adaptive spatiotemporal convolution feature representation with dynamic abstraction for video to language translation

Patent number: 10402658

Abstract: A video retrieval system is provided, that includes a set of servers, configured to retrieve a video sequence from a database and forward it to a requesting device responsive to a match between an input text and a caption for the video sequence. The servers are further configured to translate the video sequence into the caption by (A) applying a C3D to image frames of the video sequence to obtain therefor (i) intermediate feature representations across L convolutional layers and (ii) top-layer features, (B) producing a first word of the caption for the video sequence by applying the top-layer features to a LSTM, and (C) producing subsequent words of the caption by (i) dynamically performing spatiotemporal attention and layer attention using the representations to form a context vector, and (ii) applying the LSTM to the context vector, a previous word of the caption, and a hidden state of the LSTM.

Type: Grant

Filed: October 26, 2017

Date of Patent: September 3, 2019

Assignee: NEC Corporation

Inventors: Renqiang Min, Yunchen Pu
Translating video to language using adaptive spatiotemporal convolution feature representation with dynamic abstraction

Patent number: 10366292

Abstract: A system is provided for video captioning. The system includes a processor. The processor is configured to apply a three-dimensional Convolutional Neural Network (C3D) to image frames of a video sequence to obtain, for the video sequence, (i) intermediate feature representations across L convolutional layers and (ii) top-layer features. The processor is further configured to produce a first word of an output caption for the video sequence by applying the top-layer features to a Long Short Term Memory (LSTM). The processor is further configured to produce subsequent words of the output caption by (i) dynamically performing spatiotemporal attention and layer attention using the intermediate feature representations to form a context vector, and (ii) applying the LSTM to the context vector, a previous word of the output caption, and a hidden state of the LSTM. The system further includes a display device for displaying the output caption to a user.

Type: Grant

Filed: October 26, 2017

Date of Patent: July 30, 2019

Assignee: NEC Corporation

Inventors: Renqiang Min, Yunchen Pu
SURVEILLANCE SYSTEM USING ADAPTIVE SPATIOTEMPORAL CONVOLUTION FEATURE REPRESENTATION WITH DYNAMIC ABSTRACTION FOR VIDEO TO LANGUAGE TRANSLATION

Publication number: 20180121731

Abstract: A surveillance system is provided that includes an image capture device configured to capture a video sequence of a target area that includes objects and is formed from a set of image frames. The system further includes a processor configured to apply a C3D to the image frames to obtain therefor (i) intermediate feature representations across L convolutional layers and (ii) top-layer features. The processor is further configured to produce a first word of a caption for the sequence by applying the top-layer features to a LSTM. The processor is further configured to produce subsequent words of the caption by (i) dynamically performing spatiotemporal attention and layer attention using the intermediate feature representations to form a context vector, and (ii) applying the LSTM to the context vector, a previous word of the caption, and a hidden state of the LSTM. The system includes a display device for displaying the caption.

Type: Application

Filed: October 26, 2017

Publication date: May 3, 2018

Inventors: Renqiang Min, Yunchen Pu
VIDEO RETRIEVAL SYSTEM USING ADAPTIVE SPATIOTEMPORAL CONVOLUTION FEATURE REPRESENTATION WITH DYNAMIC ABSTRACTION FOR VIDEO TO LANGUAGE TRANSLATION

Publication number: 20180124331

Abstract: A video retrieval system is provided, that includes a set of servers, configured to retrieve a video sequence from a database and forward it to a requesting device responsive to a match between an input text and a caption for the video sequence. The servers are further configured to translate the video sequence into the caption by (A) applying a C3D to image frames of the video sequence to obtain therefor (i) intermediate feature representations across L convolutional layers and (ii) top-layer features, (B) producing a first word of the caption for the video sequence by applying the top-layer features to a LSTM, and (C) producing subsequent words of the caption by (i) dynamically performing spatiotemporal attention and layer attention using the representations to form a context vector, and (ii) applying the LSTM to the context vector, a previous word of the caption, and a hidden state of the LSTM.

Type: Application

Filed: October 26, 2017

Publication date: May 3, 2018

Inventors: Renqiang Min, Yunchen Pu
TRANSLATING VIDEO TO LANGUAGE USING ADAPTIVE SPATIOTEMPORAL CONVOLUTION FEATURE REPRESENTATION WITH DYNAMIC ABSTRACTION

Publication number: 20180121734

Abstract: A system is provided for video captioning. The system includes a processor. The processor is configured to apply a three-dimensional Convolutional Neural Network (C3D) to image frames of a video sequence to obtain, for the video sequence, (i) intermediate feature representations across L convolutional layers and (ii) top-layer features. The processor is further configured to produce a first word of an output caption for the video sequence by applying the top-layer features to a Long Short Term Memory (LSTM). The processor is further configured to produce subsequent words of the output caption by (i) dynamically performing spatiotemporal attention and layer attention using the intermediate feature representations to form a context vector, and (ii) applying the LSTM to the context vector, a previous word of the output caption, and a hidden state of the LSTM. The system further includes a display device for displaying the output caption to a user.

Type: Application

Filed: October 26, 2017

Publication date: May 3, 2018

Inventors: Renqiang Min, Yunchen Pu

Surveillance system using adaptive spatiotemporal convolution feature representation with dynamic abstraction for video to language translation

Video retrieval system using adaptive spatiotemporal convolution feature representation with dynamic abstraction for video to language translation

Translating video to language using adaptive spatiotemporal convolution feature representation with dynamic abstraction

SURVEILLANCE SYSTEM USING ADAPTIVE SPATIOTEMPORAL CONVOLUTION FEATURE REPRESENTATION WITH DYNAMIC ABSTRACTION FOR VIDEO TO LANGUAGE TRANSLATION

VIDEO RETRIEVAL SYSTEM USING ADAPTIVE SPATIOTEMPORAL CONVOLUTION FEATURE REPRESENTATION WITH DYNAMIC ABSTRACTION FOR VIDEO TO LANGUAGE TRANSLATION

TRANSLATING VIDEO TO LANGUAGE USING ADAPTIVE SPATIOTEMPORAL CONVOLUTION FEATURE REPRESENTATION WITH DYNAMIC ABSTRACTION