Patents by Inventor Jian David Wang

Jian David Wang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Estimation Of Human Locations In Two-Dimensional Coordinates Using Machine Learning

Publication number: 20240185449

Abstract: A video conference call system is provided with a camera to generate an input frame image of a conference room, where the video conference call system detects a human head for each meeting participant captured in the input frame image by applying a machine learning human head detector model to said input image frame, generates a head bounding box which surrounds each detected human head and identifies a corresponding meeting participant, extracts a pixel width measure and pixel height measure from each head bounding box, and applies the extracted pixel width measure and pixel height measure to one or more reverse lookup tables to extract meeting room coordinates for each meeting participant identified by a corresponding head bounding box.

Type: Application

Filed: October 22, 2022

Publication date: June 6, 2024

Inventors: Rajen Bhatt, Jian David Wang
Matching active speaker pose between two cameras

Patent number: 11985417

Abstract: Described are multiple cameras in a conference room, each pointed in a different direction. A primary camera includes a microphone array to perform sound source localization (SSL). The SSL is used in combination with a video image to identify the speaker from among multiple individuals that appear in the video image. Pose information of the speaker is developed. Pose information of each individual identified in each other camera is developed. The speaker pose information is compared to the pose information of the individuals from the other cameras. The best match for each other camera is selected as the speaker in that camera. The speaker views of each camera are compared to determine the speaker view with the most frontal view of the speaker. That camera is selected to provide the video for provision to the far end.

Type: Grant

Filed: June 16, 2022

Date of Patent: May 14, 2024

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Jian David Wang, Xiangdong Wang, Varun Ajay Kulkarni
Intelligent multi-camera switching with machine learning

Patent number: 11606510

Abstract: Multiple cameras in a conference room, each pointed in a different direction and including a microphone array to perform sound source localization (SSL). The SSL is used in combination with the video image to identify the speaker from among multiple individuals that appear in the video image. Neural network or machine learning processing is performed on the identified speaker to determine the quality of the front or facial view of the speaker. The best view of the speaker's face from the various cameras is selected to be provided to the far end. If no view is satisfactory, a default view is selected and that is provided to the far end. The use of the SSL allows selection of the proper individual from a group of individuals in the conference room, so that only the speaker's head is analyzed for the best facial view and then framed for transmission.

Type: Grant

Filed: June 9, 2021

Date of Patent: March 14, 2023

Assignee: PLANTRONICS, INC.

Inventors: Jian David Wang, John Paul Spearman, Varun Ajay Kulkarni, Yong Yan, Xiangdong Wang, Peter L. Chu, David A. Bryan
System and Method for Attention Detection and Visualization

Publication number: 20230060798

Abstract: The attention level of participants is measured and then the resulting value is provided on a display of the participants. The participants are presented in a gallery view layout. The frame of each participant is colored to indicate the attention level. The entire window is tinted in colors representing the attention level. The blurriness of the participant indicates attention level. The saturation the participant indicates attention level. The window sizes vary based on attention level. Color bars are added to provide indications of percentages of attention level over differing time periods. Neural networks are used to find the faces of the participants and then develop facial keypoint values which are used to determine gaze direction, which in turn is used to develop an attention score. The attention score is then used to determine the settings of the layout.

Type: Application

Filed: July 22, 2022

Publication date: March 2, 2023

Inventors: Jian David Wang, Rajen Bhatt, Kui Zhang, Thomas Joseph Puorro, David A. Bryan
Intelligent Multi-Camera Switching with Machine Learning

Publication number: 20220408029

Abstract: Multiple cameras in a conference room, each pointed in a different direction. At least a primary camera includes a microphone array to perform sound source localization (SSL). The SSL is used in combination with a video image to identify the speaker from among multiple individuals that appear in the video image. Neural network or machine learning processing is performed on the primary camera video of the identified speaker to determine the facial pose of speaker. The locations of the other cameras with respect to the primary camera have been determined. Using those locations and the facial pose, the camera with the best frontal view of the speaker is determined. That camera is set as the designated camera to provide video for transmission to the far end.

Type: Application

Filed: June 14, 2022

Publication date: December 22, 2022

Inventors: Jian David Wang, John Paul Spearman
Matching Active Speaker Pose Between Two Cameras

Publication number: 20220408015

Abstract: Described are multiple cameras in a conference room, each pointed in a different direction. A primary camera includes a microphone array to perform sound source localization (SSL). The SSL is used in combination with a video image to identify the speaker from among multiple individuals that appear in the video image. Pose information of the speaker is developed. Pose information of each individual identified in each other camera is developed. The speaker pose information is compared to the pose information of the individuals from the other cameras. The best match for each other camera is selected as the speaker in that camera. The speaker views of each camera are compared to determine the speaker view with the most frontal view of the speaker. That camera is selected to provide the video for provision to the far end.

Type: Application

Filed: June 16, 2022

Publication date: December 22, 2022

Inventors: Jian David Wang, Xiangdong Wang, Varun Ajay Kulkarni
INTELLIGENT MULTI-CAMERA SWITCHING WITH MACHINE LEARNING

Publication number: 20220400216

Abstract: Multiple cameras in a conference room, each pointed in a different direction and including a microphone array to perform sound source localization (SSL). The SSL is used in combination with the video image to identify the speaker from among multiple individuals that appear in the video image. Neural network or machine learning processing is performed on the identified speaker to determine the quality of the front or facial view of the speaker. The best view of the speaker's face from the various cameras is selected to be provided to the far end. If no view is satisfactory, a default view is selected and that is provided to the far end. The use of the SSL allows selection of the proper individual from a group of individuals in the conference room, so that only the speaker's head is analyzed for the best facial view and then framed for transmission.

Type: Application

Filed: June 9, 2021

Publication date: December 15, 2022

Inventors: Jian David WANG, John Paul SPEARMAN, Varun Ajay KULKARNI, Yong YAN, Xiangdong WANG, Peter L. CHU, David A. BRYAN
Representation and compression of gallery view for video conferencing

Patent number: 11516433

Abstract: The development of a region of interest (ROI) video frame that includes only ROIs of interest and not other elements and providing the ROI video frames in a single video stream simplifies the development of gallery view continuous presence displays. ROI position and size information metadata can be provided or subpicture concepts of the particular codec can be used to separate the ROIs in the ROI video frame. Metadata can provide perspective/distortion correction values, speaker status and any other information desired about the participant or other ROI, such as name. Only a single encoder and a single decoder is needed, simplifying both transmitting and receiving endpoints. Only a single video stream is needed, reducing bandwidth requirements. As each participant can be individually isolated, the participants can be provided in similar sizes and laid out as desired in a continuous presence display that is pleasing to view.

Type: Grant

Filed: August 27, 2021

Date of Patent: November 29, 2022

Assignee: PLANTRONICS, INC.

Inventors: Yong Yan, Stephen C. Botzko, Jian David Wang
Long-term reference for error recovery in video conferencing system

Patent number: 11265583

Abstract: Utilizing two LTR frames for improved error recovery. By using two LTR frames, much better performance is achieved in terms of error recovery as the likelihood of the decoder having one of the two LTR frames is very high. When the decoder determines a frame is lost, the decoder provides a fast update request (FUR). The FUR includes a listing of the LTR frames present at the decoder. With this indication of the LTR frames present at the decoder, the encoder utilizes one of the LTR frames, preferably the most recent, to use as a reference to send the next frame as a P frame. The P frame is sent with an indication of the LTR frame used as reference. The use of two LTR frames and the feedback of LTR frames present at the decoder allows the minimization of the use of I frames for error recovery.

Type: Grant

Filed: January 6, 2020

Date of Patent: March 1, 2022

Assignee: PLANTRONICS INC.

Inventors: Jian David Wang, John Paul Spearman, Stephen C. Botzko
LONG-TERM REFERENCE FOR ERROR RECOVERY IN VIDEO CONFERENCING SYSTEM

Publication number: 20210211742

Abstract: Utilizing two LTR frames for improved error recovery. By using two LTR frames, much better performance is achieved in terms of error recovery as the likelihood of the decoder having one of the two LTR frames is very high. When the decoder determines a frame is lost, the decoder provides a fast update request (FUR). The FUR includes a listing of the LTR frames present at the decoder. With this indication of the LTR frames present at the decoder, the encoder utilizes one of the LTR frames, preferably the most recent, to use as a reference to send the next frame as a P frame. The P frame is sent with an indication of the LTR frame used as reference. The use of two LTR frames and the feedback of LTR frames present at the decoder allows the minimization of the use of I frames for error recovery.

Type: Application

Filed: January 6, 2020

Publication date: July 8, 2021

Inventors: Jian David Wang, John Paul Spearman, Stephen C. Botzko