Indexing Electronic Notes

A digital publishing platform enables users to create and organize notes associated with electronic, published documents. Sets of notes, each associated with a document, are uploaded to the publishing platform by notepad applications executing on user devices. Each set of notes has one or more notes, and each note includes a link to a location in the associated document. The publishing platform is configured to index sets of notes based on keywords of the notes, which may be identified based on content of the publication at the location with which notes are associated.

Skip to: Description  ·  Claims  · Patent History  ·  Patent History
Description
CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No. 61/670,994, filed Jul. 12, 2012, which is incorporated by reference in its entirety.

BACKGROUND

1. Field of the Invention

This invention relates to creation and indexing of notes associated with electronic documents.

2. Description of the Related Art

The rapid shift to mobile Internet services is bringing content offerings to an increasingly larger number of connected devices. Experiences previously limited to a single device are now accessible across multiple devices as high volume consumer electronic platforms such as Smart Phones, tablets, eReaders, game systems, and Internet TVs have become new channels to receive digital documents and services. Popular electronic book services leverage standardized publishing formats to seamlessly integrate and synchronize digital document reading experiences across consumer devices.

But while providing excellent user reading experience for this new digital medium remains a focus of the commercially available eReading systems and applications, it has been so far much more difficult to fully integrate other related reading activities, such as note taking. The simple action of writing a note into the border of a document remains a challenging proposition for a variety of reasons in most digital reading systems. In addition, as new digital content services are progressively embedded within the original document, it becomes increasingly difficult to create, edit, aggregate, and organize these additional content layers into a single reading experience. As digital documents are shifting from a static model to a connected one where related, personalized and other social content are being aggregated dynamically within the original document, it becomes strategic for publishing platforms and their distribution systems to be able to properly author and manage these new individual content layers among a plurality of users.

SUMMARY

A digital publishing platform enables the creation, organization, navigation, synchronization, and reduction of personalized notes within HTML5 document publishing. Embodiments of the invention leverage a publishing platform's overall understanding of HTML5 document services and eReading systems for digital content distribution and consumption.

Users create notepad documents associated with published, electronic documents that are stored and distributed by the publishing platform. The published documents are each associated with a table of contents defining the document's structure. Notes generated in association with a document are associated with specific locations within the document, and as a result may be accessed based on the table of contents.

In one embodiment, a notes indexing system identifies keywords of the notes based on the locations in the document with which the notes are associated. When a user query specifying a keyword is received, the notes indexing system selects the notes having the specified keyword. The selected notes are returned to the user in a listing that enables the user to quickly view the notes having the keyword. The query results listing may be viewed as a standalone document or in the same browser tab as an associated published document. Furthermore, the listing may be filtered based on any of a number of attributes of each note for ease of reference by a user.

The features and advantages described in this summary and the following detailed description are not all-inclusive. Many additional features and advantages will be apparent to one of ordinary skill in the art in view of the drawings, specification, and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an example publishing platform, according to one embodiment.

FIG. 2 illustrates a document distribution environment, according to one embodiment.

FIG. 3 is a high-level block diagram of a computer for use as a client device, according to one embodiment.

FIG. 4 is a block diagram illustrating modules within a notepad application, according to one embodiment.

FIG. 5A illustrates an example user interface for reading and annotating a document, according to one embodiment.

FIG. 5B illustrates an example notepad user interface, according to one embodiment.

FIG. 6 is a flowchart illustrating a method for creating notes associated with a document, according to one embodiment.

FIG. 7 is a block diagram illustrating modules within a notes indexing system, according to one embodiment.

FIG. 8 is a flowchart illustrating a method for summarizing a set of notes, according to one embodiment.

The figures depict various embodiments of the present invention for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein.

DETAILED DESCRIPTION Overview

Embodiments of the invention provide a method for creating notes associated with an HTML document and summarizing the created notes. The method is organized around an educational digital publication and reading platform configured to aggregate, manage, and distribute multilayered content. FIG. 1 is a high-level block diagram illustrating the platform environment, organized around four function blocks: content 101, management 102, delivery 103, and experience 104.

Content block 101 automatically gathers and aggregates content from a large number of sources, categories, and partners. Whether the content is curated, perishable, on-line, or personal, these systems define the interfaces and processes to automatically collect various content sources into a formalized staging environment.

Management block 102 comprises five blocks with respective submodules: ingestion 120, publishing 130, distribution 140, back office system 150, and eCommerce system 160. The ingestion module 120, including staging, validation, and normalization subsystems, ingests published documents that may be in a variety of different formats, such as PDF, ePUB2, ePUB3, SVG, XML, or HTML. The ingested document may be a book, such as a textbook, a set of self-published notes, or any other published document, and may be subdivided in any manner. For example, the document may have a plurality of pages organized into chapters, which could be further divided into one or more sub-chapters. Each page may have text, images, tables, graphs, or other items distributed across the page.

After ingestion, the documents are passed to the publishing system 130, which in one embodiment includes transformation, correlation, and metadata subsystems. If the document ingested by the ingestion module 120 is not in a markup language format, the publishing system 130 automatically identifies, extracts, and indexes all the key elements and composition of the document to reconstruct it into a modern, flexible, and interactive HTML5 format. The ingested documents are converted into markup language documents well-suited for distribution across various computing devices. In one embodiment, the publishing system 130 reconstructs published documents so as to accommodate dynamic add-ons, such as user-generated and related content, while maintaining page fidelity to the original document. The transformed content preserves the original page structure including pagination, number of columns and arrangement of paragraphs, placement and appearance of graphics, titles and captions, and fonts used, regardless of the original format of the source content and complexity of the layout of the original document.

The page structure information is assembled into a document-specific table of contents describing locations of chapter headings and sub-chapter headings within the document, as well as locations of content within each heading. During reconstruction, metadata describing a product description, pricing, and terms (e.g., whether the content is for sale, rent, or subscription, or whether it is accessible for a certain time period or geographic region, etc.) are also added to the transformed document.

The document's table of contents indexes the content of the document into a description of the overall structure of the document, including chapter headings and sub-chapter headings. Within each heading, the table of contents identifies the structure of each page. As content is added dynamically to the reconstructed document, the content is indexed and added to the table of contents to maintain a current representation of the document's structure.

After reconstructing a document, the distribution system 140 packages content of the publishing platform 200 for delivery, uploads the content to content distribution networks, and makes the content available to end-users based on the content's digital rights management policies. In one embodiment, the distribution system 140 includes digital content management, content delivery, and data collection analysis subsystems.

The distribution system 140 may also aggregate additional content layers from numerous sources. These layers, including related content, advertising content, social content, and user-generated content, may be added to the document to create a dynamic, multilayered document. For example, related content may comprise material supplementing the core document, such as study guides, self-testing material, solutions manuals, glossaries, or journal articles. Advertising content may be uploaded by advertisers or advertising agencies to the publishing platform, such that advertising content may be displayed with the document. Social content may be uploaded to the publishing platform by the user or by other nodes (e.g., classmates, teachers, authors, etc.) in the user's social graph. Examples of social content include interactions between users related to the document and content shared by members of the user's social graph. User-generated content includes annotations made by a user during an eReading session, such as highlighting or taking notes. In one embodiment, user-generated content may be self-published by a user and made available to other users as a related content layer.

As layers are added to the document, page information and metadata are referenced by all layers to merge the multilayered document into a single reading experience. The publishing system 130 may also add information describing the supplemental layers to the document's table of contents.

The back-office system 150 of management block 102 enables business processes such as human resources tasks, sales and marketing, customer and client interactions, and technical support. The eCommerce system 160 interfaces with back office system 150, publishing 130, and distribution 140 to integrate marketing, selling, servicing, and receiving payment for digital products and services.

Delivery block 103 of an educational digital publication and reading platform distributes content for user consumption by, for example, pushing content to edge servers on a content delivery network. Experience block 104 manages user interaction with the publishing platform by updating content, reporting users' reading activities, and assessing network performance.

In the example illustrated in FIG. 1, the content distribution and protection system is interfaced directly between the distribution sub-system 140 and the eReading application 170, essentially integrating the digital content management (DCM), content delivery network (CDN), delivery modules and eReading data collection interface for capturing and serving all users' content requests. By having content served dynamically and mostly on-demand, the content distribution and protection system effectively authorizes the download of one page of content at a time through time sensitive dedicated URLs which only stay valid for a limited time, for example a few minutes in one embodiment, all under control of the platform service provider.

HTML5 eReading Environment

FIG. 2 illustrates an eReading environment including a publishing platform 200 and a user device 210. One user device 210 is illustrated in FIG. 2, but any number of user devices 210 may communicate with platform 200 to access the content distributed by platform 200. Each device 210 executes a web browser 215 and at least one eReader application 170. In one embodiment, each user is associated with an account on the publishing platform 200, and content purchased by the user is made available through the user account. The user device 210 may also be registered to the account to authorize the device for accessing content. Furthermore, a user may register multiple devices to his account in order to access and interact with layered content synchronously on a plurality of screens. For example, a user may register one or more devices to his account, such as a desktop computer, a laptop, a smart phone, a tablet, an eReader, an Internet television, or any other device including computing functionality and data communication capabilities, and use one or more of these devices simultaneously to interact with a multilayered document.

The content distribution system 140 delivers multilayered content to the eReading browser application 170 executing on the user device 210 through the network 205. The eReading application 170 fetches content from the distribution system 140 in small increments, such as one page at a time. Alternatively, the user device 210 may cache one or more pages of the document to enable faster retrieval of the pages.

Contrary to other existing digital publishing services, the educational digital publication and reading platform of the present invention allows the user to access content without downloading a specific reading application from the publisher. Rather, the eReader application 170, comprising client software compatible with the web browser 215, constructs document pages using structureless HTML5 elements. The eReader application 170 integrates a number of a user's reading activities, including reading the content, navigating between pages, creating highlights, interacting with advertisements, generating social content, and taking notes. This user-generated content is stored and archived into the on-line end user account so that it may be synchronized across all registered devices for a given end user. Thus, the end user's content can be accessed from any of the user's registered devices. It should be noted that eReader applications 170 comprise eReading applications as well as supplemental content applications that function in the browser environment to support the user's eReading activities and overall engagement with the multilayered documents distributed by the platform, such as notepad applications, social applications, and advertising applications.

Communication between the publishing platform 200 and user device 210 is enabled by network 205. In one embodiment, the network 205 uses standard communications technologies and/or protocols. Thus, the network 205 can include links using technologies such as Ethernet, 802.11, worldwide interoperability for microwave access (WiMAX), 3G, Long Term Evolution (LTE), digital subscriber line (DSL), asynchronous transfer mode (ATM), InfiniBand, PCI Express Advanced Switching, etc. Similarly, the networking protocols used on the network 305 can include multiprotocol label switching (MPLS), the transmission control protocol/Internet protocol (TCP/IP), the User Datagram Protocol (UDP), the hypertext transport protocol (HTTP), the simple mail transfer protocol (SMTP), the file transfer protocol (FTP), etc. The data exchanged over the network 305 can be represented using technologies and/or formats including the hypertext markup language (HTML), the extensible markup language (XML), etc. In addition, all or some of links can be encrypted using conventional encryption technologies such as secure sockets layer (SSL), transport layer security (TLS), virtual private networks (VPNs), Internet Protocol security (IPsec), etc. In another embodiment, the entities can use custom and/or dedicated data communications technologies instead of, or in addition to, the ones described above. Depending upon the embodiment, the network 205 can also include links to other networks such as the Internet.

A high-level block diagram of a computer 300, as an example of a user device 210, is illustrated in FIG. 3. Illustrated are at least one processor 302 coupled to a chipset 304. The chipset 304 includes a memory controller hub 320 and an input/output (I/O) controller hub 322. A memory 306 and a graphics adapter 312 are coupled to the memory controller hub 320, and a display device 318 is coupled to the graphics adapter 312. A storage device 308, keyboard 310, pointing device 314, and network adapter 316 are coupled to the I/O controller hub 322. Other embodiments of the computer 300 have different architectures. For example, the memory 306 is directly coupled to the processor 302 in some embodiments.

The storage device 308 is a non-transitory computer-readable storage medium such as a hard drive, compact disk read-only memory (CD-ROM), DVD, or a solid-state memory device. The memory 306 holds instructions and data used by the processor 302. The pointing device 314 is a mouse, track ball, or other type of pointing device, and is used in combination with the keyboard 310 to input data into the computer 300. The graphics adapter 312 displays images and other information on the display device 318. The network adapter 316 couples the computer 300 to a network. Some embodiments of the computer 300 have different and/or other components than those shown in FIG. 3. The types of computer 300 can vary depending upon the embodiment and the desired processing power. Other computing devices may alternatively be used as the user device 210, such as a tablet, a smart phone, an Internet television, or a gaming console.

Notes Editing Platform

As a user reads and interacts with the multilayered document, the eReader application 170 enables the user to create personal notes associated with the document. In one embodiment, each user account and layered content document is associated with an HTML5 notepad for creating and organizing user-generated notes. The notepad, comprising a data object including a plurality of note regions for storing user-generated notes, may be stored and distributed by the publishing platform 200. The user may create, view, and organize notes within the notepad by interacting with a notepad application 405 executing on the user's device 210.

FIG. 4 illustrates the notepad application 405 including a notepad interface generator 410, a reporting module 415, and a note generation module 420. Other embodiments of the notepad application 405 include fewer or more modules. In one embodiment, the notepad application 405 enables a user to read and annotate a document distributed by the publishing platform 200. The notepad application 405 may be configured as a plug-in compatible with the web browser 215, or it may be integrated with the eReader application 170 in a single application.

In another embodiment, an eReading browser application gateway can be used to abstract client-side application components and provide a mechanism for these components to communicate with each other and with their environment without breaking the abstraction principle. The abstraction layer is defined as a sandbox environment, where each of the application components is isolated into individual wrappers. As such, the abstraction layer becomes the only way for components to gain access to the application itself or to other components that are accessible through the abstraction layer. The application gateway is particularly useful when considering the extensibility of the eReading Applications. Specifically, by publishing wrappers API through the application gateway, developers design and add components that seamlessly integrate with eReading browser applications. This assists in developing an environment that facilitates having client web applications that provide the same level of service that a stand-along desktop application provides.

For example, the notepad application 405 can be framed as a secondary application within the eReading browser application 170. As such, the addition and integration of notes into the eReading browser applications 170 are architected around application components specific to notes and their associated wrappers which are registered to the eReading application internal and external gateways. This implementation is particularly useful when considering the maintenance and upgradability of deployed eReading browser applications as each component can be updated separately, and disabled if necessary, while still being able to communicate within a controlled environment. This same capability can be used to activate or deactivate an application's features and reorganize them into different feature sets, which allows, for instance, for the notepad application 405 to be defined as a stand-alone HTML5 editing platform.

The notepad application 405 is fundamentally linked to the particular layered content document that is being accessed by a user at the time of launch. For instance, all the user's notes to be created following the launch of a notepad user interface are referenced from within the structure of the opened layered content document, including specific information about the table of contents from that document. By referencing the table of contents of the open document, the notes can be mapped to page locations within the original document and thus become supplemental material to the existing publication.

An example user interface generated by the eReader application 170 and displayed by the browser 215 is illustrated in FIG. 5A. As a user reads the document, each page is fetched by the eReader application 170, rendered by the browser 215, and displayed within the document page window 510. The navigation pane 515 lists the table of contents of the document or a high-level summary of the table of contents, enabling a user to view an outline of the document structure. In one embodiment, a user may navigate through the document by clicking on links within the navigation pane 515. For example, when the user clicks on a link titled “Chapter 3,” the eReader application 170 will fetch the page having the start of chapter 3 from the publishing platform 200. Alternatively, one or more pages may be cached by the device 210, in which case the eReader application 170 may retrieve a page from the device's cache rather than directly fetching it from the platform 200. Browser 215 may then display the fetched (or retrieved) page within the document page window 510.

Notepad panel 520 is a user interface generated by the notepad interface generator 410 within the browser 215 for enabling a user to create and view notes. The notepad panel 520 may be created by the interface generator 410 during an eReading session by, for example, a user selecting a notepad icon displayed in the browser 215. The selection of the icon launches the notepad application 405, which opens the notepad panel 520 in the same HTML5 browser tab as the already-opened eReading browser application 170. In one embodiment, the notepad panel 520 is rendered alongside the document page 510 in the same browser tab, as illustrated in FIG. 5A. However, displaying both the document page 510 and the notepad panel 520 may limit the amount of display space available for the notepad panel 520. This may result, in part, from a requirement that the HTML5 document page be rendered with page fidelity to the original ingested content page of the document. Thus, to compensate for the differences in visual presentation between these forms of content, the notepad application 405 may have a wide screen mode that gives notepad content access to the entire screen available within a browser tab. That is, the notepad panel 520 may be configured to occupy the navigation pane 515 and page window 510 illustrated in FIG. 5A when, for example, a user clicks an “expand window” button or drags the notepad panel 520 to the left. In another embodiment, a user may effectively separate the notepad panel 520 from the document page window 510 by opening the notepad on a different paired device and closing the notepad panel 520 on the first device.

One embodiment of the notepad panel 520 is illustrated in more detail in FIG. 5B. The header 525 lists the title of the opened layered content document to which all user notes for that particular document will refer. The header information is extracted from the eReading browser application 170 that is processing the table of contents of the layered document. For example, user “Joe” accessing the “Biology101” document from the eReading browser application 170 generates, upon launching of the notepad application 405, a notepad document with the header “Biology101+Joe” that is stored by the publishing platform 200.

The sub-header 530 lists the generic location within the document that is open while a user creates or edits a note. The generic location used by the sub-header 530 is defined by table of contents level information as extracted from the document, such as chapter or subchapter. For example, user “Joe” adding the first note in “Chapter 1” of “Biology101” within the eReading application 170 will create a new sub-heading “Biology101+Chapter1” on the notepad panel 520. As an another example, user “Joe” adding a second note to “Chapter1” of “Biology101” from the eReading browser application will append the second note under the existing sub-heading “Biology101+Chapter1” of the notepad document “Biology101+Joe.”

The notepad panel 520 may include one or more note windows, such as new note windows 535 and user notes 540. The user notes 540 are notes already existing in the notepad beneath the sub-header 530. The new note windows 535 are empty boxes in which a user may create new notes. When selected, the note window is switched from passive (display) to active (editing) mode, which enables the creation or importation of content into it.

Each user's notepad includes at least one note region corresponding to a note window displayed in the notepad panel 520. A note window is defined as a dynamically resizable box within the HTML5 notepad panel 520, displayed within the user interface for either the editing or rendering of content as selected by the user. A note window is either in a passive (display) or active (editing) mode. Each note region associated with each note window is identified by a descriptor and set of metadata describing attributes that are unique to the particular user's activities that led to the creation of that note and as managed by the notepad application 405 overall. For instance, a note region's attributes typically includes information such as the type and nature of its embedded content, source and origin of its embedded content, the imported location designation within the original document, the location referential within the notepad panel 520, the time of creation, and a log to keep track of various edits over time. The attributes may also include information about the user who created the note, providing the user with explicit rights to ownership of the note. During the course of note taking activities, notes are progressively added to the user's notepad document in an expanding list of regions stacked on top of each other within the section of the document to which these notes belong. The aggregation, organization, and management of these regions by the notepad application 405 translates into an HTML5 notes document that is unique to a particular user and layered content document.

The notes generation module 420 receives user inputs to generate notes. The user input indicates a location in the document, and the notes generation module 420 associates the note generated as a result of the user input with the indicated location. The user input may be received at the note window displayed in the notepad panel 520. For example, the user may click on the note window, such as note window 535A, or hover a cursor over the note window. The user may then input content into the note by typing within the note window.

Alternatively, the note generation module 420 may receive a user input at another location but associated with a particular note window. For example, the user may select content of the document to import into a note without specifying a note window in which to copy the text. The selected content may be extracted from the rendered HTML5 page and imported as text-only in the targeted note region. The selected content may alternatively be imported as a graphic object, such as a bitmap, which keeps the original content's formatting including fonts and other layout information. The note generation module 420 may automatically add the selected content to the next available note window beneath the sub-header indicating the location from which the content was imported. When content of the multilayered document is selected and added to the note, it provides a link between the note and the section of the document in which the content is located.

In one embodiment, regardless of whether the content is imported in an HTML5 or bitmap format, the newly added note keeps specific mention of the original licensing rights of the layered content document. For example, content copyright information as defined by the platform service provider and/or owner of the layered content document can be associated with the note, either by explicitly displaying the information within in the note or attaching the information to the note region as metadata. Furthermore, the licensing policies of the original document can be associated with each note. That is, if the multilayered document has a licensing policy restricting, for example, the amount of the document that can be copied and pasted at another location, the same policy is applied to the notes. As a result, the quantity of content that can be copied into a note, or the content that can be copied from the note and pasted at another location, may be limited based on the policy. Similarly, if the multilayered document has a licensing policy restricting the amount of the document that can be printed, the amount of notepad content (that was copied from the document) that can be printed may be similarly restricted.

Another example of a user input to generate a note may be looking up information in an external knowledge database such as an online dictionary or an encyclopedia. The note generation module 420 may the import the resulting lookup definition into an empty note window, such as new note window 535A, as a bitmap that maintains the integrity of the original content and includes the original copyright information of the external database.

The user may also add into a note region a selectable link to a destination web page. The note generation module 420 may then access the destination page to analyze its properties and capture the title, summary, and, optionally, graphical information. The note generation module 420 aggregates this information, formats it into a custom bitmap that fits the dimensions of a note window, attaches the original link to the bitmap, and inserts the combined bitmap and link into the note window.

As another example of importing content into a note, a user may add multimedia content, such as an image or a video or audio file, into a note region. The note generation module 420 analyzes the file to determine its properties, such as file format, extension, length, and copyright information, and, if applicable, captures a thumbnail of an image associated with the file. The resulting information is aggregated into a custom bitmap that fits the dimensions of the note window and includes an audio or video embedded player and its navigation commands such as play, pause, skip forward, and skip backward, for example. This approach allows the multimedia content to be launched from within the note window.

Yet another example of a user input to generate a note may be an email sent to an email account registered to the user. The email account, operated by the publishing platform 200, enables a user to automatically import content directly into his notepad. The publishing platform 200 receives emails sent to this account, including content embedded in the email such as text, images, multimedia files, and links. The platform 200 extracts a note header and/or indexing keywords to match the email to one or more documents the user is authorized to access. The email is then processed by extracting the embedded content, analyzing the content to determine its properties, and formatting it into one or more independent unmapped notes windows, such as new note window 535A. The email may indicate a location within the document with which to associate the emailed note (e.g., Chapter 1, Part 1). If no location is specified, the note may be automatically added to a new note window following the last note in the user's notepad associated with the document specified by the emailed note.

If the user is reading a document while creating notes through email, the distribution system 140 may automatically synchronize the emailed notes with any existing notes in the user's notepad document and map the emailed notes to locations within the open layered content document. Alternatively, if the user creates notes via email while not reading the layered content document, the notes added by email may be synchronized with existing notes and mapped to locations in the associated document at the start of the user's next eReading session.

When the user has completed the note, the user clicks on the close box or outside of the note window, enabling the particular note to be indexed, synchronized, and referenced by the notepad application 405 with the specific page of the layered content document from which the note originated. The note is then added to a note region within the notepad associated with the user and layered content document, with the note regions structured according to the table of contents of the document. The note regions may contain metadata describing the attributes of the note, such as a reference to the applicable header and sub-header information, the time the note was created, type of content of the note, or keywords of the content. If applicable, the note region may also inherit metadata describing digital rights management information of the source content. In one embodiment, a user completing a note causes the notepad interface generator 410 to automatically open a new and empty note window in passive mode just before and after the completed note, such as new windows 535B and 535C displayed before and after user note 540B.

The user's notes created by accessing a particular chapter of a layered content document are grouped into the same chapter section within the notepad panel 520. As the listing of these individual notes gets longer than the actual length of the opened browser tab which hosts the rendered HTML5 document page and the notepad panel 520, a scroll bar may be added to the notepad panel 520 to be able to list all the available notes within a document or document section, without changing the rendered HTML5 page of the document being accessed. Using the scroll bar, the note editing platform quickly recovers, synchronizes, and lists all available notes within the opened document. As a result, the user can quickly access his notes across the entire layered content document, even though only a limited portion of the document is stored on the user's device 210 at any given time.

Each note window, as displayed in the notepad panel 520, is fundamentally tied to a note region in the user's notepad. Because the note regions inherit the structure of the associated document, each region is associated with a particular location in the document. As a result, a user may rearrange the order of the note windows in the notepad panel 520 without losing the coupling between each note and its associated location in the document.

In one embodiment, the partial decoupling between notes and document pages synchronization provides the user with a mechanism to consult existing notes within the entire layered document without necessarily downloading the specific document pages that are referenced by these individual notes. This is particularly important when layered content documents are only partially cached into the local browser environment, which may result in possible latency between the requested page content and the actual rendered page content.

The decoupling between reviewing notes and rendering pages of HTML5 documents may stay in effect until the user selects from within a note an embedded page referenced location within the document. This in turn forces the eReading browser application 170 to synchronize to that particular page by either rendering it directly, if that page is already in the cache, or otherwise fetching the page from the platform content distribution system 140. Similarly, accessing from the eReading browser application 170 a different page than is currently rendered also results in synchronizing the notepad panel 520 with the notes available, if any, for that new page within the layered content document.

In one embodiment, a user may access the notepad application 405 from multiple connected devices. The reporting module 415 manages the redistribution of notepad content to all user devices displaying the notepad panel 520. For example, reporting module 415 uploads and reports user-generated notes from the user's connected devices to the distributions system 140 of the publishing platform 200. The notes are uploaded to the distribution system 140 for saving and distribution of the notes. As a result, a user's notepad may be effectively synchronized across all connected devices.

Creating Notes within an HTML5 Document

A process for creating notes is summarized in the flowchart of FIG. 6. The notepad application 405 accesses 602 a document distributed by the publishing platform 200 that is being rendered by an eReading browser application executing on a user device. The document has a table of contents defining the structure of the document, including a plurality of sections within the document. A notepad stored on the publishing platform 200 may be associated with the layered content document and the user viewing and interacting with the document.

The notepad application 405 then opens a notepad panel 520 in the eReader browser application 170 and displays 604 a note window in the notepad panel 520. The note window may be color coded or may contain a label such as “Click here to add content” so as to be easily identifiable by the user. The note window supports free text entry which is associated to the adjacent layered content document page opened by the eReading browser application 170.

A user input to generate a note is then received 606 by the notepad application 405. The user input may be, for example, typing of a note, importing text from the layered content document, looking up information in an external knowledge database, referencing a link, or sending an email to a dedicated notepad email account. The note may be displayed within a note window while the user is generating the note and after the note is generated. The appearance of the note in the note window depends on the source of the note's content. For example, the note may comprise one or more of unformatted HTML text, a bitmap, a summary of a linked page, a thumbnail of a multimedia application, and an embedded multimedia player.

When the note is completed, the notepad application 405 adds 608 the note to a note region within the notepad document. When a user creates and adds notes in a notepad document associated with a layered content document, the application 405 indexes the notes into the overall notepad document, which is unique to a particular user. As a result of this indexing process, which may be undertaking constantly or periodically, the user can reorganize the listing of the existing notes by changing their respective locations within the notepad panel 520. That is, any existing notes may be reorganized in any order within the notepad panel 520, while maintaining the association with the document structure of the layered content document.

Because each note inherits metadata describing a location in the document with which the note is associated, the note may then be accessed 610 using the table of contents of the document. As a user reads and navigates through the document, the notepad application 405 accesses notes associated with the section of the document currently rendered by the eReader application 170 and displays them to the user.

Notes Indexing

As a user reads a layered content document and creates notes associated with the document, the notes are added to a user's notepad. In one embodiment, notes associated with multiple documents can be aggregated into a single set of notes to enable users to more easily review these notes. Similarly, multiple notes sets associated with the same document but created by different users can be aggregated into a single set. As a result, a user's notepad can become lengthy to read and consult.

A notes indexing system 700 analyzes the notes in a user's notepad to identify keywords of each note. The keywords of each note may be words used in the note (e.g., a word used frequently in particular note may be a keyword), or may be words near the location in a document with which the note is associated. The notes indexing system 700 indexes the notes according to the keywords of the notes. A user may input a query to search a set of notes (e.g., the user's notepad) for notes having a particular keyword. Based on the identified keywords by which the notes are indexed, the notes indexing system 700 identifies notes in the set that have the keyword specified by the query and returns a listing of the identified notes to the user.

FIG. 7 illustrates a block diagram of the notes indexing system 700, including a keyword identification module 705, a query processing module 710, a summarizing module 715, and a filtering module 720. Other embodiments of the notes indexing system 700 include fewer or more modules. In one embodiment, the notes indexing system 700 is configured as a subsystem of the publishing platform 200. In other embodiments, the notes indexing system 700 may be configured to communicate with the publishing platform 200 through a network, such as the network 205. For example, the notes indexing system 700 may execute on the user device 210.

User notes generated by the notepad application 405 executing on a user device 210 are uploaded to the publishing platform 200. The notes uploaded to the platform 200 are associated with metadata describing various attributes of the notes, such as the time the note was created, the location in a document with which the note is associated, licensing rights of the note, and the user who created the note. The publishing platform 200 stores the notes in data structures corresponding to each notepad. For example, one user's notes associated with a given multilayered document may be stored as a single notepad. The publishing platform 200 may additionally or alternatively aggregate notes from multiple users or multiple documents into a single notepad. For example, user “Joe” shares his “Biology 101” notes with user “Anna,” who also has a set of “Biology 101” notes. The publishing platform 200 aggregates the notes from the two users into a single notepad, “Biology 101+Joe+Anna” As another example, user “Joe” may aggregate his notes from the “Biology 101” textbook with his notes from the “Biology 102” textbook.

The keyword identification module 705 of the notes indexing system 700 identifies keywords of the notes uploaded to the publishing platform 200. In one embodiment, the keyword identification module 705 parses the content of each note to determine the words in each note. By employing data mining, a semantic engine, and/or other methods, the keyword identification module 705 identifies keywords based on the words in the note. For example, the keyword identification module 705 may employ a term frequency-inverse document frequency (tf-idf) statistic to determine the important words of each note. The keyword identification module 705 determines the number of times each word appears in a note, which may be normalized to the length of the note (e.g., the total number of words appearing in each note). In one embodiment, the keyword identification module 705 compares term frequencies across a number of notes, such as all notes in a user's notepad, to identify stop words that are not to be considered keywords (e.g., “the”). The keyword identification module 705 removes the stop words from the set of possible keywords, and identifies the most frequently-used words in a note as the note's keywords.

In one embodiment, the keyword identification module 705 determines keywords based on the layered content document with which each note is associated. Because each note is linked to a particular location in a document, the keyword identification module 705 may analyze the content near the associated location to identify relevant keywords of a note. The keywords may be within the associated location (e.g., if a note is associated a paragraph of text, the keyword is in the paragraph), or may be near the associated location. By one method, the keyword identification module 705 accesses a glossary associated with the layered content document. The glossary lists various terms relevant to the content of the document and a location of each of the terms in the document. When identifying keywords of a particular note, the keyword identification module 705 may determine whether any terms in the glossary are located near the location with which the note is associated. For example, a note may be associated with a location on page 5 of “Biology 101.” The keyword identification module 705 accesses the glossary associated with “Biology 101,” determines terms in the glossary that are listed as being on page 5, and identifies the terms as keywords of the note.

The keyword identification module 705 may alternatively access a supplemental glossary (that is, a glossary not originally published with a layered content document) to identify keywords of notes. In one embodiment, the supplemental glossary is generated by the publishing platform 200 based on user reading activities. As users read and interact with a particular layered content document, the publishing platform 200 may maintain a record of users' searches within the document. For example, users reading “Biology 101” may search for the term “anaphase” to find the locations in the textbook discussing the stage of mitosis. As the publishing platform 200 returns the list of the locations to the user that match the user's search, the publishing platform 200 logs the search term in association with the multilayered document. The frequency at which particular terms are searched can be determined from the log.

The publishing platform 200 may add the most frequently searched terms in particular layered content document to a supplemental glossary for the document. The publishing platform 200 may define a fixed number of frequently searched terms to include in the supplemental glossary (e.g., the one hundred terms searched most frequently by readers of the document), or may define the most frequently searched terms as those having number of searches above a given threshold number of searches (e.g., one thousand searches) or percentage of users (e.g., a term has been searched for by twenty percent of the readers of a book).

The publishing platform 200 may alternatively add terms to a supplemental glossary for a layered content document based on the number of times the terms appear in the document. If a particular term occurs frequently in the layered content document, the publishing platform 200 may automatically add the term to the supplemental glossary.

As yet another example, the publishing platform 200 may extract terms from headers or sub-headers within a publication, and add the extracted terms to the supplemental glossary for the publication. Titles of chapters, sections within chapters, or other titles within a publication may be used for identifying keywords. For example, if a section in “Biology 101” is entitled “Mitosis,” the term “mitosis” may be added to a supplemental glossary associated with “Biology 101.”

When identifying keywords for a particular note, the keyword identification module 705 accesses the supplemental glossary generated by the publishing platform 200 and determines terms in the supplemental glossary that are located near the location with which the note is associated. The nearby terms are identified as keywords of the note. For the purpose of identifying keywords, the keyword identification module 705 may define “near” in any appropriate manner. For example, “near” may mean that a note and a word in the glossary or supplemental glossary are associated with the same page, the same paragraph, the same line in a paragraph, or the same chapter or sub-chapter of the multilayered document. The precise definition of “near” may be determined heuristically for different layered content documents, based for example on the length of the document.

The keyword identification module 705 generates an index of the set of notes based on the identified keywords. After identifying keywords of a note, the keyword identification module 705 associates the note with metadata that identifies the keywords. The keyword metadata is persistent when notes are aggregated, and thus the keyword metadata can be searched to identify notes having a particular keyword.

The query processing module 710 receives user search queries for notes having certain keywords in a particular set of notes. Search queries may be received at a search box, a user interface element displayed within an eReading browser application 170 or in a notepad panel 520. The query processing module 710 parses the received query to identify one or more keywords specified by the query. In one embodiment, the query processing module 710 supports Boolean connectors between keywords in a query. For example, if a user inputs the query “mitosis+anaphase,” the query processing module 710 parses the query into the keywords “mitosis” and “anaphase” and the Boolean connector AND.

The query received by the query processing module 710 may include an identification of the set of notes to search, either explicitly identified by a user or inferred by the query processing module 710 based on the set of notes currently accessed by a user. For example, if a user “Joe” inputs a search query into a search box in the notepad application 520, which is currently displaying the user's notes associated with “Biology 101,” the query identification module 710 identifies “Biology 101+Joe” as the set of notes to be searched. In one embodiment, the query processing module 710 may use a grammar that also supports an identification of one or more attributes that describe information about the creation of the note. For example, a user may input a query specifying a keyword and an attribute. The query processing module 710 parses the query to identify the keyword and the attribute.

The summarizing module 715 receives the one or more keywords, Boolean connectors, and/or set of notes specified by a user's query. Based on the index generated by the keyword identification module 705, the summarizing module 715 selects the notes in the identified set that have the one or more keywords specified by the query. The summarizing module 715 generates a query results listing comprising the selected notes for presentation to the user.

When sending the query results listing to the user, the summarizing module 715 may reduce a spatial size of the displayed notes, effectively displaying summaries of the notes rather than the entirety of the notes. In one embodiment, the summarizing module 715 passively reduces the size of the displayed results. For example, the generated listing may display a fixed number of words or lines of the selected notes that a user could expand or contract as desired. In another embodiment, the summarizing module 715 determines a portion of the notes corresponding to the locations in the multilayered documents with which the notes are associated. The determined portion is used as the summary of the note. For example, a note may include a sentence imported from a layered content document. The copied sentence provides a link to the document. When generating a summary of the note to include in the results listing, the summarizing module 715 uses the copied sentence as the summary of the note.

The filtering module 720 filters the notes selected by the summarizing module 715 based on attributes specified by the search query. For example, the notes associated with “Biology 101” can be filtered to select only the notes that a user “Joe” created during a specified time interval, such as January through May of 2012. The time interval may correlate to the length of a course, the length of a section of a course, a single day, or any other period of time over which a user wishes to filter notes. In response, the filtering module 720 extracts the time metadata of the notes of the “Biology 101+Joe” document and selects the notes having a creation time within the specified range. As another example, the user Joe's notes associated with “Biology 101” may be filtered based on the type of content of the notes, such as an embedded video. The filtering module 720 extracts content metadata from the notes of the set “Biology 101+Joe” and selects the notes having embedded videos. If the search query specifies a note attribute in addition to a keyword, the listing generated by the summarizing module 715 is a listing of the notes filtered by the filtering module 720.

The search query results listing created by the summarizing module 715 retains the properties described herein of a notepad associated with a multilayered document. If a results listing includes notes associated with one multilayered document, the results listing may be displayed in the notepad panel 520 adjacent to the associated multilayered document. Alternatively, the results listing may be viewed as a stand-alone document. The notes in the listing, whether viewed in the notepad panel 520 or as a stand-alone document, retain the association that the original notes have to a particular page location. For example, the eReading browser application 170 may fetch the page having the location with which a note is associated when a user clicks on a note in the listing, if the user is authorized to access the associated page.

FIG. 8 is a flowchart illustrating a method for generating a summary representation of notes. In one embodiment, the steps of the method are performed by the notes indexing system 700. In other embodiments, the steps may be performed by other entities, or may include different and/or additional steps.

The notes indexing system 700 accesses 802 a set of electronic notes associated with a publication, such as a multilayered document distributed by the publishing platform 200. Each note in the set of electronic notes is associated with a location in the publication.

The notes indexing system 700 identifies 804 keywords of notes in the set of electronic notes. In one embodiment, a keyword is a word in a note. For example, the keyword may be the most frequently-used term in a note. In another embodiment, the keywords are identified 904 based on content of the publication near the location with which the note is associated. For example, the keywords may be terms in a glossary or a supplemental glossary associated with the publication that are near to the location with which the note is associated.

The notes indexing system 700 receives 806 a user query specifying a keyword. Based on the keywords identified for the notes, the notes indexing system 700 selects 808 the notes that have the specified keyword. A listing of the selected notes is sent 810 for presentation to the user.

Additional Configuration Considerations

The foregoing description of the embodiments of the invention has been presented for the purpose of illustration; it is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Persons skilled in the relevant art can appreciate that many modifications and variations are possible in light of the above disclosure.

Any of the steps, operations, or processes described herein may be performed or implemented with one or more hardware or software modules, alone or in combination with other devices. In one embodiment, a software module is implemented with a computer program product comprising a computer-readable medium containing computer program code, which can be executed by a computer processor for performing any or all of the steps, operations, or processes described.

Embodiments of the invention may also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, and/or it may comprise a general-purpose computing device selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a non-transitory, tangible computer readable storage medium, or any type of media suitable for storing electronic instructions, which may be coupled to a computer system bus. Furthermore, any computing systems referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.

Finally, the language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the inventive subject matter. It is therefore intended that the scope of the invention be limited not by this detailed description, but rather by any claims that issue on an application based hereon. Accordingly, the disclosure of the embodiments of the invention is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.

Claims

1. A method, comprising:

accessing a set of electronic notes including a plurality of notes, wherein each note is associated with a location in a publication;
identifying, for the notes in the set of electronic notes, one or more keywords describing content at the locations in the publication with which the notes are associated;
receiving a user query specifying a keyword;
selecting, based on the identified keywords, notes from the set of electronic notes associated with the specified keyword; and
sending for presentation to a user, a listing of the selected notes.

2. The method of claim 1, wherein the one or more keywords are within the location with which the notes are associated.

3. The method of claim 1, wherein the one or more keywords are near the location with which the notes are associated.

4. The method of claim 1, wherein the publication is associated with a glossary listing locations of a plurality of words in the publication, and wherein identifying keywords describing content of the publication at the locations with which the notes are associated comprises, for a note in the set of electronic notes:

determining a word in the glossary located within the content of the publication at the location with which the note is associated; and
identifying the determined word as a keyword of the note.

5. The method of claim 1, wherein identifying keywords describing content of the publication at the locations with which the notes are associated comprises, for a note in the set of electronic notes:

determining a word occurring frequently in the publication that is located within the content of the publication at the location in the publication with which the note is associated; and
identifying the determined word as a keyword of the note.

6. The method of claim 1, wherein identifying keywords describing content of the publication at the locations with which the notes are associated comprises, for a note in the set of electronic notes:

determining a word occurring frequently in the note; and
identifying the determined word as a keyword of the note.

7. The method of claim 1, further comprising:

for a note in the set of electronic notes, generating a summary of the note;
wherein the listing comprises the summaries generated for the selected notes.

8. The method of claim 7, wherein the note comprises a plurality of words, and wherein generating the summary of the note comprises:

selecting a subset of the plurality of words to include in the summary.

9. The method of claim 7, wherein generating a summary of the note comprises:

identifying attributes of the note describing one or more of a time of creation of the note, a page in the publication with which the note is associated, and a user who created the note;
wherein the summary includes the identified attributes.

10. The method of claim 7, wherein generating a summary of the note comprises:

identifying a portion of the note that is related to the content of the publication at the location with which the note is associated;
wherein the summary comprises the identified portion of the note.

11. The method of claim 10:

wherein the note includes content imported from the publication; and
wherein the imported content is identified as the portion of the note that is related to the content of the publication.

12. The method of claim 1, further comprising:

filtering the notes in the set of electronic notes based on an attribute of each note to generate a filtered set, wherein the attribute comprises information describing a creation of each note;
wherein selecting based on the identified keyword, notes from the set of electronic notes comprises selecting notes from the filtered set having the identified keyword.

13. The method of claim 12, wherein the attribute specifies a time that each note was created.

14. The method of claim 12, wherein the attribute specifies the location in the document with which the note is associated.

15. The method of claim 12, wherein the attribute specifies licensing rights of content of each note.

16. The method of claim 12, wherein the attribute specifies a user who created each note.

17. The method of claim 12, wherein the publication with which each note is associated corresponds to an educational course, and wherein the attribute specifies the educational course with which each note is associated.

18. A method, comprising:

accessing a set of electronic notes including a plurality of notes, wherein each note is associated with a location in a publication, and wherein each of the plurality of notes is indexed according to one or more keywords describing content of the publication at the location with which the note is associated;
receiving a notes search query specifying a keyword;
selecting notes from the set of electronic notes associated with the specified keyword based on the one or more keywords according to which the notes are indexed; and
sending for presentation to a user, a listing of the selected notes.

19. The method of claim 18, wherein the one or more keywords are within the location with which the notes are associated.

20. The method of claim 18, wherein the one or more keywords are near the location with which the notes are associated.

21. The method of claim 18, further comprising:

for a note in the set of electronic notes, generating a summary of the note;
wherein the listing comprises the summaries generated for the selected notes.

22. The method of claim 21, wherein the note comprises a plurality of words, and wherein generating the summary of the note comprises:

selecting a subset of the plurality of words to include in the summary.

23. The method of claim 21, wherein generating a summary of the note comprises:

identifying attributes of the note describing one or more of a time of creation of the note, a page in the publication with which the note is associated, and a user who created the note;
wherein the summary includes the identified attributes.

24. The method of claim 21, wherein generating a summary of the note comprises:

identifying a portion of the note that is related to the content of the publication at the location with which the note is associated;
wherein the summary comprises the identified portion of the note.

25. The method of claim 24:

wherein the note includes content imported from the publication; and
wherein the imported content is identified as the portion of the note that is related to the content of the publication.

26. The method of claim 18, further comprising:

filtering the notes in the set of electronic notes based on an attribute of each note to generate a filtered set, wherein the attribute comprises information describing a creation of each note;
wherein selecting based on the identified keyword, notes from the set of electronic notes comprises selecting notes from the filtered set having the identified keyword.

27. The method of claim 26, wherein the attribute specifies a time that each note was created.

28. The method of claim 26, wherein the attribute specifies the location in the document with which the note is associated.

29. The method of claim 26, wherein the attribute specifies licensing rights of content of each note.

30. The method of claim 26, wherein the attribute specifies a user who created each note.

31. The method of claim 26, wherein the publication with which each note is associated corresponds to an educational course, and wherein the attribute specifies the educational course with which each note is associated.

32. A method, comprising:

accessing a set of electronic notes including a plurality of notes, wherein each note is associated with a location in a publication;
identifying, for the notes in the set of electronic notes, one or more keywords describing content of a publication at the locations with which the notes are associated; and
generating an index for the set of electronic notes based on the identified keywords.

33. The method of claim 32, wherein the one or more keywords are within the location with which the notes are associated.

34. The method of claim 32, wherein the one or more keywords are near the location with which the notes are associated.

35. The method of claim 32, wherein the publication is associated with a glossary listing locations of a plurality of words in the publication, and wherein identifying keywords describing content of the publication at the locations with which the notes are associated comprises, for a note in the set of electronic notes:

determining a word in the glossary located within the content of the publication at the location with which the note is associated; and
identifying the determined word as a keyword of the note.

36. The method of claim 32, wherein identifying keywords describing content of the publication at the locations with which the notes are associated comprises, for a note in the set of electronic notes:

determining a word occurring frequently in the publication that is located within the content of the publication at the location in the publication with which the note is associated; and
identifying the determined word as a keyword of the note.

37. The method of claim 32, wherein identifying keywords describing content of the publication at the locations with which the notes are associated comprises, for a note in the set of electronic notes:

determining a word occurring frequently in the note; and
identifying the determined word as a keyword of the note.
Patent History
Publication number: 20140019438
Type: Application
Filed: Dec 18, 2012
Publication Date: Jan 16, 2014
Inventors: Vincent Le Chevalier (San Jose, CA), Shahaf Shakuf (Rehovot), Gerard Genesse (Redwood City, CA), Roded Konforty (Rehovot), Ohad Eder-Pressman (San Francisco, CA), Charles F. Geiger (San Jose, CA)
Application Number: 13/719,140
Classifications
Current U.S. Class: Post Processing Of Search Results (707/722); Generating An Index (707/741)
International Classification: G06F 17/30 (20060101);