Andrew Sears <asears@mit.edu>
Massachusetts Institute of Technology
USA
One of the key problems with existing multimedia conferencing applications is the lack of common directory services for different applications. These directories can be considered as defining a new "space" in the Internet for conferencing and collaboration. This paper focuses on three functions of directories for conferencing: user location, which assists in finding users; group directories, which maintain information and real-time membership in groups for conferencing; and public switched telephone network (PSTN) directories, which allow users to interconnect with the telephone network. In this paper, we first present several scenarios in which new directories are needed. We then examine the design choices in providing directories to meet the needs presented, including the choice of access protocol and the schema, or layout, of the directory structure. The design we propose uses lightweight directory access protocol (LDAP) [25], although it could be adapted and ported to the Domain Name System (DNS) or Hypertext Transfer Protocol (HTTP). Our proposal allows users to be called using e-mail addresses, LDAP DNs, phone numbers, or other personal identifiers. Our group directory design includes semipermanent groups based on the Usenet structure, and temporary user-created groups similar to Internet Relay Chat (IRC) channels. Our PSTN directories allow users to locate gateways to seamlessly place calls from the Internet to the PSTN.
As new applications are developed in the Internet, new standards are designed to define a "space" for those applications. One example is development of the World Wide Web. The namespace for the WWW was created by the specification for uniform resource locators (URLs), and the HTML specification created an open specification for documents. It is important to note that these standards were effective because the dominant paradigm for the WWW is browsing, and URLs and documents were constructs that promoted browsing. This is not the case with NFS and Prospero, which use the file system rather than the browsing paradigm.
Exciting possibilities now exist in the area of real-time applications on the Internet, particularly audio and video conferencing and collaboration. For many applications on the Internet, the standards providing an open framework came before the applications experienced widespread use, but this was not the case with multimedia conferencing on the Internet. Many conferencing applications experienced widespread use before the standards that allow them to interoperate existed. Although standards such as ITU's H.323 and T.120 series [4-8], and the Internet Engineering Task Force's (IETF's) MMUSIC group of standards [11] provide much of the needed framework, they do not provide standards to fulfill the directory needs of these applications.
Directories for conferencing can be considered complementary to existing work on Call Management Agents (CMA) in the VoIP Forum. The main difference is that CMA provides "smart directories" that allow attributes to be calculated dynamically as they are requested. One example of this might be a user-location directory that provides different referrals depending on the time of day and the caller. This paper focuses on directory structures that may not necessarily use dynamic calculation. Extension mechanisms in LDAPv3 would allow for dynamic calculations to be built in, but this paper does not discuss those details.
This paper presents three types of directory structures. The first directory is for user location, which allows one user to find the present location of another user given a unique identifier and whether or not they are available for conferencing. The second directory structure is for user groups, which provides discussion "rooms" where people can meet and maintain real-time group membership. This is analogous to channels in IRC. The third directory structure is a PSTN directory, which allows calls to be placed transparently between the Internet and the telephone network by providing information on gateways to complete the calls.
The user location part of our proposal can be seen as an extension that is added to the white pages structure [3] once it is put in place. The user group directories part of our proposal seeks to correct the scalability problems in IRC. Finally, the PSTN directory part of our proposal suggested a way of providing more functionality to the "tpc.int" design [16] in DNS. Examples in the following section explain these directories more fully and show why they are needed.
User location directories are needed to make it easier for users to "call" each other on the Internet using a conferencing application. Although placing a call may seem easy, a series of steps are required that operate in the background and are more complex. They include the following:
One thing to note is that not all calls require the completion of all four steps because in some cases, users may already possess partial information. For example, a user who already knows the IP address/port and login status of a friend, can go directly to Step 4, which is what many applications already allow. The first three steps make calling more user-friendly. Most people are not likely to prearrange all their calls, and they may want to receive calls at different locations and use simple names like e-mail addresses to identify users. It is clear that additional standards for a user location directory are needed if these features are to be provided.
A user group directory can be thought of as a "virtual room" where people meet to start a conference. It might contain attributes that define a user group and separate records for group members. It differs from a conference directory, which maintains membership of a group after a conference has already started. In a user group, all members may be talking in the same conference or none may be talking together or anywhere in between. In other words, the membership function of the group is decoupled from the users actually participating in a discussion. User group directories and user location are closely related functions; group member records may have the same attributes as user location records. In addition, both user location directories and user groups can be mechanisms for locating users.
Standards for conferencing directories are being provided both by IETF (SDES packets in RTCP) [22] and ITU (T.124 Conference Roster). IRC [20], when used for text chat, is a conference directory in which all members of a group participate in the same text discussion. IRC experiences scalability problems because the rapidly changing information is globally replicated. VocalTec and others have only used IRC's directory functions to implement user group directories. For example, in VocalTec's Internet Phone, a user might create a group called "sports" with a huge potential membership. Members in that group could then call each other in separate conferences to talk about sports. Such a user group, though vast in scale, might not have a problem of everyone talking at once because members would be joining separate conferences within a subset of the group membership.
Two examples of user group directories are IRC channels and Usenet groups. In IRC channels, any user can create a group, and as a result, many groups are poorly structured and do not scale well. In Usenet, users can navigate through groups by topic and subtopic until they arrive at the one they want. Navigating through groups might involve the following steps:
PSTN directories provide information needed to place calls from the Internet to the telephone network. The steps involved include the following:
One possible directory access protocol is the DNS [Mo86, Mo87]. Changes to the DNS have been suggested that would allow for dynamic updates to the directory [Vi96]. These changes actually are designed to allow users to have write access to their DNS entries, rather than to optimize the directory for dynamic data, as is the case with dynamic LDAP. The goal of these changes is to allow laptop computers to change their Internet Protocol (IP) address as they are relocated. It might be possible to use the DNS as a general-purpose dynamic directory, without changes to the specification. The primary disadvantage of DNS is that it is not based on records with attribute/value pairs. This makes it much more difficult to update single parts of the system. In addition, it limits the extensibility of the system.
Even with all the servers it has, DNS's advantage is reduced by the fact that each would require a software upgrade before it could be used for dynamic data. The main problem with DNS is that it was never designed to serve as a general-purpose directory as was LDAP/X.500, and there are currently no plans to use DNS to store white pages information as there is with LDAP. Plans to modify DNS protocols and develop new servers do not yet include optimizing the protocol and server for dynamic data, as is being done for LDAP.
HTTP [Fi96] is another possibility for an access protocol to conferencing directories. Its main advantage is its ubiquity, which allows easy viewing of directories by anyone with a browser. The main disadvantage of HTTP is that it was designed as a document access protocol rather than a directory access protocol. It does not include features such as type-specific access to attribute information, nor does it allow simple tasks like server-based searches. Although it provides authentication for reads, authentication for writes requires the setup of individual user accounts for most server designs, which makes implementation difficult. Because HTTP is not designed as a directory access protocol and lacks much of the functionality needed, it cannot be considered an appropriate choice for a conferencing directory access protocol, although it could provide read access to the directory.
Another possibility is that no directory access protocol is suited for conferencing directories, and a new directory interface should be designed. This is the approach taken by the MUCS and ULS proposals. The drawback to this approach is the significant task involved in having to design a new directory access protocol, and there is no reason to "reinvent the wheel" if an existing protocol is appropriate.
Another possible access protocol use of RAS signaling in H.245. RAS signaling is a function provided in the H.323 model for registration and client admission through a gatekeeper. This gatekeeper is a centralized mechanism that provides server-side support to conferencing applications. There is still widespread debate as to whether Internet conferencing applications will need all the functions of the gatekeeper, and whether it will be widely deployed. One of the main problems of the gatekeeper is its monolithic design, which makes it very complicated to implement. One problem with using RAS as the access protocol is that it is tied to the gatekeeper, which has an uncertain future.
The second problem with the using RAS signaling as an access protocol is that it essentially requires building an entirely new data model and mechanisms. Until recently, RAS defined little more than a channel on which to send messages. There are already many well-developed data models and access mechanisms that can handle the conferencing application needs effectively. If some functionality is needed that cannot be supplied by existing technologies, RAS may play a role with directory services and call management agents in the future. In the meantime, there are more pressing needs for directories, and another directory mechanism could provide part of the solution. It remains to be seen what role RAS will play, but this paper proposes an alternative or complement to providing directories and call management through RAS.
LDAP is a simplified access protocol to the X.500 directory [X.500, We92] for the Internet. X.500 is a hierarchical directory, and although it has similarities to DNS, it has a different naming convention. One of the disadvantages of X.500 naming is that it does not appear as user-friendly as e-mail addresses or domain names. X.500 records are identified by their distinguished names (DNs), such as "CN=Andrew Sears, OU=EECS, O=MIT, C=US." The last three parts of the name indicate different levels of server hierarchy, as shown in Figure 1, and each represents a server that is responsible for identifying servers and entries below it.
Figure 1. Illustration of the X.500 hierarchy accessing a record
For example, the diagram shows a user in the O=IETF contacting the IETF server requesting the record with the above DN. That server then contacts or redirects to the C=US, then to the O=MIT server, and then to the OU=EECS server, which finally returns the record. This example shows how the system works without replication or caching, but, normally, higher levels of the hierarchy would be widely replicated and the location of frequently accessed servers would be cached to improve performance. LDAP specifies the interface to the X.500 directory and provides a simplified protocol to communicate with servers.
Recently, a standard was proposed to modify LDAP in order to handle dynamic records [29]. It offers the greatest functionality and flexibility of the available access protocols, and LDAP is receiving widespread commercial support. Conferencing directories could be implemented in LDAP without changing the protocol and by using existing servers; however, existing servers have not been optimized to handle dynamic data and may not perform as well as possible. Microsoft has recently released their LDAP server, which has been optimized for dynamic data. Another significant advantage of LDAP is that it is the de facto standard for the directory for the Internet white pages, which could allow for seamless integration of user location and white pages information.
After reviewing the above access protocols in detail, we concluded that LDAP provides the best choice for the primary directory access protocol for conferencing directories. This is not to say that LDAP should be the only protocol used to access these directories. It is possible to provide gateways to other access protocols, and to use multiple access protocols linked to the same back-end database on a machine. For example, both Microsoft and Netscape provide a HTTP server that shares the same database as LDAP and provides records using HTTP as well as LDAP. This type of integration could also be done for parts of the DNS if desired. This would allow the same objects to be named with either DNS or X.500 naming conventions, with the back-end database handling concurrency issues. Our proposal specifies a conferencing directory using LDAP in the X.500 namespace. It could also be adapted for DNS, but with less functionality than LDAP.
After an access protocol has been selected, the next step in the design of a conferencing directory system is the schema or organization of information in the directory. Designing a schema involves consideration of what information will be available, who will maintain the servers for the directories, who will have update rights to the data, how different directories might interconnect, and how records can be replicated and organized for effective searching. Although our organizational design is specific to LDAP/X.500, it could also be adapted to other directories, such as DNS. Figure 1 shows the information that could be stored locally by each organization in our proposed design. Figure 2 shows the information that would be replicated in our design.
User and group information will be stored under a given organization in the X.500 tree, which is the directory structure for LDAP. The records under the organization providing the conferencing directory can be logically divided into URs, group records, and branches with replicated information. The replicated section of the directory will include information for redirecting UIDs, replicated groups, and PSTN gateways. The primary copy of all groups will be stored under an organization responsible for maintaining each particular group, but some of their information will be replicated. Since X.500 allows for multiple naming of records, we also propose that a new branch of the X.500 directory be developed under Internet information specified in [15] to provide a secondary naming structure for these global groups and (as explained in Sections 5.2 and 5.3) to aid navigation. This directory is fully explained in the following sections.
Figure 2. Information stored locally in our proposed organizational
schema
Figure 3. Replicated information in our proposed organizational
schema
In our design, user location records would be stored as records under each user's white pages entry. We assume that the white pages work will define some type of extensible record type under the white pages, which we call the user record (UR). The DN for the user location record will use the same DN as the user's white pages record, with the addition of "UR=user location." For example, the DN for John Doe's user locator record at MIT would be "UR=user location, CN=John Doe, O=MIT, C=US." We realize that some organizations might restrict access to this information. In this case, user locator information could be provided through a third party (such as Four11) and the DN might be "UR=user location, CN=John Doe 721, C=US." Currently, all user location is done through an application provider or third party, but it is likely that many organizations may choose to provide their own user location servers along with their white pages servers. The advantage of having the user location provided internally as opposed to having it provided by a software provider or third party is an issue that will have to be played out in the market. Our design is adaptable enough to accommodate different scenarios.
One of the features of our design is that it provides aliases to translate e-mail addresses, E.164 phone numbers, and other unique identifiers into a name that provides information on the location of their conferencing directory entries. We expect that commercial services such as Four11, which already provides several conferencing mechanisms, will also provide this mechanism. Our design leaves the maintenance of this service in the hands of individual users and the commercial directory provider.
Figure 2 shows the two records that constitute a group directory. Group information records store a group's description, access rights, location, and other information. Group member records, which are very dynamic, provide the real-time membership in a group using individual records for each user in the group and include information such as the user's location (IP and port), a nickname, the user's comments, and DN of the user's white pages entry. Group information records may be widely replicated, but group member records can only be replicated among a small group of servers, if at all, to maintain scalability. In our design, we assume that only the server storing the primary copy of the group information record will store group member records, unless another copy is desired for redundancy.
In considering group information records, our design makes a distinction between persistent and dynamic group information. We define persistent groups as those that users cannot add or delete without supervisory approval, so they do not change often. Dynamic groups are those that are created and deleted by users and change with relatively high frequency, similar to IRC channels. The group information of persistent groups can be widely replicated, and, like Usenet, there are "global" groups. The group information of dynamic groups generally is not replicated, although it may be replicated within an organization or within small groups of organizations. The primary reason for this structure is to improve scalability
Our design of persistent groups is based largely on the Usenet model and presupposes two ways of browsing through group hierarchies: (1) by organization and (2) by topic. This corresponds to the two main branches "C=, O=" and "O=Internet, OU=Groups." A group identified by organization might have the DN "C=US, O=Microsoft, Group=My user group," and a group identified by topic might have the DN "O=Internet, Group=Sports, SG=Basketball." Browsing by topic allows users to see all subtopics in a group regardless of where the group is actually stored. The basketball group may actually be stored on a Netscape server and have as a second name "C=US, O=Netscape, Group=Basketball." All groups, whether persistent or dynamic, may be browsed by organization, but only persistent groups may be browsed by topic. This is because browsing by topic requires centralized information that translates the topic to the location of the server maintaining that group. This will only scale well if groups do not change often so that their information can be widely replicated.
We propose that existing Usenet groups and categories form the basis of the first conferencing groups although future groups may not be limited to the Usenet structure. For this to happen, the topic area moderators under Usenet would need to be given passwords to modify their group information under the "OU=Groups, O=Internet" branch of the X.500 directory. We propose that the administrative control of this branch initially be given to the Internet Telephony Consortium at MIT until the initial structure is set up and a more permanent administrator can be found. For example, the moderator of the "alt.sports" newsgroup would be given the password with access rights to modify and create new groups under the DN "Subgroup=sports, Group=alt, OU=Groups, and O=Internet."
While this answers the question of who will moderate a group, it does not necessarily solve the problem of who will provide the server to store the group. Providing the primary copy of a Usenet group involves relatively low overhead because the postings are only replicated at intervals, and can be "fanned" out using replication hierarchies. Providing the primary copy of a conferencing group involves maintaining real-time group member records for that group, which has a higher overhead than maintaining a newsgroup. As explained in the appendix on scalability, the current cost of providing user location might range from $.10 to $.40 per user, but this is likely to decrease rapidly. However, this high a cost is likely to deter institutions like universities from maintaining public groups. This leaves maintenance of servers belonging to public groups to application providers or a third party.
It is possible that application providers will try to have their own closed groups, i.e., groups that are not available to other applications. This could be done in three ways. The first would be to have some type of access restriction on their servers through LDAP, that only their software could resolve. The second would be not to provide any persistent groups, making it difficult to replicate them. The third would be for the application providers not to cooperate with the moderators of Usenet groups or for the providers to abuse their own power if they serve as moderators. We expect that many application providers maintaining group servers may want to moderate the groups they maintain, which should be their right. In order to do this in an open manner, the providers would need to encourage browsing by topic rather than by organization. Otherwise, Usenet conferencing groups would "balkanize" into Microsoft groups, Netscape groups, and other application provider groups.
Another directory function that is needed by conferencing applications is one for locating gateways to interconnect with the PSTN. Given a phone number and the gateway service needed (fax, paging, voice, etc.), this directory would provide information about gateways that can complete the call. A basic form of this type of directory, called "tpc.int," has been implemented in the DNS, which will return the domain name of a gateway when given a phone number. The problem with tpc.int [16] is that it does not provide any information about that gateway (such as usage costs) and it does not distinguish gateways by the type of service they provide. Tpc.int is mainly being used as a directory for fax gateways, so users desiring voice gateways to the PSTN have no way of distinguishing between the two. If tpc.int provided cost information, this would be helpful when users placed calls from the Internet to the PSTN transparent to the user.
We propose that the branch "OU=PSTN Gateways, O=Internet" be used to store this information and that MIT's Internet Telephony Consortium be given initial authority to administer this branch. We propose that gateway providers make information available through their own LDAP server, making it open for replication. ITC would maintain pointers to the replicas, which could be replicated and stored either in the LDAP servers of the application providers or by commercial directory providers like Four11.
We have proposed alpha versions of attributes for each of the records we proposed, including user location records, group information records, group member records, UID redirection records, and PSTN gateway records. We encourage further feedback and modification from interested parties. The specifications for these records are located at http://itel.mit.edu/icons/attributes.html. We also have provided trial results for our model, showing the scalability of our design, at http://itel.mit.edu/icons/scale.html. Finally, a comparison of this proposal with other proposals can be found at http://itel.mit.edu/icons/compare.html.
The lack of a directory to locate gateways to the PSTN is only one example of the problems still to be solved in the area of locating services or objects. Services that might be used with telephony include voice mail, speech recognition, and text to speech. The open distributed processing model uses the concept of a "trader" [9], which provides mechanisms for announcing and discovering objects. One possibility for future work in this area would be to investigate using an open "trading" architecture to locate gateways and other services.
A number of specifications are needed to exploit the full benefit of conferencing directories. A partial list of specifications for record types and organization is as follows:
This paper presented a design and implementation of group and user location for conferencing in LDAP. We found that LDAP provides all the functionality needed for conferencing directories. Our design can be used with the directory systems of existing conferencing application providers so that backward compatibility is maintained. Our design achieves scalability by relying on fine-grained partitioning rather than replication and caching to distribute load. However, there is still a considerable amount of work to be done in order to provide conferencing directories compatible with existing standards.
Many thanks are due to Mark Handley, Karen Sollins, Henning Schultzrinne, Frans Kaashoek, William Lie from Microsoft, Tony Genovese from Microsoft, and Rich Pizarro from Netscape, who offered comments on early ideas. Special thanks go to David Clark, who has provided continual feedback throughout the process of this work.