Types of VoIP Protocols used in 2022

VOIP PROTOCOLS SUMMARY

VoIP protocols are required for various components to work smoothly in communications services. Virtually every device in the world uses a standard called Real-Time Protocol (RTP) for transmitting audio and video packets between communicating computers. The Internet Engineering Task Force (IETF) defines RTP in RFC 3550. The payload format for several CODECs is defined in RFC 3551, although the International Telecommunications Union ITU and other IETF RFCs define other payload format specifications.

RTP addresses issues like packet order and provides mechanisms to help address delay and jitter. These mechanisms include the Real-Time Control Protocol, or RTCP, defined in RFC 3550. Also, one of the main areas of concern about Internet communications is the eavesdropping potential. To address security concerns, Secure RTP was created (defined in RFC 3711), and this technology provides for encryption, authentication, and integrity of the audio and video packets transmitted between communicating devices.

But, before audio or video media can be transmitted between two computers, various protocols must be employed to find the remote device and negotiate the communication transmission. The protocols essential to this process are known as “call-signaling protocols,” the most popular of which are H.323 and SIP. Still, many other protocols help users perform various tasks, which need devices to function properly. The following protocols are most common to most devices utilized today.

H.323

In 1995, researchers wanted to solve the problem of how two computers could initiate communication to exchange audio and video media streams. H.323 and SIP (Session Initiation Protocol) were the two resultant solutions to that problem, but H.323 enjoyed the first commercial success. While both protocols allow users to do the same thing – to establish multimedia communication on single or multimedia platforms – both protocols differ in design. H.323 is a binary protocol, and SIP is an ASCII-based protocol.

When shopping for VoIP systems, you might notice that most new technology includes SIP rather than H.323. Ongoing debates over which system is better often leave H.323 behind, but H.323 is superior in several ways. It often brings better interoperability with the PSTN and for video and reliable out-of-band transport of DTMF (the tones heard when pressing a button on a telephone). One advantage SIP has over H.323 is its lack of complexity. SIP resembles the HTTP/SMTP protocols, which makes SIP easier for many individuals to use.

HOW IT WORKS

An H.323 terminal is an endpoint in a LAN that participates in real-time two-way communications with another H.323 terminal, gateway, or multipoint control unit (MCU). H.323 endpoints are grouped into zones, and each zone has one gatekeeper that manages all the endpoints in that zone. Each terminal must support audio communication, but they also can support audio with video, audio with data, or a combination of these capabilities.

H.323 can be referred to as “intelligent endpoint protocol,” which means that all the intelligence required to locate the remote endpoint and to establish media streams between the local and the remote device is an integral part of this protocol. “Device control protocols” are complementary to H.323, and those current protocols are H.248 and MGCP.

BASIC USAGE

Understanding how H.323 is used helps to understand how the gateway works. In VoIP, the gateway usually is a device that offers an IP interface on one side and some legacy telephone interface on the other side. The legacy telephone interface may be complex, such as an interface to a legacy Public Switched Telephone Network (PSTN) switch or a simple interface that allows the user to connect to one or a few traditional telephones. The gateway converts media provided in one type of network to the format required for another.

Originally, gateways were viewed as monolithic devices with call control provided by H.323 (or SIP) and hardware required to control the PSTN interface. In 1998, the idea of splitting the gateway into two logical parts was proposed: one part, which contains the call control logic, is called the Media Gateway Controller (MGC) or call agent (CA), and the other part, which interfaces with the PSTN, is called the media gateway (MG). With this functional split, a new interface existed (going between the MGC and MG), driving the necessity to define MGCP and H.248.

The H.323 gateway can provide an interface between H.323 and a PSTN and between H.320, V.70, H.324, and other speech terminals. H.323 uses CODECs to convert between circuit-switched and packet formats and works with the gatekeeper through RAS protocols to route signals from voice and fax through the network.

MEGACO H.248

H.323, used for Local Area Networks (LANs), cannot scale to larger public networks. Enter Megaco, the result of a joint effort between the Internet Engineering Task Force (IETF) and the ITU-T Study Group 16. The IETF defines Megaco as RFC 3015 and as recommendation H.248.MGCP. Megaco/H.248, or the Media Gateway Control Protocol (MGCP), is also known as H.248 and Megaco.

This general-purpose standard protocol for handling signaling and session management is required during a multimedia conference. It’s also used to the control elements in a physically decomposed multimedia gateway, which separates call control from media conversion.

HOW IT WORKS

MGCP and Megaco/H.248 are complementary to H.323 and SIP. They are referred to as “device control protocols” because they remove the signaling control from the gateway and send it to a media gateway controller (MGC – sometimes called a “call agent” or Softswitch), which dictates the service logic of communications traffic. Megaco/H.248 contains terminations and contexts, two basic components. Terminations represent streams entering or leaving the MG, such as analog telephone lines, RTP, or MP3 streams. Terminations have properties such as the maximum size of a jitter buffer, which can be inspected and modified by the MGC.

Terminations can be placed into contexts when two or more termination streams are mixed and connected. Contexts are created and released by the MG under the command of the MGC when the first termination is added and released by removing the last termination. A termination may contain more than one stream so that a context may carry a multistream context. Audio, video, and data streams may exist in context within several terminations.

All Megaco/H. Two hundred forty-eight messages are in the format of ASN.1 text messages that display demands. These demands are the messages sent from the MGC to the MG, although the command “ServiceChange” can also be sent by the MG. The MG sends the “Notify” command to the MGC to inform the MGC that one of the events the MGC was interested in has occurred.

BASIC USAGE

Megaco/H.248 is similar to MGCP in an architectural and controller-to-gateway relationship, but Megaco/H.248 supports a broader range of networks. This protocol is central to VoIP (Voice over Packet) solutions. It can be integrated easily into products such as Central Office Switches, Gateways (Trunking, Residential, and Access), Network Access Servers, Cable Modems, PBXs, IP Phones, Soft Phones, IADs, Middleboxes, etc. to develop a concurrent voice and data solution.

MGCP

MGCP (Media Gateway Control Protocol) is an internal protocol used within a Voice over IP (VoIP) system and specified in RFC 3435. This simple protocol was developed primarily to address carrier-based IP telephone network demands, and it has become the de facto standard for media gateway control worldwide. MGCP is a complementary protocol for H.323 and SIP, which serve as IP signaling devices within an IP network. A Media Gateway Controller (MGC) uses MGCP to interface with the Media Gateway (MG) and handles all the processing by linking with an IP network.

HOW IT WORKS

The MGCP system is comprised of a Call Agent (CA), one MG that performs the conversion of media signals between circuits and packets, and one signaling gateway (SG) when that SG is connected to the Public Switched Telephone Network (PSTN). MGCP, which utilizes SDP, is widely used between elements of a decomposed multimedia gateway. The gateway has a CA comprising the call control “intelligence” and a media gateway boasting the media functions, for example, conversion from TDM voice to Voice over IP.

Media Gateways feature endpoints for the CA to create and manage media sessions with other multimedia endpoints. Endpoints are sources and data sinks that can be physical or virtual. Hardware installation is needed for creating physical endpoints, while virtual endpoints can be created using available software. Call Agents come with the capability of creating new connections or modifying an existing connection.

Generally, a media gateway is a network element that provides conversion between the data packets carried over the Internet or other packet networks and the voice signals carried by telephone lines. The Call Agent provides instructions to the endpoints to check for any events and to create signals for existing events. The endpoints are designed in such a way as to automatically communicate changes in the service state to the Call Agent. The Call Agent can audit endpoints and the connections on endpoints.

MGCP connections can be point-to-point or multipoint. Point-to-point connections can be created from a connection between two endpoints for transmitting data between these endpoints. Once the Connection is set up between two endpoints, data transfer can take place between the endpoints. In a multipoint connection, the Connection is set up between an endpoint and a multipoint session. In a multipoint connection, connections can be created over various bearer networks.

BASIC USAGE

MGCP is a popular VoIP application because the MGCP Call Agent works as a seemingly complex software switch for a VoIP network; however, its simplicity is understated. It does nothing more than direct the media and signaling gateways that perform all the work.

Every command within MGCP architecture features a transaction ID, and it receives an acknowledgment and a response. These actions are often understood as a subscription architecture, as the CA informs the MG and signaling gateways about the events that are attended and unattended.

MGCP packets usually are found wrapped in UDP port 2427. The MGCP datagrams are formatted with white space, and an MGCP packet can be either a command that begins with a four-letter verb or a three-number response code.

MIME

MIME, or Multipurpose Internet Mail Extensions, is an official Internet standard that defines how messages must be formatted to be exchanged among various email systems. The headers that define MIME messages are defined by RFC 2045, and the extensions that permit non-US-ASCII text data in Internet mail header fields are defined by RFC 207. Finally, the MIME message formats and acknowledgments are defined by RFC 2049.

MIME is a very flexible format that permits virtually any file or document message type, including text, images, audio, video, or other data. MIME uses base64 as an encoding procedure to ensure protection for a non-text message. Ironically, it achieves this encoding by coding non-text messages as text.

HOW IT WORKS

MIME-type comprises a combination of type and subtype, and the charset of a text type reveals the encoding. Internet protocols such as HTTP use the content-type header and MIME-type registry. MIME enables messages to have a tree structure, and it offers many features that are considered essential for modern email usage:

  • Support for character sets other than ASCII is required for sending emails in languages other than English.
  • A content-type labeling system allows multimedia content to be handled intelligently by computer programs.
  • Support for content in email messages that are not text allows email to contain multimedia content, including images, audio, office documents, and more.
  • Support for compound documents allows a single email message to contain multiple parts (multiple images, file attachments, and so on).

BASIC USAGE

The MIME format is very similar to the information exchange between a Web browser and its Web server. This related format is specified as part of the Hypertext Transfer Protocol (HTTP). Virtually all human-written Internet emails and a fairly large proportion of automated emails are transmitted via SMTP in MIME format.

Internet email has come a long way since RFC 822 was published in 1982. Today, all the mainstream email programs are fully compatible with the MIME standard for email, which allows for some advanced features and interoperability. The user-visible features that depend on MIME include styled text, text in non-Roman alphabets, file attachments, and multimedia content.

RVP

Remove Voice Protocol (RVP or RVP/IP) is a proprietary specification developed by MCK Communications for transporting digital telephony sessions over packet- or circuit-based data networks. The protocol is used primarily in MCK’s Extender product family, which extends PBX services over Wide Area Networks (WANs).

HOW IT WORKS

RVP provides connection establishment and configuration facilities between a client (or remote station set) device and a server (or phone switch) device. When a remote caller attempts to connect with the PBX, the MCK Extender initiates a TCP session to the Extender PBX gateway. The initiation occurs from a high TCP to TCP 2698.

RVP/IP uses Transmission Control Protocol (TCP) to transport signaling and control data and User Datagram Protocol (UDP) to transport voice data. TCP and UDP work with IP to ensure that packets reach their intended destinations. The signaling occurs through the TCP session, and the voice is transferred via the UDP session. RVP over IP depends on the network configuration and Quality-of-Service (QoS) level.

The devices communicate as clients and servers, with the MCK Extender products functioning as clients. To begin with, a client who initiates the RVP over an IP session opens the first TCP port, 1024 or higher. The client then sends a request to TCP 2698. Voice and network parameters make up the data packets. The voice parameter consists of a voice path, voice compression algorithm, DTMF encoding, comfort noise generator, echo cancellation, and silence detection. The network parameters comprise packet size and jitter buffer.

After successfully establishing the TCP session, the remote MCK extender starts the UDP stream. The UDP stream starts from port 12288 (0×3000) up to 12544 (0x30FF). The UDP listening port is 2698. RVP over IP reduces network traffic congestion and packet loss by employing a “packetizer” that uses a data packet for holding several voice samples. The CODEC and packet size determine the interval at which voice is transmitted.

BASIC USAGE

RVP Control Protocol was originally developed for point-to-point applications, so most of its functionality is unnecessary when using TCP/IP. During the RVP/IP session, one class of RVP/IP control message is exchanged. RVPCP ADD VOICE (operation code 12) packet takes a single parameter of type, and the server responds with a single packet containing the code RVPCP ADD VOICE ACK (operation code 13). If RVP/IP operates in “dynamic voice” mode, this exchange must be repeated whenever the voice channel needs re-established, i.e., when the Connection is broken.

SAP

Session Announcement Protocol (SAP) is an announcement protocol that session directory clients use to assist in advertising multicast multimedia conferences and other multicast sessions. It also communicates the relevant session setup information to prospective participants.

HOW IT WORKS

An SAP announcer periodically multicasts an announcement packet to a well-known multicast address and port. The announcement is multicast with the same scope as the session it is announcing, ensuring that the recipients of the announcement can also be potential recipients of the session the announcement describes (bandwidth and other such constraints permitting). This is also important for the protocol’s scalability, as it keeps local session announcements local.

An SAP listener learns of the multicast scopes it is within (for example, using the Multicast-Scope Zone Announcement Protocol) and listens to the well-known SAP address and port for those scopes. In this manner, it will eventually learn of all the announced sessions, allowing those sessions to be joined.

BASIC USAGE

It is expected that sessions may be announced by several different mechanisms, not only SAP. For example, a session description may be placed on a web page, sent by email, or conveyed in a session initiation protocol. Application-level security is employed to ease interoperability with these other mechanisms rather than using IPsec authentication headers.

The announcement is multicast with the same scope as the session it is announcing, ensuring that the recipients of the announcement can also be potential recipients of the session the announcement describes (bandwidth and other such constraints permitting). This is also important for the protocol’s scalability, as it keeps local session announcements local.

SDP

SDP is an IETF standard that allows a multimedia device to describe the kinds of media it offers or wishes to accept. As part of this description, the device will indicate the type of media (audio, video, text, etc.), the IP ports used, the protocols used (e.g., T.120), and other information necessary for a device to receive the specified media and understand how to handle that media.

The IETF has published SDP as RFC 4566. There are additional RFCs that document extensions or enhancements to SDP.

HOW IT WORKS

The owner of a conference advertises it over a network by sending multicast messages which contain a description of the session, e.g., the name of the owner, the name of the session, the coding, the timing, etc. SDP does not provide the content of the media form itself but provides a negotiation between two endpoints to allow them to agree on a media type and format.

The recipients of the SDP message then decide to participate in the session. SDP is generally contained in the body part of the Session Initiation Protocol, popularly called SIP.

SDP started as a component of SAP but found other uses in conjunction with RTP, SIP, and as a standalone format, as described above.

There are five terms related to SDP:

  1. Conference: Two or more communicating users who utilize communication devices to meet rather than meet in person.
  2. Session: The flowing data stream between an open multimedia sender and receiver.
  3. Session Announcement: A session announcement is a description conveyed to users who may or may not expect the announcement.
  4. Session Advertisement: same as session announcement.
  5. Session Description: The information included in the session announcement or advertisement.

BASIC USAGE

SDP is ideal for informing business partners, clients, and other large groups of interconnected individuals and groups about upcoming events. But, like with any software, there is a learning and usability curve. The SDP offer/answer model is where most SIP interoperability issues occur, and following the RFC may or may not resolve your issues. The session advertising problems may also be concerned with endpoint types and various control protocols.

Within the context of VoIP architecture, it is seen that MGCP, H248, or SIP may control several different media endpoints. For all combinations of endpoints on a given connection, there is a need to ensure that CODEC negotiation takes place and that the differing uses of SDP are reconciled.

SGCP

Christian Huitema and Mauricio Arango published the Simple Gateway Control Protocol (SGCP) in 1998 as part of developing the “Call Agent Architecture” at Telcordia. In this architecture, a central server called the “Call Agent” or “Softswitch” controls media gateways and receives telephony signaling requests through a ‘signaling gateway.’ SGCP handles the communication between the call agent and the gateways.

HOW IT WORKS

SGCP was designed to be a simple ‘remote control’ standard with the Session Initiation Protocol (SIP), enabling the Call Agent to relay calls between a VoIP network using H.323 or SIP and a traditional telephone network. The SGCP commands are encoded with syntax comparable to the SIP or HTTP headers. They carry a payload describing the voice-over IP media stream. This payload is encoded using the same “session description protocol” (SDP) as SIP.

SGCP was merged with the IPDC proposal sponsored by Level 3 Communications. This led to the definition of the Media Gateway Control Protocol, jointly submitted to the IETF by the authors of SGCP and IPDC in November 1998.

The SGCP assumes a connection model where the basic constructs are endpoints and connections. Connections may be either point-to-point or multipoint. A point-to-point connection is an association between two endpoints to transmit data between these endpoints. Once this association is established for both endpoints, data transfer between these endpoints can take place. A multipoint connection is established by connecting the endpoint to a multipoint session.

BASIC USAGE

The SGCP is designed as an internal protocol within a distributed system that appears to the outside as a single VoIP gateway. This system comprises a call agent that may or may not be distributed over several computer platforms and gateways. SGCP instructs remote control gateways to forward the voice signals received on a circuit toward another gateway.

SGCP commands (Create Connection and Modify Connection) carry an SDP payload, where the VoIP parameters such as supported encoding, RTP options, UDP port, and IP address are defined. In some network configurations, gateways expect to carry the voice packets over an ATM or a frame relay network. SGCP can easily be extended to provide signaling for these gateways.

Through the interface, the call agent can ask the gateway to collect digits dialed by the user. This facility is intended to be used with access gateways to collect the numbers that a user dials; it may also be used with trunking gateways and access gateways to collect the access codes, credit card numbers, and other numbers requested by call control services.

An alternative procedure would ask the gateway to notify the call agent of the dialed digits as soon as they are dialed. However, such a procedure generates a large number of interactions. It is preferable to accumulate the dialed numbers in a buffer and transmit them in a single message. However, the problem with this accumulation approach is that it is hard for the gateway to predict how many numbers it needs to accumulate before transmission.

The solution to this problem is loading the gateway with a digit map corresponding to the dial plan. The call agent provides digit maps to the gateway whenever the call agent instructs the gateway to listen for digits.

SIP

SIP is an application-layer control protocol that allows users to create, modify, and terminate sessions with one or more participants. It can create two-party, multiparty, or multicast sessions that include Internet telephone calls, multimedia distribution, and multimedia conferences.

Session Initiation Protocol (SIP) was published by the IETF in 1996, but the first recognized standard was published later in 1999. SIP was revised and re-published in 2002 as RFC 3261, the currently recognized standard for SIP. These delays in the standards process resulted in delays in the market adoption of the SIP protocol, which is why H.323 is considered the VoIP connectivity standard.

Today, H.323 still commands the bulk of the VoIP deployments in the service provider market for voice transit, especially for transporting voice calls internationally. H.323 is also widely used in room-based video conferencing systems and is the preferred protocol for IP-based video systems. Most recently, SIP has become more popular for instant messaging systems.

HOW IT WORKS

Like HTTP or SMTP, SIP works in the Application Layer of the Open Systems Interconnection (OSI) communications model, the level that ensures communications. SIP can establish multimedia sessions or Internet telephony calls and modify or terminate them. The protocol can also invite participants to unicast or multicast sessions that do not necessarily involve the initiator. Because the SIP supports name mapping and redirection services, it makes it possible for users to initiate and receive communications and services from any location and for networks to identify the users wherever they are.

SIP is a request-response protocol that deals with client requests and server responses. SIP URLs identify participants; requests can be sent through any transport protocol. SIP will determine the end system used for any session, the communication media and its parameters, and the recipient’s response to the call. Once these actions have been executed, SIP establishes the call parameters at the caller and recipient ends and handles the transfer and termination.

Although SIP is as old as H.323 as an initiation protocol, SIP wasn’t designed to address many problems within legacy communication systems. Additionally, since H.323 has been the industry standard, many more people are familiar with this protocol. Although SIP has been marketed as easy to use and debug, the reality is that there is the same amount of complexity involved in this standard as any other standard within VoIP.

SIP does appear to be easier to develop and troubleshoot, but these attributes don’t make the protocol easier to use. Instead, these abilities have resulted in some non-standard SIP variations and several non-standard extensions for these developments.

BASIC USAGE

SIP can run on TCP, UDP, or SCTP, and it supports five facets of establishing and terminating multimedia communications:

  • It determines the end system that will be used for communication;
  • It determines the willingness of the called party to engage in communications;
  • It determines the media and its parameters;
  • It ‘rings’ the establishment of session parameters on both ends;
  • It includes transfer and termination of sessions, modifies session parameters, and invokes services.

SIP provides security services, including denial-of-service prevention, authentication (user-to-user and proxy-to-user), integrity protection, and encryption and privacy services.

SKINNY (SCCP)

The word ” skinny” often refers to a scaled-down device that functions purposefully with fewer features or functions than its “fat” version of that same device. In VoIP, the Skinny Client Control Protocol (SCCP, also known as Skinny) is a ‘lite’ proprietary protocol Cisco uses with its ‘fat’ telephone equipment systems. Skinny reduces the processing load on its hardware.

HOW IT WORKS

In this system, Cisco allows SKINNY clients to communicate with H.323 VoIP systems, as the H.323 processing capabilities are used in an intervening Call Manager device.

The SKINNY client and the Call Manager use a simple messaging set called Skinny Client Control Protocol (SCCP) to communicate over TCP/IP. SKINNY systems use a proxy for the H.225 and H.245 signaling and use RTP/UDP/IP for audio. The skinny client (i.e., an Ethernet Phone) uses TCP/IP to transmit and receive calls and RTP/UDP/IP to/from a Skinny Client or H.323 terminal for audio. Skinny messages are carried above TCP and use port 2000. Skinny gateways are a series of digital gateways that include the DT-24+, the DT-30+, and the WS-X6608-x1 Catalyst voice module.

The end station of a LAN or IP- based PBX must be simple to use, familiar, and relatively cheap. The H.323 recommendations are quite an expensive system. An H.323 proxy can be used to communicate with the Skinny Client using the SCCP. In such a case, the telephone is a skinny client over IP in H.323. A proxy is used for the H.225 and H.245 signaling.

When calling a non-Skinny client, the clients establish a connection through the Call Manager using TCP, and then the two endpoints communicate using UDP. When Skinny phones connect, they use RTP over UDP. In addition to Cisco, some vendors also support SCCP, and Cisco Call Manager 4.0 supports a secure version of SCCP, which uses Transport Layer Security (TLS) to encrypt communications and provide for the confidentiality of voice conversations.

BASIC USAGE

If you already maintain a Cisco system, the changeover might prove seamless. However, this system limits the use of open-source systems and locks you into proprietary software that may be subject to budget-pinching upgrades and licenses. On the other hand, the Cisco Call Manager is an H.323 proxy that communicates with Skinny clients. This may result in much less overhead than with the H.323, especially for a business connected to a company’s Local Area Network (LAN) or Wide Area Network (WAN).

Scroll to Top