Definition & Overview:

 The term VoIP represents Voice communication over/via IP-network. There are many advantages of having voice communication over an IP network termed as PS ( packet switching)  over traditionally voice communication over CS( Circuit switching) like fast, easy, efficient resource utilization with enhanced features. Entities who wish to communicate with each other via VoIP mechanism must support TCP/IP packing called IP-nodes/Host. VoIP also leverages to establish session/call with to different networks like circuit-switched PSTN or GSM/UMTS user. 

It is a technique of sending voice /media packets over IP-network. This solution of establishing a session between two parties is adopted by next-generation mobile communication like 4G or 5G. VoIP solution is incorporated with an underline protocol stack like SIP, H.323, SDP, TCP/IP suite, etc. VoIP solution is mainly driven with the help of supporting APPLICATION layer protocol, namely SIP (Session Initiation Protocol) or H.323.

Protocol & procedure: 

Predominantly SIP is the most versatile text-based protocol mostly used in enhance technology.  SIP defines request and response mechanisms like INVITE, Register, Option, Publish, etc, and 1xx, 2xx, 3xx, 4xx, 5xx, 6xx responses respectively.  SIP protocol also defines a routing mechanism with the help of a SIP-proxy server to discover called user details. 

A session is established between two nodes (IP-based) who wish to communicate by routing the request to Proxy Server and querying database termed location server. Once two nodes (calling party and called party) learn each other IP-addresses, voice RTP packets are sent end to end between them. SIP stacks maintain and control the established VoIP session until it is tear down. The session is released or cleared by sending a BYE request to another party. SIP protocol standard is defined by IETF standards of 3261rfc and is being adopted by many new upcoming technologies. H.323 protocol, on the other hand, is a protocol suite used earlier for VoIP solutions. It is still used within certain old technology and enterprise solutions.

Technical Details: 

Architecture: This section describes VoIP architecture and solution in detail based on SIP protocol. VoIP architecture is defined based on a three-tier model. 

• User-plan or Application layer mainly consists of IP-nodes who are either Calling User/Nodes or Called User/Nodes. SIP-based IP-nodes are termed UA (User-Agent). SIP-based IP-Node that initiates session request is termed as UAC (User Agent Client) and SIP-based IP-nodes who respond to the request is termed as UAS (User Agent Server).

• Middle tier is a Server-plan or Routing plane which helps in routing SIP based request to the destination and fetching relevant data. A SIP-based server like a SIP-proxy server, Promedia Server, Gatekeeper (for H.323 protocol) is grouped in this tier. Promedia servers are used for establishing a session between SIP and H.323 based UAs.

• Database tier of the VoIP architecture consists of servers that maintain data regarding User agents, session billing and session authentication, and others. Location server, Registrar server, Authentication server, Billing Server, IVR server are such example server of this layer. The protocol which is used between the SIP proxy server and the Database layer server is RADIUS, which is mainly a transaction-based protocol.

The procedure of VoIP Session: 

 SIP session/signaling are done using either TCP or UDP as a transport layer, with TCP as a recommendation to achieve the fast establishment of a session.

• SIP based IP-node which is either a calling party or called party is configured with IP-address and can be located anywhere physically and connected through the internet.

• Called Party must be registered to the VoIP network via preconfigured REGISTRAR Server IP-address and Proxy server

• Calling Party needs to be VoIP registered to initiate a call to the called user, but must also be configured with a Proxy server

• Called Party registers itself by sending a REGISTER request to the Proxy Server. The proxy server, in turn, updates the REGISTRAR server with Called Party IP-address, and/or domain and/or MSIDN/sip-URI. SIP-URI is nothing but a Unique Resource Identifier in sip format like ‘’. This association created in the REGISTRAR server is uploaded into Location Server which is called as AOR – Address of Record. In most cases, both the REGISTRAR server and LOCATION server can be colocated into one box.

• Calling Party wants to establish a communication with called user, sends INVITE request to the Proxy or Out-bound Proxy server with called party MSISDN or SIP-URI. An outbound Proxy server is nothing but a proxy server, configured separately for handling calls/sessions, which is again deployment-specific. Here Calling User will act as a UAC- User agent client and the Proxy server acts as a UAS – User agent server.

• Outbound Proxy server receives INVITE request, response with 100 Trying response to the Calling user/UAC acknowledging Calling user that INVITE request is received and is being processed and not to send multiple INVITE requests.  The proxy server fetches called user AOR from the location server and receives called user IP-address.

• Proxy server route the INVITE request to the called user IP-address received in AOR from the location server.

• Called Party upon receiving an INVITE request, start ring with tone to alarm calling user about the incoming call, acknowledge back with 180 ringing response. In 180 Ringing, called party to add its own IP-address which is traversed back to the calling party.

• When the called party picks the calls, it responds with 200 OK.

• Calling party upon receiving 200 OK response send another SIP request ACK, which indicates the flow of voice RTP packet.

• The session is torn down by either party by sending a BYE request.

RTP and Voice Packets: 

Voice packets are being transmitted between the calling party and called party directly end to end in a particular codec format.  SDP – session description protocol suite is also deployed along with SIP stack on each SIP entities which maintains all voice packets stream and their measurement reports. Both RTP and RTCP packets are sent over UDP as a transport protocol to achieve real-time streaming end to end. RTCP provides feedback about RTP packets sends in the reverse direction.

Features & Functionality: 

SIP-based protocol leverage to add call features like call hold, Call Park, call conference, both audio and video call, multi-user support, and many other call functionality. It is easy to deploy and flexible enough to twist to achieve the desired result of Voice over IP.

Future Prospects and Deployment:

VoIP solution is key to the fourth-generation mobile communication supporting multimedia services like audio, video, chat, conference, announcement services. This technology is commonly known as IMS – IP Multimedia Subsystem, a standard of the 3GPP project. SIP session is created high voice quality via dedicated bearer reserved for each SIP session. If the access type is LTE, then the VoIP session is termed as VoLTE( Voice over LTE) and if the access type is Wifi, the VoIP session is termed as VoWIFI.

Leave a Comment

Scroll to Top