Definition & Overview:

 The term VoIP represents Voice communication over/via IP-network. There are many advantages of voice communication over an IP network termed PS ( packet switching)  over traditional voice communication over CS( Circuit switching), like fast, easy, efficient resource utilization with enhanced features. Entities who wish to communicate with each other via VoIP mechanism must support TCP/IP packing called IP nodes/Hosts. VoIP also leverages establishing session/call with networks like circuit-switched PSTN or GSM/UMTS users. 

It is a technique of sending voice /media packets over IP-network. This solution of establishing a session between two parties is adopted by next-generation mobile communication like 4G or 5G. VoIP solution is incorporated with an underline protocol stack like SIP, H.323, SDP, TCP/IP suite, etc. VoIP solution is mainly driven with the help of supporting APPLICATION layer protocol, namely SIP (Session Initiation Protocol) or H.323.

Protocol & procedure: 

Predominantly SIP is the most versatile text-based protocol mostly used in enhancing technology. SIP defines request and response mechanisms like INVITE, Register, Option, Publish, etc., and 1xx, 2xx, 3xx, 4xx, 5xx, and 6xx responses, respectively. SIP protocol also defines a routing mechanism with the help of a SIP-proxy server to discover called user details. 

A session is established between two nodes (IP-based) who wish to communicate by routing the request to Proxy Server and querying the database termed the location server. Once two nodes (calling Party and called Party) learn each other IP addresses, voice RTP packets are sent end to end between them. SIP stacks maintain and control the established VoIP session until it is torn down. The session is released or cleared by sending a BYE request to another party. IETF standards of 3261rfc define the SIP protocol standard and are being adopted by many new upcoming technologies. H.323 protocol, on the other hand, is a protocol suite used earlier for VoIP solutions. It is still used within certain old technology and enterprise solutions.

Technical Details: 

Architecture: This section describes VoIP architecture and solutions based on the SIP protocol. VoIP architecture is defined based on a three-tier model. 

• User-plan or Application layer mainly consists of IP nodes, either Calling Users/Nodes or Called Users/Nodes. SIP-based IP nodes are termed UA (User-Agent). SIP-based IP-Node that initiates a session request is termed UAC (User Agent Client), and SIP-based IP-nodes that respond to the request are termed UAS (User Agent Server).

• Middle tier is a Server-plan or Routing plane that helps route SIP-based requests to the destination and fetch relevant data. A SIP-based server like a SIP-proxy server, Promedia Server, or Gatekeeper (for H.323 protocol) is grouped in this tier. Promedia servers establish a session between SIP and H. 323-based UAs.

• Database tier of the VoIP architecture consists of servers that maintain data regarding User agents, session billing and session authentication, and others. Location server, Registrar server, Authentication server, Billing Server, and IVR server are such example servers of this layer. The protocol used between the SIP proxy server and the Database layer server is RADIUS, which is mainly a transaction-based protocol.

The procedure of VoIP Session: 

 SIP session/signaling is done using either TCP or UDP as a transport layer, with TCP as a recommendation to achieve the fast establishment of a session.

• SIP-based IP node, either a calling party or called party, is configured with IP-address and can be located anywhere physically and connected through the internet.

• Called Party must be registered to the VoIP network via preconfigured REGISTRAR Server IP address and Proxy server

• Calling Party needs to be VoIP registered to initiate a call to the called User but must also be configured with a Proxy server

• Called Party registers by sending a REGISTER request to the Proxy Server. The proxy server, in turn, updates the REGISTRAR server with Called Party IP address and domain and MSIDN/sip-URI. SIP-URI is a Unique Resource Identifier in sip format like ‘’. This association created in the REGISTRAR server is uploaded into the Location Server, called AOR – Address of Record. In most cases, the REGISTRAR and LOCATION servers can be colocated into one box.

• Calling Party wants to establish communication with the User, sending an INVITE request to the Proxy or Outbound Proxy server with called party MSISDN or SIP-URI. An outbound Proxy server is configured separately for handling calls/sessions, which is deployment-specific. Here Calling User will act as a UAC- User agent client and the Proxy server as a UAS – User agent server.

• Outbound Proxy server receives INVITE request, response with 100 Trying response to the Calling user/UAC acknowledging Calling User that INVITE request is received and is being processed and not to send multiple INVITE requests. The proxy server fetches called user AOR from the location server and receives called user IP address.

• Proxy server route the INVITE request to the called user IP address received in AOR from the location server.

• Called Party upon receiving an INVITE request, started ring with tone to alarm calling User about the incoming call, acknowledge back with 180 ringing response. In 180 Ringing, called Party adds its IP address, which is traversed back to the calling party.

• When the called Party picks up the calls, it responds with 200 OK.

• Calling Party, upon receiving a 200 OK response, sends another SIP request ACK, which indicates the flow of voice RTP packet.

• Either Party tears down the session by sending a BYE request.

RTP and Voice Packets: 

Voice packets are being transmitted between the calling party and called Party directly end to end in a particular codec format. SDP – session description protocol suite is also deployed along with the SIP stack on each SIP entity which maintains all voice packets stream and their measurement reports. RTP and RTCP packets are sent over UDP as a transport protocol to achieve real-time end-to-end streaming. RTCP provides feedback about RTP packets sent in the reverse direction.

Features & Functionality: 

SIP-based protocol leverage to add call features like call hold, Call Park, call conference, audio and video calls, multi-user support, and many other call functionality. It is easy to deploy and flexible enough to twist to achieve the desired result of Voice over IP.

Future Prospects and Deployment:

VoIP solution is key to the fourth-generation mobile communication supporting multimedia services like audio, video, chat, conference, and announcement services. This technology is commonly known as IMS – IP Multimedia Subsystem, a standard of the 3GPP project. SIP session is created with high voice quality via a dedicated bearer reserved for each session. If the access type is LTE, the VoIP session is termed VoLTE( Voice over LTE), and if the access type is Wifi, the VoIP session is termed VoWIFI.

Scroll to Top