SIP is a signaling protocol on the application layer that functions to build, modify, and end a multimedia session involving one or more users. A multimedia session is the exchange of data between users that includes voice, video, or text. SIP does not provide services directly, but provides a foundation that can be used by other application protocols to provide more complete services to users, for example with RTP (Real Time Transport Protocol) for real-time data transfer, with SDP (Session Description Protocol) to describe multimedia sessions, with MEGACO (Media Gateway Control Protocol) for communication with the PSTN (Public Switch Telephone Network). However, the basic functions and operations of SIP do not depend on these protocols. SIP also does not depend on the transport layer protocol used.
The development of multimedia communication with SIP is carried out in several stages:
- User location: determines the location of the user who will communicate.
- User availability: determines the level of desire of the called party to engage in communication.
- User capability: determines the media and parameters related to the media that will be used for communication.
- Session setup: "ringing", establishing a relationship between the calling party and the called party.
- Session management: includes transfer, modification, and termination of sessions.
1. SIP Protocol Structure
The SIP protocol is supported by several protocols, including RSVP to make reservations on the network, RTP and RTCP to transmit media and determine the quality of service, and SDP (Session Description Protocol) to describe media sessions in a communication. By default, SIP uses the UDP protocol but in some cases can also use TCP as the transport protocol.
2. SIP Communication
Communication on SIP is done by sending HTTP-based messages. Each user has an address stated with a SIP-URI (Uniform Resource Identification).
Contoh SIP URI : sip: martin@bandung.comIn addition, the address can also be written in a tel-URL which is then converted into a SIP-URI with the 'user' parameter filled with 'phone'. Example: tel: +62-22-2534119 is equivalent to sip: + [email protected] ; user=phone
The relationship built by SIP in the signaling process is client-serve. Thus there are 2 types of messages, namely request and response.

Table 5.1. SIP Request Message

Table 5.2. SIP Respond Message
3. SIP Components
In relation to IP Telephony, there are two components in the SIP system, namely:
3.1 User agent
User agent is the end system used to communicate. User agent consists of 2 parts, namely:
- User Agent Client (UAC), is an application on the client that is designed to initiate SIP requests.
- User Agent Server (UAS), is a server application that notifies the user when receiving a request and provides a response to the request. The response can be either accepting or rejecting the request.
3.2 Network server
In order for users on the SIP network to initiate a call and be called, the user must first register so that their location can be known. Registration can be done by sending a REGISTRATION message to the SIP server. User locations can vary, so a location server is needed to get the actual user location. On a SIP network, there are 2 types of network servers, namely:
- Proxy server. is a server that receives requests, processes them, and forwards the requests it receives to the next hop server after changing some headers in the request message. The next hop server can be a SIP server or another server that the proxy server does not need to know. Proxy servers can function as clients and servers because proxy servers can provide requests and responses.
- Redirect server, this component is a server that receives request messages and provides a response to the request which contains the address of the next hop server.
4. SIP Application
- Voice over Internet Protocol (VoIP)
- Multimedia conference
- Text messaging
- Event-notification -> voicemail notification, callback notification
- Unified Messaging -> voicemail2email.
5. Advantages of SIP
5.1 General-purpose
SIP can be integrated with other IETF standard protocols to create a SIP-based application.
5.2 Distributed and scalable architecture
- Proxy-server, receives requests from user-agent-clients, authenticates them, processes them, and sends the request to the next hop on behalf of the client.
- Redirect server, receives requests from clients, compares the destination address to be reached, once found, the address is returned to the client.
- Registrar-server, receives REGISTER requests from clients.
- Location-server, stores data obtained from the registrar-server. Location-server is used by proxy/redirect servers to obtain information about the destination address that you want to reach. With distributed functions, the development process on one component will not interfere with other components (scalable).
5.3 Simple
Message delivery is HTTP-based (text-based), not binary-based. This makes SIP easy to implement.
5.4 Mobility
- A user can receive messages/calls addressed to him/her even when moving from one location to another. The proxy server will forward the call to the user's current location.
- The device used can be a PC, either at home or in the office, a wireless phone, an IP phone, or a regular telephone.
Services can be created with Call Processing Language (CPL) and Common Gateway Interface (CGI), including:
- Call waiting, call forwarding, call blocking (basic features)
- Call-forking (making calls to multiple endpoints)
- Instant messaging
- Find me / follow me
6. SIP Based System Architecture

Figure 5.36. SIP Architecture
7. Conclusion
The increase in processing power available in computers has led to the development of multimedia applications in a wide range. These applications affect the existing network infrastructure to deliver video-based and audio-based applications to the recipient. The network is used for a short time, solely to support data transmission. These applications provide more capabilities for two-way videoconferencing, audio broadcasting, whiteboard collaboration, interactive training and IP telephony (VoIP). Multimedia is the use of several different media to combine and convey information in the form of text, audio, graphics, animation, video and interactive. In a distributed multimedia system, a network protocol is needed to regulate it. A protocol is an agreement on how communication is processed between 2 nodes. RTP implements the transport features needed to provide synchronization of multimedia data streams. By considering the use of applications between video and audio components. RTP can be used to mark packets associated with individual video and audio streams. RSVP is a unicast and multicast signaling protocol designed to install and manage ordering information at each router along the data path. This protocol is used by terminals to obtain certain QoS from their networks so that they can be used by VoIP applications. QuickTime is a multimedia framework developed by Apple Inc. that is capable of handling various digital video formats, media clips, sound, text, animation, music, and several types of interactive panoramic images.
8. QUESTIONS
- Mention what protocols are found in multimedia!
- How to overcome jitter when communicating using VoIP?
- What are 3 advantages and disadvantages of VoIP?
- Name some applications of multimedia!
- Explain how the multimedia streaming network works below!
