Friday, May 15, 2009

Python Notes 14: Advanced Network Operations

We have explored the usual issues in network programming, both on client side and server side. In this post we will discuss some advanced topics in network programming.

Half-Open Sockets
Normally, sockets are bidirectional—data can be sent across them in both directions. Sometimes, you may want to make a socket be unidirectional so data can only be sent in one direction. A socket that's unidirectional is said to be a half-open socket. A socket is made half-open by calling shutdown(), and that procedure is irreversible for that socket. Half-open sockets are useful when

  • You want to ensure that all data written has been transmitted. When shutdown() is called to close the output channel of a socket, it will not return until all buffered data has been successfully transmitted.
  • You want to have a way to catch potential programming errors that may cause the program to write to a socket that shouldn't be written to, or read from a socket that shouldn't be read from.
  • Your program uses fork() or multiple threads, and you want to prevent other processes or threads from doing certain operations, or you want to force a socket to be closed immediately.

The socket. shutdown() call is used to accomplish all of these tasks.

The call to shutdown() requires a single argument that indicates how you want to shut down the socket. Its possible values are as follows:

  • 0 to prevent future reads
  • 1 to prevent future writes
  • 2 to prevent future reads and writes

Once shut down in a given direction, the socket can never be reopened in that direction. Calls to shutdown() are cumulative; calling shutdown(0) followed by shutdown(1) will achieve the same effect as calling shutdown(2).

Timeouts

TCP connections can be held open indefinitely, even if there's no traffic flowing across them. Timeouts are useful
for discovering error conditions or communication problems in some instances.

To enable timeout detection on a Python socket, you call settimeout() on the socket, passing it the number of seconds until a timeout is reached. Later, when you make a socket call and nothing has happened for that amount of time, a socket.timeout exception is raised.

Transmitting Strings
One common problem that arises when sending data across the network is that of transmitting variable-length strings. When you read information from a TCP stream you don't know when the sender has finished giving you a piece of data unless you build some sort of indication into your protocol. There are two common approaches to solving this problem:

  • End-of-string identifier
    • Terminate the string with ‘\n’ or NULL
    • Problem: Terminator might occur in the data if we transmit binary data.
    • Solutions:
      • Escape the identifier.
      • Encode data in base64
      • use different if found in data and send the new identifier before the data.
  • Leading fixed-length size indicator
    • Send a constant number of bytes containing the size of the string.
    • The “size” itself could be sent as characters or as binary data, characters are simpler, however you have to pad them to get a constant length.

Using Broadcast Data

When you broadcast a UDP packet, it's sent to all machines
connected to your LAN. The underlying transport, such as Ethernet, will have a special mode that lets you do this without having to repeat the packet for each computer.
On the receiver's side, when a broadcast packet is received, the kernel looks at the destination port number. If it has a process listening to that port, the packet is sent to that process. Otherwise, it's silently discarded. Therefore, simply sending out a broadcast packet will not harm or impact machines that don't have a server listening for it.
Broadcast packets are often used for the following types of activities:

  • Automatic service discovery: For instance, a computer might send out a broadcast packet looking for all print servers of a particular type.
  • Automatic service announcements: A server providing a service for a LAN might periodically broadcast the availability of that service. Clients would listen for those broadcasts.
  • Searching for LAN computers that implement a specific protocol. For instance, a chat program might send out a broadcast packet looking for other people on the LAN with the same chat program. It might then compile a list and present it to the user.

To be able to broadcast data, you need to set the socket option on client and server as follows:

s.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1)

On the sender, instead of sending to a particular IP, send to ‘<broadcast>’

s.sendto(‘<broadcast>’,123)

In this post we dealt with a few advanced issues in network programming.

Python Notes 13: Network servers

For a client, the process of establishing a TCP connection is a two-step process that includes the creation of the socket object and a call to connect() to establish a connection to the server. For a server, the process requires the following four steps:

  1. Create the socket object.
  2. Set the socket options (optional).
  3. Bind to a port (and, optionally, a specific network card).
  4. Listen for connections.

Example of these steps:

host = '' # Bind to all interfaces
port = 51423
# Step 1 (Create the socket object)
s = socket. socket(socket.AF_INET, socket.SOCK_STREAM)
# Step 2 (Set the socket options)
s.setsockopt(socket.SOL_SOCKET, socket.SOREUSEADDR, l)
# Step 3 (Bind to a port and interface)
s.bind((host, port))
# Step 4 (Listen for connections)
s.listen(5)

Setting and Getting Socket Options
There are many different options that can be set for a socket. For general-purpose servers, the socket option of greatest interest is called SOREUSEADDR. Normally, after a server process terminates, the operating system reserves its port for a few minutes, thereby preventing any other process (even another instance of your server itself) from opening it until the timeout expires. If you set the SOREUSEADDR flag to true, the operating system releases the server port as soon as the server socket is closed or the server process terminates.This is done through:

s.setsockopt(socket.SOL_SOCKET, socket.SOJEUSEADDR, l)

Binding the Socket

The next step is to claim a port number for the server. This process is called binding. To bind to a port, you call:
s.bind((‘’, 111))

The first argument to bind() specifies the IP address to bind to it. It's generally left blank, which means "bind to all interfaces and addresses."

Listening for Connections
The last step before actually accepting client connections is to call listen(). This call tells the operating system to prepare to receive connections. It takes a single parameter, which indicates how many pending connections the operating system should allow to remain in queue before the server actually gets around to processing them.

Accepting Connections

Most servers designed to run indefinitely and service multiple connections, this is usually done with a carefully designed infinite loop. Example:

import socket
host = '' # Bind to all interfaces
port = 51423
s = socket.socket(socket.AF_INET, socket.SOCKJTREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, l)
s.bind((host, port))
print "Waiting for connections..."
s.listen(l)
while l:
    clientsock, clientaddr = s.acceptQ
    print "Got connection from", clientsock.getpeername()
    clientsock.close()

Using User Datagram Protocol

To use UDP on the server, you create a socket, set the options, and bind () just like with TCP However, there's no need for listen () or accept ()—just use recvf rom().
This function actually returns two pieces of information: the received data, and the address and port number of the program that sent the data. Because UDP is connectionless, this is all you need to be able to send back a reply. Example, echo server:

import socket, traceback
host = '' # Bind to all interfaces
port = 51423
s = socket.socket(socket.AF_INET, socket.SOCK_DCRAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, l)
s.bind((host, port))
while l:
    try:
        message, address = s.recvfrom(8l92)
        print "Cot data from", address
        s.sendto(message, address)   #Echo it back
    except (Keyboardlnterrupt, SystemExit):
        raise
    except:
        traceback.print_exc()

In this post we have clarified some points in network servers.

Python Notes 12 : Network clients

After we have explored the basics of network programming in brief in the previous post, we will discuss network clients in more details in this post.

Understanding Sockets
Sockets are an extension to the operating system's I/O system that enable communication between processes and machines. It can be treated the same as standard files, with the same interface methods so in many cases, a program need not know whether it's writing data to a file, the terminal, or a TCP connection. While many files are opened with the open () call, sockets are created with the socket () call and additional calls are needed to connect and activate them.

Creating Sockets

For a client program, creating a socket is generally a two-step process.

  1. Create the actual socket object.
  2. Connect the socket to the remote server.

When you create a socket object, you need to tell the system two things:

  • The communication type: the underlying protocol used to transmit data. Examples of protocols include IPv4 (current Internet standard), IPv6 (future Internet standard), IPX/ SPX (NetWare), and AFP (Apple file sharing). By far the most common is IPv4.
  • The protocol family: defines how data is transmitted.
    For Internet communications, which make up the bulk of this book, the communication type is almost always AF_INET (corresponding to IPv4). The protocol family is typically either:
    • SOCK_STREAM for TCP communications or
    • SOCK_DGRAM for UDP communications

For a TCP connection, creating a socket generally uses
code like this:

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) To connect the socket, you'll generally need to provide a tuple containing the remote hostname or IP address and the remote port. Connecting a socket typically looks like this:
s.connect(("www.example.com", 80))

Finding the port number

Most operating systems ship with a list of well-known server port numbers which you can query. On windows systems, you can find this file at C:\Windows\System32\drivers\etc\services. To query this list, you need two parameters:

  • A protocol name
  • A port name.

This query is like:

>>>print socket.getservbyname(‘ftp’,’tcp’)

21

You didn't have to know in advance that FTP uses port 80.

Getting Information from a Socket
Once you've established a socket connection, you can find out some useful information from it.

s.getsockname() #Get your IP address and port number

s.getpeername() #Get the remote machine IP address and port number

Socket Exceptions

Different network calls can raise different exceptions when network errors occur. Python's socket module actually defines four possible exceptions:

  • socket.error for general I/O and communication problems.
  • socket.gaierror for errors looking up address information
  • socket.herror for other addressing errors.
  • socket.timeout for handling timeouts that occur after settimeout() has been called on a socket.

Complete Example

The example program takes three command-line arguments: a host to which it will connect, a port number or name on the server, and a file to request from the server. The program will connect to the server, send a simple HTTP
request for the given filename, and display the result. Along the way, it exercises care to handle various types of potential errors.

import socket, sys
host = sys.argv[l]
textport = sys.argv[2]
filename = sys.argv[3]
try:
    s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
except socket.error, e: 
    print "Strange error creating socket: %s" % e 
    sys.exit(l)
    # Try parsing it as a numeric port number.
try:
    port = int(textport)
except ValueError:
    # That didn't work, so it's probably a protocol name.
    # Look it up instead,
try:
    port = socket.getservbyname(textport, 'tcp')
    except socket.error, e:
    print "Couldn't find your port: %s" % e
    sys.exit(i)

try:
    s.connect((host, port))
except socket.gaierror, e:
    print "Address-related error connecting to server: %s" % e
    sys.exit(i)
except socket.error, e:
    print "Connection error: %s" % e
    sys.exit(l)
try:
    s.sendall("CET %s HTTP/1.0\r\n\r\n" % filename)
except socket.error, e:
    print "Error sending data: %s" % e
    sys.exit(i)
while 1:
    try:
        buf = s.recvB048)
    except socket.error, e:
        print "Error receiving data: %s" % e
        sys.exit(l)
    if not len(buf):
        break
    sys.stdout.write(buf)

Using User Datagram Protocol

In UDP there is no sufficient control over how data is sent and received. Working with UDP clients differs than TCP clients in the following:

  • When create the socket ask for SOCKDGRAM
    instead of SOCKSTREAM; this indicates to the operating system that the socket will
    be used for UDP instead of TCP communications.
  • When call socket.getservbyname(), pass ‘udp’ instead of ‘tcp’.

In this post we discussed network clients in a little bit depth. In the next post we will discuss network servers.

Python Notes 11 : Introduction to Network Programming

Network Overview

Python provides a wide assortment of network support.

  • Low-level programming with sockets (if you want to create a protocol).
  • Support for existing network protocols (HTTP, FTP, SMTP, etc...).
  • Web programming (CGI scripting and HTTP servers).
  • Data encoding

Network Basics: TCP/IP

Python’s networking modules primarily support TCP/IP.

  • TCP - A reliable connection-oriented protocol (streams).
  • UDP - An unreliable packet-oriented protocol (datagrams).

TCP is the most common (HTTP, FTP, SMTP, etc...). Both protocols are supported using "sockets".

A socket is a file-like object. Allows data to be sent and received across the network like a file. But it also includes functions to accept and establish connections. Before two machines can establish a connection, both must create a socket

Network Basics: Ports

In order to receive a connection, a socket must be bound to a port (by the server). A port is a number in the range 0-65535 that’s managed by the OS. Used to identify a particular network service (or listener). Ports 0-1023 are reserved by the system and used for common protocols:

  • FTP Port 20
  • Telnet Port 23
  • SMTP (Mail) Port 25
  • HTTP (WWW) Port 80

Ports above 1024 are reserved for user processes.

Socket programming in a nutshell

  • Server creates a socket, binds it to some well-known port number, and starts listening.
  • Client creates a socket and tries to connect it to the server (through the above port).
  • Server-client exchange some data.
  • Close the connection (of course the server continues to listen for more clients).

Socket Programming Example

The socket module

Provides access to low-level network programming functions. The following example is a simple server that returns the current time

import time, socket

s = socket(AF_INET, SOCK_STREAM)#Create TCP socket

s.bind(("",8888))                      #Bind to port 8888

s.listen(5)                                #Start listening

while 1:

    client,addr = s.accept()          #Wait for a connection

    print "Got a connection from ", addr

    client.send(time.ctime(time.time())) #Send time back

    client.close()

Notes: The socket first opened by server is not the same one used to exchange data.Instead, the accept() function returns a new socket for this (’client’ above).listen() specifies max number of pending connections

The following example is the client program for the above time server which connect to time server and get current time.

from socket import *

s = socket(AF_INET,SOCK_STREAM) #Create TCP socket

s.connect(("google.com",8888))       #Connect to server

tm = s.recv(1024)                #Receive up to 1024 bytes

s.close()                             # Close connection

print "The time is", tm

Notes: Once connection is established, server/client communicate using send() and recv(). Aside from connection process, it’s relatively straightforward. Of course, the devil is in the details. And are there ever a LOT of details.

The Socket Module

The socket module used for all low-level networking, creation and manipulation of sockets, and general purpose network functions (hostnames, data conversion, etc...). It’s a direct translation of the BSD socket interface.

Utility Functions

  • socket.gethostbyname(hostname) # Get IP address for a host
  • socket.gethostname() # Name of local machine
  • socket.ntohl(x) # Convert 32-bit integer to host order
  • socket.ntohs(x) # Convert 16-bit integer to host order
  • socket.htonl(x) # Convert 32-bit integer to network order
  • socket.htons(x) # Convert 16-bit integer to network order

Comments: Network order for integers is big-endian. Host order may be little-endian or big-endian (depends on the machine).

The socket(family, type, proto) function creates a new socket object. Family is usually set to AF_INET. Type is one of:

  • SOCK_STREAM          Stream socket (TCP)
  • SOCK_DGRAM           Datagram socket (UDP)
  • SOCK_RAW               Raw socket

Proto is usually only used with raw sockets:

  • IPPROTO_ICMP
  • IPPROTO_IP
  • IPPROTO_RAW
  • IPPROTO_TCP
  • IPPROTO_UDP

Socket methods

  • s.accept()                  # Accept a new connection
  • s.bind(address)          # Bind to an address and port
  • s.close()                    # Close the socket
  • s.connect(address)      # Connect to remote socket
  • s.fileno()                   # Return integer file descriptor
  • s.getpeername()         # Get name of remote machine
  • s.getsockname()    #Get socket address as (ipaddr,port)
  • s.getsockopt(...)        # Get socket options
  • s.listen(backlog)        # Start listening for connections
  • s.makefile(mode)   # Turn socket into a file like object
  • s.recv(bufsize)           # Receive data
  • s.recvfrom(bufsize)    # Receive data (UDP)
  • s.send(string)           # Send data
  • s.sendto(string, address)    # Send packet (UDP)
  • s.setblocking(flag)   #Set blocking or nonblocking mode
  • s.setsockopt(...)      #Set socket options
  • s.shutdown(how)     #Shutdown one or both halves of connection

There are a huge variety of configuration/connection options. You’ll definitely want a good reference at your side

The SocketServer Module

Provides a high-level class-based interface to sockets. Each protocol is encapsulated in a class (TCPServer, UDPServer, etc.). It also provides a series of handler classes that specify additional server behavior.

To create a network service, need to inherit from both a protocol and handler class. Example, the same time server we done before:

import SocketServer

import time

# This class actually implements the server functionality

class TimeHandler(SocketServer.BaseRequestHandler):

    def handle(self):

        self.request.send(time.ctime(time.time()))

# Create the server

server = SocketServer.TCPServer("",8888),TimeHandler)

server.serve_forever()

Notes: The module provides a number of specialized server and handler types. Ex: ForkingTCPServer, ThreadingTCPServer, StreamRequestHandler, etc.

Common Network Protocols

Modules are available for a variety of network protocols:

  • ftplib                FTP protocol
  • smtplib             SMTP (mail) protocol
  • nntplib              News
  • gopherlib          Gopher
  • poplib               POP3 mail server
  • imaplib             IMAP4 mail server
  • telnetlib            Telnet protocol
  • httplib              HTTP protocol

These modules are built using sockets, but operate on a very low-level. Working with them requires a good understand of the underlying protocol. But can be quite powerful if you know exactly what you are doing

The httplib Module

Implements the HTTP 1.0 protocol and can use to talk to a web server.

HTTP in two bullets:

  • Client (e.g., a browser) sends a request to the server

GET /index.html HTTP/1.0

Connection: Keep-Alive

Host: www.python.org

User-Agent: Mozilla/4.61 [en] (X11; U; SunOS 5.6 sun4u)

[blank line]

  • Server responds with something like this:

HTTP/1.0 200 OK

Content-type: text/html

Content-length: 72883

Headers: blah

[blank line]

Data

...

Making an HTTP connection

import httplib

h = httplib.HTTP("www.python.org")

h.putrequest(’GET’,’/index.html’)

h.putheader(’User-Agent’,’Lame Tutorial Code’)

h.putheader(’Accept’,’text/html’)

h.endheaders()

errcode,errmsg, headers = h.getreply()

f = h.getfile()        # Get file object for reading data

data = f.read()

f.close()

You should understand some HTTP to work with httplib.

The urllib Module

A high-level interface to HTTP and FTP which provides a file-like object that can be used to connect to remote servers

import urllib

f = urllib.urlopen("http://www.python.org/index.html")

data = f.read()

f.close()

Utility functions

  • urllib.quote(str)         # Quotes a string for use in a URL
  • urllib.quote_plus(str)    # Also replaces spaces with ’+’
  • urllib.unquote(str)         # Opposite of quote()
  • urllib.unquote_plus(str)  # Opposite of quote_plus()
  • urllib.urlencode(dict)
  • # Turns a dictionary of key=value pairs into a HTTP query-string

Examples

urllib.quote("ebeid@ieee")         #Produces "ebeid%40ieee"

urllib.unquote("%23%21/bin/sh")    #Produces "/bin/sh"

The urlparse Module

Contains functions for manipulating URLs

  • URL’s have the following general format

scheme:/netloc/path;parameters?query#fragment

  • urlparse(urlstring) - Parses a URL into components

import urlparse

t = urlparse.urlparse("http://www.python.org/index.html")

#Produces (’http’,’www.python.org’,’/index.html’,’’,’’,’’)

  • urlunparse(tuple) - Turns tuple of components back into a URL string

url = urlparse.urlunparse((’http’,’www.python.org’,’foo.html’, ’bar=spam’,’’))

# Produces "http://www.python.org/foo.html?bar=spam"

  • urljoin(base, url) - Combines a base and relative URL

urlparse.urljoin("http://www.python.org/index.html","help.html")

# Produces "http://www.python.org/help.html"

In this note we explored horizontally the network programming capabilities of Python. Every single module and topic mentioned here, needs multiple posts to cover it. In the upcoming posts, we will dig into network programming in detail.