11.5.22 Examples
This example gets the python.org main page and displays the first 100 bytes of it:
>>> import urllib2 >>> f = urllib2.urlopen('http://www.python.org/') >>> print f.read(100) <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <?xml-stylesheet href="./css/ht2html
Here we are sending a data-stream to the stdin of a CGI and reading the data it returns to us. Note that this example will only work when the Python installation supports SSL.
>>> import urllib2 >>> req = urllib2.Request(url='https://localhost/cgi-bin/test.cgi', ... data='This data is passed to stdin of the CGI') >>> f = urllib2.urlopen(req) >>> print f.read() Got Data: "This data is passed to stdin of the CGI"
The code for the sample CGI used in the above example is:
#!/usr/bin/env python import sys data = sys.stdin.read() print 'Content-type: text-plain\n\nGot Data: "%s"' % data
Use of Basic HTTP Authentication:
import urllib2 # Create an OpenerDirector with support for Basic HTTP Authentication... auth_handler = urllib2.HTTPBasicAuthHandler() auth_handler.add_password('realm', 'host', 'username', 'password') opener = urllib2.build_opener(auth_handler) # ...and install it globally so it can be used with urlopen. urllib2.install_opener(opener) urllib2.urlopen('http://www.example.com/login.html')
build_opener() provides many handlers by default, including a
ProxyHandler. By default, ProxyHandler uses the
environment variables named <scheme>_proxy
, where <scheme>
is the URL scheme involved. For example, the http_proxy
environment variable is read to obtain the HTTP proxy's URL.
This example replaces the default ProxyHandler with one that uses programatically-supplied proxy URLs, and adds proxy authorization support with ProxyBasicAuthHandler.
proxy_handler = urllib2.ProxyHandler({'http': 'http://www.example.com:3128/'}) proxy_auth_handler = urllib2.HTTPBasicAuthHandler() proxy_auth_handler.add_password('realm', 'host', 'username', 'password') opener = build_opener(proxy_handler, proxy_auth_handler) # This time, rather than install the OpenerDirector, we use it directly: opener.open('http://www.example.com/login.html')
Adding HTTP headers:
Use the headers argument to the Request constructor, or:
import urllib2 req = urllib2.Request('http://www.example.com/') req.add_header('Referer', 'http://www.python.org/') r = urllib2.urlopen(req)
OpenerDirector automatically adds a User-Agent: header to every Request. To change this:
import urllib2 opener = urllib2.build_opener() opener.addheaders = [('User-agent', 'Mozilla/5.0')] opener.open('http://www.example.com/')
Also, remember that a few standard headers (Content-Length:, Content-Type: and Host:) are added when the Request is passed to urlopen() (or OpenerDirector.open()).
See About this document... for information on suggesting changes.