Downloading A Web Page Using Python

by HSG on Oct 25, 2012 in Python Tutorial

In this tutorial I am going to give you a gentle introduction to network programming in Python. If you are new to programming or new to Python then that may seem like a daunting thought. But read on and you will be pleasantly surprised how easy it is.

Like most modern programming languages, Python was designed for networking from the very beginning, and thanks to that, a lot of the networking tasks you would want to accomplish with the language are made a whole lot easier.

Teaching You Tools to Become Empowered
To build a successful business and career, you must always keep up with changes in the IT Marketplace. AWS, Office 365, LinkedIn, Sales Force, SEO, Social Media are just a few examples of where knowledge is a huge factor for success.
Training with HSG addresses your knowledge gaps and provides you with the requisite tools to outpace the competition.

Network communication is a large topic, but if it is something that interests you then read on because in this tutorial I will show you how to download a web page. I will show you how easy Python makes tasks like this.

Take a look at the following code:

import urllib
	
con = urllib.urlopen("http://hartmannsoftware.com")
page = con.read()
con.close()
print page

Yes, you’re eyes are just fine. That really is only five lines of code to perform such a powerful operation. So what does it all mean?

The first line of code imports the urllib module for us to use. This module contains various networking functions we can use to perform network based operations such as connecting to a server and receiving data.

On the second line we call the urlopen function of the urllib module and give it the address of the page we want to download. In this case I’ve used Slashdot but you can easily replace that with any other address. We assign the result of the urlopen function to our variable named con, which is a connection object.

Next up on the third line of code we create a variable called page and assign it the results of our connection objects read function. In this case the result will be all the html and text and found on the web page.

The fourth line is simple enough to understand. All it does is close the connection so we can’t send and receive any more data.

And lastly we use the print statement to output all the data we received, which will basically be every piece of text, html, javascript and css which makes up the web page.

So there, you have it. In just five lines of code you were able to connect to a server and then download a web page from it. Trying to do the same thing in other languages can be a rather long winded experience often requiring you to have a good knowledge of how sockets work.

Now, I will be the first to admit that downloading a web page isn’t exactly the most exciting thing you can do in Python, but it gives you a taste of the kind of power which is built into the language and the sheer simplicity of it is amazing.

So I hope you enjoyed this article and if you are interested in learning more then pay us a visit again for more tutorials to help you learn.

other blog entries

Interesting Reads Take a class with us and receive a book of your choosing for 50% off MSRP.

did you know? HSG is one of the foremost training companies in the United States

Information Technology Training that enables companies to build better applications and expertly manage the software development process.

Our courses focus on two areas: the most current and critical object-oriented and component based tools, technologies and languages; and the fundamentals of effective development methodology. Our programs are designed to deliver technology essentials while improving development staff productivity.

Personalized courses for unique business and technology needs

An experienced trainer and faculty member will identify the client's individual training requirements, then adapt and tailor the course appropriately. Our custom training solutions reduce time, risk and cost while keeping development teams motivated. The Hartmann Software Group's faculty consists of veteran software engineers, some of whom currently teach at several Colorado Universities. Our faculty's wealth of knowledge combined with their continued real world consulting experience enables us to produce more effective training programs to ensure our clients receive the highest quality and most relevant instruction available. Instruction is available at client locations or at various training facilities located in the metropolitan Denver area.

consulting services we do what we know ... write software

Design and Development
Application lifecycle management
We generate use cases, UML documentation, architect the system, create an object model, iteratively develop the system, unit and system test and modularize where necessary. These types of engagements are typically longterm and, almost always, incorporate a variety of software technologies.
If you are in need of expertise but only require the assistance of one or two engineers, the Hartmann Software Group can be of help. Instead of procuring talent by way of a placement agency who is likely to contact us, come to the Hartmann Software Group directly. You may realize some savings.
Mentor
The best way to learn is by doing.

The coaching program integrates our course instruction with hands on software development practices. By employing XP (Extreme Programming) techniques, we teach students as follows:

Configure and integrate the needed development tools

MOntitor each students progress and offer feedback, perspective and alternatives when needed.

Establish an Action plan to yield a set of deliverables in order to guarantee productive learning.

Establish an Commit to a deliverable time line.

Hold each student accountable to a standard that is comparable to that of an engineer/project manager with at least one year's experience in the field.

These coaching cycles typically last 2-4 weeks in duration.

Provide Expertise
Services for your business growth.
With the introduction of smart devices, e-marketplaces, electronic forms, SEO practices, big data, virtual office environments, media streaming and so much more, there is hardly a business whose livelihood is not, in some way, dependent upon the inclusion of such software functionality into its online presence, work environment and product offerings. Such inclusion often presents a complex array of challenges that are far beyond the expertise of the business as it demands an understanding of technological options, modular decomposition, backend systems integration and web services. Most importantly, the business requires IT talent and expertise; this is where the Hartmann Software Group shines. To that end, here is an overview of some of the services that we offer:

Business Rule isolation and integration for large scale systems using Blaze Advisor

Develop Java, .NET, Perl, Python, TCL and C++ related technologies for Web, Telephony, Transactional i.e. financial and a variety of other considerations.

Windows and Unix/Linux System Administration.

Application Server Administration, in particular, Weblogic, Oracle and JBoss.

Desperate application communication by way of Web Services (SOAP & Restful), RMI, EJBs, Sockets, HTTP, FTP and a number of other protocols.

Graphics Rich application development work i.e. fat clients and/or Web Clients to include graphic design

Performance improvement through code rewrites, code interpreter enhancements, inline and native code compilations and system alterations.

Mentoring of IT and Business Teams for quick and guaranteed expertise transfer.

Architect both small and large software development systems to include: Data Dictionaries, UML Diagrams, Software & Systems Selections and more