Download this whitepaper now and get a new one every month!Download »
We love digital
Call 0845 485 1219
We love digital - Call and say hello - Mon - Fri, 9am - 5pm
by Alec Sharratt on 12th September 2012
Hello. As most of you are probably aware of the basic on page SEO elements that you can look at when optimising a website, today’s video is going to cover a little bit more detail in regards to performing a technical website audit. A technical website audit is still a very important aspect of an SEO audit of a site, but this video is going to go into a little bit more detail than I’ve previously gone into with regards to looking at the on page elements for SEO.
This is really, really important and I think should be considered a fundamental part or a foundation to your website. I see the technical aspects of websites as the first couple of layers of Maslow’s hierarchy of needs or analogous to that. I say this because by having them in place you’re not going to get any particular advantage out of it, but by not having them in place you can incur a disadvantage. So in that respect they are analogous to Maslow’s hierarchy of needs or certainly the first couple of levels of it.
Now, I’m going to start off with the hosting server. This isn’t really necessarily an on page aspect, but it’s part of a website technical audit. It’s important to know what server a website is hosted on, or if you’re in the development stages, what server you want to host it on in the future.
Now I’m only going to cover Linux and Windows servers in this particular video. They are both good choices, my personal favourite being Linux, and it’s also the most popular web hosting server at the moment on the Internet. The reason for this is usually because it’s cheaper and it’s easier to administrate. Unless you’ve got a very specific need for something that a Windows server can provide, so you want to use a .net framework or .aspx pages or you would like to integrate some kind of Windows tool or application into your site, such as Microsoft Access or Excel, then really you want to use the Linux server.
Linux is a free and open source platform, so it’s more often cheaper. It provides certain functionality features which just make administrating aspects of your website so much easier. With Windows you will need either a Windows administrator or fair degree of skill and knowledge about how to administrate Windows servers, as well as admin access to that server. So if you don’t have a dedicated server or you don’t have admin access, or you don’t have the skills to administrate a Windows server, you’re either going to have to pay for someone that does or potentially just be unable to make the kind of fundamental changes you may want to make over the time. So I would strongly advise, unless you have a specific requirement to use a Windows server, that you always go with Linux.
So the next thing on my list here is the robots.txt file, a very basic on page element that most people are familiar with. Essentially this file will determine what aspects or what parts of your site can be indexed and what parts can’t be indexed as well as setting a few sort of specific parameters for indexing types of files or specific directories in your site. And these commands are given to crawlers or spiders, whatever you want to call them, that will come and index your site from different search engines.
So it’s very, very important to have this in place, not only because it’s the first port of call for robots that are going to index your site, but it determines how they can then access the site and what to index on it. This can be important, more important for one person than another depending on the type of site you’ve got. If you’ve got a website with a lot of images on there, you might not want all of those images indexed, so you can exclude those from being indexed.
You can also specify a different set of commands for one website crawler over another website crawler. To give you an example, the opening statement in a robot.txt file should be “user agent:”. This specifies the Google bots or Bing bots or whatever it is that you want to give commands to. So you can say, everything underneath this up until the next time that user agent is specified will be affected by the commands contained therein. If you want the user agent to apply to everyone, so you just want to carte blanche robots.txt file which applies to anything trying to crawl your site, just simply add a star, and that means it will apply to anything.
Now if you’d like to disallow a particular directory for example, you simply write “disallow:” and then specify the directory. So to follow the example of images, that would disallow an image directory titled image. If you wanted to stop a certain page from being indexed, just add the URL of the page. To follow this example here, “page.html”, and that will prevent that page from being indexed. Equally, you can take away this from the front, and that would specifically allow that page to be indexed.
So as you can see, this is a powerful tool for controlling how spiders will access a site and what they do when they get there. The other very, very important feature of the robots.txt file is to include a link to your sitemap. Because the robots file is the first port of call, it will look here for a link to your sitemap. The sitemap should just be written, “sitemap:” and then a space with the URL of the sitemap following afterwards. And that will tell Google or any search engine that’s crawling your site where your sitemaps are located if you have one or multiple. As I say, this is really, really important because when you’re getting your site indexed, as we’ll move on to, the sitemap.xml file is vital for this. The format for this file is xml, which puts in a particular type of code that search engines are specifically looking for in this type of file so they can read it. They can understand all of the pages on your website. And you can also add markup information to this so you can define what pages have priority over other pages in terms of relevance to one another or relative to one another.
You can give a page a higher priority than another page and so indicate to the search engines what your top tier pages are. This can also carry information about when the pages were last modified or uploaded and is just an absolutely essential part. If you have a webmaster tools account or for Google or for Bing or for whatever, they’ll all ask you for the sitemap.xml file, so very, very important to have it and also very important to have it linked to from the robots file.
Now moving onto HTML, this is again a very, very fundamental part of any website. If your website’s made in HTML or use CSS, there is a validator tool that you can use which is, if you look in the transcript underneath this video, you’ll find a link to the W3C validator. You just put your website address in there, click Go, and it’ll tell you all of the HTML and CSS errors that are present on your site with ways to fix them as well.
Getting light HTML and CSS is I think very, very important to a site. It improves load times, and it can also affect the way that your website’s rendered. If you have any errors in there, you may find that a menu doesn’t render correctly in a certain type of browser or, you know, images aren’t loading correctly, that kind of thing. So getting this right is fundamental because it’s ultimately going to impact how people can interact with your site and if they can see the content that’s contained within it.
Google analytics code: this is the tracking code provided by Google analytics. I won’t get into actually getting that code, but the important thing to remember is where that code is placed. I see it very often on sites where it’s kind of put at the bottom of the page or it’s at the very top of the page. The ideal place as advised by Google when you get the code is to put it immediately before the closing head tag of the page. A head tag looks like this, well the closing head tag looks like this, and put the code immediately before that. This means that the code can be loaded as soon as possible and you won’t lose or you’ll retain as much data as is possible to retain by having the code installed correctly on the page.
Htaccess file. Now this is unique to the Linux server that I mentioned earlier on, and it is such a powerful tool. It can control many server side actions, and it can also control things like GZip compression, which we’ll come on to in a little bit more detail later on. Specifically, one of the first things that I want to do with the .htaccess file is resolve the canonical issue that you get with websites where they can be accessed from the www. and without the www., and also if you’ve got an indexed HTML page, you can redirect that to the root of the domain so that your home page doesn’t show as indexed or HTML just the domain name.
This is important. It’s not as important as apparently it once used to be because Google have got better at identifying canonical issues, but it’s still possible, and I still do see it, where you get an entire website indexed with and without the www. As a result, Google thinks there’s duplicate content there, and you’ll end up with a load of error messages and webmaster tools telling you you’ve got duplicate Meta and duplicate titles and duplicate content.
To avoid this you can use the .htaccess file. Again, the code for this will be in the transcript of the video, just to give you the example of how to use that code.
Making the file is very easy if you don’t have one already. You just open up a text document using a text editor, as with the robots file, and save it as .htaccess rather then .txt. But the entire file name is just .htaccess. There’s no prefix to the dot. This is very important. If you save it in any other format, it won’t work, it won’t be recognised.
Word of warning with using the .htaccess file, it is a very powerful tool. It’s very, very easy to use, and thus it is also very easy to break your website with. Any changes that are made using the .htaccess file should be thoroughly checked afterwards. It’s completely possible to bring your website down using the .htaccess file or take down parts of the site or make certain pages inaccessible. So when using it, check everything that you do, make sure that it works, and then just follow that process because, as I said, it is possible to take your website down.
But it is very easy to use. There’s lots of resources available on the Internet about it. So I wouldn’t say it’s something that you shouldn’t do if you have no experience with it, and as long as you’ve got the file there you know where it is and you can see if something’s gone wrong and you can take the file back down again. But if you aren’t familiar with it do your testing at your quietest time.
Now website performance is the last and probably the largest topic in this list. There are so many aspects to website performance we can’t really cover all of them in this one video. But I will give you an overview of the key areas to look for in improving website performance. But before I do that, it’s important to know why we’re looking at website performance. If you’ve ever been to a website that’s got slow load times, you can’t access a page, you’ve got a little spinning load icon on the top of screen, you’re not sure whether it’s going to work, whether it’s not going to work. People will click off of the page before it loads, and ultimately that’s going to lead to either data loss in terms of not gathering Google Analytics data. It’s going to lead to people leaving your website and frustration for people using it. Even if it’s just slow enough so that people can use it, but it’s not fast enough for people to enjoy using it, that can create a bad user experience and prevent people from coming back in the future. So very, very important to get website performance correct.
Another indicator of how important this is, is that Google Webmaster Tools have an entire section based on website performance, and they give you graphs in there that demonstrate the speed of your website. Although it’s not 100 percent accurate, it’s still a very, very good indicator. There are loads of free tools available on the Internet that show you what your website speed is or what the loads on specific pages are. The quicker you can get this the better. There are a few techniques that you can do that are very, very easy to implement depending on the size of the site.
Improving CSS, this is again it’s quite simple. You may need some experience with CSS to do this, but just removing unnecessary code, any white space, any gaps, checking it for errors as we mentioned with using the validator, you could do that, all help to improve the speed, and again combining them is also good. Containing all of the CSS within a style sheet rather than mixing it up and having some on page and some on the style sheet is another great way to improve performance. CSS is a very powerful tool anyway for website design and replaces the need to have lots and lots of HTML code repeated on every single page of the website. So instead of defining the paragraph, size, and font within the source code of a page, you define it within the CSS file, which means that that code exists once on your website rather than on every single page, and hence overall you’re reducing the amount of code that’s going to be read when loading pages. That will improve load times.
GZip compression, I mentioned this earlier with regards to the .htaccess file. It is possible to do it with a Windows server, but it’s a much more long-winded process, and you’re going to need to have a Windows administrator or someone skilled and knowledgeable to do this for you. It’s not something you’re going to be able to do very easily. You’ll need access to the IIS system of your server as well. So depending on the setup you’ve got, you may or may not have access to that. From my experience, with Windows servers, it’s very hit or miss. Half the time people will and half the time people won’t be able to implement these kinds of things.
GZip compression literally compresses data that’s been passed to the web browser from the hosting server. It doesn’t include images, but it does include code. So it’s a very, very good way to speed up load times for web browsers. But because it doesn’t do images, we do need to look at images.
Images are the biggest obstacle for website performance and are quite often overlooked. There are a number of ways. If you’ve got a lot of images in your site, you’re going to want to look at this because it will impact website performance across the board.
Images, you can reduce the size of them physically, so make the canvas size smaller. But one of the best ways to do it is either to compress them or to use a program other than say Photoshop, something like Paint or Snagit that strips the additional code away that’s created when you create a Jpeg from a Photoshop file or from a more complicated application that adds sort of mark-up information to the files. This can have a huge impact by reducing file size up to 50, 60, 70 percent in some cases. So just open the file in Paint or Snagit or a similar basic editor, save the file, and you’ll see the file size drop because all that additional information is gotten rid of.
This should really be the first thing you do when reducing image sizes because you retain all of the quality and the actual canvas size of the image whilst you do it. There are programs that essentially zip or compress files, image files for you, and that way you’ll see the file is smaller once it goes through to the web browser. And it’s then essentially unpacked at the other end. This is a great way to improve website speed.
Redirects as well, if you use a lot of redirects, very, very, very, very important. If you’ve got thousands of them, they’re going to slow down your site. So reduce the number of redirects that you use. Don’t simply remove them. Address the problems. If you’ve got old pages and they’re not being used anymore, make sure if the redirect is taken away that you don’t leave links to those pages, that you haven’t still got links to those pages from outside of your website, and check webmaster tools as well to make sure that these pages now aren’t appearing with broken links or error 404s.
So this has been a basic overview anyway at least of the technical aspects of a website and what to look for when auditing it. As I say, these are really fundamental, foundational aspects of a website. Getting them right in the beginning will create a lot less problems for you later on.
Thanks for watching.