Forget people not trusting the algorithm – the algorithm doesn’t trust people – especially people that wear black hats. Gone are the days when buying a bunch of links and stuffing keywords into EVERYTHNG made Google go “Duh, okay!” and rank your site top. Now, Google says “Erm… No. I see what you’re trying to do. And I don’t like it!” Google got a little bit smarter.
But it still doesn’t have two brain cells to rub together. Not yet. It needs your brain cells first.
Pandas and Penguins still haunt the dreams of many and now we have Hummingbird to consider too. At the very least, these refreshes to Google’s brain were announced publicly. We face a future where Google make a series of minor, unannounced updates (except for the “ground breaking” ones) as and when it deems necessary. Tweaks here and there they may be, but how long until all of that adds up?
Google has been cross and split testing a lot of stuff lately; rolling out Chrome updates, Maps updates, the image carousel, new Google homepage layouts and generally having a play around with things. It’s likely that these minor algorithm changes will be in the same vein – playing for the sake of discovery. For all of Google’s data, the vast swathes of information it has on us, it still cannot think like us. It can try to predict us, it can assume it knows what we want, it can study our history and tendencies, but it can never out-fox us, at least not in its current guise.
Humans are unpredictable. We don’t always want to buy things when we make a search. Sometimes, we don’t even know what we’re searching for until we find it. We are too complicated and individual to break down into numbers and trends. Algorithms can’t figure us out. So we’d like to believe.
Algorithms can go a long way and figure out a lot of stuff. They can crack encryptions, look for answers, optimise images and sounds, sort your junk mail and decide what Netflix will show next. That last one is particularly important. Netflix is not focused on content retrieval, but content consumption. Netflix doesn’t need to do anything more than provide you with all the content you could think of and watch you consume it.
The content you consume defines your habits and your likes. What are you most likely to watch next based on this? What kind of new show would you watch? That’s usable data. The result for Netflix was House of Cards, a milestone in TV and digital distribution. They saw that people like to binge on TV series. They found out which shows and movies people tend to like watching over other shows and movies and they went there.
Because Netflix is a closed ecosystem, all of this information is theirs. They create content catered to the mass audience, calculating hit TV shows with fearsome accuracy. Critics argue that this is the beginning of the end for spontaneous creative masterpieces coming out of leftfield in favour of safe bets and easy wins. But data talks the same language as money.
Content tilts its crown at a jaunty angle, reclines in the throne and smirks. All hail the king.
Matt Cutts’ job exists because Google has a problem. Spam is as old as advertising itself and the two can be very hard to separate at times. The problem Google has is that unlike Netflix, it is open to all content, not just the quality, trusted content of TV and film studios. And an awful lot of that content is in existence purely to game the algorithm.
This is where the algorithm fails. It doesn’t know what’s important and it has to fight off humans constantly, while figuring out which ones it can trust and serving the rest with answers. People dictate the choices that Netflix makes. Google dictates the choices that people make. Both give an illusion of power to the user.
Netflix gives you specifically what you want, but in a closed environment that you’re sold into and can’t escape from (assuming you never give up watching films and TV). Google gives you the internet (or rather, it’s interpretation of it), but most users will never scroll down past search engine result number six. Personally, I get very frustrated when I get results that don’t fulfill my query. Most of the time, I’m thinking about the SEO behind those results.
Why are they shooting for this query? Maybe they’re not – maybe the algorithm assumes this is what I want. Then you click the results and go through to find all those old school SEO clichés we’ve seen other sites get punished for. I see sites ranking on page one that Panda and Penguin should have destroyed. Footer links and over-optimisation all over the place – neutron star levels of keyword density and content duplication, content duplication and content duplication.
The algorithm is full of holes that can only be addressed by human beings. Human beings can fix it, but it needs to be done at certain levels, in specific and far reaching circles. It has to be calculated, but at the same time, fast, genuine and easy to measure. Google needs humans to do more than click on adverts – it now needs them to save it, by telling the robots what to do and who to trust.
The first Penguin sucker punch was painful for a lot of sites. Some are still reeling from the blow. Disavow was on hand to help – but who exactly is it helping? So many link removal requests go unanswered. They devour resources. The existence of old school SEO links is harmful to Google and Google knows it. So the Disavow Tool was born, the hate-child of Penguins and desperate webmasters. The tool works, as Cyrus Shepard proved with an experiment disavowing every single link his site had earned.
But I suspect that this tool goes deeper than simply cutting all ties with the link source. All those TXT files uploaded will not go unchecked – there’s just too much data in there to ignore. All those URLs and domains must be stored centrally for an audit on some level, be it manually assisted or automated. Links that have been disavowed for the fear of Penguin, especially if submitted by multiple users, are more than likely to have been on the shady side, from link farms or from paid directory submissions, or from other purpose built link networks – all the stuff Google wants gone.
It’s a golden idea. You give webmasters a tool that they will use to tell you what bad links look like. You get them to use it with the threat of rankings punishment – hey presto, a readymade, crowd sourced list of bad sites. Audits will spare the good sites added by malevolent forces and crush the bad for good – well, that’s the theory anyway. I’ll just loosen my tinfoil hat…
The number of total registered users on Google+ is about to grow enormously, as YouTube users will soon be required to use a Google+ password in order to comment. Google+ is not an ill-conceived attempt at a social networking site and it is not a failure. The billion user mark (Google’s active user count – take that as you will) was recently met, putting Google+ just a Twitter’s-length (500 million users) behind Facebook.
Whilst Google already have a human element checking pages for them it just can’t scale to cover the entire web quickly enough for them. Social media is the human approval system that Google needs.
But it has to be done right – there are far too many fakes and spammers roaming the planet for it to be taken seriously 100% of the time. To implement a social scoring system would spell death for Google’s credibility once fakery and spam take their unfair slice of the pie. So Google+ was born.
I’m not suggesting for a second that the platform is spam free – it isn’t. Spend five minutes on it after following a few groups and joining some communities. Things go nuts fast. And in allowing this, Google can now have full, unbridled access to the fakes and the real users. It can sort the power users from the automatons, the authorities and the spam throwers. It is a slow game, but before long every internet/Google user will be on there in some form or another.
If you’re legit, you’ll be running authorship mark-up. By no means is this concrete proof but to fake it would be very unwise, letting your reputation hang on the visage of a charlatan. You’ve got to be real, and that’s a good thing. Publisher tags reinforce this by separating people from businesses and giving groups a collective voice. Again, this is healthy, welcome progress. Human approval gives data meaning. Data with meaning is much more important than data without.
While many theories exist to debunk the +1 ranking factor myth, you have to wonder why Google+ exists. It’s a future proofing, less robotic alternative to algorithmic ranking, which to date has been rather dim-witted. Human beings will shape how search goes from this point forward, not algorithms.
By (almost) silently releasing Hummingbird to take long search queries and conversational searches further, Google has done what many expected to be the next logical step in search. They made it more human. This might take some getting used to, as most of us treat Google like a robot – because it is. That view might be set to change soon.
Speech recognition and conversational searching are not quite the same thing. One is fairly standard, the other is horrifically complex. Recognising words is a technical achievement, but it was cracked a long time ago. Making sense of language, contextualising a string of words, understanding colloquial terms – this is what the human brain does all the time. The brain is the most powerful computer in the known universe, so it’s not really hard for it to do.
For a machine, language, conversation and context are like climbing Everest on skates with no idea what climbing, Everest or skates are. It doesn’t have a sense of humour, so it can’t recognise that this is a ridiculous concept and think, “oh you, you silly search engine user! You can’t do that! Ha ha ha!” It will try to resolve the problem, because problem solving is what it does.
Google is not able to make sense of things other than the very basic, by tying certain key triggers together and recognising very basic levels of context. If Hummingbird and the fun voice searching options it provides get significant exposure, it can start getting to grips with language, or rather how people use language when they’re not at a keyboard. Google will have swathes of user generated spoken queries from all over the world. This will form a self-supplied database of human contextualisation, abstract terms, saucy language, colloquialisms and verbal diarrhoea.
This is brain food – the complex carbs of data. It’s data with human meaning attached. Language is the essence of communication and how human beings are able to contextualise and make sense of things. We see to believe, but we use language to understand and pass on knowledge. Google cannot see – but if it can hear then it can begin to unravel language and how we humans use it.
Then it can try to use it itself – becoming truly, fearsomely intelligent.
Social data, human spam curators, breaking the language barrier – all of this adds up to a desire to understand the human animal from the inside out. Maybe so that things can be advertised better, maybe to further technology to its outermost perceivable limits, maybe both. Being smarter is good. Google being smarter is a little scary. Google being alive is nightmarish.
But it could be on the way. Artificial intelligence is a long term goal for Google, but before it can achieve it, Googlebot has to learn to be human.
That’s my two pence – what do you think? Is human curated search the next stage? Will Google do a Skynet? Leave your comments below!