What do Digg and short URLs have in common? They are URLs, they are shared and used by many people. Why not mixing the idea of short URLs (Which has mainly arised because of the Twitter hype) and the ability to get people to share and publish URLs to the public like Digg?
So a few months ago I was discussing with a few lads about all this, and we decided to start on our own social network idea, with URLs. Basically one thing we noticed is that many many of the url shortening services had cool features like stats, custom urls, and at the same time you saw the power of Digg’s public URL hype idea. We basically decided that if we could make an URL shortening service that had most of the features of the other services in one place, and that if we could also add some value to this by bringing a “social-networking” aspect to it, it would simply rock!. Instead of linking people by their type of types of people, activities, technology interests, groups of friends, etc. we saw that there was an opportunity to link people by the type of urls they shorten but most importantly the context of those URLs.
Right now as it stands, you can have public or private urls with short.ie. When you shorten an URL and make it public, we basically parse the content of that page, analyze it’s title, description, headings tags and a few other things (wink). We then store all this in a context related manner in our database. Where does that bring us? We of course have the content of the URLs you shortened, but also the URLs of everyone else using the service, so we can do basically like a google search but instead of having our users typing the search term we do it automatically and instead of doing it on all the pages in the world, we do it on our database and return the URLs that are the most relevant to you (Or so we think). For instance, when you go to your profile page, we randomly select 10 of your previously shortened URLs, crawl through the community URLs and identify the contexts and other people’s URLs that may be interesting you using the URLs you previously shortened.
This could be seen a bit like Amazon’s “recommendation” feature. Even though I know many people don’t like this exact feature of Amazon I still believe, and so do they, that it is still working for many many others :).
Currently on my own profile on short.ie I have links to articles/URLs related to PHP, Politics, Conferences, etc. As the system grows the content becomes more diversified and the links more relevant. And hey, if we don’t find anything that you may randomly be interested in, we don’t bother showing you a box saying we didn’t find anything, we just show you your profile page without that box
Here’s a bit more details about the recommendation engine:
As many other recommendation engines as such as Amazon, iLike, Collarity, etc. Our recommendation engine makes use of some correlation algorithm (More precisely the Pearson product-moment correlation algorithm). To explain quickly because this is not a whitepaper on statistical analysis algorithms, I will simply say that we firstly take an URL, analyze it’s context and calculate the average of the values creating a median with the values. Then we take the other context urls, analyze them, and calculate the ones with the most points near the median line. The 10 ones with most points are kept and displayed to the user. Badly worded? See the graph below, it should help understanding

Correlation example
Of course this is only an example, but I’ll explain how we use this. The red line is the median of an URL’s point of associations. Then we analyze the other URLs that have contexts and give points depending on wether they match certain relations with the original URL’s median. The URLs that acquire the most points close the the median line are kept/chosen and recommended to the users. Of course the more data you have, the more analysis you can do and this is something we are currently refining and improving daily. We are adding rules to the analysis and making sure that the URLs that we are “recommending” to our users are better and better.
For those interested, here’s a rough “mathematical-representation” of the algorithm.
![]()
where
![]()
are the standard score, population mean, and population standard deviation(calculated using n in the denominator).
The result obtained is equivalent to dividing the covariance between the two variables by the product of their standard deviations (Note that this is the general equation and the broad idea behind the recommendation system).
That should cover most of the idea and theory the recommendation system. Other things that I have to mention are upcoming features as such as the “public links” (A list of public links, a public timeline ala twitter that will show everyone’s most active public links - and not the links per user, but the PUBLIC TIMELINE ), “top URLs” (A list of the most clicked URLs), top users, referer URLs, etc.
You can leave a response, or trackback from your own site.
What I basically do is parsing the words of the original URL to associate content with and each words are kept independently with a score, let’s say 5. Then we have a list of related words, and when they are met they are automatically given a 5 score (for now) and there’s a list of words related to the words. When a related-word is found, depending on the score already associated to it, we add this score.
Let’s say the original URL has the word “webapp”. We will then find the related words and when we encounter the word “webapp” we give it 5. If we encounter the word “php” we may add 3 because it’s only that relevant. and if we encounter the word framework, we give it 1 point, then you also have the words like “car” which will basically have -1
And in those kept scores, we’d keep the highest 2 ones thus “webapp” and “php”.
Also, I do agree that some words may be skewing the score, thinking someone may have the context “web” but an url related to spiders (real spiders). This is exactly when the other parts of context/page are coming into the game. They discriminate the other results.
Of course there’s still a fair bit of work to be done on the engine side but it’s already generating and recommending links. Seems to be working interestingly but hey it’s still beta
so may very well be wrong
Hey Dave,
Interesting read. I like short.ie and I look forward to seeing how it evolves.
A couple of points…..
Why would someone want a recommendation based on what someone else has re-written?
I’ve noted some of the recommended links which seem to have very little relevancy - presumeably the accuracy of the recomended links becomes greater if the URL’s which are being shortened are re-written prior to to be shortened??
Best
Barry
hey Barry
Fair question. I’ll answer it that way. Why would amazon recommend you books based on other content? Basically if you read a page that is about PHP, MySQL, etc, when someone else shortens an URL that is about PHP, MySQL, etc, you “may” happen to be interested in that stuff. Thus the recommendation
We are still in the works of refining the engine to be honest and as the little beta message says, some links may be wrong. However if some links are of little revelancy it may still be the highest ranked in the algorithm.
I’ll give a few looks and try to make them a bit more relevant for you
We like to blog about things we're passionate about. We love PHP, MySQL, CouchDB, Linux, Apache - web development standards. We also like writing about building web apps and working with web technology.
You can email us on freedom@echolibre.com
Eamon Leonard - @EamonLeonard
David Coallier - @DavidCoallier
Helgi Þormar Þorbjörnsson - @h
J.D Fitz.Gerald - @jdfitzgerald
Noah Slater - @nslater
Court Ewing - @courtewing
(2)
(4)
(2)
(3)
(3)
(5)
(2)
(1)
(35)
(1)
(2)
(1)
(1)
(28)
(12)
(1)
(2)
(2)
(3)
(1)
(1)
(3)
(1)
(1)
(15)
(1)
(5)
(6)
(6)
(1)
(19)
(3)
(1)
(2)
(2)
(1)
(5)
(3)
(1)
(2)
(2)
(1)
(3)
(3)
Not looking for trade secrets but when you put standard deviation into effect with word generated content how do you score it? are certain words assigned a certain number? And if so would there not be a danger of some words which might be relevant to another having vastly different weightings and thus skewing the score?
Otherwise what a cool idea! This is a good example of something I will actually use! I am putting short.ie into my ff speed-dial! great work lads!