Our client produces branded documents for customers who sign up via their web site. New users to their website enter various details including email address, company address and telephone number and can produce a branded document for use within their business. One piece of information which needs to be supplied by the user is a company logo. Obtaining the company logo can be time consuming for the user and the logo is usually available on the company’s own website.
Our client wished to make the whole sign-up process less time-consuming and approached us to make the process more efficient. Uploading of a logo seemed to be the part of the process which caused the most issues. If this part of the process could be streamlined then it may make the sign-up easier for new customers and reduce drop-outs during this process.
Our solution to this problem was to investigate how we could improve the logo upload process. Since the users signing up were business users their email addresses contained the domain name of their business (e.g. firstname.lastname@example.org or jane@jane_doe_inc.co.uk). Given that most businesses have websites and we had the email address of the user it became clear that from the email address we could find the user’s web site (e.g. email@example.com would have a website of www.acme.com). Almost all business websites contain the company logo of that business. Therefore, from a user email address we may be able to retrieve the company logo without the need for the user to manually upload the logo.
This led to the interesting issue of how to determine which of the images on a website was, in fact, the company logo. In order to solve this problem a combination of machine learning algorithms, including clustering and classification, were used to provide an accurate indication of which of the images on a website were the logo. The algorithm was trained using a range of website logos and produced a useable logo in 70% to 85% of cases.