I am Irvin Lim.

I make pretty and functional things on the web.

CS3216 Week 3

My role in Assignment 1 is that of a backend developer, and hence I was in charge of making sure that any data that the frontend requires is made available to them easily.

Frustrations

Because of how we are using Meteor and MySQL, and how the ORM we are using (Sequelize) works, I ran into many problems early on even simply trying to retrieve a specific row from the database. Sequelize was based in Promises, and I have to admit that I was not particular confident in knowing fully well how Promises work. Furthermore, Sequelize is not very intuitive in the sense that Promise callbacks return instances and not objects, which in my opinion makes the code so much messier. Furthermore, depending on the type of Sequelize operation, the parameter may return an array of instances. I often found myself getting undefined values or null value errors simply because I didn’t know the method signatures or didn’t spend a little more time to find out how everything works.

It was a frustrating week, where I went through multiple libraries and approaches just for the client Meteor app to communicate with the database. I think the most important thing I learnt from this week is to Read The Documentation. I realised that so many problems could have been avoided and time saved if I had just taken some time and not be lazy to find out how things work, instead of assuming a library is supposed to work the same way as what I am used to in other libraries.

Redesigns

Despite being mainly in charge of the backend, I was also concerned with the user experience when a user uses our app. With reference to ExchangeHunt, an Assignment 1 project from CS3216 last year, I sketched out multiple changes to the design. Most importantly, I felt that we should have an index page for an exchange group, which is visible to users even when they are not signed in. This would allow us to provide Share buttons for users to invite or publicise the existence of the group on their Facebook timeline, which so happens to be one of the Assignment milestones as well. I thought that users’ friends who visit the page should be able to view who is currently in the group, which will somewhat act like peer pressure to incline other people to hop onto the app as well.

Bypassing server limits

Halfway through making use of the data which we had proudly dug out from TopUniversities (see last week’s post), we found out that the university logos we had “scraped” from their data could not be hotlinked. Furthermore, most of the logos were 48x48 pixels, while the main site often had logos which were 200x200. Seeing that there were these two problems, I subsequently thought of a way to retrieve the images in their highest resolution possible, by looping through a series of URL formats to attempt to find the highest resolution one. One example of a higher-resolution URL would be to simply change the _small portion of the URL to _medium or _large, but not all logos had medium or large formats available and was not standardized, and so I couldn’t simply do a search and replace or batch download.

I didn’t expect myself to actually write an image fetcher just for this task, considering that there were more important things to settle, but it sounded like a fun problem that would give me immense satisfaction if I solved it, and so I stayed up till 5am and actually wrote a simple fetcher in 3 hours (found here). What it does is simply to loop through all university data, if a logo URL exists, fetch the highest resolution image using wget, and save it to a local folder.

One of the big problems I faced was that I had to spoof HTTP headers to pretend that the request is being made from a browser, in order to prevent the no-hotlinking error from being returned. I also constantly found that images would always run into timeout errors while downloading halfway, despite continually trying again and again. I figured that the server might be imposing limits on how often requests can be made in a specified period of time, and so I hacked together some simple logic to split the downloads into batches of 50, and download a maximum of 50 images a minute. A lot of debugging and testing later, I felt so good when 200+ (relatively) high-res university logos were automatically downloaded to my computer.

I felt so good and somewhat felt like a “hacker” for crawling for these images which I don’t think I was supposed to have obtained. I’m pretty sure there isn’t a proper repository of university logos readily available, and so what I did may easily be useful for other developers around the world as well, which was why I decided to push it to a separate repository on GitHub. However, I also wondered if this may be a breach of Terms of Use of their website, but at that point in time, I didn’t care as much and just wanted to solve this problem at hand. I think the lesson learnt here would be to step out of your comfort zone, and not to stop just because there are no APIs or data sources available, and therefore right now we have 200+ pretty university logos for our app!