The pre-existing scraping application utilized .NET architecture, which they wanted Chetu to undertake and reinvent, bridging any gaps in the system and developing the appropriate code to increase effectiveness.
Scraping applications serve as data mining devices, extracting data from designated sites and saving the information for future use. Administrators then tap into this information and apply the collected data in any way they deem fit. Within the ticketing industry, vendors use website scraping tools to gain insight into ticket prices and availability.
During the preliminary phases of the project, Chetu focused on eliminated the existing issues, pinpointing the pain points and reprogramming bugs in the system. This provided our engineers a blank canvas to develop the enhancements from, a platform to garner the results the client demanded.
Chetu programmed the following technologies during the course of this project:
The client preferred Gatherer Service, which they used to scrape information from the primary vendor sites. Chetu's initial testing revealed that Gatherer was not functioning as it should. For example, the service is supposed to run between 8am-10pm EST, but it was not self-executing during the morning hours.
Although this was a hurdle, it was indicative of an underlying operational failure—the client lacked a system for tracking the Gatherer success rates. If this system was in place, the client would have realized the working hours lost during the morning. To engineer this new process, Chetu coded a C# protocol that allows administrators to compare data, view raw data, and export data in .csv format.
Chetu's main concern was Gatherer's speed; although there are a small number of events in the database, an optimized scraping application would filter these events a few times a day. In an effort to speed up Gatherer, Chetu blocked excess scripts from loading on the page so that the application would access the data at a higher rate.
After Chetu's intervention, Gatherer runs much smoother, and is no longer preoccupied with extraneous data. Chetu's efforts to improve the speed were a success—after we cached the standard .js files, they were no longer loaded repeatedly, a small change that revolutionized the entire scraping system.
Our client now operates much more efficiently, scraping tickets at a higher rate while having the tools to analyze the incoming data as needed.