Trying to build a project which can scrape data from the various websites hosted by the government - which actually holds the data.

  • Cron job to run this service every day (Learnt for this project)
  • Dockerized this container (Learnt for this project)
  • Node Script - Puppeteer
  • TypeScript defintion files alongside with actual code in JavaScript (That way we can get IntelliSense but it is not too intrusive as you don’t have to use .ts files) (Tried for first time)
  • This is already scraping data and collecting it daily
  • Added Unit Testing

Future Scope
Need to scrape websites of states (As they have the district level data)
Already building an API using golang (leaning along with this project)
Need to build a client to show the data

  • Might be interested in extending beyond India. As the data is common. This problem might not be restriced to India, and useful where there is no official API, or the official API lacks historical data available in public domain.