Paul Hammant's Blog: Scraping Github pull requests and their code review comments
Github stores its pull-request and code review data in MySql. I’d much prefer a git reperentation for both (JSON, commits, audit trail, etc). Kinda the way Github Wiki pages are stored. That’s an aside though, this article is about storing code-review comments long term. The problem I’m trying to solve is one of deletion of users thich causes their pull requenst commentary to also get deleted. Sure the commits make it back to the origin/master (in the pull request is processed), but many things are left assoctaed with the fork. If the user gets deleted such info is gone forever :(
I want a permanent copy, so the interim answer is to scrape the data I fear losing, while it still exists.
They can’t object for your own GithubEnterprise instance of course. Github could change the structure of their HTML, and the script might stop work so well.If that happens I’m happy to accept back pull-requests via the usual mechanism :)