I stored almost all of my the research data in a single git repo.
This encompasses paper drafts, intermediate results, small experiments,
etc. The repo does not include collaborations with other researchers;
those usually happened in separate shared repos. As part of some kind of
informal retrospective on how I structured my PhD research, and for fun,
I decided to do some data-sciencing on the git repo’s commit messages. I
make no pretense at scientific rigour here: this is all just for fun
:-).
For example, below are the 24 most positive commit messages.
interesting interesting (100)
nice improvement! (100)
nice progress! (100)
nice progress (100)
pretty pretty (100)
happy happy (100)
fun fun fun (100)
progress! (100)
nice nice (100)
beautiful (100)
progress (100)
fun fun (100)
pretty! (100)
pretty (100)
cool! (100)
nice! (100)
yay! (100)
nice (100)
woo! (100)
cool (100)
wow (100)
yay (100)
ok! (100)
ok (100)
I acquired these by exporting commit messages
from git, doing case
folding and then deduplication. For sentiment analysis, I used the
well-known NLTK
python library. As you can see, there are many commit messages with a
100% score on positivity. To maximize interestingness of the ranking, I
break the ties by ranking longer commit messages higher.
These are the top 10 negative commit messages:
tricky tricky (100)
interrupt (100)
scary (100)
funky (100)
ugh (100)
scary scary work (86)
ugh tricky lemma (82)
bad work today! :( (77)
stupid import (77)
scary workday (76)
Here I just took the top 10 because there were not many ties for
100%. I’m surprised “funky” is listed in there. The rest looks
accurate.
Here are the most emotional commit messages. This means those with
the most emotion going on in all three categories analyses by NTLK:
positivity, negativity, and neutrality.
in exceptions paper, remove all leftover old commented tex code. in
the rest, add meeting questions for marieke, todos, and logbook entries
(100)
moved some stuff from vercors repo into my phd folder. also made a
next project file, and did some work on triggers for petra (100)
made bigrat all fancy, for some reason. finished with my smt typing
experiment. moved smt names into scopes a bit. (100)
added some while-proofs that use both big-step and small-step
semantics. need to move those to a separate file (100)
start drafting fields and such a bit. need to start emitting smt for
the heapt type and heaps aggregated type (100)
seems to still work. now to repair the whole date feature requires
some work with pandoc metadata… (100)
generation is done. next up, either integration in mill, or test
manually directly in adder first (100)
start working on removing the category row from the paper, and
change the evaluation cases names (100)
start a bit on the new structure of the zettel, and write one day of
the uppsala vacation (100)
some final changes to the exceptions paper that i submitted
yesterday. also pubs config (100)
intermediate result, hanoi is solvable, done! counting comes next.
looks difficult…! (100)
made the array interface first-class. now only some refactorings of
common tasks left (100)
remove slides i will probably never read. add readme to indicate i
have these videos. (100)
what next: generic type inference, contracts, a parser, or symbolic
execution? (100)
refactored the equivalence into a file, and proven seq_abort and
seq_unfold! (100)
partially do casing of sections/subsections/paragraphs. now for
chaptertocs. (100)
should look at carbon at some point (and maybe silicon) how they
encode maps (100)
start splitting out the solver stuff for a bit more modularization
and reuse (100)
add submitted artefact, paper draft around the time artefact was
submitted. (100)
restructuring and extending finalresults, work, prepare annual
evaluation (100)
finalize logbook for today and remove write paper indentation in
projects (100)
move stuff around. do some writing. start on comission composition
list. (100)
outline a bit. i think i can work out the one for industry section
now (100)
include pdf for jan, and pdf which includes some axioms for
summations (100)
As you can see, this category mostly boils down to just a length
competition. I was hoping for commit messages that had high ratings in
all three of NLTK’s sentiment categories. Unfortunately that’s not the
case: the commit messages above all have 100% in only the
neutrality category. I leave determining why that is for future
work.
Using the dates from the git commit export I made the following bar
chart to illustrate how my commit times where distributed over the day.
In this chart, commits are bucketed in the hours of the day:
Nothing unexpected: there are peaks around the end of my workday (4pm
to 5pm), followed by a dip during dinner time (6pm - 8pm). The second
significant peak is around lunchtime, followed by a subtle third peak in
the evening (21pm - 22pm). I don’t usually work in the evenings, though
it has happened occasionally around paper and thesis deadlines because I
suck at following my planning. Instead, I suspect most of my evening
commits are due to “hobby commits”. I have a habit of storing my hobby
projects in my PhD folder as well; usually those are tangentially
related to my research, anyway.
That’s it for now. The code for this analysis is available on sourcehut.