During my PhD, I encountered two categories of data. The first is project-related data. This data usually lives in some kind of shared folder, so that everyone on the project can explore and contribute. But not everything you do during your PhD directly relates to a project, or needs to be available to colleagues. Where do you put that data?
That’s where the second category comes in: it’s everything else.
This second category, “personal” PhD data, necessarily encompasses a wide range of types of data. Including, but not limited to, personal meeting notes, random collections of notes about topics only tangentially related to your PhD, small experiments, larger experiments, paper PDFs, and the list goes on. Anything, really. I’d even put email and calendar data in this category, if not for the fact that email and calendar programs don’t interface nicely with git.1
Early in my PhD I decided that, at least from a high-level perspective, I wanted to be disciplined regarding this second category of data. Over the course of the four+ years that followed, I refined my approach. One of the outcomes of that is the structure of my personal PhD-stuff folder. Essentially, it’s a shallow hierarchy of folders with some rules about what goes where, and how to name subfolders with some consistency.
The aim of this article is not to be a replacement for scientific data management techniques. Especially if you have large datasets, the ideas I describe in this article won’t be helpful (though they also won’t hurt). I recommend that, for your official final paper sources and datasets, stick to the data management policy as prescribed by your employer. For everything else, the ideas below might be useful for your: pick whatever sounds good, and leave what you don’t like.
Here it is, in alphabetical order:
/home/bob/science/
admin/archive/bibliography/collab/conferences/courses/experiments/finalResults/projects/reviews/students/teaching/These are the “top-level folders”, because they’re supposed to be located at the top-level of my research data hierarchy.
There are some top-level folders that I’m not including in the list
above. Some of them I are just bad ideas (see the next section), and
others are just embarassing boring to talk about, e.g. failed
note-taking experiments.
The science/ folder is also a git repository. I’ll
probably write a bit more about how I use git for my personal research
data in the future, but to summarize it: I commit and push twice a day,
and put anything smaller than a few megabytes in this repo. Anything
larger goes in my stack
folder.
A small remark about the folder name science/: the
folder name used to be PhD/, and before that,
masterthesis/. Whatever you do: don’t follow the pattern of
naming it after whatever it is you are currently doing. Any particular
job title is temporary, so just pick something that 1. matches the vibe
of whatever you’ll be putting in the folder, and 2. is also
generic enough that you can use it in the future for most jobs without
renaming. Other good candidates are:
research/,academia/,study/,math/, if you’re into that kind of stuff,work/, but it’s so terribly generic!
Might as well go with stuff/, data/ or
code/ then.csdata/ might also work.I went with science/ because it felt kinda short even
though it really isn’t. It’s also the right level of generic, since most
things I put in there are somehow related to computer science.
Alright, let’s go through the top-level folders A to Z.
TL;DR: at least check out the descriptions of
finalResults/ and experiments/. Of all of the
top-level folders, I think these have been the nicest to have. Their
necessity is also not 100% evident at first. Or have a look at the last two
sections for the executive summary.
admin/This folder is for mechanical, configuration and plumbing related
files. Kind of boring, but necessary. For example, my bibliography
system requires a few config files for settings. These files are in the
admin/ folder. Another example is the timer.py
script. It’s essentially a pomodoro timer, except it hooks into the
system notification system to tell you the time’s up. The
admin/ folder also contains the dmenu_pubs.sh
script from an earlier post about dmenu
helpers.
archive/This is where top-level folders go when I don’t use them anymore, either because they failed to be useful or because they’re no longer relevant. At the time of writing, it contains only one zipped folder, so it’s debatable whether I’ll keep it in the long run. Nevertheless, I still have it because it feels like a useful folder still.
bibliography/This is where I keep my pubs bibliography repository.
The primary reason for me to use pubs is that it provides a
reasonably structured and scriptable interface for what is essentially a
database for two filetypes: bibtex files and PDFs. See my post on scripting with
dmenu for an example of this.
Pubs can also store a plaintext note file for each paper, and I’ve used that a bunch of times. At the time of writing, 35 times. But it never became a habit. I think there was always a bit of a barrier because: opening the terminal and manually typing the identifier of the paper is a bit annoying, editing prose in Vim is not ideal, and there is no way to link those notes to other parts of my data in a meaningful way.
What seems to work well at the moment is putting paper notes in Obsidian. There is no concrete link between Obsidian and pubs (though that’d be cool), but there is an implicit one: the filename of a paper note corresponds directly to an entry in my pubs database. Combined with the shortcut I set up to quickly open PDFs from the pubs database, I can easily browse bibliography notes, while also having quick access to the accompanying PDFs.
More integration and shortcuts would be nice, e.g. opening the corresponding notes file when a certain PDF is open in sioyek, or copying the corresponding bibtex file given the key2, but I’m not sure if I’d use those often so I haven’t put in the effort yet. The only thing that I’m missing is an easier way to add PDFs. Currently it involves a bunch of steps:
scib add -D-d ~/Downloads/xyz.pdf, or my god why don’t I fix thisI know there are Python libraries that can scan PDFs for DOIs. Unfortunately, every once in a while I acquire PDFs which definitely don’t have PDFs in them (some publications don’t even have DOIs, e.g. Usenix papers), so there’d need to be some kind of fallback mechanism to include a DOI easily. Or maybe I should just ignore those cases. I haven’t made up my mind yet.
collab/Here I keep git repositories that are relevant for my current
research. I do not use submodules for this. Maybe they’re a
great fit, but I always find working with them confusing. Instead, I
committed the collab/ folder as an empty folder, and ignore
any contents in this folder via .gitignore. This way,
whenever I clone my science/ folder, I’m reminded that
there are also a bunch of external git repos I should consider.
To manage the git repos in the collab/ folder, I use myrepos, for two reasons:
.mrconfig
file, which I keep in the root folder science/. This file
also doubles as a nice list for me to see which repos I should be
keeping an eye on. Whenever a project finishes, I comment it out in the
.mrconfig file, leaving a nice trail of activity in case I
forget about a repo.mr checkout, to do the initial setup of all the repos, and
mr status, to make sure there is no leftover uncommitted
work in one of the repos. For anything else, I usually just open a
terminal in that folder.I think I’ve only ever had at most 3 repos active in the
collab/ folder, so I could’ve done without myrepos. Still,
it’s nice to have the infrastructure and “paper trail” ready to go.
conferences/This folder contains a folder for each conference I need to store data for. Any data that relates to the conference, e.g. not just drafts and slides, but also receipts, organizational information, notes I took during presentations, you name it.
A downside is that it overlaps with the projects/ folder
a bit. E.g. if you’re writing a draft for a certain conference, it’s not
really clear if it should go in the correspoding project or conference
folder. Luckily, this is a bit of a nitpick, and doesn’t really matter
in practice: I just make the draft whichever folder was there first, and
that seems to work fine.
Another downside about this folder is that it feels a bit weird to have a folder for a conference if your submission gets rejected. Having some record of all my attempts compensates for this.
courses/Not to be confused with teaching/ further down. Here I
keep all courses I follow, in contrast with courses I help
teach. Anything else vaguely course-shaped, such as summer schools or
exercise-heavy workshops, also go in this folder.
Initially this folder contained mostly courses required by the UT PhD programme, but later I also did some courses during my PhD, and also postdoc, purely out of interest and genuine usefulness.
I make sure to put a date in every folder name. E.g.
“Academic Publishing Bootcamp (2020-08)/” or
“Career College (2025-09)/”. Having the parentheses in
there makes it a bit harder to navigate this part of the filesystem in
the CLI, but that doesn’t happen often, and usually I can tab-complete
my way through it.
Putting dates in the folder names avoids name clashes between
duplicate courses, e.g. in case I take courses again later on. This is
rare, but it can happen. In the case of one particular course I had to
drop out after one day. Since it was mandatory I had to take it again
later. I also discovered later that I like having the option of viewing
the list in chronological order without having to depend on flaky things
like filesystem modified timestamps. This is a recurring pattern in my
science/ folder, and a practice I very much recommend.
The course folders do not contain tons of files. It’s just convenient to have a default place to put course materials, notes, slides you might want to reference later, exercises, related book PDFs, etc. Also, since my PhD programme required me to keep & upload certificates of all the courses I followed, it was very handy to have a default place to dump any PDF that might serve as a certificate later. The only downside is that presentation slides can be large files, which is problematic for the git side of things. For now I’m just accepting that I can’t clone my repo on Android devices because of large file sizes.
experiments/This folder contains all kinds of smaller projects. Some are only
tangentially related to my research, others are completely separate and
just wound up in this folder because spinning up a separate git repo is
not worth it. The key characteristic they all share is that I only work
on them for a few days to a few weeks, or maybe a few weekends. If they
stay relevant for longer, they should either be upgraded to their own
git repo or the projects/ folder.
Here’s a list of typical examples from stuff I’ve put in this folder:
I also put dates in the folder names here, to keep a chronological
trail independent of filesystem attributes. Early on in my PhD I
braincoded a script ls.py to print the folders in
chronological order in accordance with the timestamp in the folder name.
It’s good to have, but I rarely feel the need to use it.
Currently there are 138 experiments, which is equivalent to starting a new experiment every 13 days or so since I started my PhD. That feels about right. Have a look at this nice plot:

There are definitely some flat parts of the plot, but on the whole it looks pretty steady. There are 20 experiments I started early in my PhD when I wasn’t putting dates in the folder names yet. In the plot, I put those all in the first month, but that’s not really realistic. It might explain why the early part of the plot is not as steep as the rest of the graph. I’d like to go and put dates into those folders retroactively, but I haven’t found a reliable method to determine those dates yet. I’m sure git can tell me.3
finalResults/This is my favourite folder of them all. Whenever I complete some
significant milestone or project, I put the related files in a dedicated
subfolder in finalResults/. I started doing this to keep
track of papers and presentations, but now I use it for important things
generally: important results related to a publication, conference stuff,
proposals, standalone presentations, academic achievements, and custom
course material.
The most important benefit of this folder is that whenever I want to reference or send someone a previous result, I can just go into this folder and get a PDF or sources in two clicks or so. Putting everything in here takes some maintenance, but it becomes a valuable resource in the long term.
There are two structures to this folder: the outer structure, which is essentially a naming scheme for milestone folders, and the inner structure, which governs what files are in a milestone folder and their naming.
Each milestone folder follows the following naming scheme:
date - occasion - title or description (result types...)
Here’s a concrete example of what that looks like:

Having all this information in the folder names makes the folder listing pleasant to browse if you’re looking for something. I didn’t start out with this naming scheme, I only started doing this somewhere in 2022 when I noticed I was having a hard time finding previous results.
The folder names can get a bit long, especially if you have a long title and a bunch of result types to add. In practice, this doesn’t matter too much. The longest line I have takes up half my 1080p screen, so there’s even room for an even longer folder name still 🙂.
The parts of the naming scheme are used as follows:
date: a standard YYYY-mm-dd date. This
ensures the list also sorts chronologically when sorted alphabetically.
This is not an official date or anything, just the date on which I
created a milestone folder, or if create multiple milestone folders,
dates of the day after as well.
occasion: usually the title of the event or journal,
ideally including a year if this makes sense. This is usually the case
for conferences and journals, e.g. ETAPS-2022 or iFM-2024. I try to keep
this one short.
title: the title of the milestone, or if that doesn’t
apply, a short description. Sometimes the occasion is already
descriptive enough, in that case I leave the title out.
result types: this is a comma separated list of results
that are part of the milestone. I try to reuse types as much as
possible, but I’m also not too hesitant to create a new one if it feels
right. I have used the following types so far:
Some of these are one-off types. E.g. “working directory files”, which is only used for my masterthesis milestone folder. It includes a bunch of interesting notes and other files that I’d like to keep around. Others appear frequently, e.g. paper and presentation.
Whenever I put something in a milestone folder, I try to approach it
from an archivist’s point of view: what do I want to find in this folder
in 20 years, and what would be the best form to store it in? For now,
for each part of a milestone folder, I try to add the final form (e.g. a
finished PDF) and it’s sources (e.g. .tex files), cleaned
up to only contain what’s needed to reproduce the final PDF, and nothing
more.
If relevant, I include other files. E.g. for papers I typically include a the camera ready version, which I can share freely, and the final published version, which I can’t.
Here’s what that looks like for one of my papers:

The naming scheme here is again fairly rigid:
result type - title.extension for files,
resultSources/ for folders containing sources for result
types, or just result type/ if the result does not consist
of one particular file. Again, this is to optimize for browsing and
allowing to catch missing files by skimming the content list of each
folder. It might not work for everyone, but I like it so far.
projects/This is a simple one. Each subfolder of projects/
contains all files related to that project. Whenever I have an idea that
I think will take a while to explore, or a folder in
experiments/ that I’ve been working on for a long time, I
make a folder in projects.
If you’d graph out the number of files per project folder, I think you’d get a pretty long tail, in the sense that there are a few projects with most files, and most projects having only a few. Actually, let’s do that:

(I kicked project 32 out of the graph because it had a few git repos in there that artificially inflated the numbers by a few orders of magnitude.)
This reflects kind of what I expected, but then again it’s also kind of different. The red bars are projects that actually led to a concrete output (student report, paper, etc.). The green bars are either ongoing, or, let’s say, no longer promising. In particular, project 19, 28, 29 and 30 became chapters in my thesis. I expected the finished projects to be more heavily biased to the right than they actually are.
Funnily enough, even though I have some nice schemes for putting
dates in filenames almost everywhere, in the projects/
folder I don’t do that at all. I’m honestly not sure why not!
The projects/ folder is also where I used to keep files
like my daily logbook, my project file with all my high-level notes and
tasks per project, and some project-related archive files. A few months
ago I moved those to a separate repo, purely for technical reasons: so I
can also have a copy on my phone and tablet via git.
reviews/Here I keep reviews of journal and conference papers, both written by and for me. Its mere presence provides a nice trail of all the reviewing I’ve done and received so far, which is nice to have for when you need to write such things down under the “community service” bullet in your CV. It’s also nice to have a repository of reviews lying around for when you need inspiration for how to start.
The review folders follow the following naming scheme:
conference abbreviation with year - review types (date)/.
Concretely that looks like this:

So far I’ve only reviewed papers and artifacts, so the review type labels are not so helpful. I still think they look nice.
One thing I didn’t do, but which I wish I did, was keep better track of which review points I addressed in papers I co-authored, and how much I addressed them.
What I would usually do was just paste all reviews in a text file and start working through them top to bottom. Whenever I would be satisfied with the changes for one review point, I’d put “(ok)” in front of it or something similar, and move on to the next. This was a simple and effective way of tracking my progress, and made it possible to pick up where I left off the next day. When I finished addressing reviews, I’d just delete this progress file.
While simple and effective, the downside is that you don’t keep track of what possible holes (and more importantly, their sizes) are still present in your work when you publish a paper. When the day of my defense came, I knew there were still small problems in my papers, but I had a hard time tracking them down. If I just would’ve saved the tracking list, possibly with a 1-line explanation per review point about how I addressed it, that would’ve made re-reading the papers a lot quicker and more effective.
In truth, I have to admit I’m not sure if I could’ve predicted the questions they asked if I did have access to these tracking lists. Nevertheless, reviewer feedback is valuable information, so I suggest you keep track of it, if only with a few words per item of how you tackled it, or if at all.
students/Every student I supervise gets a folder here. It’s not really a place I do actual work, except on the rare occasion that a student gets stuck and I actually need to do some debugging. In general it’s mostly a dumping ground for files related to the student: meeting notes, emails I want to save, significant outputs (code, patches, reports, etc.), pictures or receipts.
The naming scheme here is to just use the name of the student in question as the folder name. I have not yet had name collision, nor did I supervise one student twice! But those are just name collisions waiting to happen. I should probably start putting dates in these folder names as well.
I will probably rename this to supervision in the near
future. “Student” is strongly tied to an academic context, and there’s a
good chance the context or people I supervise will change in the long
term.
teaching/Here there’s a folder for each course where I contribute to teaching. Sometimes there are slides in there, communication with students I want to hang on to, or other kinds of notes. Anything related to a course that might be useful later.
One particular recurring file is
Points for next year.md. Whenever I encounter something
that seems useful to improve, but now’s not the time (as is frequently
the case when you’re teaching), I put it in this file. It contains ideas
from the entire spectrum from small to large: from small notes on how to
do installation of optional tools, to ideas for how parts of the course
should be restructured.
There is a certain threshold for these ideas, though: they should be
completeable in a few months leading up to the next installment of the
course. If they’re smaller than that, I try to apply them anyway,
possibly changing lectures or other material I’ve already handed out to
students. Next year’s material will be copied from this year anyway, so
that way the changes find their way into the course organically. If the
change needs more than a few months of prep, or merely needs a longer
timeline, I try to put it on my personal task list, or, even better,
somewhere in my calendar as an appointment. That way I reduce the chance
a bit that the Points for next year.md file becomes just
another dumping ground for large projects I won’t feel like doing later
on.
The subfolders within teaching/ are dated. E.g.:

(Yes, PP and PPPP are different courses.)
Beyond putting dates in the names, there’s not much of a naming scheme. Ideally, the naming is consistent over the years, so sorting alphabetically also groups courses. But even that is optional when course names and content changes.
Of course, I didn’t just come up with these folders when I made the
first commit to my PhD git repo on June 2nd, 2020. I had some
expectations about what would work well
(e.g. experiments/), but over the years the structure grew
mostly organically. This includes some folders that I thought would be
nice, but which ultimately turned out to be less useful, or which turned
out to be located in the wrong place, and had to be moved.
One pattern I’ve found that fairly accurately predicts if a folder will turn out to be useful or not, is the following: topics bound by time, or not exactly related to a research project, probably shouldn’t be a top-level folder. This makes sense: if they’re not related to research, their purpose will not come up often in my day to day work. If they expire at some point, from then on they will be cluttering the root directory. In either case the folder is better off being moved somewhere else.
Of course, there are exceptions to this rule. For example, for major milestones, it’s nice having them as a top-level folder. E.g. having “defense” be a top-level folder was not only a smart move in terms of being able to easily navigate to it, it also felt motivating. In addition, obviously if you feel that something should be a top-level folder, it’s okay to put it there. That’s how I arrived at most of the structure of my research folder.
Finally, moving a folder around is usually okay if you don’t have tools or processes that depend on the exact path. E.g. the “worst” that happened to me is that I’ve had a few recent file shortcuts break because I’d been moving folders around. Usually, such problems can be solved through reconfiguration, or just re-opening the file in my case.
Here are some of the folders that failed and got, or will be, removed.
meetings/This folder is still a top-level folder, but I don’t use it anymore. I’ve found that, ideally, every meeting should be related to some project, or in other words a short, medium or long-term goal. I’m not saying you shouldn’t have meetings that are not directly related to your day-to-day, but if you keep notes on such meetings and put them in an isolated folder, it’s unlikely you’ll even remember to look for them later.
In the rare case I do have meetings like that, now I just put them as a bullet in my daily logbook. I got this strategy from Jeff Huang’s productivity text file, the only difference being that he puts all his meeting notes in this text file, whereas I only do this with notes from uncategorizable meetings.
Putting notes in my daily log works better than a folder dedicated to
meeting notes: if something related comes up in the future, I’m more
likely to remember in which period the meeting took place, or at least
what I was working on at the time. That’ll help me finding the notes in
my daily log. As a small bonus, the notes being in my daily log
increases the chance I’ll stumble upon them whan randomly browsing my
log file. Both of these benefits are not there if you put notes into
inert dated subfolders of meetings/.
Another example are PhD (now postdoc) progress meeting notes. During
my PhD progress meetings, we would usually discuss between 1 and 3 of my
ongoing research projects, plus students we’d be supervising and other
things that happened to be relevant at the time. This made it difficult
for me to decide where to put these notes, so it made sense that, at the
time, I decided to create a meetings/ folder.
Instead, what I do now is to put such generic work notes into a PhD
project (now postdoc) folder in the project/ folder. This
work well in practice. I’ve now had several occasions where I wanted to
remember something related to a research project or student. The moment
I realize it’s something I talked about with one of my supervisors, this
folder turned out to be the right place to start looking.
jobSearch/When I was looking for jobs around the end of my contract I needed a place to store a bunch of files and notes around that process, so I made this folder. Later I realized this fits better with my other personal data in my stack folder, where I also store my tax-related files, pictures, etc., as it’s actually a somewhat personal topic and not so technical if you think about it.
sites/At some point I had the idea of designing and deploying my personal website from my PhD folder/git repository. I was also expecting some people around me to also require a small website soon, so I figured, let’s put it into this top-level folder. That need never materialized, and my personal site felt like a serious project that should get its own git repository.
Deploying a research site from your personal research folder is still
a good idea, but I think if the need ever comes up again I’d just make
it a subfolder of projects/.
I think all of these folders represent important and distinct aspects of daily PhD work. To put it another way: if you’d ask me what kind of activities are generally involved with doing a PhD, I’d give a few answers.
First and foremost, your job is to formulate, then answer, research
questions. That is, working out the details and implications of a
particular line of research. You will need to do
experiments/, most of which will fail, but some of which
will grow into longer running projects/. The foundation of
these projects are the papers you will collect in your
bibliography/, as well as the collab/orations
you’ll undertake as you’ll inevitably run into the limitations of not
just your field, but yourself as a person.
You’ll probably have to do some teaching/ and grade a
bunch of exams when you’d rather be working on a deadline. If you’re
lucky, the amount of teaching you’ll have to do will be bounded by 20%
or so. If you’re luckier, most of the teaching you’ll be doing in the
form of supervising students/. You won’t always be the
teacher, though, as most universities have graduate schools where you’ll
follow some courses/. Some mandatory, and some because they
look genuinely interesting.
As you progress through your PhD, most likely you’ll produce some
finalResults/ that you can be proud of: papers,
presentations, and other creative expressions of the knowledge you’re
accumulating about your niche. You’ll also handle the day to day
admin/ work of a PhD: replying to emails, cleaning up your
inbox, and clicking the occasional button in the university’s HR webapp.
Inevitably, as research directions fail to pan out and the years slip
by, some of these endeavours you’ll have to archive/. You
realize each archived folder brings you closer to the truth we so
frantically pursue for as scholars (and also, publications.)
Finally, you will experience becoming and being part of a community.
You will go to conference/s to discover the work of others,
and even better, to spread the word of the cool things you’ve
discovered. If you’re lucky, you’ll have some nice colleagues with you
to show you around and introduce you. However, inevitably, as you go
deeper into your niche, you will go to a conference entirely on your
own. Imagine that! Going on your own to an event specifically organized
for people with your particular interest. How will you ever manage to
strike up a conversation out of the blue??? If you’re in computer
science, you will most likely find this simultaneously exhilarating and
frightening.
As part of the community effort, your supervisor will give you the
opportunity to review/ papers every once in a while. This
is a good chance to see how other people work, and more importantly, to
see the spectrum of quality of work that people from other groups
produce. I guarantee you you will be surprised, likely multiple
times.
Whew! There you have it, all the folders that structure all files in my day-to-day. It might be a bit much to take in. If there’s anything I think you should take away from this post, it’s not that I think you should exactly copy my system, but this:
Flexibility and personalization is key.
Clearly my needs were not as I initially understood them to be, which required some folders to be renamed, moved, or even deleted. On top of that, my needs changed over time, causing more changes as folders became irrelevant. What’s important is that you learn to recognize the need for these changes, and act upon them when it’s the most convenient and effective to do so. This more or less boils down to, starting early with some kind of structure, and to not be afraid to make changes when it’s not working. Even when you’re on a deadline. Just keep making small changes, and you’ll end up with a nicer structure in the long run.
One last thing I also want to mention is that I don’t think you need an explicit structure like this. I’ve seen plenty of researchers more productive than me who just wing it, so clearly the structure is optional. Some might even say a chaotic approach to managing research data stimulates insight. I’m not sure if I’d go that far.
I hope there are some ideas in this post which you can use to improve your own PhD personal data structure!
Generated with BYOB.
License: CC-BY-SA.
This page is designed to last.
⇐ [ This site is part of the UT webring ] ⇒