Open Sourcing my Journal Analysis
In case we haven’t talked about it I’ve been doing my daily journal for almost 10 years now. The entry is done at the end of the day and it’s been pretty consistent. Sometimes entries are small, sometimes they’re huge. But it’s been a useful tool for me to write down my thoughts at the end of the day.
I mostly keep my journal as two things:
- A source of what happened that day.
- A medium to write about my day or thoughts
Code is available here
Tools to help
The other thing that I haven’t talked about as much though is that I’ve also analyzed the hell out of my entries. At first it was mostly just silly things like tracking names across time periods to see who is mentioned a lot. I just used a white list of names of people who I wanted to track though.
Then it was trying to find common themes and some other more specific entries but I wanted to open source some of my tracking and analysis as well as taking a stab at trying to learn some natural language tools. Most of this isn’t going to get you into real insights but it’s fun to see some interesting long term behavior about your work. There’s also some additional fun features like Facebook’s day in history feature.
Example Entry
Here’s an example entry along with a sentiment score. The score goes from -1 to 1. Where -1 is super negative and 1 is super positive day. I also built a little button to show you how the sentences were analyzed and find out which things were super negative and positive. Out of the box it’s using the NLTK sentiment analyzer which is built from a corpus of imdb movie databases. Not exactly the best source of sentiment for a journal but I’m going to continue building out the tools to quickly build your own classifier.
Day in History
A day in history view is kind of fun on Facebook so I built a quick tool to let you get that same experience in the app.
N-Grams
A N-Gram is a linguistics terminology that effectively indicates common word combinations. The n refers to the number of words in this case. So a common one for me at n-gram of 3 is “I don’t know” and “we’ll see how”. I think the interesting insight is sometimes you see expressions popping up in smaller time periods that show up in these n-gram graphs. So during the peak of covid-19 there was a lot “I’m worried about”. When you take larger indexes you’ll see common expressions coming out. I’ll occasionally learn about some common patterns and actively try to avoid using them. It’s useful as an exercise to give yourself an exercise to change your writing sometimes. It’s good to shake things up sometimes.
Sentiment Analysis
I’ve built some graphs that show average sentiment over time which have been useful to see patterns. My average sentiment is fairly negative which I think is pretty accurate for me. The next step is already to see about building out the new tools and stuff. I’ve mostly talked about this inside my README.md on the open source repo as well.
Next Steps
The next steps I want to do are just more silly things but I’m always looking for ideas. What would you be interested in if you could see your daily records for months?
My next features: – Names over time. I did this a long time ago and thought it was fun to see who comes in and out of my life and how often they appear. It also could show you a graph of how frequently people in your entires based on names though this doesn’t work all that well. I don’t really use last names in my journal entries obviously so Michael could be my friend Michael or another friend Michael. The hard part of this is NLTK parses any pronouns as “names” so something if you mention something like “Star Trek”. It will count as two entires, “Star” and “Trek”. It might be useful to improve this by creating allow/blocklists. – Making it easier to utilize your own classifier instead of using the stock NLTK one. I might also explore adding some more specific ones that general sentiment analysis on a sentence. Right now it’s just saying “give me a sentence and say positive negative or neutral”. I could build feature extractors and other stuff to find super common important words.