Working with Unix timestamps in Stata

I recently wrote up a quick little bot to download Bitcoin (BTC) trade data from mtgox.com using their API. The data was very easy to parse using Python and I wrote the data to a tab-delimited text file for analysis using Stata as I’m interested in the bitcoin market and considering purchasing some BTC. I’ve uploaded the dataset to Buzzdata if you are interested in Bitcoin as well or in following along with this post.

A quick -twoway- plot shows an interesting trend but the x-axis label is non-sensical to (most) humans. Below I’ll cleanup the x-axis label by converting the unix timestamp to something Stata can use and to format the x-axis.

I read a little of an excellent post on the Stata blog about dates and times from other software, but this didn’t address unix timestamps explicitly. The approach is similar to that used with the SAS conversion as mentioned in that post but with the difference that unix timestamps are seconds since January 01, 1970. So to convert a unix timestamp to a Stata clock (%tc) formatted variable one could use something like:

* convert unix timestamp to %tc EST
clonevar datetime = unixtime
replace datetime = datetime*1000 + msofhours(24)*3653 - msofhours(5)
format %tc datetime

The important bit is the middle line. Multiplying datetime by 1000, accounts for Stata’s measurement of time in milliseconds, the -msofhours()- part accounts for the number of days between January 1, 1960 and January 1, 2970, and the last part subtracts five hours from the time to express it in Eastern Standard Time (EST) since it is GMT – 5:00.

This still isn’t quite what I want because now the x-axis of my graph will show the times and get really cluttered, so I create one more date variable which is formatted daily (%td) and use that in my plot below.

clonevar day = datetime
replace day = dofc(day)
format day %td
twoway (line price day, sort xtitle("Date") ytitle("Price (USD)") title("Bitcoin (BTC) prices")), xlabel(#4, labsize(small) format(%tdMon_DD,_CCYY))

One last thing to mention, note that there is a big (or not so big depending on how specific you need to be) difference between Stata’s %tc and %tC formats. I use the first, while the second will account for leap seconds. For more details, you can read up more on the Stata blog or Stata help files on dates and times.

Good luck!

This entry was posted in Uncategorized and tagged , , , , . Bookmark the permalink.

One Response to Working with Unix timestamps in Stata

  1. Nick Cox says:

    A minor side-issue:

    With such a graph, no need for any axis title at all. “Date” is redundant, and you can just blank out the default axis title instead.

    I routinely see graphs with something like “1980 1990 2000 2010″ as x-axis labels, but people feel compelled to add “Year” as a title. Why? Whichever teacher first insisted to them that every graph axis must have a title was imparting the right idea for almost all problems, but not this one.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>