I recently wrote up a quick little bot to download Bitcoin (BTC) trade data from mtgox.com using their API. The data was very easy to parse using Python and I wrote the data to a tab-delimited text file for analysis using Stata as I’m interested in the bitcoin market and considering purchasing some BTC. I’ve uploaded the dataset to Buzzdata if you are interested in Bitcoin as well or in following along with this post.
A quick -twoway- plot shows an interesting trend but the x-axis label is non-sensical to (most) humans. Below I’ll cleanup the x-axis label by converting the unix timestamp to something Stata can use and to format the x-axis.
I read a little of an excellent post on the Stata blog about dates and times from other software, but this didn’t address unix timestamps explicitly. The approach is similar to that used with the SAS conversion as mentioned in that post but with the difference that unix timestamps are seconds since January 01, 1970. So to convert a unix timestamp to a Stata clock (%tc) formatted variable one could use something like:
* convert unix timestamp to %tc EST clonevar datetime = unixtime replace datetime = datetime*1000 + msofhours(24)*3653 - msofhours(5) format %tc datetime
The important bit is the middle line. Multiplying datetime by 1000, accounts for Stata’s measurement of time in milliseconds, the -msofhours()- part accounts for the number of days between January 1, 1960 and January 1, 2970, and the last part subtracts five hours from the time to express it in Eastern Standard Time (EST) since it is GMT – 5:00.
This still isn’t quite what I want because now the x-axis of my graph will show the times and get really cluttered, so I create one more date variable which is formatted daily (%td) and use that in my plot below.
clonevar day = datetime replace day = dofc(day) format day %td twoway (line price day, sort xtitle("Date") ytitle("Price (USD)") title("Bitcoin (BTC) prices")), xlabel(#4, labsize(small) format(%tdMon_DD,_CCYY))
One last thing to mention, note that there is a big (or not so big depending on how specific you need to be) difference between Stata’s %tc and %tC formats. I use the first, while the second will account for leap seconds. For more details, you can read up more on the Stata blog or Stata help files on dates and times.