Studio711.com – Ben Martens

Search Results

Free Kusto Cluster

I’ve written before about the Kusto big data tool (aka Azure Data Explorer or ADX.) If your Azure budget was tightly restricted, you might not have a good way to play around with it and get enough confidence to push for it with your management.

Now you can visit http://aka.ms/kustofree and create a cluster for free very quickly! It’s obviously not a full-blown cluster but you can do plenty of exploration (more info.) I’ve been using one for my own random home projects and it’s great!

The Kusto docs have a good guide for getting started with the language and there’s even a short introductory course available too. If you have access to Pluralsight courses, here are some good ones to check out:

Keep calm and Kusto on!

Disclosure: I work for Microsoft.

Azure Data Explorer Overview Talk

I’ve written about Azure Data Explorer (aka Kusto) before, and I realize that many of you aren’t geeks, but for those who have geek-ish tendencies, I want to share a great talk from our Israel team. (These are people that I met on my trip last February.) It’s a couple hours long, but if you watch the first 45 minutes, it will give you a great overview of the product and maybe help you understand why I’ve been so excited to work with it for the past years.

My job revolves around petabyte datasets and analyzing new data within minutes of it getting created at the source. This product was a game changer in what we were able to achieve and promise to our users. I realize that 45 minutes is a non-trivial chunk of time, but if you’re in the data space, I think you’ll find it valuable or at least intriguing.

Analyzing Water Data in Azure Data Explorer

One of my favorite systems at work officially launched a couple weeks ago as Azure Data Explorer (internally called Kusto). I’ve been doing some blogging for their team on their Tech Community site. You can see all my posts on my profile page. This post will use Azure Data Explorer too but I thought it fit better on this blog.

A year or two ago, our local water company replaced all of the meters with digital, cellular meters. I immediately asked if that meant we’d get access to more data and they said it was coming in the future. The future is now! If you happen to live in Woodinville, you can get connected with these instructions.

The site is nice and lets you see charts, but by now you probably know that I love collecting data about random things so I immediately tried to figure out how to download the raw data. The only download directly supported form their site is the bi-monthly usage from the bills, but from the charts, I could see that hourly data was available somewhere. A little spelunking in the Chrome dev tools revealed the right REST endpoint to call to get a big JSON array full of the water usage for every hour in the last ~11 months.

I pulled that into Azure Data Explorer and started querying to see what I could learn. This first chart shows the median water usage by three hour chunks of the day. Tyla and I usually both shower in the morning so it makes sense that 6-9am has the heaviest usage.

WaterUsage
| summarize 
    sum(Gallons)
    by Hour=bin(hourofday(Timestamp), 3), bin(Timestamp, 1d)
| summarize percentile(sum_Gallons, 50) by Hour
| render columnchart  with (title = 'Median Water Usage by 3 Hour Bin', legend = hidden)

I feel like there’s probably a better way to do write the next query, but this works. It’s the cumulative usage throughout each month. The four lines at the top of the chart are the summer months when I’m using the irrigation in the yard. The lines that drop off at the end of the month are because I ran the x axis all the way from 1 to 31 for every month so months don’t have enough data, but it still conveys the general idea. It’s interesting how similar all the non-watering months are.

union
(
    WaterUsage
    | summarize Gallons=sum(Gallons) by bin(Timestamp, 1d)
    | extend Month=monthofyear(Timestamp), Day = dayofmonth(Timestamp)
),
(
    // Original data had some missing rows
    datatable(Timestamp:datetime, Gallons:long, Month:long, Day:long)
    [
        datetime(2018-11-26T00:00:00.0000000Z), 0, 11, 26, 
        datetime(2018-11-27T00:00:00.0000000Z), 0, 11, 27, 
    ]
)
| order by Timestamp asc
| serialize MonthlyWater=row_cumsum(Gallons, Month != prev(Month))
| project Month, Day, MonthlyWater
| make-series sum(MonthlyWater) on Day from 1 to 32 step 1 by Month
| render linechart with  (ycolumns = sum_MonthlyWater, series = Day, Month, legend=hidden, title='Cumulative Gallons By Month')

The data is in 10 gallon increments so it’s not super precise but it’s a LOT better than the two month resolution I had previously. I’m excited to play around with this data and see if we can start decreasing our usage.

Along these same lines, I heard that the local power company is starting to install power meters with Zigbee connectivity so there’s a chance that I’ll be able to start getting more insight into my power consumption in a similar fashion…