Imhotep - Data Analytics Platform by Indeed

Sample Queries

This page includes sample queries to run on our demo cluster preloaded with four sample datasets: apachejira, nasa, wikipedia and worldcup2014.

Click here for information about the sample data.

apachejira

The following query shows the users who reported the most bugs in Apache Software Foundation projects:

from apachejira 2016-01-01 2017-05-01
   where action="create" issuetype="Bug"
   group by actor

The following query shows the 10 projects with the most reported bugs:

from apachejira 2016-01-01 2017-05-01
   where action="create" issuetype="Bug"
   group by project[10]

The following query returns the number of unique contributors per project:

from apachejira 2016-01-01 2017-05-01
   where status="Patch Available" fieldschangedtok="status"
   group by project
   select distinct(actor)

The following query shows the number of contributions per person in Hadoop Common, which is a very active project:

from apachejira 2016-01-01 2017-05-01
   where status=”Patch Available”  fieldschangedtok=”status” project=”Hadoop Common”
   group by actor
   select distinct(issuekey)

From the graph (as of 2017-05-01), the following insights are available:

  • There are 200 different people who have contributed a patch.
  • Only 50 people have contributed 5 or more patches.
  • On average, each person contributes 5 patches.

The following query shows the average number of hours it takes for a patch to be accepted per project:

from apachejira 2016-01-01 2017-05-01
   where prevstatus="Patch Available" status="Resolved" fieldschangedtok="status"
   group by project
   select timesinceaction\3600/count()

NASA

The following query on hourly counts returns a graph of the full time range of the dataset with the number of queries every hour:

from nasa 1995-06-30 22:00:00 1995-09-02 00:00:00 
  group by time(1h)

From the graph, the following insights are available:

The following query shows that the top 100 pages accessed during this peak hour were limited to shuttle liftoff coverage:

from nasa 1995-07-13 07:00:00 1995-07-13 08:00:00
  group by url[100]

This query lists the most popular non-image URLs on each day in the dataset:

from nasa 1995-07-01 00:00:00 1995-09-01 00:00:00 
  where url !=~ ".\*gif"
  group by time(1d), url[1 by count()]

Wikipedia

The following query returns the names of the most popular Wikipedia articles that start with E from one hour on 9/13/2014:

from wikipedia 2014-09-13 11:00:00 2014-09-13 12:00:00
  where title=~"E.\*"
  group by title[10 by numRequests]
  select numRequests

World Cup 2014

Team Captains

The following query returns the average age of captains and players of all other positions. The query also compares the number of appearances in the World Cup for the two groups of players. Team captains are on average almost 5 years older than other players and have 3 times as many national team appearances.

from worldcup2014 2014-07-01 2014-07-02 
  group by Captain 
  select Age/count(), Selections/count()]

The following query lists the captains, along with their club, country, position, and number of World Cup appearances.

from worldcup2014 2014-07-01 2014-07-02
  where Captain:1 
  group by Player, Country[], Club[], Position[] 
  select Selections

Clubs

The following query returns data for the top 25 clubs: number of players, number of captains, average country ranking of the team’s players, average player age. Barcelona has the most players in the World Cup (16), but Real Madrid/Man U have the most captains (2). Atletico Madrid has the highest average country rank for its players. Manchester City the oldest players, Schalke 4 the youngest.

from worldcup2014 2014-07-01 2014-07-02
  group by Club[25] 
  select count(), Captain, Rank/count(), Age/count()

Countries

The following query returns data by country: average player age and average number of World Cup appearances. Argentina has the oldest team, Ghana the youngest. Spain is the most experienced, Australia the least.

from worldcup2014 2014-07-01 2014-07-02 
  group by Country 
  select Age/count(), Selections/count()

Age versus Experience

The following query compares player age to the number of World Cup appearances. Not surprisingly, the older you are, the more appearances you’ve had, in general.

from worldcup2014 2014-07-01 2014-07-02 
  group by Age
  select Selections/count()

Jersey Numbers

The following query returns the number of players grouped by their jersey number. The query also returns the number of captains for each jersey number. Teams number all players 1-23. However, captains gravitate towards wearing #1, #4, #3, and #10.

from worldcup2014 2014-07-01 2014-07-02 
  group by Jersey 
  select count(), Captain

The following query groups documents by the player’s jersey number and then, for each jersey number group, returns the most common position for that jersey number. Some numbers are typically associated with a position: #1 is always the goalie, defenders are frequently #2 and #3, and #9 is usually a forward.

from worldcup2014 2014-07-01 2014-07-02 
  group by Jersey, Position[1]

Positions

The following query returns the average player age and average number of World Cup appearances by their position. Goalies are older and more frequently the captain. Forwards typically have the most experience.

from worldcup2014 2014-07-01 2014-07-02
  group by Position 
  select count(), 100\*Captain/count(), Age/count(), Selections/count()

Groups

The following query returns data about the World Cup groups: average number of World Cup appearances, average age, and average country rank. Group D and G were rough. Group F and H were easy. Group H was also the youngest and least experienced, while group C was the oldest and most experienced.

from worldcup2014 2014-07-01 2014-07-02 
  group by Group 
  select Selections/count(), Age/count(), Rank/count()