Hdfs top users by actions
Sometimes it could be difficult to identifying the activities that are stressing your namenodes.
Following is an article on how to have the top users by hdfs actions on your HDFS Cluster requesting the Namenode JMX
Using simple commands like a curl, you can request the jmx to give you what user is doing what on the hdfs FS.
$ export NNURL="http://<your namenode hostname>:50070/" $ echo "25 minutes top users" && curl --silent "$NNURL"jmx?qry=Hadoop:service=NameNode,name=FSNamesystemState | jq .beans[].TopUserOpCounts | sed 's/\\//g;s/.$//;s/^.//' | jq --compact-output .windows[1].ops[] | sort -t":" -n -k2 -r && echo "5 minutes top users" && curl --silent "$NNURL"jmx?qry=Hadoop:service=NameNode,name=FSNamesystemState | jq .beans[].TopUserOpCounts | sed 's/\\//g;s/.$//;s/^.//' | jq --compact-output .windows[0].ops[] | sort -t":" -n -k2 -r && echo "1 minute top users" && curl --silent "$NNURL"jmx?qry=Hadoop:service=NameNode,name=FSNamesystemState | jq .beans[].TopUserOpCounts | sed 's/\\//g;s/.$//;s/^.//' | jq --compact-output .windows[2].ops[] | sort -t":" -n -k2 -r
This will give you the activity on your cluster by HDFS operation with the TopUsers and the TotalActions of the operation.
It could be helpfull when searching what action is slowing or hanging your cluster.
Below an example of the output from a Hortonworks sandbox on 1 minute after the start, try it on your real cluster you will find knowledge of operations that are made on your cluster and when something going wrong, good evidences of who is doing bad things.
{ "opType": "*", "topUsers": [ { "user": "hdfs", "count": 332 }, { "user": "spark", "count": 117 }, { "user": "zeppelin", "count": 61 }, { "user": "oozie", "count": 52 }, { "user": "ambari-qa", "count": 20 }, { "user": "hive", "count": 17 }, { "user": "mapred", "count": 12 }, { "user": "yarn", "count": 9 }, { "user": "kafka", "count": 4 }, { "user": "livy", "count": 4 } ], "totalCount": 628 } { "opType": "setPermission", "topUsers": [ { "user": "hdfs", "count": 8 } ], "totalCount": 8 } { "opType": "setOwner", "topUsers": [ { "user": "hdfs", "count": 203 } ], "totalCount": 203 } { "opType": "open", "topUsers": [ { "user": "zeppelin", "count": 22 } ], "totalCount": 22 } { "opType": "mkdirs", "topUsers": [ { "user": "hive", "count": 4 }, { "user": "ambari-qa", "count": 4 }, { "user": "kafka", "count": 1 }, { "user": "hdfs", "count": 1 }, { "user": "livy", "count": 1 } ], "totalCount": 11 } { "opType": "listStatus", "topUsers": [ { "user": "spark", "count": 29 }, { "user": "oozie", "count": 29 }, { "user": "hdfs", "count": 19 }, { "user": "mapred", "count": 10 }, { "user": "yarn", "count": 6 }, { "user": "zeppelin", "count": 2 }, { "user": "livy", "count": 2 } ], "totalCount": 97 } { "opType": "getfileinfo", "topUsers": [ { "user": "hdfs", "count": 87 }, { "user": "zeppelin", "count": 37 }, { "user": "spark", "count": 30 }, { "user": "oozie", "count": 23 }, { "user": "ambari-qa", "count": 14 }, { "user": "hive", "count": 12 }, { "user": "yarn", "count": 3 }, { "user": "kafka", "count": 2 }, { "user": "mapred", "count": 2 }, { "user": "livy", "count": 1 } ], "totalCount": 211 } { "opType": "delete", "topUsers": [ { "user": "spark", "count": 29 }, { "user": "hdfs", "count": 3 }, { "user": "ambari-qa", "count": 2 }, { "user": "hive", "count": 1 } ], "totalCount": 35 } { "opType": "create", "topUsers": [ { "user": "spark", "count": 29 }, { "user": "hdfs", "count": 4 }, { "user": "kafka", "count": 1 } ], "totalCount": 34 } { "opType": "contentSummary", "topUsers": [ { "user": "hdfs", "count": 7 } ], "totalCount": 7 }
The complete output for 25, 5 and 1 minute
25 minutes top users {"opType":"*","topUsers":[{"user":"spark","count":84},{"user":"ambari-qa","count":10},{"user":"hive","count":9},{"user":"mapred","count":6},{"user":"yarn","count":5},{"user":"oozie","count":5}],"totalCount":119} {"opType":"mkdirs","topUsers":[{"user":"ambari-qa","count":2},{"user":"hive","count":2}],"totalCount":4} {"opType":"listStatus","topUsers":[{"user":"spark","count":21},{"user":"mapred","count":6},{"user":"yarn","count":5},{"user":"oozie","count":4}],"totalCount":36} {"opType":"getfileinfo","topUsers":[{"user":"spark","count":21},{"user":"ambari-qa","count":7},{"user":"hive","count":6}],"totalCount":34} {"opType":"delete","topUsers":[{"user":"spark","count":21},{"user":"ambari-qa","count":1},{"user":"hive","count":1}],"totalCount":23} {"opType":"create","topUsers":[{"user":"spark","count":21}],"totalCount":21} 5 minutes top users {"opType":"*","topUsers":[{"user":"spark","count":24},{"user":"yarn","count":1},{"user":"mapred","count":1},{"user":"oozie","count":1}],"totalCount":27} {"opType":"listStatus","topUsers":[{"user":"spark","count":6},{"user":"yarn","count":1},{"user":"mapred","count":1},{"user":"oozie","count":1}],"totalCount":9} {"opType":"getfileinfo","topUsers":[{"user":"spark","count":6}],"totalCount":6} {"opType":"delete","topUsers":[{"user":"spark","count":6}],"totalCount":6} {"opType":"create","topUsers":[{"user":"spark","count":6}],"totalCount":6} 1 minute top users {"opType":"*","topUsers":[{"user":"hdfs","count":332},{"user":"spark","count":313},{"user":"zeppelin","count":61},{"user":"oozie","count":60},{"user":"ambari-qa","count":40},{"user":"hive","count":35},{"user":"mapred","count":23},{"user":"yarn","count":17},{"user":"kafka","count":4},{"user":"livy","count":4}],"totalCount":889} {"opType":"setPermission","topUsers":[{"user":"hdfs","count":8}],"totalCount":8} {"opType":"setOwner","topUsers":[{"user":"hdfs","count":203}],"totalCount":203} {"opType":"open","topUsers":[{"user":"zeppelin","count":22}],"totalCount":22} {"opType":"mkdirs","topUsers":[{"user":"hive","count":8},{"user":"ambari-qa","count":8},{"user":"kafka","count":1},{"user":"hdfs","count":1},{"user":"livy","count":1}],"totalCount":19} {"opType":"listStatus","topUsers":[{"user":"spark","count":78},{"user":"oozie","count":37},{"user":"mapred","count":21},{"user":"hdfs","count":19},{"user":"yarn","count":14},{"user":"zeppelin","count":2},{"user":"livy","count":2}],"totalCount":173} {"opType":"getfileinfo","topUsers":[{"user":"hdfs","count":87},{"user":"spark","count":79},{"user":"zeppelin","count":37},{"user":"ambari-qa","count":28},{"user":"hive","count":24},{"user":"oozie","count":23},{"user":"yarn","count":3},{"user":"kafka","count":2},{"user":"mapred","count":2},{"user":"livy","count":1}],"totalCount":286} {"opType":"delete","topUsers":[{"user":"spark","count":78},{"user":"ambari-qa","count":4},{"user":"hdfs","count":3},{"user":"hive","count":3}],"totalCount":88} {"opType":"create","topUsers":[{"user":"spark","count":78},{"user":"hdfs","count":4},{"user":"kafka","count":1}],"totalCount":83} {"opType":"contentSummary","topUsers":[{"user":"hdfs","count":7}],"totalCount":7}
Then add a `jq`query on top to make it readable 😉