100% CPU on Netapp / N Series |

I’m pretty partial to the ol’ N Series (IBM’s version of Netapp storage). We use them as our VMware storage over NFS. Easy to set up and manage.

This problem has been annoying me for ages, and I’m excited to finally have an answer to it. Every now and again, users would complain that performance had dropped through the floor. I’ve got pretty used assuming that bad VM performance = bad storage performance , and so jump onto the N Series / filer straight away.

systat -x 1

is a good start when investigating – it’ll show you how many NFS packets are passing through the filer, and the CPU, Network and Disk utilization levels needed to service the requests . The Disk Util column is interesting too – it isn’t an average or anything, it’s the busiest disk in the filer. And since we’ve got 50-ish disks in the filer.

This is what a normal sysstat output looks like. (At least for me).

netapp> sysstat -x 1 CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk FCP iSCSI FCP kB/s iSCSI kB/s in out read write read write age hit time ty util in out in out 64% 3039 0 0 3039 20249 4826 25087 52299 0 0 12s 93% 100% :f 24% 0 0 0 0 0 0 55% 2816 0 0 2816 17085 12570 20919 37107 0 0 12s 85% 100% :f 17% 0 0 0 0 0 0 44% 2856 0 0 2856 17580 6341 12312 39792 0 0 12s 82% 100% :f 51% 0 0 0 0 0 0 28% 3448 0 0 3448 18033 6292 4988 3980 0 0 12s 82% 13% : 12% 0 0 0 0 0 0

And this is one when all hell is breaking loose:

CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk FCP iSCSI FCP kB/s iSCSI kB/s in out read write read write age hit time ty util in out in out 100% 26 0 0 26 148 9 4584 7308 0 0 7s 100% 50% : 39% 0 0 0 0 0 0 100% 66 0 0 66 260 20 4728 0 0 0 7s 100% 0% - 31% 0 0 0 0 0 0 100% 39 0 0 39 275 211 5071 24 0 0 8s 100% 0% - 29% 0 0 0 0 0 0 100% 143 0 0 143 633 43 4548 8 0 0 8s 100% 0% - 30% 0 0 0 0 0 0

See the NFS column? It’s not like the CPU is busy because it is servicing NFS requests. It’s not being overtaxed by the VMs – it’s something internal to the filer. Even the Disk Util isn’t very high. What’s going on?

In our particular environment, our VMs are really disposable, since the build of WCM that goes into them is obsolete the next day. At any time the filer might be half filled with switched off VMs. Eventually, the filer fills up and you have to delete all the obsolete VMs. This is what has caused the high CPU – deleting a bunch of VMs. Each are about 25 gigs and I must have deleted around 200 or something. The deleting process itself is quite quick, but it spiked the CPU for about an hour. An agonizing hour!

At the very least, it’s great to know what was going on. I’ll open a case with IBM support and report back (if I can). In the meantime better write a delete queue to try not to tax it as badly.

100% CPU on Netapp / N Series

6 Responses to 100% CPU on Netapp / N Series

Leave a Reply

Most Popular Posts

Recent Comments

Blogroll