Vision-based human activity recognition can provide rich contextual information but has traditionally been computationally prohibitive. We present a characterisation of five convolutional neural networks (DenseNet169, MobileNet, ResNet50, VGG16, VGG19) implemented with TensorFlow Lite running on three state of the art Android mobile phones. The networks have been trained to recognise 8 modes of transportation from camera images using the SHL Locomotion and Transportation dataset. We analyse the effect of thread count and back-ends services (CPU, GPU, Android Neural Network API) to classify the images provided by the rear camera of the phones. We report processing time and classification accuracy.