GPSME Toolkit

The GPSME Toolkit is part of the FP7 project entitled “A General Toolkit for “GPUtilisation” in SME Applications”. The project is developed by a consortium composed of two research institutions (University of Bedfordshire, UK and Rijksuniversiteit Groningen, the Netherlands) and 4 SME partners. The goal of the project is to develop a toolkit that helps SMEs improve the quality and reduce the time-to-market for their new and existing products. The GPUtilisation toolkit will automate the conversion of existing sequential CPU code to an optimal GPU implementation. 

The lack of computing power is not an inherent limitation for the actual SMEs, but is mostly a limitation imposed by the equipment that the users of their products are expected to have – in other words, the product developed is constrained by the computational resources of the likely user base. Hence, the availability and affordability of the equipment necessary to use the SME’s products can affect their marketing and become a major obstacle to their competitiveness in the future. From a technological point of view, through the GPSME toolkit the consortium has developed techniques based on automatic parallelization for modern day GPUs. As a foundation for the GPSME toolkit, the consortium has used a sophisticated open source complier platform ROSE (, and has improved the functions and optimize the algorithms of an existing C-to-CUDA translator MINT.

In the GPSME software structure, the input to the toolkit is C/C++ source code, which would be read and transferred as an AST (Abstract Syntax Tree) by the ROSE frontend. At the base of the GPSME toolkit is a set of #pragma user annotations that are designed to guide the toolkit to a closer-to-optimal tailor-made implementation. By using the information provided by the user through the annotations, the toolkit can carry out different transformations on the AST. The output from the toolkit is CUDA/OpenCL source code obtained by unparsing the transformed AST through the ROSE back-end. The toolkit operates by creating optimized GPU kernels out of annotated parallelizable loops.

The toolkit is available at