In recent years two main platforms emerged as powerful key players in the domain of parallel computing: GPUs and FPGAs. Many researches investigate interaction and benefits of coupling them with a general purpose processor (CPU), but very few, and only very recently, integrate the two in the same computational system. Even less researches are focusing on direct interaction of the two platforms. This paper presents an implementation of a GPU-FPGA direct communication. The transfer is triggered by a central CPU but managed by the FPGA, in a DMA-like manner. An initial framework has been developed on a Virtex-5 FPGA, with a PCIe Gen1.1×1 setup, and demonstrates a 200 MB/s data rate. A new implementation on Virtex-7 has been conducted, supporting Gen3.0×8, with a demonstrated throughput of up to 2.4 GB/s in a Gen2.1×8 setup. Performance results between different hardware setups are therefore presented and compared. The various measurements demonstrate achieved data rates that are close to the theoretical maximum, with some interesting outliners, and a very low interfacing latency.