Detailed explanation of PHP FFI -- a new PHP extension method

Posted by Johan Beijar on Mon, 30 Mar 2020 12:07:14 +0200

 

With PHP7.4, there is an extension that I think is very useful: PHP FFI (Foreign Function interface), which refers to a description in PHP FFI RFC:

For PHP, FFI provides a way to write PHP extensions and bindings to C libraries in pure PHP.

Yes, FFI provides high-level languages to call each other directly. For PHP, FFI allows us to call various libraries written in C language conveniently.

In fact, a large number of existing PHP extensions are the packaging of some existing C libraries, some commonly used mysqli, curl, gettext, etc. PECL also has a large number of similar extensions.

In the traditional way, when we need to use some existing C language library capabilities, we need to use C language to write wrappers and package them into extensions. In this process, we need to learn how to write PHP extensions. Of course, there are some convenient ways, some Zephir. But there are still some learning costs, and with FFI, we can directly call the functions in the library written in C language in the PHP script.

In the decades of history of C language, the accumulated excellent library, FFI directly allows us to enjoy this huge resource conveniently.

To get back to the point, today I'll use an example to show how we can use PHP to call libcurl to grab the content of a web page. Why use libcurl?

Isn't PHP already extended with curl? Well, first of all, I'm familiar with libcurl's api. Second, it's just because I have it that I can compare it. Isn't the direct ease of use of traditional extension and FFI?

First of all, let's take the current article you read as an example. Now I need to write a piece of code to grab its content. If we use the curl extension of traditional PHP, we will probably write as follows:

<?php
 
$ url  =  " https://www.laruence.com/2020/03/11/5475.html" ;
$ ch  =  curl_init ();
 
curl_setopt ($ ch , CURLOPT_URL , $ url );
curl_setopt ($ ch , CURLOPT_SSL_VERIFYPEER , 0 );
 
curl_exec ($ ch );
 
curl_close ($ ch );

(because my website is https, there will be another operation to set ssl_verify)

What about using FFI?

First of all, you need to enable ext / ffi of PHP 7.4. Note that PHP-FFI requires libffi-3 or above.

Then, we need to tell PHP FFI what the function prototype we want to call is. We can use FFI:: cdef for this. Its prototype is:

FFI :: cdef ([ string $ cdef  =  ""  [, string $ lib  = null ]]):  FFI

In string $cdef, we can write C language function declarations, FFI will parse it, we know what we want to call in the string $lib in the library function. In this example, we use 31 libcurl functions, their declarations can be found in the libcurl document.

For this example, we write a curl.php, which contains all the things to declare. The code is as follows:

$ libcurl  = FFI :: cdef (<<< CTYPE
//Invalid * curl'easy'init();
int curl_easy_setopt ( void * curl , int Options, ...);
int curl_easy_perform ( void * curl );
void curl_easy_cleanup ( void * handle );
//type
 , " libcurl.so"
 );

Here is a place where the return value written in the document is CURL *, but in fact, because our example will not dereference it, just pass it, then avoid the trouble and replace it with void *.

However, there is also a trouble that PHP is predefined:

<?php
const CURLOPT_URL =  10002 ;
const CURLOPT_SSL_VERIFYPEER =  64 ;
 
$ libcurl  = FFI :: cdef (<<< CTYPE
//Invalid * curl'easy'init();
int curl_easy_setopt ( void * curl , int Options, ...);
int curl_easy_perform ( void * curl );
void curl_easy_cleanup ( void * handle );
//type
 , " libcurl.so"
 );

OK, the definition part is finished. Now we finish the actual logic part. The whole code will be:

<?php
//"curl.php" is required;
 
$ url  =  " https://www.laruence.com/2020/03/11/5475.html" ;
 
$ ch  =  $ libcurl- > curl_easy_init ();
$ libcurl- > curl_easy_setopt ($ ch , CURLOPT_URL , $ url );
$ libcurl- > curl_easy_setopt ($ ch , CURLOPT_SSL_VERIFYPEER , 0 );
 
$ libcurl- > curl_easy_perform ($ ch );
 
$ libcurl- > curl_easy_cleanup ($ ch );

How about using curl extension in proportion? Is it the same concise way?

Next, we'll make it a little bit more complicated, until, if we don't want the result to be output directly, but to return it as a string, for the curl extension of PHP, we only need to call curl_setup to set CURLOPT_RETURNTRANSFER to 1, but in libcurl, we don't have the ability to return the string directly, or we provide an alternative function of WRITEFUNCTION When data is returned, libcurl will call this function. In fact, PHP curl extension does the same.

At present, we can't directly pass a PHP function as an additional function to libcurl through FFI, so we have two ways to do it:

1. With WRITEDATA, the default libcurl will call fwrite as a variable function, and we can give libcurl an fd through WRITEDATA, so that it does not write stdout, but writes to this fd
2. We write a C to simple function by ourselves, which comes in through the date of FFI and passes it to libcurl.

First, we need to use fopen. This time, we define a C header file to declare the prototype (file.h):

void * fopen ( char *File name, char *Mode);
void fclose ( void * fp );

Like file.h, we put all libcurl function statements in curl.h

#Definition FFI_LIB "libcurl.so"
 
//Invalid * curl'easy'init();
int  curl_easy_setopt (void  * curl , int Options, ...);
int  curl_easy_perform (void  * curl );
void  curl_easy_cleanup (CURL * handle ); 

Then we can use FFI:: load to load the. h file:

Static function loading (string $filename): FFI;

But how to tell FFI to load the corresponding library? As shown above, we define a macro of ffi﹣lib to tell FFI that these functions come from libcurl.so. When we use FFI:: load to load the h file, PHP FFI will automatically load libcurl.so

Then why does fopen not need to specify a loading library? That's because FFI will also look up symbols in the variable symbol table. Fopen is a standard library function, which has existed for a long time.

OK, now the whole code will be:

<?php
const CURLOPT_URL =  10002 ;
const CURLOPT_SSL_VERIFYPEER =  64 ;
const CURLOPT_WRITEDATA =  10001 ;
 
$ libc  = FFI :: load (" file.h" );
$ libcurl  = FFI :: load (" curl.h" );
 
$ url  =  " https://www.laruence.com/2020/03/11/5475.html" ;
$ tmpfile  =  " /tmp/tmpfile.out" ;
 
$ ch  =  $ libcurl- > curl_easy_init ();
$ fp  =  $ libc- > fopen ($ tmpfile , " a" );
 
$ libcurl- > curl_easy_setopt ($ ch , CURLOPT_URL , $ url );
$ libcurl- > curl_easy_setopt ($ ch , CURLOPT_SSL_VERIFYPEER , 0 );
$ libcurl- > curl_easy_setopt ($ ch , CURLOPT_WRITEDATA , $ fp );
$ libcurl- > curl_easy_perform ($ ch );
 
$ libcurl- > curl_easy_cleanup ($ ch );
 
$ libc- > fclose ($ fp );
 
$ ret  =  file_get_contents ($ tmpfile );
@unlink ($ tmpfile );

But this way is to use a temporary transfer file, which is not elegant enough. Now we use the second way. To use the second way, we need to write an alternative function in C and pass it to libcurl:

#include  <stdlib.h>
#include  <string.h>
#include  " write.h"
 
size_t own_writefunc (void * ptr ,size_t size ,size_t nmember ,void * data ){         
        own_write_data * d = ( own_write_data *)Data;  
        size_t  total =Size* nmember ;
 
        //If (D - > buf = = null){
                d- > buf =  malloc ( total );
                //If (D - > buf = = null){
                        //Return to 0;
                }
                d- > size = total ;
                memcpy ( d- > buf , ptr , total );
        }  Other {
                d- > buf =Reallocation( d- > buf , d- > size + total );
                //If (D - > buf = = null){
                        //Return to 0;
                }
                memcpy ( d- > buf + d- > size , ptr , total );
                d- > size + = total ;
        }
 
        //Total return;
}
 
//Invalid * init(){
        return  & own_writefunc ;
}

Note the initial function here, because in PHP FFI, we can't get a function pointer directly in the current version (2020-03-11), so we define this function to return the address of own_writefunc.

Finally, we define the header file write.h used above:

Define FFI lib "write.so"
 
typedef  struct _writedata {  
        Invalid * buf;
        Size_tsize;
} own_write_data ;
 
Invalid * init();

Note that we have also defined FFI ﹐ Lib in the header file, so that this header file can be used by write.c and our PHP FFI at the same time.

Then we compile the write function as a dynamic library:

gcc -O2 -fPIC -shared -g write.c -o write.so

Now, the whole code will be:

<?php
const CURLOPT_URL =  10002 ;
const CURLOPT_SSL_VERIFYPEER =  64 ;
const CURLOPT_WRITEDATA =  10001 ;
const CURLOPT_WRITEFUNCTION =  20011 ;
 
$ libcurl  = FFI :: load (" curl.h" );
$ write   = FFI :: load (" write.h" );
 
$ url  =  " https://www.laruence.com/2020/03/11/5475.html" ;
 
$ data  =  $ write- > new (" own_write_data" );
 
$ ch  =  $ libcurl- > curl_easy_init ();
 
$ libcurl- > curl_easy_setopt ($ ch , CURLOPT_URL , $ url );
$ libcurl- > curl_easy_setopt ($ ch , CURLOPT_SSL_VERIFYPEER , 0 );
$ libcurl- > curl_easy_setopt ($ ch , CURLOPT_WRITEDATA , FFI :: addr ($ data )); 
$ libcurl- > curl_easy_setopt ($ ch , CURLOPT_WRITEFUNCTION , $ write- > init ());
$ libcurl- > curl_easy_perform ($ ch );
 
$ libcurl- > curl_easy_cleanup ($ ch );
 
ret = FFI :: String ( $ data- > buf , $ data- > size );

Here, we use FFI:: new ($write - > New) to allocate the memory of a structure, write data:

Function FFI:: new (mix $type [, bool $own = true [, bool $persistent = false]]): FFI \ CData

$own indicates whether this memory management adopts PHP memory management. In some cases, the memory we applied for will go through PHP life cycle management and do not need to be released actively, but sometimes you may want to manage it yourself. Then you can set $own to flash, and you need to call FFI:: free to release it actively when appropriate.

Then we pass $data as WRITEDATA to libcurl, where we use FFI:: addr to get the actual memory address of $data:

Static function address (FFI \ CData $CDATA): FFI \ CData;

Then we pass the own write func as the write function to libcurl, so that when it returns, libcurl will call our own own write func to handle the return, and at the same time, we will pass the write data as a custom parameter to our alternative function.

Finally, we use FFI:: string to convert a piece of memory into PHP string:

Static function FFI:: string (FFI \ CDATA $SRC [, int $size]): String copy code

When $size is not provided, FFI:: String stops when null byte is encountered.

All right, let's run?

However, if so is loaded directly in PHP for every request, it will be a big performance problem, so we can also use the preload mode. In this mode, we use opcache.preload to load when PHP starts:

ffi.enable = 1
opcache.preload = ffi_preload.inc

ffi_preload.inc:

<?php
FFI :: load (" curl.h" );
FFI :: load (" write.h" );

But what about the FFI we reference to load? Therefore, we need to modify these two. h header files and add FFI_SCOPE, such as curl.h:

#Definition FFI_LIB "libcurl.so"
#Definition FFI_SCOPE "Of libcurl"
 
//Invalid * curl'easy'init();
int  curl_easy_setopt (void  * curl , int Options, ...);
int  curl_easy_perform (void  * curl )
void  curl_easy_cleanup (void  * handle );

Correspondingly, we added the FFI scope to write.h as "write", and now our script should look like this:

<?php
const CURLOPT_URL =  10002 ;
const CURLOPT_SSL_VERIFYPEER =  64 ;
const CURLOPT_WRITEDATA =  10001 ;
const CURLOPT_WRITEFUNCTION =  20011 ;
 
$ libcurl  = FFI :: Scope (" libcurl" );
$ write   = FFI :: Scope (" write" );
 
$ url  =  " https://www.laruence.com/2020/03/11/5475.html" ;
 
$ data  =  $ write- > new (" own_write_data" );
 
$ ch  =  $ libcurl- > curl_easy_init ();
 
$ libcurl- > curl_easy_setopt ($ ch , CURLOPT_URL , $ url );
$ libcurl- > curl_easy_setopt ($ ch , CURLOPT_SSL_VERIFYPEER , 0 );
$ libcurl- > curl_easy_setopt ($ ch , CURLOPT_WRITEDATA , FFI :: addr ($ data )); Copy code
$ libcurl- > curl_easy_setopt ($ ch , CURLOPT_WRITEFUNCTION , $ write- > init ());
$ libcurl- > curl_easy_perform ($ ch );
 
$ libcurl- > curl_easy_cleanup ($ ch );
 
ret = FFI :: String ( $ data- > buf , $ data- > size );

That is, instead of FFI:: load, we now use FFI:: scope to reference the corresponding function.

Static function range (string $name): FFI;

Then there is another problem. Although FFI has given us a large scale, it is still very risky to call C library functions directly. We should only allow users to call the functions we have confirmed, so ffi.enable = preload should be on the stage. When we set ffi.enable= For preload, only functions in opcache.preload script can call FFI, while functions written by users can't be called directly.

Let's change FFI preload.inc to FFI safe preload.inc

<?php
CURLOPT class{
     const URL =  10002 ;
     const SSL_VERIFYHOST =  81 ;
     const SSL_VERIFYPEER =  64 ;
     const WRITEDATA =  10001 ;
     const WRITEFUNCTION =  20011 ;
}
 
FFI :: load (" curl.h" );
FFI :: load (" write.h" );
 
//Function get UU libcurl(): FFI{
     //Returns the FFI:: scope ("libcurl");
}
 
//Get write data ($write): FFI \ CData{
     //Return $write - > New ("own \ write \ data");
}
 
//Function get write(): FFI{
     //Returns the FFI:: range ("write");
}
 
//Function get_data_addr ($data): FFI \ CData{
     //Return FFI:: addr ($data);
}
 
//Function Paser? Libcurl? RET ($data): String{
     //Return FFI:: string ($data - > buf, $data - > size);
}

In other words, we define all the functions that will call the FFI API in the preload script, and then our example will become (ffi_safety. PHP):

<?php
$ libcurl  =  get_libcurl ();
$ write   =   get_write ();
$ data  =  get_write_data ($ write );
 
$ url  =  " https://www.laruence.com/2020/03/11/5475.html" ;
 
 
$ ch  =  $ libcurl- > curl_easy_init ();
 
$ libcurl- > curl_easy_setopt ($ ch , CURLOPT :: URL , $ url );Copy code
$ libcurl- > curl_easy_setopt ($ ch , CURLOPT :: SSL_VERIFYPEER , 0 );
$ libcurl- > curl_easy_setopt ($ ch , CURLOPT :: WRITEDATA , get_data_addr ($ data ));Copy code
$ libcurl- > curl_easy_setopt ($ ch , CURLOPT :: WRITEFUNCTION , $ write- > init ());
$ libcurl- > curl_easy_perform ($ ch );
 
$ libcurl- > curl_easy_cleanup ($ ch );
 
$ ret  =  paser_libcurl_ret ($ data );

In this way, through ffi.enable = preload, we can limit that all FFI API s can only be called by our controllable preload script, and users cannot directly call it. So we can do a good job of security assurance in these functions, so as to ensure a certain degree of security.

 


Author: PHP open source community
Link: https://juejin.im/post/5e81a015f265da47c35d69fc
Source: Nuggets
The copyright belongs to the author. For commercial reprint, please contact the author for authorization. For non-commercial reprint, please indicate the source.

Topics: Programming PHP curl C