[KinoSearch] Towards a stable C API... via indirect dispatch

Marvin Humphrey marvin at rectangular.com
Sun Oct 28 19:18:49 PDT 2007


On Oct 28, 2007, at 9:05 AM, Aaron Crane wrote:

> Theoretically, object pointers (including void pointers) and function
> pointers are incommensurate according to the C standard -- you get
> undefined behaviour when you cast between them.

Ah, yes, I'd forgotten that.

Presently, the vtables are actual objects themselves, with a  
'refcount' member and the whole bit.  Having them be objects makes it  
easier to implement dynamic subclassing, a feature which is required  
by both Schema and FieldSpec, and which may come in handy elsewhere  
in the future.

The vtable objects belong to the class  
"KinoSearch::Util::VirtualTable".  Here's the definition for  
KinoSearch::Index::Term's vtable object:

     KINO_TERM_VTABLE KINO_TERM = {
         (KINO_OBJ_VTABLE*)&KINO_VIRTUALTABLE, /* vtable object's  
vtable */
         1,                                    /* refcount */
         (KINO_OBJ_VTABLE*)&KINO_OBJ,          /* parent */
         "KinoSearch::Index::Term",            /* class name */
         (kino_Obj_clone_t)kino_Term_clone,
         (kino_Obj_destroy_t)kino_Term_destroy,
         (kino_Obj_equals_t)kino_Term_equals,
         (kino_Obj_hash_code_t)kino_Obj_hash_code,
         (kino_Obj_is_a_t)kino_Obj_is_a,
         (kino_Obj_to_string_t)kino_Term_to_string,
         (kino_Obj_serialize_t)kino_Term_serialize,
         (kino_Term_get_field_t)kino_Term_get_field,
         (kino_Term_get_text_t)kino_Term_get_text,
         (kino_Term_copy_t)kino_Term_copy
     };

The first four member variables aren't function pointers, and I'd  
kinda sorta been hoping to sneak them into the array somehow.  ;)  A  
fifth member var will actually be needed as well: 'size' (or  
something like that), describing the size of the vtable either in  
bytes or in array members.

One approach is to keep the vtables as structs, with the last member  
a "flexible array" of function pointers:

     typedef struct kino_VTable {
         KINO_OBJ_VTABLE *_;
         chy_u32_t        refcount;
         KINO_OBJ_VTABLE *parent;
         const char      *class_name;
         size_t           size;
         kino_method_t    methods[];
     } kino_VTable;

Flexible arrays are C99, but you can get away with them on C89 if you  
declare them to be at least length 1.

         kino_method_t    methods[1];

You then take advantage of C's lack of bounds checking to malloc()  
enough memory for however many elements you need. :)   It's a hack,  
but widely portable -- Perl's regex engine depends on it, for example.

The downside of having the vtable be a struct rather than an array is  
that it adds an extra addition op to the process of finding the right  
function pointer.

     method_OFFSET * sizeof(kino_method_t)

     method_OFFSET * sizeof(kino_method_t) + FIXED_OFFSET

Here's some AT&T assembler, for code implementing the array technique:

     # %eax register holds method_OFFSET
     # %edx register holds address of "methods" array
     movl    (%edx,%eax,4), %eax

Here's assembler for code using a vtable struct containing a  
"methods" array:

     # %eax register holds method_OFFSET
     # %edx register holds vtable struct pointer
     movl    20(%edx,%eax,4), %eax  # <----------- NOTE extra "20"

(To see the whole context, view the attached file "need_meth.s",  
which was generated from the attached file "need_meth.c" using the  
command "gcc -S -Wall -Os need_meth.c" on an x86 Linux box.)

I'm not sure how much of a penalty you pay for the extra addition op  
-- only a benchmark would tell -- but I'm reasonably sure it doesn't  
help matters. :)

It seems to me that the only way to get away with using the array  
rather than the struct containing the array involves some nasty  
casting hacks.  Worth it, y'think?

>> Say we remove the Kino_Term_Destroy method... then this code
>> will crash at run-time, because the kino_Term_destroy_OFFSET
>> symbol cannot be resolved:
>>
>>    destroy_meth = self->_[kino_Term_destroy_OFFSET];
>>
>> Of course a run-time crash would be bad -- but that just means that
>> we can't redact public methods -- which we wouldn't be doing anyway.
>
> More specifically, the failure would be at link-time, right?  Unless
> I'm misunderstanding, code using the new macro will contain a  
> reference
> to the kino_Term_destroy_OFFSET symbol, so the linker should fail when
> trying to resolve uses of that symbol in callers.  Of course, assuming
> that most uses of the Kinosearch code rely on a dynamically loaded
> KinoSearch.so (or local equivalent), that turns out to be roughly the
> same thing as run-time anyway.

Exactly.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: need_meth.c
Type: application/octet-stream
Size: 1156 bytes
Desc: not available
Url : http://rectangular.com/pipermail/kinosearch/attachments/20071028/2b906aef/attachment-0002.obj 
-------------- next part --------------
  
  
-------------- next part --------------
A non-text attachment was scrubbed...
Name: need_meth.s
Type: application/octet-stream
Size: 2286 bytes
Desc: not available
Url : http://rectangular.com/pipermail/kinosearch/attachments/20071028/2b906aef/attachment-0003.obj 
-------------- next part --------------

-------------- next part --------------
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch


More information about the kinosearch mailing list