Convert-Binary-C

 view release on metacpan or  search on metacpan

lib/Convert/Binary/C.pm  view on Meta::CPAN

  $data = $c->unpack('foo', $binary);

You can now easily access all of the values:

  print "foo.ary[1] = $data->{ary}[1]\n";

Or you can even more conveniently use
the L<Data::Dumper|Data::Dumper> module:

  use Data::Dumper;
  print Dumper($data);

The output would look something like this:

  $VAR1 = {
    'ary' => [
      42,
      48,
      100
    ],
    'baz' => 5000,
    'bar' => -271
  };

=head2 Preprocessor configuration

Convert::Binary::C uses Thomas Pornin's C<ucpp> as an internal
C preprocessor. It is compliant to ISO-C99, so you don't have
to worry about using even weird preprocessor constructs in
your code.

If your C source contains includes or depends upon preprocessor
defines, you may need to configure the internal preprocessor.
Use the C<Include> and C<Define> configuration options for that:

  $c->configure(Include => ['/usr/include',
                            '/home/mhx/include'],
                Define  => [qw( NDEBUG FOO=42 )]);

If your code uses system includes, it is most likely
that you will need to define the symbols that are usually
defined by the compiler.

On some operating systems, the system includes require the
preprocessor to predefine a certain set of assertions.
Assertions are supported by C<ucpp>, and you can define them
either in the source code using C<#assert> or as a property
of the Convert::Binary::C object using C<Assert>:

  $c->configure(Assert => ['predicate(answer)']);

Information about defined macros can be retrieved from the
preprocessor as long as its configuration isn't changed. The
preprocessor is implicitly reset if you change one of the
following configuration options:

  Include
  Define
  Assert
  HasCPPComments
  HasMacroVAARGS

=head2 Supported pragma directives

Convert::Binary::C supports the C<pack> pragma to locally override
struct member alignment. The supported syntax is as follows:

=over 4

=item #pragma pack( ALIGN )

Sets the new alignment to ALIGN. If ALIGN is 0, resets the
alignment to its original value.

=item #pragma pack

Resets the alignment to its original value.

=item #pragma pack( push, ALIGN )

Saves the current alignment on a stack and sets the new
alignment to ALIGN. If ALIGN is 0, sets the alignment to
the default alignment.

=item #pragma pack( pop )

Restores the alignment to the last value saved on the
stack.

=back

  /*  Example assumes sizeof( short ) == 2, sizeof( long ) == 4.  */
  
  #pragma pack(1)
  
  struct nopad {
    char a;               /* no padding bytes between 'a' and 'b' */
    long b;
  };
  
  #pragma pack            /* reset to "native" alignment          */
  
  #pragma pack( push, 2 )
  
  struct pad {
    char    a;            /* one padding byte between 'a' and 'b' */
    long    b;
  
  #pragma pack( push, 1 )
  
    struct {
      char  c;            /* no padding between 'c' and 'd'       */
      short d;
    }       e;            /* sizeof( e ) == 3                     */
  
  #pragma pack( pop );    /* back to pack( 2 )                    */
  
    long    f;            /* one padding byte between 'e' and 'f' */
  };
  
  #pragma pack( pop );    /* back to "native"                     */

lib/Convert/Binary/C.pm  view on Meta::CPAN

=item C<configure> OPTION1 =E<gt> VALUE1, OPTION2 =E<gt> VALUE2, ...

This method can be used to configure an existing Convert::Binary::C
object or to retrieve its current configuration.

To configure the object, the list of options consists of key
and value pairs and must therefore contain an even number of
elements. L<C<configure>|/"configure"> (and also L<C<new>|/"new"> if
used with configuration options) will throw an exception if you
pass an odd number of elements. Configuration will normally look
like this:

  $c->configure(ByteOrder => 'BigEndian', IntSize => 2);

To retrieve the current value of a configuration option, you
must pass a single argument to L<C<configure>|/"configure"> that
holds the name of the option, just like

  $order = $c->configure('ByteOrder');

If you want to get the values of all configuration options at
once, you can call L<C<configure>|/"configure"> without any
arguments and it will return a reference to a hash table that
holds the whole object configuration. This can be conveniently
used with the L<Data::Dumper|Data::Dumper> module, for example:

  use Convert::Binary::C;
  use Data::Dumper;
  
  $c = Convert::Binary::C->new(Define  => ['DEBUGGING', 'FOO=123'],
                               Include => ['/usr/include']);
  
  print Dumper($c->configure);

Which will print something like this:

  $VAR1 = {
    'DisabledKeywords' => [],
    'HasCPPComments' => 1,
    'UnsignedChars' => 0,
    'LongDoubleSize' => 16,
    'OrderMembers' => 1,
    'CompoundAlignment' => 1,
    'UnsignedBitfields' => 0,
    'DoubleSize' => 8,
    'Assert' => [],
    'PointerSize' => 8,
    'ByteOrder' => 'LittleEndian',
    'Warnings' => 0,
    'LongSize' => 8,
    'Include' => [
      '/usr/include'
    ],
    'EnumType' => 'Integer',
    'EnumSize' => 4,
    'ShortSize' => 2,
    'IntSize' => 4,
    'StdCVersion' => 199901,
    'HostedC' => 1,
    'Alignment' => 1,
    'HasMacroVAARGS' => 1,
    'KeywordMap' => {},
    'Define' => [
      'DEBUGGING',
      'FOO=123'
    ],
    'LongLongSize' => 8,
    'CharSize' => 1,
    'FloatSize' => 4,
    'Bitfields' => {
      'Engine' => 'Generic'
    }
  };

Since you may not always want to write a L<C<configure>|/"configure"> call
when you only want to change a single configuration item, you can
use any configuration option name as a method name, like:

  $c->ByteOrder('LittleEndian') if $c->IntSize < 4;

(Yes, the example doesn't make very much sense... ;-)

However, you should keep in mind that configuration methods
that can take lists (namely C<Include>, C<Define> and C<Assert>,
but not C<DisabledKeywords>) may behave slightly different than
their L<C<configure>|/"configure"> equivalent.
If you pass these methods a single argument that is an array
reference, the current list will be B<replaced> by the new one,
which is just the behaviour of the
corresponding L<C<configure>|/"configure"> call.
So the following are equivalent:

  $c->configure(Define => ['foo', 'bar=123']);
  $c->Define(['foo', 'bar=123']);

But if you pass a list of strings instead of an array reference
(which cannot be done when using L<C<configure>|/"configure">),
the new list items are B<appended> to the current list, so

  $c = Convert::Binary::C->new(Include => ['/include']);
  $c->Include('/usr/include', '/usr/local/include');
  print Dumper($c->Include);
  
  $c->Include(['/usr/local/include']);
  print Dumper($c->Include);

will first print all three include paths, but finally
only C</usr/local/include> will be configured:

  $VAR1 = [
    '/include',
    '/usr/include',
    '/usr/local/include'
  ];
  $VAR1 = [
    '/usr/local/include'
  ];

Furthermore, configuration methods can be chained together,
as they return a reference to their object if called as a
set method. So, if you like, you can configure your object

lib/Convert/Binary/C.pm  view on Meta::CPAN

together. You could, for example, disable the C<signed> keyword,
but still have C<__signed__> mapped to the original C<signed> token:

  $c->configure(DisabledKeywords => [ 'signed' ],
                KeywordMap       => { __signed__  => 'signed' });

This would allow you to define

  typedef __signed__ long signed;

which would normally be a syntax error because C<signed> cannot
be used as an identifier.

=item C<UnsignedChars> =E<gt> 0 | 1

Use this boolean option if you want characters
to be unsigned if specified without an
explicit C<signed> or C<unsigned> type specifier.
By default, characters are signed.

=item C<UnsignedBitfields> =E<gt> 0 | 1

Use this boolean option if you want bitfields
to be unsigned if specified without an
explicit C<signed> or C<unsigned> type specifier.
By default, bitfields are signed.

=item C<Warnings> =E<gt> 0 | 1

Use this boolean option if you want warnings to be issued
during the parsing of source code. Currently, warnings
are only reported by the preprocessor, so don't expect
the output to cover everything.

By default, warnings are turned off and only errors will be
reported. However, even these errors are turned off if
you run without the C<-w> flag.

=item C<HasCPPComments> =E<gt> 0 | 1

Use this option to turn C++ comments on or off. By default,
C++ comments are enabled. Disabling C++ comments may be
necessary if your code includes strange things like:

  one = 4 //* <- divide */ 4;
  two = 2;

With C++ comments, the above will be interpreted as

  one = 4
  two = 2;

which will obviously be a syntax error, but without
C++ comments, it will be interpreted as

  one = 4 / 4;
  two = 2;

which is correct.

=item C<HasMacroVAARGS> =E<gt> 0 | 1

Use this option to turn the C<__VA_ARGS__> macro expansion
on or off. If this is enabled (which is the default), you can use
variable length argument lists in your preprocessor macros.

  #define DEBUG( ... )  fprintf( stderr, __VA_ARGS__ )

There's normally no reason to turn that feature off.

=item C<StdCVersion> =E<gt> undef | INTEGER

Use this option to change the value of the preprocessor's
predefined C<__STDC_VERSION__> macro. When set to C<undef>,
the macro will not be defined.

=item C<HostedC> =E<gt> undef | 0 | 1

Use this option to change the value of the preprocessor's
predefined C<__STDC_HOSTED__> macro. When set to C<undef>,
the macro will not be defined.

=item C<Include> =E<gt> [ INCLUDES ]

Use this option to set the include path for the internal
preprocessor. The option value is a reference to an array
of strings, each string holding a directory that should
be searched for includes.

=item C<Define> =E<gt> [ DEFINES ]

Use this option to define symbols in the preprocessor.
The option value is, again, a reference to an array of
strings. Each string can be either just a symbol or an
assignment to a symbol. This is completely equivalent
to what the C<-D> option does for most preprocessors.

The following will define the symbol C<FOO> and
define C<BAR> to be C<12345>:

  $c->configure(Define => [qw( FOO BAR=12345 )]);

=item C<Assert> =E<gt> [ ASSERTIONS ]

Use this option to make assertions in the preprocessor.
If you don't know what assertions are, don't be
concerned, since they're deprecated anyway. They
are, however, used in some system's include files.
The value is an array reference, just like for the
macro definitions. Only the way the assertions are
defined is a bit different and mimics the way they
are defined with the C<#assert> directive:

  $c->configure(Assert => ['foo(bar)']);

=item C<OrderMembers> =E<gt> 0 | 1

When using L<C<unpack>|/"unpack"> on compounds and
iterating over the returned hash, the order of the
compound members is generally not preserved due to
the nature of hash tables. It is not even guaranteed

lib/Convert/Binary/C.pm  view on Meta::CPAN


  not             =>  ''
  ptr             =>  'typedef'
  foo             =>  'struct'
  bar             =>  ''
  xxx             =>  undef
  foo.xxx         =>  'member'
  foo.abc         =>  ''
  xxx.yyy         =>  undef
  quad            =>  'typedef'
  quad[3]         =>  'member'
  quad[5]         =>  'member'
  quad[-3]        =>  'member'
  short[1]        =>  undef
  unsigned long   =>  'basic'

So, if L<C<def>|/"def"> returns a non-empty string, you can safely use
any other method with that type's name or with that member expression.

Concerning arrays, note that the index into an array doesn't
need to be within the bounds of the array's definition, just
like in C. In the above example, C<quad[5]> and C<quad[-3]> are
valid members of the C<quad> array, even though it is declared to
have only four elements.

In cases where the typedef namespace overlaps with the
namespace of enums/structs/unions, the L<C<def>|/"def"> method
will give preference to the typedef and will thus return
the string C<"typedef">. You could however force interpretation
as an enum, struct or union by putting C<"enum">, C<"struct">
or C<"union"> in front of the type's name.

=back

=head2 defined

=over 8

=item C<defined> MACRO

You can use the L<C<defined>|/"defined"> method to find out if a certain
macro is defined, just like you would use the C<defined> operator of the
preprocessor. For example, the following code

  use Convert::Binary::C;
  
  my $c = Convert::Binary::C->new->parse(<<'ENDC');
  
  #define ADD(a, b) ((a) + (b))
  
  #if 1
  # define DEFINED
  #else
  # define UNDEFINED
  #endif
  
  ENDC
  
  for my $macro (qw( ADD DEFINED UNDEFINED )) {
    my $not = $c->defined($macro) ? '' : ' not';
    print "Macro '$macro' is$not defined.\n";
  }

would print:

  Macro 'ADD' is defined.
  Macro 'DEFINED' is defined.
  Macro 'UNDEFINED' is not defined.

You have to keep in mind that this works only as long as the preprocessor
is not reset. See L<"Preprocessor configuration"> for details.

=back

=head2 pack

=over 8

=item C<pack> TYPE

=item C<pack> TYPE, DATA

=item C<pack> TYPE, DATA, STRING

Use this method to pack a complex data structure into a
binary string according to a type definition that has been
previously parsed. DATA must be a scalar matching the
type definition. C structures and unions are represented
by references to Perl hashes, C arrays by references to
Perl arrays.

  use Convert::Binary::C;
  use Data::Dumper;
  use Data::Hexdumper;
  
  $c = Convert::Binary::C->new( ByteOrder => 'BigEndian'
                              , LongSize  => 4
                              , ShortSize => 2
                              )
                         ->parse(<<'ENDC');
  struct test {
    char    ary[3];
    union {
      short word[2];
      long  quad;
    }       uni;
  };
  ENDC

Hashes don't have to contain a key for each compound member
and arrays may be truncated:

  $binary = $c->pack('test', { ary => [1, 2], uni => { quad => 42 } });

Elements not defined in the Perl data structure will be
set to zero in the packed byte string. If you pass C<undef> as
or simply omit the second parameter, the whole string will be
initialized with zero bytes. On success, the packed byte
string is returned.

  print hexdump(data => $binary);

The above code would print:

    0x0000 : 01 02 00 00 00 00 2A                            : ......*

You could also use L<C<unpack>|/"unpack"> and dump the data structure.



( run in 1.110 second using v1.01-cache-2.11-cpan-39bf76dae61 )