Data-Random-Structure-UTF8

 view release on metacpan or  search on metacpan

Changes  view on Meta::CPAN

0.01    Date/time
        First version, released on an unsuspecting world.

0.02	2020-04-16
	Documentation additions.

0.03	2020-04-16
	Changed minimum Perl Version to 5.008

0.04	2020-04-16
	Extended the functionality by adding only_unicode() to
	control what scalar types to output.
	Added check_content_recursively() which checks the type
	of keys, values, elements of a nested data structure.
	Added lots more tests.
	Removed Unicode::UCD and List::Util dependency.
	Changed minimum Perl Version to 5.8.0

0.05	2020-04-16
	Fixed a bug in _check_content_recursively()
	(as reported by CPAN Tester, Alexandr Ciornii (CHORNY).

MANIFEST  view on Meta::CPAN

Changes
lib/Data/Random/Structure/UTF8.pm
Makefile.PL
MANIFEST			This list of files
MANINFEST.SKIP
README
README.md
t/00-load.t
t/01-basic.t
t/02-check-content-recursively.t
t/03-only-unicode.t
t/manifest.t
t/pod-coverage.t
t/pod.t
META.yml                                 Module YAML meta-data (added by MakeMaker)
META.json                                Module JSON meta-data (added by MakeMaker)

META.json  view on Meta::CPAN

{
   "abstract" : "Produce nested data structures with unicode keys, values, elements.",
   "author" : [
      "Andreas Hadjiprocopis <bliako@cpan.org>"
   ],
   "dynamic_config" : 1,
   "generated_by" : "ExtUtils::MakeMaker version 7.44, CPAN::Meta::Converter version 2.150010",
   "license" : [
      "artistic_2"
   ],
   "meta-spec" : {
      "url" : "http://search.cpan.org/perldoc?CPAN::Meta::Spec",

META.yml  view on Meta::CPAN

---
abstract: 'Produce nested data structures with unicode keys, values, elements.'
author:
  - 'Andreas Hadjiprocopis <bliako@cpan.org>'
build_requires:
  Data::Dump: '0'
  Data::Dumper: '0'
  ExtUtils::MakeMaker: '0'
  Scalar::Util: '0'
  Test::More: '0'
configure_requires:
  ExtUtils::MakeMaker: '0'

README  view on Meta::CPAN

NAME

    Data::Random::Structure::UTF8 - Produce nested data structures with
    unicode keys, values, elements.

VERSION

    Version 0.06

SYNOPSIS

    This module produces random, arbitrarily deep and long, nested Perl
    data structures with unicode content for the keys, values and/or array
    elements. Content can be forced to be exclusively strings and
    exclusively unicode. Or the strings can be unicode. Or anything goes,
    mixed unicode and non-unicode strings as well as integers, floats, etc.

    This is an object-oriented module which inherits from
    Data::Random::Structure and extends its functionality by providing for
    unicode keys and values for hashtables and unicode content for array
    elements or scalars, randomly mixed with the usual repertoire of
    Data::Random::Structure, which is non-unicode strings, numerical,
    boolean values and the assorted entourage to the court of Emperor
    Computer, post-Turing.

    For example, it produces these:

      * unicode scalars: e.g. "αβγ",

      * mixed arrays: e.g. ["αβγ", "123", "xyz"]

      * hashtables with some/all keys and/or values as unicode: e.g. {"αβγ"
      = "123", "xyz" => "αβγ"}>

      * exclusive unicode arrays or hashtables: e.g. ["αβγ", "χψζ"]

    This is accomplised by adding an extra type string-UTF8 (invisible to
    the user) and the respective generator method. All these are invisible
    to the user which will get the old functionality plus some (or maybe
    none because this is a random process which does not eliminate
    non-unicode strings, at the moment) unicode strings.

        use Data::Random::Structure::UTF8;
    
        my $randomiser = Data::Random::Structure::UTF8->new(
            'max_depth' => 5,
            'max_elements' => 20,
            # all the strings produced (keys, values, elements)
            # will be unicode strings
            'only-unicode' => 1,
            # all the strings produced (keys, values, elements)
            # will be a mixture of unicode and non-unicode
            # this is the default behaviour
            #'only-unicode' => 0,
            # only unicode strings will be produced for (keys, values, elements),
            # there will be no numbers, no bool, only unicode strings
            #'only-unicode' => 2,
        );
        my $perl_var = $randomiser->generate() or die;
        print pp($perl_var);
    
        # which prints the usual escape mess of Dump and Dumper
    [
      "\x{7D5A}\x{4EC1}",
      "\x{E6E2}\x{75A4}",
      329076,
      0.255759160148987,

README  view on Meta::CPAN

        1,
        "_ow|J\@~=6%*N;52?W3Y\$S1",
        {
          "x{75A4}x{75A4}" => 123,
          "123" => "\x{7D5A}\x{4EC1}",
          "xyz" => [1, 2, "\x{7D5A}\x{4EC1}"],
        },
      ],
    
        # can control the scalar type (for keys, values, items) on the fly
        # this produces unicode strings in addition to
        # Data::Random::Structure's usual repertoire:
        # non-unicode-string, numbers, bool, integer, float, etc.
        # (see there for the list)
        $randomiser->only_unicode(0); # the default: anything plus unicode strings
        print $randomiser->only_unicode();
    
        # this produces unicode strings in addition to
        # Data::Random::Structure's usual repertoire:
        # numbers, bool, integer, float, etc.
        # (see there for the list)
        # EXCEPT non-unicode-strings, (all strings will be unicode)
        $randomiser->only_unicode(1);
        print $randomiser->only_unicode();
    
        # this produces unicode strings ONLY
        # Data::Random::Structure's usual repertoire does not apply
        # there will be no numbers, no bool, no integer, no float, no nothing
        $randomiser->only_unicode(2);
        print $randomiser->only_unicode();

METHODS

    This is an object oriented module which has exactly the same API as
    Data::Random::Structure.

 new

    Constructor. In addition to Data::Random::Structure <new()> API, it
    takes parameter 'only-unicode' with a valid value of 0, 1 or 2. Default
    is 0.

      * 0 : keys, values, elements of the produced data structure will be a
      mixture of unicode strings, plus Data::Random::Structure's full
      repertoire which includes non-unicode strings, integers, floats etc.

      * 1 : keys, values, elements of the produced data structure will be a
      mixture of unicode strings, plus Data::Random::Structure's full
      repertoire except non-unicode strings. That is, all strings will be
      unicode. But there will possibly be integers etc.

      * 2 : keys, values, elements of the produced data structure will be
      only unicode strings. Nothing of Data::Random::Structure's repertoire
      applies. Only unicode strings, no integers, no nothing.

    Controlling the scalar data types can also be done on the fly, after
    the object has been created using Data::Random::Structure::UTF8
    <only_unicode()> method.

    Additionally, Data::Random::Structure <new()>'s API reports that the
    constructor takes 2 optional arguments, max_depth and max_elements. See
    Data::Random::Structure <new()> for up-to-date, official information.

 only_unicode

    Controls what scalar types to be included in the nested data structures
    generated. With no parameters it returns back the current setting.
    Otherwise, valid input parameters and their meanings are listed in
    Data::Random::Structure::UTF8 <new()>

 generate

    Generate a nested data structure according to the specification set in
    the constructor. See Data::Random::Structure <generate()> for all
    options. This method is not overriden by this module.

    It returns the Perl data structure as a reference.

 generate_scalar

    Generate a scalar which may contain unicode content. See
    Data::Random::Structure::generate_scalar for all options. This method
    is overriden by this module but calls the parent's too.

    It returns a Perl string.

 generate_array

    Generate an array with random, possibly unicode, content. See
    Data::Random::Structure::generate_array for all options. This method is
    not overriden by this module.

    It returns the Perl array as a reference.

 generate_hash

    Generate an array with random, possibly unicode, content. See
    Data::Random::Structure::generate_array for all options. This method is
    not overriden by this module.

    It returns the Perl array as a reference.

 random_char_UTF8

    Return a random unicode character, guaranteed to be valid. This is a
    very simple method which selects characters from some pre-set code
    pages (Greek, Cyrillic, Cherokee, Ethiopic, Javanese) with equal
    probability. These pages and ranges were selected so that there are no
    "holes" between them which would produce an invalid character.
    Therefore, not all characters from the particular code page will be
    produced.

    Returns a random unicode character guaranteed to be valid.

 random_chars_UTF8

      my $ret = random_chars_UTF8($optional_paramshash)

    Arguments:

      * $optional_paramshash : can contain

	* 'min' sets the minimum length of the random sequence to be
	returned, default is 6

	* 'max' sets the maximum length of the random sequence to be
	returned, default is 32

    Return a random unicode-only string optionally specifying minimum and
    maximum length. See Data::Random::Structure::UTF8 <random_chars_UTF8()>
    for the range of characters it returns. The returned string is unicode
    and is guaranteed all its characters are valid.

SUBROUTINES

 check_content_recursively

      my $ret = check_content_recursively($perl_var, $paramshashref)

    Arguments:

README  view on Meta::CPAN

      structure

      * $paramshashref : can contain one or more of the following keys:

	* 'numbers' set it to 1 to look for numbers (possibly among other
	things). If set to 1 and a number 123 or "123" is found, this sub
	returns 1. Set it to 0 to not look for numbers at all (and not
	report if there are no numbers) - don't bother checking for
	numbers, that's what setting this to zero means.

	* 'strings-unicode' set it to 1 to look for unicode strings
	(possibly among other things). The definition of "unicode string"
	is that at least one its characters is unicode. If set to 1 and a
	"unicode string" is found, this sub returns 1.

	* 'strings-plain' set it to 1 to look for plain strings (possibly
	among other things). The definition of "plain string" is that none
	of its characters is unicode. If set to 1 and a "plain string" is
	found, this sub returns 1.

	* 'strings' set it to 1 to look for plain or unicode strings
	(possibly among other things). If set to 1 and a "plain string" or
	"unicode string" is found, this sub returns 1. Basically, it
	returns 1 when a string is found (as opposed to a "number").

    In general, by setting <'strings-unicode'=1>> you are checking whether
    the input Perl variable contains a unicode string in a key, a value, an
    array element, or a scalar reference.

    But, setting <'strings-unicode'=0>>, it simply means do not look for
    this. It does not mean report if they are NO unicode strings.

    Return value: 1 or 0 depending whether what was looking for, was found.

    This is not an object-oriented method. It is called thously:

        # check if ANY scalar (hash key, value, array element or scalar ref)
        # contains ONLY single number (integer, float)
        # the decicion is made by Scalar::Util:looks_like_number()
        if( Data::Random::Structure::UTF8::check_content_recursively(
            {'abc'=>123, 'xyz'=>[1,2,3]},
            {
                    # look for numbers, are there any?
                    'numbers' => 1,
            }
        ) ){ print "data structure contains numbers\n" }
    
        # check if it contains no numbers but it does unicode strings
        if( Data::Random::Structure::UTF8::check_content_recursively(
            {'abc'=>123, 'xyz'=>[1,2,3]},
            {
                    # don't look for numbers
                    'numbers' => 0,
                    # look for unicode strings, are there any?
                    'strings-unicode' => 1,
            }
        ) ){ print "data structure contains numbers\n" }

    CAVEAT: as its name suggests, this is a recursive function. Beware of
    extremely deep data structures. Deep, not long. If you do get <"Deep
    recursion..." warnings>, and you do insist to go ahead, this will
    remove the warnings (but are you sure?):

        {
            no warnings 'recursion';

README  view on Meta::CPAN

                {
                    'numbers' => 1,
                }
            ) ){ print "data structure contains numbers\n" }
        }

SEE ALSO

      * The parent class Data::Random::Structure.

      * Data::Roundtrip for stringifying possibly-unicode Perl data
      structures.

AUTHOR

    Andreas Hadjiprocopis, <bliako ta cpan.org / andreashad2 ta gmail.com>

BUGS

    Please report any bugs or feature requests to
    bug-data-random-structure-utf8 at rt.cpan.org, or through the web
    interface at
    https://rt.cpan.org/NoAuth/ReportBug.html?Queue=Data-Random-Structure-UTF8.
    I will be notified, and then you'll automatically be notified of
    progress on your bug as I make changes.

CAVEATS

    There are two issues users should know about.

    The first issue is that the unicode produced can make Data::Dump to
    complain with

       Operation "lc" returns its argument for UTF-16 surrogate U+DA4B at /usr/local/share/perl5/Data/Dump.pm line 302.

    This, I have found, can be fixed with the following workaround (from
    github user iafan
    <https://github.com/evernote/serge/commit/865402bbde42101345a5bee4cd0a855b9b76bdd7>,
    thank you):

        # Suppress `Operation "lc" returns its argument for UTF-16 surrogate 0xNNNN` warning

README  view on Meta::CPAN

         ...

    in order to extract the type which can not be handled and handle it
    ourselves. So whenever the parent class (Data::Random::Structure)
    changes its croak song, we will have to adopt this code accordingly (in
    Data::Random::Structure::UTF8 <generate_scalar()>). For the moment, I
    have placed a catch-all, fall-back condition to handle this but it will
    be called for all kind of types and not only the types we have added.

    So, this issue is not going to make the module die but may make it to
    skew the random results in favour of unicode strings (which is the
    fallback, default action when can't parse the type).

SUPPORT

    You can find documentation for this module with the perldoc command.

        perldoc Data::Random::Structure::UTF8

    You can also look for information at:

README.md  view on Meta::CPAN

# NAME

Data::Random::Structure::UTF8 - Produce nested data structures with unicode keys, values, elements.

# VERSION

Version 0.06

# SYNOPSIS

This module produces random, arbitrarily deep and long,
nested Perl data structures  with unicode content for the
keys, values and/or array elements. Content can be forced
to be exclusively strings and exclusively unicode. Or
the strings can be unicode. Or anything goes, mixed
unicode and non-unicode strings as well as integers, floats, etc.

This is an object-oriented module
which inherits from
[Data::Random::Structure](https://metacpan.org/pod/Data%3A%3ARandom%3A%3AStructure) and extends its functionality by
providing for unicode keys and values for hashtables and
unicode content for array elements or scalars, randomly mixed with the
usual repertoire of [Data::Random::Structure](https://metacpan.org/pod/Data%3A%3ARandom%3A%3AStructure), which is
non-unicode strings,
numerical, boolean values and the assorted entourage to the court
of Emperor Computer, post-Turing.

For example, it produces these:

- unicode scalars: e.g. `"αβγ"`,
- mixed arrays: e.g. `["αβγ", "123", "xyz"]`
- hashtables with some/all keys and/or values as unicode: e.g.
`{"αβγ" =` "123", "xyz" => "αβγ"}>
- exclusive unicode arrays or hashtables: e.g. `["αβγ", "χψζ"]`

This is accomplised by adding an extra
type `string-UTF8` (invisible to the user) and the
respective generator method. All these are invisible to the user
which will get the old functionality plus some (or maybe none
because this is a random process which does not eliminate non-unicode
strings, at the moment) unicode strings.

      use Data::Random::Structure::UTF8;

      my $randomiser = Data::Random::Structure::UTF8->new(
          'max_depth' => 5,
          'max_elements' => 20,
          # all the strings produced (keys, values, elements)
          # will be unicode strings
          'only-unicode' => 1,
          # all the strings produced (keys, values, elements)
          # will be a mixture of unicode and non-unicode
          # this is the default behaviour
          #'only-unicode' => 0,
          # only unicode strings will be produced for (keys, values, elements),
          # there will be no numbers, no bool, only unicode strings
          #'only-unicode' => 2,
      );
      my $perl_var = $randomiser->generate() or die;
      print pp($perl_var);

      # which prints the usual escape mess of Dump and Dumper
  [
    "\x{7D5A}\x{4EC1}",
    "\x{E6E2}\x{75A4}",
    329076,
    0.255759160148987,

README.md  view on Meta::CPAN

      1,
      "_ow|J\@~=6%*N;52?W3Y\$S1",
      {
        "x{75A4}x{75A4}" => 123,
        "123" => "\x{7D5A}\x{4EC1}",
        "xyz" => [1, 2, "\x{7D5A}\x{4EC1}"],
      },
    ],

      # can control the scalar type (for keys, values, items) on the fly
      # this produces unicode strings in addition to
      # Data::Random::Structure's usual repertoire:
      # non-unicode-string, numbers, bool, integer, float, etc.
      # (see there for the list)
      $randomiser->only_unicode(0); # the default: anything plus unicode strings
      print $randomiser->only_unicode();

      # this produces unicode strings in addition to
      # Data::Random::Structure's usual repertoire:
      # numbers, bool, integer, float, etc.
      # (see there for the list)
      # EXCEPT non-unicode-strings, (all strings will be unicode)
      $randomiser->only_unicode(1);
      print $randomiser->only_unicode();

      # this produces unicode strings ONLY
      # Data::Random::Structure's usual repertoire does not apply
      # there will be no numbers, no bool, no integer, no float, no nothing
      $randomiser->only_unicode(2);
      print $randomiser->only_unicode();

# METHODS

This is an object oriented module which has exactly the same API as
[Data::Random::Structure](https://metacpan.org/pod/Data%3A%3ARandom%3A%3AStructure).

## `new`

Constructor. In addition to [Data::Random::Structure](https://metacpan.org/pod/Data%3A%3ARandom%3A%3AStructure) `<new()`>
API, it takes parameter `'only-unicode'` with
a valid value of 0, 1 or 2. Default is 0.

- 0 : keys, values, elements of the produced data structure will be
a mixture of unicode strings, plus [Data::Random::Structure](https://metacpan.org/pod/Data%3A%3ARandom%3A%3AStructure)'s full
repertoire which includes non-unicode strings, integers, floats etc.
- 1 : keys, values, elements of the produced data structure will be
a mixture of unicode strings, plus [Data::Random::Structure](https://metacpan.org/pod/Data%3A%3ARandom%3A%3AStructure)'s full
repertoire except non-unicode strings. That is, all strings will be
unicode. But there will possibly be integers etc.
- 2 : keys, values, elements of the produced data structure will be
only unicode strings. Nothing of [Data::Random::Structure](https://metacpan.org/pod/Data%3A%3ARandom%3A%3AStructure)'s
repertoire applies. Only unicode strings, no integers, no nothing.

Controlling the scalar data types can also be done on the fly, after
the object has been created using
[Data::Random::Structure::UTF8](https://metacpan.org/pod/Data%3A%3ARandom%3A%3AStructure%3A%3AUTF8) `<only_unicode()`>
method.

Additionally, [Data::Random::Structure](https://metacpan.org/pod/Data%3A%3ARandom%3A%3AStructure) `<new()`>'s API reports that
the constructor takes 2 optional arguments, `max_depth` and `max_elements`.
See [Data::Random::Structure](https://metacpan.org/pod/Data%3A%3ARandom%3A%3AStructure) `<new()`> for up-to-date, official information.

## `only_unicode`

Controls what scalar types to be included in the nested
data structures generated. With no parameters it returns back
the current setting. Otherwise, valid input parameters and their
meanings are listed in [Data::Random::Structure::UTF8](https://metacpan.org/pod/Data%3A%3ARandom%3A%3AStructure%3A%3AUTF8) `<new()`>

## `generate`

Generate a nested data structure according to the specification
set in the constructor. See [Data::Random::Structure](https://metacpan.org/pod/Data%3A%3ARandom%3A%3AStructure) `<generate()`> for
all options. This method is not overriden by this module.

It returns the Perl data structure as a reference.

## `generate_scalar`

Generate a scalar which may contain unicode content.
See [Data::Random::Structure::generate\_scalar](https://metacpan.org/pod/Data%3A%3ARandom%3A%3AStructure%3A%3Agenerate_scalar) for
all options. This method is overriden by this module but
calls the parent's too.

It returns a Perl string.

## `generate_array`

Generate an array with random, possibly unicode, content.
See [Data::Random::Structure::generate\_array](https://metacpan.org/pod/Data%3A%3ARandom%3A%3AStructure%3A%3Agenerate_array) for
all options. This method is not overriden by this module.

It returns the Perl array as a reference.

## `generate_hash`

Generate an array with random, possibly unicode, content.
See [Data::Random::Structure::generate\_array](https://metacpan.org/pod/Data%3A%3ARandom%3A%3AStructure%3A%3Agenerate_array) for
all options. This method is not overriden by this module.

It returns the Perl array as a reference.

## `random_char_UTF8`

Return a random unicode character, guaranteed to be valid.
This is a very simple method which selects characters
from some pre-set code pages (Greek, Cyrillic, Cherokee,
Ethiopic, Javanese) with equal probability.
These pages and ranges were selected so that there are
no "holes" between them which would produce an invalid
character. Therefore, not all characters from the
particular code page will be produced.

Returns a random unicode character guaranteed to be valid.

## `random_chars_UTF8`

    my $ret = random_chars_UTF8($optional_paramshash)

Arguments:

- `$optional_paramshash` : can contain
    - `'min'` sets the minimum length of the random sequence to be returned, default is 6
    - `'max'` sets the maximum length of the random sequence to be returned, default is 32

Return a random unicode-only string optionally specifying
minimum and maximum length. See
[Data::Random::Structure::UTF8](https://metacpan.org/pod/Data%3A%3ARandom%3A%3AStructure%3A%3AUTF8) `<random_chars_UTF8()`>
for the range of characters it returns. The returned string
is unicode and is guaranteed all its characters are valid.

# SUBROUTINES

## `check_content_recursively`

    my $ret = check_content_recursively($perl_var, $paramshashref)

Arguments:

- `$perl_var` : a Perl variable containing an arbitrarily nested data structure
- `$paramshashref` : can contain one or more of the following keys:
    - `'numbers'` set it to 1 to look for numbers (possibly among other things).
    If set to 1 and a number `123` or `"123"` is found, this sub returns 1.
    Set it to 0 to not look for numbers at all (and not report if
    there are no numbers) - _don't bother checking for numbers_, that's what
    setting this to zero means.
    - `'strings-unicode'` set it to 1 to look for unicode strings (possibly among other things).
    The definition of "unicode string" is that at least one its characters is unicode.
    If set to 1 and a "unicode string" is found, this sub returns 1.
    - `'strings-plain'` set it to 1 to look for plain strings (possibly among other things).
    The definition of "plain string" is that none of its characters is unicode.
    If set to 1 and a "plain string" is found, this sub returns 1.
    - `'strings'` set it to 1 to look for plain or unicode strings (possibly among other things).
    If set to 1 and a "plain string" or "unicode string" is found, this sub returns 1. Basically,
    it returns 1 when a string is found (as opposed to a "number").

In general, by setting `<'strings-unicode'=`1>> you are checking whether
the input Perl variable contains a unicode string in a key, a value,
an array element, or a scalar reference.

But, setting `<'strings-unicode'=`0>>, it simply means do not look for
this. It does not mean _report if they are NO unicode strings_.

Return value: 1 or 0 depending whether what
was looking for, was found.

This is not an object-oriented method. It is called thously:

    # check if ANY scalar (hash key, value, array element or scalar ref)
    # contains ONLY single number (integer, float)
    # the decicion is made by Scalar::Util:looks_like_number()
    if( Data::Random::Structure::UTF8::check_content_recursively(
        {'abc'=>123, 'xyz'=>[1,2,3]},
        {
                # look for numbers, are there any?
                'numbers' => 1,
        }
    ) ){ print "data structure contains numbers\n" }

    # check if it contains no numbers but it does unicode strings
    if( Data::Random::Structure::UTF8::check_content_recursively(
        {'abc'=>123, 'xyz'=>[1,2,3]},
        {
                # don't look for numbers
                'numbers' => 0,
                # look for unicode strings, are there any?
                'strings-unicode' => 1,
        }
    ) ){ print "data structure contains numbers\n" }

CAVEAT: as its name suggests, this is a recursive function. Beware
of extremely deep data structures. Deep, not long. If you do get
`<"Deep recursion..." warnings`>, and you do insist to go ahead,
this will remove the warnings (but are you sure?):

    {
        no warnings 'recursion';

README.md  view on Meta::CPAN

            {'abc'=>123, 'xyz'=>[1,2,3]},
            {
                'numbers' => 1,
            }
        ) ){ print "data structure contains numbers\n" }
    }

# SEE ALSO

- The parent class [Data::Random::Structure](https://metacpan.org/pod/Data%3A%3ARandom%3A%3AStructure).
- [Data::Roundtrip](https://metacpan.org/pod/Data%3A%3ARoundtrip) for stringifying possibly-unicode Perl data structures.

# AUTHOR

Andreas Hadjiprocopis, `<bliako ta cpan.org / andreashad2 ta gmail.com>`

# BUGS

Please report any bugs or feature requests to `bug-data-random-structure-utf8 at rt.cpan.org`, or through
the web interface at [https://rt.cpan.org/NoAuth/ReportBug.html?Queue=Data-Random-Structure-UTF8](https://rt.cpan.org/NoAuth/ReportBug.html?Queue=Data-Random-Structure-UTF8).  I will be notified, and then you'll
automatically be notified of progress on your bug as I make changes.

# CAVEATS

There are two issues users should know about.

The first issue is that the unicode produced can make
[Data::Dump](https://metacpan.org/pod/Data%3A%3ADump) to complain with

    Operation "lc" returns its argument for UTF-16 surrogate U+DA4B at /usr/local/share/perl5/Data/Dump.pm line 302.

This, I have found, can be fixed with the following workaround (from [github user iafan](https://github.com/evernote/serge/commit/865402bbde42101345a5bee4cd0a855b9b76bdd7), thank you):

    # Suppress `Operation "lc" returns its argument for UTF-16 surrogate 0xNNNN` warning
    # for the `lc()` call below; use 'utf8' instead of a more appropriate 'surrogate' pragma
    # since the latter is not available in until Perl 5.14
    no warnings 'utf8';

README.md  view on Meta::CPAN


in order to extract the `type` which can not be handled
and handle it ourselves. So whenever the parent class ([Data::Random::Structure](https://metacpan.org/pod/Data%3A%3ARandom%3A%3AStructure))
changes its `croak` song, we will have to adopt this code
accordingly (in [Data::Random::Structure::UTF8](https://metacpan.org/pod/Data%3A%3ARandom%3A%3AStructure%3A%3AUTF8) `<generate_scalar()`>).
For the moment, I have placed a catch-all, fall-back condition
to handle this but it will be called for all kind of types
and not only the types we have added.

So, this issue is not going to make the module die but may make it
to skew the random results in favour of unicode strings (which
is the fallback, default action when can't parse the type).

# SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Data::Random::Structure::UTF8

You can also look for information at:

lib/Data/Random/Structure/UTF8.pm  view on Meta::CPAN


our $VERSION='0.06';

use parent 'Data::Random::Structure';

use Scalar::Util qw( looks_like_number );

sub	new {
	my $class = shift;
	my %options = @_;
	my $only_unicode = 0;
	if( exists $options{'only-unicode'} ){
		if( defined $options{'only-unicode'} ){
			$only_unicode = $options{'only-unicode'}
		}
		# do not pass our options to parent it may get confused and croak
		delete $options{'only-unicode'}
	}
	my $self = $class->SUPER::new(%options);
	# at this point our _init() will be called via parent's
	# constructor. Our _init() will call parent's _init()
	$self->only_unicode($only_unicode);
	return $self
}
sub	_reset {
	my $self = shift;
	# we are interfering with the internals of the parent... not good
	$#{$self->{_types}} = -1;
	$#{$self->{_scalar_types}} = -1;
}
sub	_init {
	my $self = shift;
	$self->_reset();
	$self->SUPER::_init(@_);
	push @{$self->{_scalar_types}}, 'string-UTF8'
}
sub	only_unicode {
	my $self = $_[0];
	my $m = $_[1];
	return $self->{'_only-unicode'} unless defined $m;
	$self->_init();
	$self->{'_only-unicode'} = $m;
	if( $m == 1 ){
		# delete just the 'string' type
		# we will get various types but the strings will
		# be exclusively unicode
		my @idx = grep { $self->{'_scalar_types'}->[$_] eq 'string' }
			reverse 0 .. $#{$self->{_scalar_types}}
		;
		splice(@{$self->{_scalar_types}}, $_, 1) for @idx;
	} elsif( $m > 1 ){
		# delete ALL the _scalar_types and leave just our unicode string
		# we will get only unicode strings no other scalar type
		$#{$self->{_scalar_types}} = -1;
		push @{$self->{_scalar_types}}, 'string-UTF8'
	}
	return $m
}
sub	random_char_UTF8 {
	# the crucial part borrowed from The Effective Perler:
	# https://www.effectiveperlprogramming.com/2018/08/find-the-new-emojis-in-perls-unicode-support/
#	my $achar;
#	for(my $trials=100;$trials-->0;){
#		$achar = chr(int(rand(0x10FFF+1)));
#		return $achar if $achar =~ /\p{Present_In: 8.0}/;
#	}

	# just greek and coptic no holes
	return chr(0x03B0+int(rand(0x03F0-0x03B0)));

	my $arand = rand();

lib/Data/Random/Structure/UTF8.pm  view on Meta::CPAN

			# but don't die
			return scalar(random_chars_UTF8(min=>2,max=>2));
		}
	}
	return $rc
}
sub	check_content_recursively {
	my $looking_for = $_[1]; # a hashref of types to look-for, required
	my $bitparams = 0;
	$bitparams |= 1 if exists($looking_for->{'numbers'}) && ($looking_for->{'numbers'}==1);
	$bitparams |= 2 if exists($looking_for->{'strings-unicode'}) && ($looking_for->{'strings-unicode'}==1);
	$bitparams |= 4 if exists($looking_for->{'strings-plain'}) && ($looking_for->{'strings-plain'}==1);
	$bitparams |= (2+4) if exists($looking_for->{'strings'}) && ($looking_for->{'strings'}==1);
	return _check_content_recursively($_[0], $bitparams);
}
# returns 1 if we are looking for it and it was found
# returns 0 if what we were looking for was not found.
# 'looking_for' can be more than one things.
# it is a bit string, 1st bit if set looks for numbers,
# 2nd bit, if set, looks for unicode strings,
# 3rd bit, if set, looks for non-unicode strings (plain)
# if you set 'numbers'=>0, it simply means "do not check for numbers"
# and so it will not check if it has any numbers
# by giving nothing to check, it return 0, nothing was found
sub	_check_content_recursively {
	my $inp = $_[0];
	# NUMBER,UNICODE_STRING,NON_UNICODE_STRING
	my $looking_for = $_[1];
	my $aref = ref($inp);
	my ($r, $v);
	if( ($aref eq '') || ($aref eq 'SCALAR') ){
		if( $aref eq 'SCALAR' ){ $inp = $$inp }
		if( looks_like_number($inp) ){
			return 1 if $looking_for & 1; # a number
			return 0;
		}
		if( _has_utf8($inp) ){
			return 1 if $looking_for & 2; # unicode string
			return 0;
		}
		return 1 if $looking_for & 4; # plain string
		return 0;
	} elsif( $aref eq 'ARRAY' ){
		for my $v (@$inp){
			$r = _check_content_recursively($v, $looking_for);
			return 1 if $r;
		}
	} elsif( $aref eq 'HASH' ){
		for my $k (sort keys %$inp){
			$r = _check_content_recursively($k, $looking_for);
			return 1 if $r;
			$r = _check_content_recursively($inp->{$k}, $looking_for);
			return 1 if $r;
		}
	} else { die "don't know how to deal with this ref '$aref'" }
}
sub	_has_utf8 { return $_[0] =~ /[^\x00-\x7f]/ }
# this does not work for unicode strings
# from https://www.perlmonks.org/?node_id=958679
# and https://www.perlmonks.org/?node_id=791677
#sub isnum ($) {
#    return 0 if $_[0] eq '';
#    $_[0] & ~$_[0] ? 0 : 1
#}
1;

=pod

=encoding utf8

=head1 NAME

Data::Random::Structure::UTF8 - Produce nested data structures with unicode keys, values, elements.

=head1 VERSION

Version 0.06

=head1 SYNOPSIS

This module produces random, arbitrarily deep and long,
nested Perl data structures  with unicode content for the
keys, values and/or array elements. Content can be forced
to be exclusively strings and exclusively unicode. Or
the strings can be unicode. Or anything goes, mixed
unicode and non-unicode strings as well as integers, floats, etc.

This is an object-oriented module
which inherits from
L<Data::Random::Structure> and extends its functionality by
providing for unicode keys and values for hashtables and
unicode content for array elements or scalars, randomly mixed with the
usual repertoire of L<Data::Random::Structure>, which is
non-unicode strings,
numerical, boolean values and the assorted entourage to the court
of Emperor Computer, post-Turing.

For example, it produces these:

=over 4

=item * unicode scalars: e.g. C<"αβγ">,

=item * mixed arrays: e.g. C<["αβγ", "123", "xyz"]>

=item * hashtables with some/all keys and/or values as unicode: e.g.
C<{"αβγ" => "123", "xyz" => "αβγ"}>

=item * exclusive unicode arrays or hashtables: e.g. C<["αβγ", "χψζ"]>

=back

This is accomplised by adding an extra
type C<string-UTF8> (invisible to the user) and the
respective generator method. All these are invisible to the user
which will get the old functionality plus some (or maybe none
because this is a random process which does not eliminate non-unicode
strings, at the moment) unicode strings.

    use Data::Random::Structure::UTF8;

    my $randomiser = Data::Random::Structure::UTF8->new(
        'max_depth' => 5,
        'max_elements' => 20,
        # all the strings produced (keys, values, elements)
	# will be unicode strings
	'only-unicode' => 1,
        # all the strings produced (keys, values, elements)
	# will be a mixture of unicode and non-unicode
	# this is the default behaviour
	#'only-unicode' => 0,
        # only unicode strings will be produced for (keys, values, elements),
	# there will be no numbers, no bool, only unicode strings
	#'only-unicode' => 2,
    );
    my $perl_var = $randomiser->generate() or die;
    print pp($perl_var);

    # which prints the usual escape mess of Dump and Dumper
[
  "\x{7D5A}\x{4EC1}",
  "\x{E6E2}\x{75A4}",
  329076,
  0.255759160148987,

lib/Data/Random/Structure/UTF8.pm  view on Meta::CPAN

    1,
    "_ow|J\@~=6%*N;52?W3Y\$S1",
    {
      "x{75A4}x{75A4}" => 123,
      "123" => "\x{7D5A}\x{4EC1}",
      "xyz" => [1, 2, "\x{7D5A}\x{4EC1}"],
    },
  ],

    # can control the scalar type (for keys, values, items) on the fly
    # this produces unicode strings in addition to
    # Data::Random::Structure's usual repertoire:
    # non-unicode-string, numbers, bool, integer, float, etc.
    # (see there for the list)
    $randomiser->only_unicode(0); # the default: anything plus unicode strings
    print $randomiser->only_unicode();

    # this produces unicode strings in addition to
    # Data::Random::Structure's usual repertoire:
    # numbers, bool, integer, float, etc.
    # (see there for the list)
    # EXCEPT non-unicode-strings, (all strings will be unicode)
    $randomiser->only_unicode(1);
    print $randomiser->only_unicode();

    # this produces unicode strings ONLY
    # Data::Random::Structure's usual repertoire does not apply
    # there will be no numbers, no bool, no integer, no float, no nothing
    $randomiser->only_unicode(2);
    print $randomiser->only_unicode();

=head1 METHODS

This is an object oriented module which has exactly the same API as
L<Data::Random::Structure>.

=head2 C<new>

Constructor. In addition to L<Data::Random::Structure> C<<new()>>
API, it takes parameter C<< 'only-unicode' >> with
a valid value of 0, 1 or 2. Default is 0.

=over 4

=item * 0 : keys, values, elements of the produced data structure will be
a mixture of unicode strings, plus L<Data::Random::Structure>'s full
repertoire which includes non-unicode strings, integers, floats etc.

=item * 1 : keys, values, elements of the produced data structure will be
a mixture of unicode strings, plus L<Data::Random::Structure>'s full
repertoire except non-unicode strings. That is, all strings will be
unicode. But there will possibly be integers etc.

=item * 2 : keys, values, elements of the produced data structure will be
only unicode strings. Nothing of L<Data::Random::Structure>'s
repertoire applies. Only unicode strings, no integers, no nothing.

=back

Controlling the scalar data types can also be done on the fly, after
the object has been created using
L<Data::Random::Structure::UTF8> C<<only_unicode()>>
method.

Additionally, L<Data::Random::Structure> C<<new()>>'s API reports that
the constructor takes 2 optional arguments, C<max_depth> and C<max_elements>.
See L<Data::Random::Structure> C<<new()>> for up-to-date, official information.

=head2 C<only_unicode>

Controls what scalar types to be included in the nested
data structures generated. With no parameters it returns back
the current setting. Otherwise, valid input parameters and their
meanings are listed in L<Data::Random::Structure::UTF8> C<<new()>>

=head2 C<generate>

Generate a nested data structure according to the specification
set in the constructor. See L<Data::Random::Structure> C<<generate()>> for
all options. This method is not overriden by this module.

It returns the Perl data structure as a reference.

=head2 C<generate_scalar>

Generate a scalar which may contain unicode content.
See L<Data::Random::Structure::generate_scalar> for
all options. This method is overriden by this module but
calls the parent's too.

It returns a Perl string.

=head2 C<generate_array>

Generate an array with random, possibly unicode, content.
See L<Data::Random::Structure::generate_array> for
all options. This method is not overriden by this module.

It returns the Perl array as a reference.

=head2 C<generate_hash>

Generate an array with random, possibly unicode, content.
See L<Data::Random::Structure::generate_array> for
all options. This method is not overriden by this module.

It returns the Perl array as a reference.

=head2 C<random_char_UTF8>

Return a random unicode character, guaranteed to be valid.
This is a very simple method which selects characters
from some pre-set code pages (Greek, Cyrillic, Cherokee,
Ethiopic, Javanese) with equal probability.
These pages and ranges were selected so that there are
no "holes" between them which would produce an invalid
character. Therefore, not all characters from the
particular code page will be produced.

Returns a random unicode character guaranteed to be valid.

=head2 C<random_chars_UTF8>

  my $ret = random_chars_UTF8($optional_paramshash)

Arguments:

=over 4

=item * C<$optional_paramshash> : can contain

lib/Data/Random/Structure/UTF8.pm  view on Meta::CPAN

=over 4

=item * C<'min'> sets the minimum length of the random sequence to be returned, default is 6

=item * C<'max'> sets the maximum length of the random sequence to be returned, default is 32

=back

=back

Return a random unicode-only string optionally specifying
minimum and maximum length. See
L<Data::Random::Structure::UTF8> C<<random_chars_UTF8()>>
for the range of characters it returns. The returned string
is unicode and is guaranteed all its characters are valid.

=head1 SUBROUTINES

=head2 C<check_content_recursively>

  my $ret = check_content_recursively($perl_var, $paramshashref)

Arguments:

=over 4

lib/Data/Random/Structure/UTF8.pm  view on Meta::CPAN

=item * C<$paramshashref> : can contain one or more of the following keys:

=over 4

=item * C<'numbers'> set it to 1 to look for numbers (possibly among other things).
If set to 1 and a number C<123> or C<"123"> is found, this sub returns 1.
Set it to 0 to not look for numbers at all (and not report if
there are no numbers) - I<don't bother checking for numbers>, that's what
setting this to zero means.

=item * C<'strings-unicode'> set it to 1 to look for unicode strings (possibly among other things).
The definition of "unicode string" is that at least one its characters is unicode.
If set to 1 and a "unicode string" is found, this sub returns 1.

=item * C<'strings-plain'> set it to 1 to look for plain strings (possibly among other things).
The definition of "plain string" is that none of its characters is unicode.
If set to 1 and a "plain string" is found, this sub returns 1.

=item * C<'strings'> set it to 1 to look for plain or unicode strings (possibly among other things).
If set to 1 and a "plain string" or "unicode string" is found, this sub returns 1. Basically,
it returns 1 when a string is found (as opposed to a "number").

=back

=back

In general, by setting C<<'strings-unicode'=>1>> you are checking whether
the input Perl variable contains a unicode string in a key, a value,
an array element, or a scalar reference.

But, setting C<<'strings-unicode'=>0>>, it simply means do not look for
this. It does not mean I<report if they are NO unicode strings>.

Return value: 1 or 0 depending whether what
was looking for, was found.

This is not an object-oriented method. It is called thously:

    # check if ANY scalar (hash key, value, array element or scalar ref)
    # contains ONLY single number (integer, float)
    # the decicion is made by Scalar::Util:looks_like_number()
    if( Data::Random::Structure::UTF8::check_content_recursively(
	{'abc'=>123, 'xyz'=>[1,2,3]},
	{
		# look for numbers, are there any?
		'numbers' => 1,
	}
    ) ){ print "data structure contains numbers\n" }

    # check if it contains no numbers but it does unicode strings
    if( Data::Random::Structure::UTF8::check_content_recursively(
	{'abc'=>123, 'xyz'=>[1,2,3]},
	{
		# don't look for numbers
		'numbers' => 0,
		# look for unicode strings, are there any?
		'strings-unicode' => 1,
	}
    ) ){ print "data structure contains numbers\n" }

CAVEAT: as its name suggests, this is a recursive function. Beware
of extremely deep data structures. Deep, not long. If you do get
C<<"Deep recursion..." warnings>>, and you do insist to go ahead,
this will remove the warnings (but are you sure?):

    {
        no warnings 'recursion';

lib/Data/Random/Structure/UTF8.pm  view on Meta::CPAN

	    }
        ) ){ print "data structure contains numbers\n" }
    }

=head1 SEE ALSO

=over 4

=item * The parent class L<Data::Random::Structure>.

=item * L<Data::Roundtrip> for stringifying possibly-unicode Perl data structures.

=back

=head1 AUTHOR

Andreas Hadjiprocopis, C<< <bliako ta cpan.org / andreashad2 ta gmail.com> >>

=head1 BUGS

Please report any bugs or feature requests to C<bug-data-random-structure-utf8 at rt.cpan.org>, or through
the web interface at L<https://rt.cpan.org/NoAuth/ReportBug.html?Queue=Data-Random-Structure-UTF8>.  I will be notified, and then you'll
automatically be notified of progress on your bug as I make changes.

=head1 CAVEATS

There are two issues users should know about.

The first issue is that the unicode produced can make
L<Data::Dump> to complain with

   Operation "lc" returns its argument for UTF-16 surrogate U+DA4B at /usr/local/share/perl5/Data/Dump.pm line 302.

This, I have found, can be fixed with the following workaround (from L<github user iafan|https://github.com/evernote/serge/commit/865402bbde42101345a5bee4cd0a855b9b76bdd7>, thank you):

    # Suppress `Operation "lc" returns its argument for UTF-16 surrogate 0xNNNN` warning
    # for the `lc()` call below; use 'utf8' instead of a more appropriate 'surrogate' pragma
    # since the latter is not available in until Perl 5.14
    no warnings 'utf8';

lib/Data/Random/Structure/UTF8.pm  view on Meta::CPAN


in order to extract the C<type> which can not be handled
and handle it ourselves. So whenever the parent class (L<Data::Random::Structure>)
changes its C<croak> song, we will have to adopt this code
accordingly (in L<Data::Random::Structure::UTF8> C<<generate_scalar()>>).
For the moment, I have placed a catch-all, fall-back condition
to handle this but it will be called for all kind of types
and not only the types we have added.

So, this issue is not going to make the module die but may make it
to skew the random results in favour of unicode strings (which
is the fallback, default action when can't parse the type).

=head1 SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Data::Random::Structure::UTF8


You can also look for information at:

t/01-basic.t  view on Meta::CPAN

# to avoid wide character in TAP output
# do this before loading Test* modules
use open ':std', ':encoding(utf8)';

use Test::More;
use Data::Random::Structure::UTF8;
use Data::Dump qw/pp/;
use Data::Dumper;

# we are dealing with a random generator
# so give it a change to produce some unicode
# eventually after so many trials, it usually does after 1-10 trials
my $MAXTRIALS=100;

############################
#### nothing to change below
my $num_tests = 0;

my ($perl_var, $found, $found1, $found2, $trials, $rc, $alength);

my $randomiser = Data::Random::Structure::UTF8->new(
	max_depth => 5,
	max_elements => 20,
);
ok(defined $randomiser, 'Data::Random::Structure::UTF8->new()'." called."); $num_tests++;

$found = 0;
for($trials=$MAXTRIALS;$trials-->0;){
	$perl_var = $randomiser->generate();
	ok(defined $perl_var, "generate() called."); $num_tests++;
	if( Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
		'strings-unicode' => 1,
	}) ){ $found=1; last }
}
ok($found==1, "generate() : produced unicode content (after ".($MAXTRIALS-$trials)." trials)."); $num_tests++; 

$found = 0;
for($trials=$MAXTRIALS;$trials-->0;){
	$perl_var = $randomiser->generate_scalar();
	ok(defined $perl_var, "generate_scalar() called."); $num_tests++;
	if( Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
		'strings-unicode' => 1,
	}) ){ $found=1; last }
}
ok($found==1, "generate_scalar() : produced unicode content (after ".($MAXTRIALS-$trials)." trials)."); $num_tests++; 

$found = 0;
for($trials=$MAXTRIALS;$trials-->0;){
	$perl_var = $randomiser->generate_hash();
	ok(defined $perl_var, "generate_hash() called."); $num_tests++;
	if( Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
		'strings-unicode' => 1,
	}) ){ $found=1; last }
}
ok($found==1, "generate_hash() : produced unicode content (after ".($MAXTRIALS-$trials)." trials)."); $num_tests++; 

$found = 0;
for($trials=$MAXTRIALS;$trials-->0;){
	$perl_var = $randomiser->generate_array();
	ok(defined $perl_var, "generate_array() called."); $num_tests++;
	if( Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
		'strings-unicode' => 1,
	}) ){ $found=1; last }
}
ok($found==1, "generate_array() : produced unicode content (after ".($MAXTRIALS-$trials)." trials)."); $num_tests++; 

# check if pp still complains about lc
$found1 = 0; $found2 = 0;
for($trials=$MAXTRIALS;$trials-->0;){
	$perl_var = $randomiser->generate_scalar();
	if( Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
		'strings-unicode' => 1,
	}) ){
		$rc = eval { my $x=Data::Dump::pp($perl_var); 1 };
		if( $@ || ! $rc ){ $found1=1; last }
		$rc = eval { my $x=Dumper($perl_var); 1 };
		if( $@ || ! $rc ){ $found2=1; last }
	}
}
if( $found1==1 ){ ok(1==1, "good to know, Data::Dump still complains"); $num_tests++; }
else { ok(1==1, "Data::Dump stopped complaining?"); $num_tests++; }
if( $found2==1 ){ ok(1==1, "good to know, Data::Dumper still complains"); $num_tests++; }

t/02-check-content-recursively.t  view on Meta::CPAN

# do this before loading Test* modules
use open ':std', ':encoding(utf8)';

use Test::More;

use Data::Random::Structure::UTF8;

use Scalar::Util qw/looks_like_number/;

# we are dealing with a random generator
# so give it a change to produce some unicode
# eventually after so many trials, it usually does after 1-10 trials
my $MAXTRIALS=100;

############################
#### nothing to change below
my $num_tests = 0;

my ($perl_var, $found, $found1, $found2, $trials, $rc, $alength, $randomiser);

# just a scalar unicode
$perl_var = 'αβγ';
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>1,
	'strings'=>0,
}) == 0, "check_content_recursively() : numbers:  no."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings'=>1,
}) == 1, "check_content_recursively() : strings: yes."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings-unicode'=>1,
	'strings-plain'=>0,
}) == 1, "check_content_recursively() : unicode strings: yes."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings-unicode'=>0,
	'strings-plain'=>1,
}) == 0, "check_content_recursively() : non-unicode strings: no."); $num_tests++;

#####
# just a scalar non-unicode
$perl_var = 'abc';
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>1,
	'strings'=>0,
}) == 0, "check_content_recursively() : numbers:  no."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings'=>1,
}) == 1, "check_content_recursively() : strings: yes."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings-unicode'=>1,
	'strings-plain'=>0,
}) == 0, "check_content_recursively() : unicode strings: no."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings-unicode'=>0,
	'strings-plain'=>1,
}) == 1, "check_content_recursively() : non-unicode strings: yes."); $num_tests++;

#####
# just a scalar mixed unicode and non-unicode
$perl_var = 'abcαβγ xyz χυζ aaa';
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>1,
	'strings'=>0,
}) == 0, "check_content_recursively() : numbers:  no."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings'=>1,
}) == 1, "check_content_recursively() : strings: yes."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings-unicode'=>1,
	'strings-plain'=>0,
}) == 1, "check_content_recursively() : unicode strings: yes."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings-unicode'=>0,
	'strings-plain'=>1,
}) == 0, "check_content_recursively() : non-unicode strings: no."); $num_tests++;

#####
# just a scalar number
$perl_var = 123;
ok(looks_like_number($perl_var), "looks like number: yes"); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>1,
	'strings'=>0,
}) == 1, "check_content_recursively() : numbers:  yes."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings'=>1,
}) == 0, "check_content_recursively() : strings: no."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings-unicode'=>1,
	'strings-plain'=>0,
}) == 0, "check_content_recursively() : unicode strings: no."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings-unicode'=>0,
	'strings-plain'=>1,
}) == 0, "check_content_recursively() : non-unicode strings: no."); $num_tests++;

#####
# just a scalar number in a string
$perl_var = '123';
ok(looks_like_number($perl_var), "looks like number: yes"); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>1,
	'strings'=>0,
}) == 1, "check_content_recursively() : numbers:  yes."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings'=>1,
}) == 0, "check_content_recursively() : strings: no."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings-unicode'=>1,
	'strings-plain'=>0,
}) == 0, "check_content_recursively() : unicode strings: no."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings-unicode'=>0,
	'strings-plain'=>1,
}) == 0, "check_content_recursively() : non-unicode strings: no."); $num_tests++;

#####
# just a scalar number in a string mixed with unicode and non-unicode
$perl_var = 'abcαβγ xyz χυζ aaa 123';
ok(!looks_like_number($perl_var), "looks like number: no"); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>1,
	'strings'=>0,
}) == 0, "check_content_recursively() : numbers:  no."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings'=>1,
}) == 1, "check_content_recursively() : strings: yes."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings-unicode'=>1,
	'strings-plain'=>0,
}) == 1, "check_content_recursively() : unicode strings: yes."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings-unicode'=>0,
	'strings-plain'=>1,
}) == 0, "check_content_recursively() : non-unicode strings: no."); $num_tests++;

#####
# complex data structures
$perl_var = {
	'χυζ' => 'abcαβγ xyz χυζ aaa 123',
	'abc' => {
		'123' => 'βγ xyz χυζ aa',
		'786' => ['α', 'β', 'c'],
	},
	'000' => [1,2,3],

t/02-check-content-recursively.t  view on Meta::CPAN

ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>1,
	'strings'=>0,
}) == 1, "check_content_recursively() : numbers:  yes."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings'=>1,
}) == 1, "check_content_recursively() : strings: yes."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings-unicode'=>1,
	'strings-plain'=>0,
}) == 1, "check_content_recursively() : unicode strings: yes."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings-unicode'=>0,
	'strings-plain'=>1,
}) == 1, "check_content_recursively() : non-unicode strings: yes."); $num_tests++;

# check for each type and set the others to zero,
# it means don't bother checking (and not report if it doesn't exist)
$perl_var = {
	'strings-unicode' => 'ναι έχω και από αυτό',
	'strings-plain' => 'sure I have some',
	'numbers' => [1,2,3,123],
};
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>1,
}) == 1, "check_content_recursively() : numbers:  yes."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'strings'=>1,
}) == 1, "check_content_recursively() : strings: yes."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'strings-unicode'=>1,
}) == 1, "check_content_recursively() : unicode strings: yes."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'strings-plain'=>1,
}) == 1, "check_content_recursively() : non-unicode strings: yes."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0, # this means don't look, and not doesn't exist
	'strings-plain'=>1,
}) == 1, "check_content_recursively() : numbers:  yes."); $num_tests++;

# check for each type and set the others to zero,
# it means don't bother checking (and not report if it doesn't exist)
$perl_var = {
	# even keys are checked, so use a number for a key too!
	'123' => [1,2,3,123],

t/02-check-content-recursively.t  view on Meta::CPAN

ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
}) == 0, "check_content_recursively() : did not check for anything"); $num_tests++;
# undef params
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, undef
) == 0, "check_content_recursively() : did not check for anything"); $num_tests++;
# no params
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var,
) == 0, "check_content_recursively() : did not check for anything"); $num_tests++;

#####
# complex data structures with only unicode
$perl_var = {
	'χυζ' => 'αβγ',
	'αβγ' => {
		'αβγ' => 'βγ',
		'χυζ' => ['α', 'β'],
	},
	'κιαθ' => ['ά', 'Ά', 'Α', 'Ζ'],
};
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>1,
	'strings'=>0,
}) == 0, "check_content_recursively() : numbers:  yes."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings'=>1,
}) == 1, "check_content_recursively() : strings: yes."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings-unicode'=>1,
	'strings-plain'=>0,
}) == 1, "check_content_recursively() : unicode strings: yes."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings-unicode'=>0,
	'strings-plain'=>1,
}) == 0, "check_content_recursively() : non-unicode strings: no."); $num_tests++;

#####
# complex data structures with no unicode
$perl_var = {
	'abc' => 'hdhd',
	'xyz' => {'ahah'=>'ssjs', 'zhahah'=>['a','b','aaaa']},
	'uauau' => ['aaaa','bbbb'],
};
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>1,
	'strings'=>0,
}) == 0, "check_content_recursively() : numbers:  yes."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings'=>1,
}) == 1, "check_content_recursively() : strings: yes."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings-unicode'=>1,
	'strings-plain'=>0,
}) == 0, "check_content_recursively() : unicode strings: no."); $num_tests++;
ok(Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
	'numbers'=>0,
	'strings-unicode'=>0,
	'strings-plain'=>1,
}) == 1, "check_content_recursively() : non-unicode strings: yes."); $num_tests++;

done_testing($num_tests);

t/03-only-unicode.t  view on Meta::CPAN

binmode STDIN,  ':encoding(UTF-8)';
# to avoid wide character in TAP output
# do this before loading Test* modules
use open ':std', ':encoding(utf8)';

use Test::More;

use Data::Random::Structure::UTF8;

# we are dealing with a random generator
# so give it a change to produce some unicode
# eventually after so many trials, it usually does after 1-10 trials
my $MAXTRIALS=100;

############################
#### nothing to change below
my $num_tests = 0;

my ($perl_var, $found, $found1, $found2, $trials, $rc, $alength, $randomiser);

$randomiser = Data::Random::Structure::UTF8->new(
	max_depth => 5,
	max_elements => 20,
	'only-unicode' => 1, # that can have numbers and unicode strings
);
ok(defined $randomiser, 'Data::Random::Structure::UTF8->new()'." called."); $num_tests++;

is($randomiser->only_unicode(),1, "only unicode set to 1."); $num_tests++;

ok(0==scalar(grep{ $_ eq 'string' } @{$randomiser->{'_scalar_types'}}), "removed the string type."); $num_tests++;

ok(1, "found these types: '".join("','",  @{$randomiser->{'_scalar_types'}})."'."); $num_tests++;
$found = 0;
for($trials=$MAXTRIALS;$trials-->0;){
	$perl_var = $randomiser->generate();
	if( ! defined $perl_var ){ ok(0, "generate() failed.");  $num_tests++; }

	if( Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
		'numbers'=>1,
		'strings'=>0,
	}) ){ $found |= 1 }
	if( Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
		'numbers'=>0,
		'strings-unicode'=>1,
	}) ){ $found |= 2 }
	if( Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
		'numbers'=>0,
		'strings-plain'=>1,
	}) ){ $found |= 4 }
}
ok($found&1, "generate() : has numbers (after $MAXTRIALS trials)."); $num_tests++; 
ok($found&2, "generate() : has unicode strings (after $MAXTRIALS trials)."); $num_tests++; 
ok(!($found&4), "generate() : does not have non-unicode strings (after $MAXTRIALS trials)."); $num_tests++; 

$randomiser->only_unicode(2);
is($randomiser->only_unicode(),2, "only unicode set to 2."); $num_tests++;
ok($randomiser->{'_scalar_types'}->[0] eq 'string-UTF8', "has string-UTF8 set."); $num_tests++;
ok($#{$randomiser->{'_scalar_types'}}==0, "no other type is set."); $num_tests++;
ok(1, "found these types: '".join("','",  @{$randomiser->{'_scalar_types'}})."'."); $num_tests++;
$found = 0;
for($trials=$MAXTRIALS;$trials-->0;){
	$perl_var = $randomiser->generate();
	if( ! defined $perl_var ){ ok(0, "generate() failed.");  $num_tests++; }

	if( Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
		'numbers'=>1,
		'strings'=>0,
	}) ){ $found |= 1 }
	if( Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
		'numbers'=>0,
		'strings-unicode'=>1,
	}) ){ $found |= 2 }
	if( Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
		'numbers'=>0,
		'strings-plain'=>1,
	}) ){ $found |= 4 }
}
ok(!($found&1), "generate() : does not have numbers (after $MAXTRIALS trials)."); $num_tests++; 
ok($found&2, "generate() : has unicode strings (after $MAXTRIALS trials)."); $num_tests++; 
ok(!($found&4), "generate() : does not have non-unicode strings (after $MAXTRIALS trials)."); $num_tests++; 

$randomiser->only_unicode(0);
is($randomiser->only_unicode(),0, "only unicode set to 0."); $num_tests++;
ok(0<scalar(grep{ $_ eq 'string-UTF8' } @{$randomiser->{'_scalar_types'}}), "has string-UTF8 set."); $num_tests++;
ok(0<scalar(grep{ $_ eq 'string' } @{$randomiser->{'_scalar_types'}}), "has string set."); $num_tests++;
ok(1, "found these types: '".join("','",  @{$randomiser->{'_scalar_types'}})."'."); $num_tests++;
$found = 0;
for($trials=$MAXTRIALS;$trials-->0;){
	$perl_var = $randomiser->generate();
	if( ! defined $perl_var ){ ok(0, "generate() failed.");  $num_tests++; }

	if( Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
		'numbers'=>1,
		'strings'=>0,
	}) ){ $found |= 1 }
	if( Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
		'numbers'=>0,
		'strings-unicode'=>1,
	}) ){ $found |= 2 }
	if( Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
		'numbers'=>0,
		'strings-plain'=>1,
	}) ){ $found |= 4 }
}
ok($found&1, "generate() : has numbers (after $MAXTRIALS trials)."); $num_tests++; 
ok($found&2, "generate() : has unicode strings (after $MAXTRIALS trials)."); $num_tests++; 
ok($found&4, "generate() : has non-unicode strings (after $MAXTRIALS trials)."); $num_tests++; 

##### fresh object with only-unicode set to 2 (nothing else, no numbers no nothing)
$randomiser = Data::Random::Structure::UTF8->new(
	max_depth => 5,
	max_elements => 20,
	'only-unicode' => 2, # that can have numbers and unicode strings
);
ok(defined $randomiser, 'Data::Random::Structure::UTF8->new()'." called."); $num_tests++;

is($randomiser->only_unicode(),2, "only unicode set to 1."); $num_tests++;

ok(0==scalar(grep{ $_ eq 'string' } @{$randomiser->{'_scalar_types'}}), "removed the string type."); $num_tests++;

ok(1, "found these types: '".join("','",  @{$randomiser->{'_scalar_types'}})."'."); $num_tests++;
is($randomiser->only_unicode(),2, "only unicode set to 2."); $num_tests++;
ok($randomiser->{'_scalar_types'}->[0] eq 'string-UTF8', "has string-UTF8 set."); $num_tests++;
ok($#{$randomiser->{'_scalar_types'}}==0, "no other type is set."); $num_tests++;
ok(1, "found these types: '".join("','",  @{$randomiser->{'_scalar_types'}})."'."); $num_tests++;
$found = 0;
for($trials=$MAXTRIALS;$trials-->0;){
	$perl_var = $randomiser->generate();
	if( ! defined $perl_var ){ ok(0, "generate() failed.");  $num_tests++; }

	if( Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
		'numbers'=>1,
		'strings'=>0,
	}) ){ $found |= 1 }
	if( Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
		'numbers'=>0,
		'strings-unicode'=>1,
	}) ){ $found |= 2 }
	if( Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
		'numbers'=>0,
		'strings-plain'=>1,
	}) ){ $found |= 4 }
}
ok(!($found&1), "generate() : does not have numbers (after $MAXTRIALS trials)."); $num_tests++; 
ok($found&2, "generate() : has unicode strings (after $MAXTRIALS trials)."); $num_tests++; 
ok(!($found&4), "generate() : does not have non-unicode strings (after $MAXTRIALS trials)."); $num_tests++; 

##### fresh object with this setting only-unicode set to zero, default behaviour
$randomiser = Data::Random::Structure::UTF8->new(
	max_depth => 5,
	max_elements => 20,
	'only-unicode' => 0, # that can have numbers and unicode strings
);
ok(defined $randomiser, 'Data::Random::Structure::UTF8->new()'." called."); $num_tests++;
is($randomiser->only_unicode(),0, "only unicode set to 0."); $num_tests++;
ok(0<scalar(grep{ $_ eq 'string-UTF8' } @{$randomiser->{'_scalar_types'}}), "has string-UTF8 set."); $num_tests++;
ok(0<scalar(grep{ $_ eq 'string' } @{$randomiser->{'_scalar_types'}}), "has string set."); $num_tests++;
ok(1, "found these types: '".join("','",  @{$randomiser->{'_scalar_types'}})."'."); $num_tests++;
$found = 0;
for($trials=$MAXTRIALS;$trials-->0;){
	$perl_var = $randomiser->generate();
	if( ! defined $perl_var ){ ok(0, "generate() failed.");  $num_tests++; }

	if( Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
		'numbers'=>1,
		'strings'=>0,
	}) ){ $found |= 1 }
	if( Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
		'numbers'=>0,
		'strings-unicode'=>1,
	}) ){ $found |= 2 }
	if( Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
		'numbers'=>0,
		'strings-plain'=>1,
	}) ){ $found |= 4 }
}
ok($found&1, "generate() : has numbers (after $MAXTRIALS trials)."); $num_tests++; 
ok($found&2, "generate() : has unicode strings (after $MAXTRIALS trials)."); $num_tests++; 
ok($found&4, "generate() : has non-unicode strings (after $MAXTRIALS trials)."); $num_tests++; 

##### fresh object with no setting for unicode
$randomiser = Data::Random::Structure::UTF8->new(
	max_depth => 5,
	max_elements => 20,
);
ok(defined $randomiser, 'Data::Random::Structure::UTF8->new()'." called."); $num_tests++;
is($randomiser->only_unicode(),0, "only unicode set to 0."); $num_tests++;
ok(0<scalar(grep{ $_ eq 'string-UTF8' } @{$randomiser->{'_scalar_types'}}), "has string-UTF8 set."); $num_tests++;
ok(0<scalar(grep{ $_ eq 'string' } @{$randomiser->{'_scalar_types'}}), "has string set."); $num_tests++;
ok(1, "found these types: '".join("','",  @{$randomiser->{'_scalar_types'}})."'."); $num_tests++;
$found = 0;
for($trials=$MAXTRIALS;$trials-->0;){
	$perl_var = $randomiser->generate();
	if( ! defined $perl_var ){ ok(0, "generate() failed.");  $num_tests++; }

	if( Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
		'numbers'=>1,
		'strings'=>0,
	}) ){ $found |= 1 }
	if( Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
		'numbers'=>0,
		'strings-unicode'=>1,
	}) ){ $found |= 2 }
	if( Data::Random::Structure::UTF8::check_content_recursively($perl_var, {
		'numbers'=>0,
		'strings-plain'=>1,
	}) ){ $found |= 4 }
}
ok($found&1, "generate() : has numbers (after $MAXTRIALS trials)."); $num_tests++; 
ok($found&2, "generate() : has unicode strings (after $MAXTRIALS trials)."); $num_tests++; 
ok($found&4, "generate() : has non-unicode strings (after $MAXTRIALS trials)."); $num_tests++; 

done_testing($num_tests);



( run in 0.645 second using v1.01-cache-2.11-cpan-f29a10751f0 )