App-Test-Generator

 view release on metacpan or  search on metacpan

lib/App/Test/Generator/SchemaExtractor.pm  view on Meta::CPAN


    die if $mode eq 'secure' && !$key;

Generated schema:

    relationships:
      - type: value_conditional
        if: mode
        equals: secure
        then_required: key
        description: When mode equals 'secure', key is required

=back

=head2 Default Value Extraction

The extractor comprehensively extracts default values from both code and POD documentation:

=head3 Code Pattern Recognition

Extracts defaults from multiple Perl idioms:

=over 4

=item * Logical OR operator: C<$param = $param || 'default'>

=item * Defined-or operator: C<$param //= 'default'>

=item * Ternary operator: C<$param = defined $param ? $param : 'default'>

=item * Unless conditional: C<$param = 'default' unless defined $param>

=item * Chained defaults: C<$param = $param || $self->{_default} || 'fallback'>

=item * Multi-line patterns: C<$param = {} unless $param>

=back

=head3 POD Pattern Recognition

Extracts defaults from documentation:

=over 4

=item * Standard format: C<Default: 'value'>

=item * Alternative format: C<Defaults to: 'value'>

=item * Inline format: C<Optional, default: 'value'>

=item * Parameter lists: C<$param - type, default 'value'>

=back

=head3 Value Processing

Properly handles:

=over 4

=item * String literals with quotes and escape sequences

=item * Numeric values (integers and floats)

=item * Boolean values (true/false converted to 1/0)

=item * Empty data structures ([] and {})

=item * Special values (undef, __PACKAGE__)

=item * Complex expressions (preserved as-is when unevaluatable)

=item * Quote operators (q{}, qq{}, qw{})

=back

=head3 Type Inference

When a parameter has a default value but no explicit type annotation,
the type is automatically inferred from the default:

    $options = {}        # inferred as hashref
    $items = []          # inferred as arrayref
    $count = 42          # inferred as integer
    $ratio = 3.14        # inferred as number
    $enabled = 1         # inferred as boolean

=head2 Context-Aware Return Analysis

The extractor provides comprehensive analysis of method return behavior,
including context sensitivity, error handling conventions, and method chaining patterns.

When a method's POD contains a C<=head4 Output> block in
L<Params::Validate::Strict> schema format, the C<type> declared there is
used as the authoritative output type and takes precedence over all
heuristic code analysis:

    =head4 Output

        {
            type => 'hashref',
        }

This is the recommended way to document methods whose return type would
otherwise be misidentified (e.g. a method that returns C<$self-E<gt>{cache}>
where the cache happens to hold a hashref).

Using parentheses as the outer container emits C<type: array>, indicating a
list-returning method.  L<App::Test::Generator> 0.39+ (with L<Test::Returns>
0.03+) captures these results in list context automatically:

    =head4 Output

        (
            {
                type => 'hashref',
            },
            ...
        )

=head3 List vs Scalar Context Detection

lib/App/Test/Generator/SchemaExtractor.pm  view on Meta::CPAN

	    $code =~ /(?:basename|dirname|fileparse)\s*\(\s*\$$param/) {
		$p->{type} = 'string';
		$p->{semantic} = 'filepath';
		$self->_log("  ADVANCED: $param manipulated as file path");
		return;
	}

	# Path validation patterns
	# Only match a literal path assigned or defaulted to this variable
	if(defined $p->{_default} && $p->{_default} =~ m{^([A-Za-z]:\\|/|\./|\.\./)}) {
		$p->{type} = 'string';
		$p->{semantic} = 'filepath';
		$self->_log("  ADVANCED: $param default looks like a path");
		return;
	}

	# IO::File detection
	if ($code =~ /\$$param\s*->\s*isa\s*\(\s*['"]IO::File['"]\s*\)/ ||
	    $code =~ /IO::File\s*->\s*new\s*\(\s*\$$param/) {
		$p->{type} = 'object';
		$p->{isa} = 'IO::File';
		$p->{semantic} = 'filehandle';
		$self->_log("  ADVANCED: $param is IO::File object");
		return;
	}
}

# --------------------------------------------------
# _detect_coderef_type
#
# Purpose:    Detect coderef and callback parameters
#             by analysing ref() checks, invocation
#             patterns, and parameter naming
#             conventions.
#
# Entry:      $p     - parameter hashref (modified
#                      in place).
#             $param - parameter name string.
#             $code  - method body source string.
#
# Exit:       Returns nothing. Modifies $p in place,
#             setting type and semantic keys.
#             Returns immediately on first match.
#
# Side effects: Logs detections to stdout when
#               verbose is set.
# --------------------------------------------------
sub _detect_coderef_type {
	my ($self, $p, $param, $code) = @_;

	return unless defined $param && $param =~ /^\w+$/;

	# ref() check for CODE
	if ($code =~ /ref\s*\(\s*\$$param\s*\)\s*eq\s*['"]CODE['"]/i) {
		$p->{type} = 'coderef';
		$p->{semantic} = 'callback';
		$self->_log("  ADVANCED: $param is coderef (ref check)");
		return;
	}

	# Invocation as coderef - note the escaped @ in \@_
	if ($code =~ /\$$param\s*->\s*\(/ ||
	    $code =~ /\$$param\s*->\s*\(\s*\@_\s*\)/ ||
	    $code =~ /&\s*\{\s*\$$param\s*\}/) {
		$p->{type} = 'coderef';
		$p->{semantic} = 'callback';
		$self->_log("  ADVANCED: $param invoked as coderef");
		return;
	}

	# Parameter name suggests callback
	if ($param =~ /^(?:callback|cb|handler|sub|code|fn|func|on_\w+)$/i) {
		$p->{type} = 'coderef';
		$p->{semantic} = 'callback';
		$self->_log("  ADVANCED: $param name suggests coderef");
		return;
	}

	# Blessed coderef (unusual but valid)
	if ($code =~ /blessed\s*\(\s*\$$param\s*\)/ &&
	    $code =~ /ref\s*\(\s*\$$param\s*\)\s*eq\s*['"]CODE['"]/i) {
		$p->{type} = 'object';
		$p->{isa} = 'blessed_coderef';
		$p->{semantic} = 'callback';
		$self->_log("  ADVANCED: $param is blessed coderef");
		return;
	}
}

# --------------------------------------------------
# _detect_enum_type
#
# Purpose:    Detect enum-like parameters whose
#             valid values are a fixed set, by
#             analysing validation patterns
#             including regex alternations, hash
#             lookups, grep checks, given/when,
#             if/elsif chains, and smart match.
#
# Entry:      $p     - parameter hashref (modified
#                      in place).
#             $param - parameter name string.
#             $code  - method body source string.
#
# Exit:       Returns nothing. Modifies $p in place,
#             setting type, enum, and semantic keys.
#             Returns immediately on first match.
#
# Side effects: Logs detections to stdout when
#               verbose is set.
#
# Notes:      Requires at least 3 if/elsif branches
#             for pattern 5 to avoid false positives
#             from ordinary conditional code.
# --------------------------------------------------
sub _detect_enum_type {
	my ($self, $p, $param, $code) = @_;

	return unless defined $param && $param =~ /^\w+$/;

	# Pattern 1: die/croak unless value is in list

lib/App/Test/Generator/SchemaExtractor.pm  view on Meta::CPAN

			my $val = defined $2 ? $2 : defined $3 ? $3 : $4;
			$kv{$key} = $val;
		}

		push @examples, {
			style => 'named',
			source => 'pod',
			args => \%kv,
			function => $method,	# TODO: add a sanity check this is what we expect
		} if %kv;
	}

	unless(scalar(@examples)) {
		# Positional calls: func($a, $b)
		while ($synopsis =~ /\b(\w+)\s*\(\s*(.*?)\s*\)/sg) {
			my ($func, $argstr) = ($1, $2);

			# next if $func eq 'new';	# already handled

			my @args = map { s/^\s+|\s+$//gr } split /\s*,\s*/, $argstr;

			next unless @args;

			push @examples, {
				style	=> 'positional',
				source	=> 'pod',
				function => $func,
				args	=> \@args,
			};
		}
	}

	if (scalar(@examples)) {
		$hints->{valid_inputs} ||= [];
		push @{ $hints->{valid_inputs} }, @examples;

		$self->_log("  POD: extracted " . scalar(@examples) . " example call(s)");
	}

	for my $k (qw(boundary_values invalid_inputs valid_inputs equivalence_classes)) {
		$hints->{$k} //= [];
	}

	return $hints;
}

# --------------------------------------------------
# _clean_default_value
#
# Purpose:    Normalise a raw default value string
#             extracted from code or POD into a
#             clean Perl scalar, handling quoted
#             strings, numeric literals, boolean
#             keywords, empty containers, and
#             undef.
#
# Entry:      $value     - raw value string.
#                          May be undef.
#             $from_code - true if the value was
#                          extracted from source
#                          code (affects escape
#                          sequence handling).
#
# Exit:       Returns the cleaned value:
#               undef   for undef or unparseable
#               {}      for empty hashrefs
#               []      for empty arrayrefs
#               integer for whole numbers
#               float   for decimal numbers
#               1 or 0  for boolean keywords
#               string  for everything else
#
# Side effects: None.
# --------------------------------------------------
sub _clean_default_value {
	my ($self, $value, $from_code) = @_;

	return unless defined $value;

	# Remove leading/trailing whitespace
	$value =~ s/^\s+|\s+$//g;

	# Remove parenthetical notes like "(no password)" only if there's content before them
	$value =~ s/(\S+)\s*\([^)]+\)\s*$/$1/;
	$value =~ s/^\s+|\s+$//g;

	# Handle chained || or // operators - extract the rightmost value
	if ($value =~ /\|\||\/{2}/) {
		my @parts = split(/\s*(?:\|\||\/{2})\s*/, $value);
		$value = $parts[-1];
		$value =~ s/^\s+|\s+$//g;
	}

	# Remove trailing semicolon if present
	$value =~ s/;\s*$//;

	# Handle q{}, qq{}, qw{} quotes
	if ($value =~ /^qq?\{(.*?)\}$/s) {
		$value = $1;
	} elsif ($value =~ /^qw\{(.*?)\}$/s) {
		$value = $1;
	} elsif ($value =~ /^q[qwx]?\s*([^a-zA-Z0-9\{\[])(.*?)\1$/s) {
		$value = $2;
	}

	# Handle quoted strings
	if ($value =~ /^(['"])(.*)\1$/s) {
		$value = $2;

		if ($from_code) {
			# In regex captures from source code, escape sequences are doubled
			# \\n in capture needs to become \n for the test
			$value =~ s/\\\\/\\/g;
		}

		# Only unescape the quote characters themselves
		$value =~ s/\\"/"/g;
		$value =~ s/\\'/'/g;

		# If NOT from code (i.e., from POD), interpret escape sequences
		unless ($from_code) {
			$value =~ s/\\n/\n/g;
			$value =~ s/\\r/\r/g;
			$value =~ s/\\t/\t/g;
			$value =~ s/\\\\/\\/g;
		}
	}

	# Sometimes trailing ) is left on
	if($value !~ /^\(/) {
		$value =~ s/\)$//;
	}

	# Handle Perl empty hash (must be before numeric/boolean checks)
	if ($value =~ /^\{\s*\}$/) {
		return {};
	}

	# Handle Perl empty list/array
	if ($value =~ /^\[\s*\]$/) {
		return [];
	}

	# Handle numeric values
	if ($value =~ /^-?\d+(?:\.\d+)?$/) {
		if ($value =~ /\./) {
			return $value + 0;
		} else {
			return int($value);
		}
	}

	# Handle boolean keywords
	if ($value =~ /^(true|false)$/i) {
		return lc($1) eq 'true' ? 1 : 0;
	}

	# Handle Perl boolean constants
	if ($value eq '1') {
		return 1;
	} elsif ($value eq '0') {
		return 0;
	}

	# Handle undef
	if ($value eq 'undef') {
		return undef;
	}

	# Handle __PACKAGE__ and similar constants
	if ($value eq '__PACKAGE__') {
		return '__PACKAGE__';
	}

	# Remove surrounding parentheses
	$value =~ s/^\((.+)\)$/$1/;

	# Handle expressions we can't evaluate
	if ($value =~ /^\$[a-zA-Z_]/ || $value =~ /\(.*\)/) {
		return if($value =~ /^\$|\@|\%/);	# The default is a value, so who knows its type?



( run in 1.480 second using v1.01-cache-2.11-cpan-df04353d9ac )