App-Test-Generator
view release on metacpan or search on metacpan
lib/App/Test/Generator/SchemaExtractor.pm view on Meta::CPAN
die if $mode eq 'secure' && !$key;
Generated schema:
relationships:
- type: value_conditional
if: mode
equals: secure
then_required: key
description: When mode equals 'secure', key is required
=back
=head2 Default Value Extraction
The extractor comprehensively extracts default values from both code and POD documentation:
=head3 Code Pattern Recognition
Extracts defaults from multiple Perl idioms:
=over 4
=item * Logical OR operator: C<$param = $param || 'default'>
=item * Defined-or operator: C<$param //= 'default'>
=item * Ternary operator: C<$param = defined $param ? $param : 'default'>
=item * Unless conditional: C<$param = 'default' unless defined $param>
=item * Chained defaults: C<$param = $param || $self->{_default} || 'fallback'>
=item * Multi-line patterns: C<$param = {} unless $param>
=back
=head3 POD Pattern Recognition
Extracts defaults from documentation:
=over 4
=item * Standard format: C<Default: 'value'>
=item * Alternative format: C<Defaults to: 'value'>
=item * Inline format: C<Optional, default: 'value'>
=item * Parameter lists: C<$param - type, default 'value'>
=back
=head3 Value Processing
Properly handles:
=over 4
=item * String literals with quotes and escape sequences
=item * Numeric values (integers and floats)
=item * Boolean values (true/false converted to 1/0)
=item * Empty data structures ([] and {})
=item * Special values (undef, __PACKAGE__)
=item * Complex expressions (preserved as-is when unevaluatable)
=item * Quote operators (q{}, qq{}, qw{})
=back
=head3 Type Inference
When a parameter has a default value but no explicit type annotation,
the type is automatically inferred from the default:
$options = {} # inferred as hashref
$items = [] # inferred as arrayref
$count = 42 # inferred as integer
$ratio = 3.14 # inferred as number
$enabled = 1 # inferred as boolean
=head2 Context-Aware Return Analysis
The extractor provides comprehensive analysis of method return behavior,
including context sensitivity, error handling conventions, and method chaining patterns.
When a method's POD contains a C<=head4 Output> block in
L<Params::Validate::Strict> schema format, the C<type> declared there is
used as the authoritative output type and takes precedence over all
heuristic code analysis:
=head4 Output
{
type => 'hashref',
}
This is the recommended way to document methods whose return type would
otherwise be misidentified (e.g. a method that returns C<$self-E<gt>{cache}>
where the cache happens to hold a hashref).
Using parentheses as the outer container emits C<type: array>, indicating a
list-returning method. L<App::Test::Generator> 0.39+ (with L<Test::Returns>
0.03+) captures these results in list context automatically:
=head4 Output
(
{
type => 'hashref',
},
...
)
=head3 List vs Scalar Context Detection
lib/App/Test/Generator/SchemaExtractor.pm view on Meta::CPAN
$code =~ /(?:basename|dirname|fileparse)\s*\(\s*\$$param/) {
$p->{type} = 'string';
$p->{semantic} = 'filepath';
$self->_log(" ADVANCED: $param manipulated as file path");
return;
}
# Path validation patterns
# Only match a literal path assigned or defaulted to this variable
if(defined $p->{_default} && $p->{_default} =~ m{^([A-Za-z]:\\|/|\./|\.\./)}) {
$p->{type} = 'string';
$p->{semantic} = 'filepath';
$self->_log(" ADVANCED: $param default looks like a path");
return;
}
# IO::File detection
if ($code =~ /\$$param\s*->\s*isa\s*\(\s*['"]IO::File['"]\s*\)/ ||
$code =~ /IO::File\s*->\s*new\s*\(\s*\$$param/) {
$p->{type} = 'object';
$p->{isa} = 'IO::File';
$p->{semantic} = 'filehandle';
$self->_log(" ADVANCED: $param is IO::File object");
return;
}
}
# --------------------------------------------------
# _detect_coderef_type
#
# Purpose: Detect coderef and callback parameters
# by analysing ref() checks, invocation
# patterns, and parameter naming
# conventions.
#
# Entry: $p - parameter hashref (modified
# in place).
# $param - parameter name string.
# $code - method body source string.
#
# Exit: Returns nothing. Modifies $p in place,
# setting type and semantic keys.
# Returns immediately on first match.
#
# Side effects: Logs detections to stdout when
# verbose is set.
# --------------------------------------------------
sub _detect_coderef_type {
my ($self, $p, $param, $code) = @_;
return unless defined $param && $param =~ /^\w+$/;
# ref() check for CODE
if ($code =~ /ref\s*\(\s*\$$param\s*\)\s*eq\s*['"]CODE['"]/i) {
$p->{type} = 'coderef';
$p->{semantic} = 'callback';
$self->_log(" ADVANCED: $param is coderef (ref check)");
return;
}
# Invocation as coderef - note the escaped @ in \@_
if ($code =~ /\$$param\s*->\s*\(/ ||
$code =~ /\$$param\s*->\s*\(\s*\@_\s*\)/ ||
$code =~ /&\s*\{\s*\$$param\s*\}/) {
$p->{type} = 'coderef';
$p->{semantic} = 'callback';
$self->_log(" ADVANCED: $param invoked as coderef");
return;
}
# Parameter name suggests callback
if ($param =~ /^(?:callback|cb|handler|sub|code|fn|func|on_\w+)$/i) {
$p->{type} = 'coderef';
$p->{semantic} = 'callback';
$self->_log(" ADVANCED: $param name suggests coderef");
return;
}
# Blessed coderef (unusual but valid)
if ($code =~ /blessed\s*\(\s*\$$param\s*\)/ &&
$code =~ /ref\s*\(\s*\$$param\s*\)\s*eq\s*['"]CODE['"]/i) {
$p->{type} = 'object';
$p->{isa} = 'blessed_coderef';
$p->{semantic} = 'callback';
$self->_log(" ADVANCED: $param is blessed coderef");
return;
}
}
# --------------------------------------------------
# _detect_enum_type
#
# Purpose: Detect enum-like parameters whose
# valid values are a fixed set, by
# analysing validation patterns
# including regex alternations, hash
# lookups, grep checks, given/when,
# if/elsif chains, and smart match.
#
# Entry: $p - parameter hashref (modified
# in place).
# $param - parameter name string.
# $code - method body source string.
#
# Exit: Returns nothing. Modifies $p in place,
# setting type, enum, and semantic keys.
# Returns immediately on first match.
#
# Side effects: Logs detections to stdout when
# verbose is set.
#
# Notes: Requires at least 3 if/elsif branches
# for pattern 5 to avoid false positives
# from ordinary conditional code.
# --------------------------------------------------
sub _detect_enum_type {
my ($self, $p, $param, $code) = @_;
return unless defined $param && $param =~ /^\w+$/;
# Pattern 1: die/croak unless value is in list
lib/App/Test/Generator/SchemaExtractor.pm view on Meta::CPAN
my $val = defined $2 ? $2 : defined $3 ? $3 : $4;
$kv{$key} = $val;
}
push @examples, {
style => 'named',
source => 'pod',
args => \%kv,
function => $method, # TODO: add a sanity check this is what we expect
} if %kv;
}
unless(scalar(@examples)) {
# Positional calls: func($a, $b)
while ($synopsis =~ /\b(\w+)\s*\(\s*(.*?)\s*\)/sg) {
my ($func, $argstr) = ($1, $2);
# next if $func eq 'new'; # already handled
my @args = map { s/^\s+|\s+$//gr } split /\s*,\s*/, $argstr;
next unless @args;
push @examples, {
style => 'positional',
source => 'pod',
function => $func,
args => \@args,
};
}
}
if (scalar(@examples)) {
$hints->{valid_inputs} ||= [];
push @{ $hints->{valid_inputs} }, @examples;
$self->_log(" POD: extracted " . scalar(@examples) . " example call(s)");
}
for my $k (qw(boundary_values invalid_inputs valid_inputs equivalence_classes)) {
$hints->{$k} //= [];
}
return $hints;
}
# --------------------------------------------------
# _clean_default_value
#
# Purpose: Normalise a raw default value string
# extracted from code or POD into a
# clean Perl scalar, handling quoted
# strings, numeric literals, boolean
# keywords, empty containers, and
# undef.
#
# Entry: $value - raw value string.
# May be undef.
# $from_code - true if the value was
# extracted from source
# code (affects escape
# sequence handling).
#
# Exit: Returns the cleaned value:
# undef for undef or unparseable
# {} for empty hashrefs
# [] for empty arrayrefs
# integer for whole numbers
# float for decimal numbers
# 1 or 0 for boolean keywords
# string for everything else
#
# Side effects: None.
# --------------------------------------------------
sub _clean_default_value {
my ($self, $value, $from_code) = @_;
return unless defined $value;
# Remove leading/trailing whitespace
$value =~ s/^\s+|\s+$//g;
# Remove parenthetical notes like "(no password)" only if there's content before them
$value =~ s/(\S+)\s*\([^)]+\)\s*$/$1/;
$value =~ s/^\s+|\s+$//g;
# Handle chained || or // operators - extract the rightmost value
if ($value =~ /\|\||\/{2}/) {
my @parts = split(/\s*(?:\|\||\/{2})\s*/, $value);
$value = $parts[-1];
$value =~ s/^\s+|\s+$//g;
}
# Remove trailing semicolon if present
$value =~ s/;\s*$//;
# Handle q{}, qq{}, qw{} quotes
if ($value =~ /^qq?\{(.*?)\}$/s) {
$value = $1;
} elsif ($value =~ /^qw\{(.*?)\}$/s) {
$value = $1;
} elsif ($value =~ /^q[qwx]?\s*([^a-zA-Z0-9\{\[])(.*?)\1$/s) {
$value = $2;
}
# Handle quoted strings
if ($value =~ /^(['"])(.*)\1$/s) {
$value = $2;
if ($from_code) {
# In regex captures from source code, escape sequences are doubled
# \\n in capture needs to become \n for the test
$value =~ s/\\\\/\\/g;
}
# Only unescape the quote characters themselves
$value =~ s/\\"/"/g;
$value =~ s/\\'/'/g;
# If NOT from code (i.e., from POD), interpret escape sequences
unless ($from_code) {
$value =~ s/\\n/\n/g;
$value =~ s/\\r/\r/g;
$value =~ s/\\t/\t/g;
$value =~ s/\\\\/\\/g;
}
}
# Sometimes trailing ) is left on
if($value !~ /^\(/) {
$value =~ s/\)$//;
}
# Handle Perl empty hash (must be before numeric/boolean checks)
if ($value =~ /^\{\s*\}$/) {
return {};
}
# Handle Perl empty list/array
if ($value =~ /^\[\s*\]$/) {
return [];
}
# Handle numeric values
if ($value =~ /^-?\d+(?:\.\d+)?$/) {
if ($value =~ /\./) {
return $value + 0;
} else {
return int($value);
}
}
# Handle boolean keywords
if ($value =~ /^(true|false)$/i) {
return lc($1) eq 'true' ? 1 : 0;
}
# Handle Perl boolean constants
if ($value eq '1') {
return 1;
} elsif ($value eq '0') {
return 0;
}
# Handle undef
if ($value eq 'undef') {
return undef;
}
# Handle __PACKAGE__ and similar constants
if ($value eq '__PACKAGE__') {
return '__PACKAGE__';
}
# Remove surrounding parentheses
$value =~ s/^\((.+)\)$/$1/;
# Handle expressions we can't evaluate
if ($value =~ /^\$[a-zA-Z_]/ || $value =~ /\(.*\)/) {
return if($value =~ /^\$|\@|\%/); # The default is a value, so who knows its type?
( run in 1.480 second using v1.01-cache-2.11-cpan-df04353d9ac )