App-Test-Generator
view release on metacpan or search on metacpan
lib/App/Test/Generator/SchemaExtractor.pm view on Meta::CPAN
=item * B<Complex Validation Logic>
Sophisticated validation involving multiple parameters or external
dependencies may require manual schema refinement.
=item * B<Confidence Heuristics>
Confidence scores are based on heuristics and should be reviewed
by developers familiar with the codebase.
=item * B<Perl Idiom Recognition>
Some Perl-specific idioms may require custom pattern recognition
beyond the built-in detectors.
=item * B<Documentation Dependency>
Analysis quality improves significantly with comprehensive POD
documentation following consistent patterns.
=back
=head2 Best Practices for Optimal Results
=over 4
=item * B<Comprehensive POD Documentation>
Write detailed POD with explicit parameter documentation using
consistent patterns like C<$param - type (constraints), description>.
=item * B<Consistent Coding Patterns>
Use consistent parameter validation patterns and method signatures
throughout your codebase.
=item * B<Schema Review Process>
Review and refine automatically generated schemas, particularly
those with low confidence scores.
=item * B<Descriptive Naming>
Use descriptive method and parameter names that clearly indicate
purpose and expected types.
=item * B<Progressive Enhancement>
Start with automatically generated schemas and progressively
refine them based on test results and code understanding.
=back
The module is particularly valuable for large codebases where manual schema
creation would be prohibitively time-consuming, and for maintaining test
coverage as code evolves through continuous integration pipelines.
=head2 Advanced Type Detection
The schema extractor includes enhanced type detection capabilities that identify specialized Perl types beyond basic strings and integers.
L<DateTime> and L<Time::Piece> objects are detected through isa() checks and method call patterns, while date strings (ISO 8601, YYYY-MM-DD) and UNIX timestamps are recognized through regex validation and numeric range checks.
File handles and file paths are identified via I/O operations and file test operators, coderefs are detected through ref() checks and invocation patterns, and enum-like parameters are extracted from validation code including regex patterns (C</^(a|b|...
These detected types are preserved in the generated YAML schemas with appropriate semantic annotations, enabling test generators to create more accurate and meaningful test cases.
=head3 Example Advanced Type Schema
For a method like:
sub process_event {
my ($self, $timestamp, $status, $callback) = @_;
croak unless $timestamp > 1000000000;
croak unless $status =~ /^(active|pending|complete)$/;
croak unless ref($callback) eq 'CODE';
$callback->($timestamp, $status);
}
The extractor generates:
---
function: process_event
module: MyModule
input:
timestamp:
type: integer
# min: 0
# max: 2147483647
position: 0
_note: Unix timestamp
semantic: unix_timestamp
status:
type: string
enum:
- active
- pending
- complete
position: 1
_note: 'Must be one of: active, pending, complete'
callback:
type: coderef
position: 2
_note: 'CODE reference - provide sub { } in tests'
=head1 RELATIONSHIP DETECTION
The schema extractor detects relationships and dependencies between parameters,
enabling more sophisticated validation and test generation.
=head2 Relationship Types
=over 4
=item * B<mutually_exclusive>
Parameters that cannot be used together.
die if $file && $content; # Can't specify both
Generated schema:
relationships:
- type: mutually_exclusive
lib/App/Test/Generator/SchemaExtractor.pm view on Meta::CPAN
}
# Blessed references
elsif ($code =~ /bless\s+.*\$$param/) {
$p->{type} = 'object';
$self->_log(" CODE: $param is blessed object");
}
# Array/hash operations
if (!$p->{type}) {
if ($code =~ /\@\{\s*\$$param\s*\}/ || $code =~ /push\s*\(\s*\@?\$$param/) {
$p->{type} = 'arrayref';
} elsif ($code =~ /\%\{\s*\$$param\s*\}/ || $code =~ /\$$param\s*->\s*\{/) {
$p->{type} = 'hashref';
}
}
# Infer type from the default value if type is unknown
if (!$p->{type} && exists $p->{_default}) {
my $default = $p->{_default};
if (ref($default) eq 'HASH') {
$p->{type} = 'hashref';
$self->_log(" CODE: $param type inferred as hashref from default");
} elsif (ref($default) eq 'ARRAY') {
$p->{type} = 'arrayref';
$self->_log(" CODE: $param type inferred as arrayref from default");
}
}
# ------------------------------------------------------------
# Heuristic numeric inference (low confidence)
# ------------------------------------------------------------
if (!$p->{type}) {
# Numeric operators: + - * / % **
# Use \/(?!\/) to exclude // (defined-or) from matching as division.
if (
$code =~ /\$$param\s*(?:[\+\-\*\%]|\/(?!\/))/ ||
$code =~ /(?:[\+\-\*\%]|\/(?!\/))\s*\$$param/ ||
$code =~ /\bint\s*\(\s*\$$param\s*\)/ ||
$code =~ /\babs\s*\(\s*\$$param\s*\)/
) {
$p->{type} = 'number';
$p->{_type_confidence} = 'heuristic';
$self->_log(" CODE: $param inferred as number (numeric operator)");
}
# Numeric comparison
elsif (
$code =~ /\$$param\s*(?:==|!=|<=|>=|<|>)/ ||
$code =~ /(?:==|!=|<=|>=|<|>)\s*\$$param/
) {
$p->{type} = 'number';
$p->{_type_confidence} = 'heuristic';
$self->_log(" CODE: $param inferred as number (numeric comparison)");
}
}
}
# --------------------------------------------------
# _analyze_advanced_types
#
# Purpose: Apply enhanced type detection to a
# single parameter, checking for
# DateTime objects, file handles,
# coderefs, and enum-like constraints
# beyond what basic type inference
# can determine.
#
# Entry: $p_ref - reference to the parameter
# hashref (modified in place
# via the referenced hash).
# $param - the parameter name string.
# $code - method body source string.
#
# Exit: Returns nothing. Modifies the
# referenced parameter hashref in place.
#
# Side effects: Logs detections to stdout when
# verbose is set.
#
# Notes: Delegates to four specialised
# detectors: _detect_datetime_type,
# _detect_filehandle_type,
# _detect_coderef_type, and
# _detect_enum_type. Each detector
# returns early on first match so
# detectors are implicitly prioritised
# in that order.
# --------------------------------------------------
sub _analyze_advanced_types {
my ($self, $p_ref, $param, $code) = @_;
# Dereference once to get the hash reference
my $p = $$p_ref;
# Now pass the dereferenced hash to the detection methods
$self->_detect_datetime_type($p, $param, $code);
$self->_detect_filehandle_type($p, $param, $code);
$self->_detect_coderef_type($p, $param, $code);
$self->_detect_enum_type($p, $param, $code);
}
# --------------------------------------------------
# _detect_datetime_type
#
# Purpose: Detect DateTime objects, Time::Piece
# objects, date strings, ISO 8601
# strings, and UNIX timestamps by
# analysing code patterns involving
# the parameter.
#
# Entry: $p - parameter hashref (modified
# in place).
# $param - parameter name string.
# $code - method body source string.
#
# Exit: Returns nothing. Modifies $p in place,
# setting type, isa, semantic, min,
# matches, and/or format keys.
# Returns immediately on first match.
#
# Side effects: Logs detections to stdout when
# verbose is set.
# --------------------------------------------------
sub _detect_datetime_type {
my ($self, $p, $param, $code) = @_;
# Validate param is just a simple word
return unless defined $param && $param =~ /^\w+$/;
# DateTime object detection via isa/UNIVERSAL checks
if ($code =~ /\$$param\s*->\s*isa\s*\(\s*['"]DateTime['"]\s*\)/i) {
$p->{type} = 'object';
$p->{isa} = 'DateTime';
$p->{semantic} = 'datetime_object';
$self->_log(" ADVANCED: $param is DateTime object");
return;
}
# Check for DateTime method calls
if ($code =~ /\$$param\s*->\s*(ymd|dmy|mdy|hms|iso8601|epoch|strftime)/) {
$p->{type} = 'object';
$p->{isa} = 'DateTime';
$p->{semantic} = 'datetime_object';
$self->_log(" ADVANCED: $param uses DateTime methods");
return;
}
# Time::Piece detection
if ($code =~ /\$$param\s*->\s*isa\s*\(\s*['"]Time::Piece['"]\s*\)/i ||
$code =~ /\$$param\s*->\s*(strftime|epoch|year|mon|mday)/) {
$p->{type} = 'object';
$p->{isa} = 'Time::Piece';
$p->{semantic} = 'timepiece_object';
$self->_log(" ADVANCED: $param is Time::Piece object");
return;
}
# String date/time patterns via regex matching
if ($code =~ /\$$param\s*=~\s*\/.*?\\d\{4\}.*?\\d\{2\}.*?\\d\{2\}/) {
$p->{type} = 'string';
$p->{semantic} = 'date_string';
$p->{format} = 'YYYY-MM-DD or similar';
$self->_log(" ADVANCED: $param validated as date string pattern");
return;
}
# ISO 8601 date pattern
if ($code =~ /\$$param\s*=~\s*\/.*?[Tt].*?[Zz].*?\//) {
$p->{type} = 'string';
$p->{semantic} = 'iso8601_string';
$p->{matches} = '/^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z?$/';
$self->_log(" ADVANCED: $param validated as ISO 8601 datetime");
return;
}
# UNIX timestamp detection (numeric with specific range)
if ($code =~ /\$$param\s*>\s*\d{9,}/ || # UNIX timestamps are 10+ digits
$code =~ /time\(\s*\)\s*-\s*\$$param/ ||
$code =~ /\$$param\s*-\s*time\(\s*\)/) {
$p->{type} = 'integer';
$p->{semantic} = 'unix_timestamp';
$p->{min} = 0;
$self->_log(" ADVANCED: $param appears to be UNIX timestamp");
return;
}
# Date parsing with strptime or similar
if ($code =~ /strptime\s*\(\s*\$$param/ ||
$code =~ /DateTime::Format::\w+\s*->\s*parse_datetime\s*\(\s*\$$param/) {
$p->{type} = 'string';
$p->{semantic} = 'datetime_parseable';
$self->_log(" ADVANCED: $param is parsed as datetime");
return;
}
}
# --------------------------------------------------
# _detect_filehandle_type
#
# Purpose: Detect file handle parameters and
# file path string parameters by
# analysing I/O operations, file test
# operators, and path manipulation
# patterns involving the parameter.
#
# Entry: $p - parameter hashref (modified
# in place).
# $param - parameter name string.
# $code - method body source string.
#
# Exit: Returns nothing. Modifies $p in place,
# setting type, isa, and semantic keys.
# Returns immediately on first match.
#
# Side effects: Logs detections to stdout when
# verbose is set.
# --------------------------------------------------
sub _detect_filehandle_type {
my ($self, $p, $param, $code) = @_;
return unless defined $param && $param =~ /^\w+$/;
# File handle operations
if ($code =~ /(?:open|close|read|print|say|sysread|syswrite)\s*\(?\s*\$$param/) {
$p->{type} = 'object';
$p->{isa} = 'IO::Handle';
$p->{semantic} = 'filehandle';
$self->_log(" ADVANCED: $param is a file handle");
return;
}
# Filehandle-specific operations
if ($code =~ /\$$param\s*->\s*(readline|getline|print|say|close|flush|autoflush)/) {
$p->{type} = 'object';
$p->{isa} = 'IO::Handle';
$p->{semantic} = 'filehandle';
$self->_log(" ADVANCED: $param uses filehandle methods");
return;
}
# File test operators
if ($code =~ /(?:-[frwxoOeszlpSbctugkTBMAC])\s+\$$param/) {
$p->{type} = 'string';
$p->{semantic} = 'filepath';
$self->_log(" ADVANCED: $param is tested as file path");
return;
}
# File::Spec operations or path manipulation
lib/App/Test/Generator/SchemaExtractor.pm view on Meta::CPAN
push @comments, '# Parameter types detected:';
foreach my $note (@param_notes) {
push @comments, "# - $note";
}
}
}
# Add relationship notes
if ($schema->{relationships} && @{$schema->{relationships}}) {
push @comments, (
'#',
'# Parameter relationships detected:'
);
foreach my $rel (@{$schema->{relationships}}) {
my $desc = $rel->{description} || _format_relationship($rel);
push @comments, "# - $desc";
}
}
# Add general notes
if ($schema->{_notes} && scalar(@{$schema->{_notes}})) {
push @comments, '#';
push @comments, '# Notes:';
foreach my $note (@{$schema->{_notes}}) {
push @comments, "# - $note";
}
}
if($schema->{_analysis}) {
push @comments, (
'#',
'# Analysis:',
'# TODO:',
);
# confidence_factors:
# input:
# - No parameters found
# output:
# - 'Return type defined: object (+30)'
# - 'Total output confidence score: 30'
# - 'Medium confidence: return type defined'
# input_confidence: none
# output_confidence: medium
# overall_confidence: none
}
# Add warnings for complex types
my @warnings;
if ($schema->{input}) {
foreach my $param_name (keys %{$schema->{input}}) {
my $p = $schema->{input}{$param_name};
if ($p->{type} && $p->{type} eq 'coderef') {
push @warnings, "Parameter '$param_name' is a coderef - you'll need to provide a sub {} in tests";
}
if ($p->{semantic} && $p->{semantic} eq 'filehandle') {
push @warnings, "Parameter '$param_name' is a filehandle - consider using IO::String or mock";
}
if ($p->{isa} && $p->{isa} =~ /DateTime/) {
push @warnings, "Parameter '$param_name' requires DateTime - ensure DateTime is loaded";
}
}
}
if (@warnings) {
push @comments, '#';
push @comments, '# WARNINGS - Manual test setup may be required:';
foreach my $warning (@warnings) {
push @comments, "# ! $warning";
}
}
push @comments, '';
return join("\n", @comments);
}
# --------------------------------------------------
# _serialize_parameter_for_yaml
#
# Purpose: Convert a parameter spec hashref into
# a cleaned, YAML-serialisable form
# suitable for App::Test::Generator
# consumption, handling semantic type
# mappings, enum values, and object
# class annotations.
#
# Entry: $param - parameter spec hashref as
# produced by the merge and
# analysis pipeline.
#
# Exit: Returns a new hashref containing only
# the fields App::Test::Generator
# understands, with internal _ keys
# and semantic keys removed or converted.
#
# Side effects: None.
#
# Notes: Semantic types are mapped to
# appropriate base types with additional
# constraint and note fields.
# The original $param hashref is not
# modified.
# --------------------------------------------------
sub _serialize_parameter_for_yaml {
my ($self, $param) = @_;
my %cleaned;
# Copy basic fields that App::Test::Generator expects
foreach my $field (qw(type position optional min max matches default)) {
$cleaned{$field} = $param->{$field} if defined $param->{$field};
}
# Handle advanced type mappings
if(my $semantic = $param->{semantic}) {
if ($semantic eq 'datetime_object') {
# DateTime objects: test generator needs to know how to create them
$cleaned{type} = 'object';
$cleaned{isa} = $param->{isa} || 'DateTime';
$cleaned{_note} = 'Requires DateTime object';
} elsif ($semantic eq 'timepiece_object') {
$cleaned{type} = 'object';
$cleaned{isa} = $param->{isa} || 'Time::Piece';
$cleaned{_note} = 'Requires Time::Piece object';
} elsif ($semantic eq 'date_string') {
# Date strings: provide regex pattern
$cleaned{type} = 'string';
$cleaned{matches} ||= '/^\d{4}-\d{2}-\d{2}$/';
$cleaned{_example} = '2024-12-12';
} elsif ($semantic eq 'iso8601_string') {
$cleaned{type} = 'string';
$cleaned{matches} ||= '/^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z?$/';
$cleaned{_example} = '2024-12-12T10:30:00Z';
} elsif ($semantic eq 'unix_timestamp') {
$cleaned{type} = 'integer';
$cleaned{min} ||= 0;
$cleaned{max} ||= $INT32_MAX; # 32-bit max
$cleaned{_note} = 'UNIX timestamp';
} elsif ($semantic eq 'datetime_parseable') {
$cleaned{type} = 'string';
$cleaned{_note} = 'Must be parseable as datetime';
} elsif ($semantic eq 'filehandle') {
# File handles: special handling needed
$cleaned{type} = 'object';
$cleaned{isa} = $param->{isa} || 'IO::Handle';
$cleaned{_note} = 'File handle - may need mock in tests';
} elsif ($semantic eq 'filepath') {
# File paths: string with path pattern
$cleaned{type} = 'string';
$cleaned{matches} ||= '/^[\\w\\/.\\-_]+$/';
$cleaned{_note} = 'File path';
} elsif ($semantic eq 'callback') {
# Coderefs: mark as special type
$cleaned{type} = 'coderef';
$cleaned{_note} = 'CODE reference - provide sub { } in tests';
} elsif ($semantic eq 'enum') {
# Enum: keep as string but add valid values
$cleaned{type} = 'string';
if ($param->{enum} && ref($param->{enum}) eq 'ARRAY') {
$cleaned{enum} = $param->{enum};
$cleaned{_note} = 'Must be one of: ' . join(', ', @{$param->{enum}});
}
}
}
# Handle memberof even if not marked with semantic.
# enum and memberof are mutually exclusive â only set memberof when enum
# is not already being output (avoids the "has both" validation error).
if($param->{enum} && ref($param->{enum}) eq 'ARRAY' && !$cleaned{enum}) {
$cleaned{memberof} = $param->{enum};
}
if($param->{memberof} && ref($param->{memberof}) eq 'ARRAY') {
$cleaned{memberof} = $param->{memberof};
}
# Handle object class
if ($param->{isa} && !$cleaned{isa}) {
$cleaned{isa} = $param->{isa};
}
( run in 1.180 second using v1.01-cache-2.11-cpan-e93a5daba3e )