App-Test-Generator

 view release on metacpan or  search on metacpan

lib/App/Test/Generator/SchemaExtractor.pm  view on Meta::CPAN

=item * B<Complex Validation Logic>

Sophisticated validation involving multiple parameters or external
dependencies may require manual schema refinement.

=item * B<Confidence Heuristics>

Confidence scores are based on heuristics and should be reviewed
by developers familiar with the codebase.

=item * B<Perl Idiom Recognition>

Some Perl-specific idioms may require custom pattern recognition
beyond the built-in detectors.

=item * B<Documentation Dependency>

Analysis quality improves significantly with comprehensive POD
documentation following consistent patterns.

=back

=head2 Best Practices for Optimal Results

=over 4

=item * B<Comprehensive POD Documentation>

Write detailed POD with explicit parameter documentation using
consistent patterns like C<$param - type (constraints), description>.

=item * B<Consistent Coding Patterns>

Use consistent parameter validation patterns and method signatures
throughout your codebase.

=item * B<Schema Review Process>

Review and refine automatically generated schemas, particularly
those with low confidence scores.

=item * B<Descriptive Naming>

Use descriptive method and parameter names that clearly indicate
purpose and expected types.

=item * B<Progressive Enhancement>

Start with automatically generated schemas and progressively
refine them based on test results and code understanding.

=back

The module is particularly valuable for large codebases where manual schema
creation would be prohibitively time-consuming, and for maintaining test
coverage as code evolves through continuous integration pipelines.

=head2 Advanced Type Detection

The schema extractor includes enhanced type detection capabilities that identify specialized Perl types beyond basic strings and integers.
L<DateTime> and L<Time::Piece> objects are detected through isa() checks and method call patterns, while date strings (ISO 8601, YYYY-MM-DD) and UNIX timestamps are recognized through regex validation and numeric range checks.
File handles and file paths are identified via I/O operations and file test operators, coderefs are detected through ref() checks and invocation patterns, and enum-like parameters are extracted from validation code including regex patterns (C</^(a|b|...
These detected types are preserved in the generated YAML schemas with appropriate semantic annotations, enabling test generators to create more accurate and meaningful test cases.

=head3 Example Advanced Type Schema

For a method like:

    sub process_event {
        my ($self, $timestamp, $status, $callback) = @_;
        croak unless $timestamp > 1000000000;
        croak unless $status =~ /^(active|pending|complete)$/;
        croak unless ref($callback) eq 'CODE';
        $callback->($timestamp, $status);
    }

The extractor generates:

    ---
    function: process_event
    module: MyModule
    input:
      timestamp:
        type: integer
        # min: 0
        # max: 2147483647
        position: 0
        _note: Unix timestamp
	semantic: unix_timestamp
      status:
        type: string
        enum:
          - active
          - pending
          - complete
        position: 1
        _note: 'Must be one of: active, pending, complete'
      callback:
        type: coderef
        position: 2
        _note: 'CODE reference - provide sub { } in tests'

=head1 RELATIONSHIP DETECTION

The schema extractor detects relationships and dependencies between parameters,
enabling more sophisticated validation and test generation.

=head2 Relationship Types

=over 4

=item * B<mutually_exclusive>

Parameters that cannot be used together.

    die if $file && $content;  # Can't specify both

Generated schema:

    relationships:
      - type: mutually_exclusive

lib/App/Test/Generator/SchemaExtractor.pm  view on Meta::CPAN

	}
	# Blessed references
	elsif ($code =~ /bless\s+.*\$$param/) {
		$p->{type} = 'object';
		$self->_log("  CODE: $param is blessed object");
	}
	# Array/hash operations
	if (!$p->{type}) {
		if ($code =~ /\@\{\s*\$$param\s*\}/ || $code =~ /push\s*\(\s*\@?\$$param/) {
			$p->{type} = 'arrayref';
		} elsif ($code =~ /\%\{\s*\$$param\s*\}/ || $code =~ /\$$param\s*->\s*\{/) {
			$p->{type} = 'hashref';
		}
	}

	# Infer type from the default value if type is unknown
	if (!$p->{type} && exists $p->{_default}) {
		my $default = $p->{_default};
		if (ref($default) eq 'HASH') {
			$p->{type} = 'hashref';
			$self->_log("  CODE: $param type inferred as hashref from default");
		} elsif (ref($default) eq 'ARRAY') {
			$p->{type} = 'arrayref';
			$self->_log("  CODE: $param type inferred as arrayref from default");
		}
	}

	# ------------------------------------------------------------
	# Heuristic numeric inference (low confidence)
	# ------------------------------------------------------------
	if (!$p->{type}) {
		# Numeric operators: + - * / % **
		# Use \/(?!\/) to exclude // (defined-or) from matching as division.
		if (
			$code =~ /\$$param\s*(?:[\+\-\*\%]|\/(?!\/))/ ||
			$code =~ /(?:[\+\-\*\%]|\/(?!\/))\s*\$$param/ ||
			$code =~ /\bint\s*\(\s*\$$param\s*\)/ ||
			$code =~ /\babs\s*\(\s*\$$param\s*\)/
		) {
			$p->{type} = 'number';
			$p->{_type_confidence} = 'heuristic';
			$self->_log("  CODE: $param inferred as number (numeric operator)");
		}
		# Numeric comparison
		elsif (
			$code =~ /\$$param\s*(?:==|!=|<=|>=|<|>)/ ||
			$code =~ /(?:==|!=|<=|>=|<|>)\s*\$$param/
		) {
			$p->{type} = 'number';
			$p->{_type_confidence} = 'heuristic';
			$self->_log("  CODE: $param inferred as number (numeric comparison)");
		}
	}
}

# --------------------------------------------------
# _analyze_advanced_types
#
# Purpose:    Apply enhanced type detection to a
#             single parameter, checking for
#             DateTime objects, file handles,
#             coderefs, and enum-like constraints
#             beyond what basic type inference
#             can determine.
#
# Entry:      $p_ref - reference to the parameter
#                      hashref (modified in place
#                      via the referenced hash).
#             $param - the parameter name string.
#             $code  - method body source string.
#
# Exit:       Returns nothing. Modifies the
#             referenced parameter hashref in place.
#
# Side effects: Logs detections to stdout when
#               verbose is set.
#
# Notes:      Delegates to four specialised
#             detectors: _detect_datetime_type,
#             _detect_filehandle_type,
#             _detect_coderef_type, and
#             _detect_enum_type. Each detector
#             returns early on first match so
#             detectors are implicitly prioritised
#             in that order.
# --------------------------------------------------
sub _analyze_advanced_types {
	my ($self, $p_ref, $param, $code) = @_;

	# Dereference once to get the hash reference
	my $p = $$p_ref;

	# Now pass the dereferenced hash to the detection methods
	$self->_detect_datetime_type($p, $param, $code);
	$self->_detect_filehandle_type($p, $param, $code);
	$self->_detect_coderef_type($p, $param, $code);
	$self->_detect_enum_type($p, $param, $code);
}

# --------------------------------------------------
# _detect_datetime_type
#
# Purpose:    Detect DateTime objects, Time::Piece
#             objects, date strings, ISO 8601
#             strings, and UNIX timestamps by
#             analysing code patterns involving
#             the parameter.
#
# Entry:      $p     - parameter hashref (modified
#                      in place).
#             $param - parameter name string.
#             $code  - method body source string.
#
# Exit:       Returns nothing. Modifies $p in place,
#             setting type, isa, semantic, min,
#             matches, and/or format keys.
#             Returns immediately on first match.
#
# Side effects: Logs detections to stdout when
#               verbose is set.
# --------------------------------------------------
sub _detect_datetime_type {
	my ($self, $p, $param, $code) = @_;

	# Validate param is just a simple word
	return unless defined $param && $param =~ /^\w+$/;

	# DateTime object detection via isa/UNIVERSAL checks
	if ($code =~ /\$$param\s*->\s*isa\s*\(\s*['"]DateTime['"]\s*\)/i) {
		$p->{type} = 'object';
		$p->{isa} = 'DateTime';
		$p->{semantic} = 'datetime_object';
		$self->_log("  ADVANCED: $param is DateTime object");
		return;
	}

	# Check for DateTime method calls
	if ($code =~ /\$$param\s*->\s*(ymd|dmy|mdy|hms|iso8601|epoch|strftime)/) {
		$p->{type} = 'object';
		$p->{isa} = 'DateTime';
		$p->{semantic} = 'datetime_object';
		$self->_log("  ADVANCED: $param uses DateTime methods");
		return;
	}

	# Time::Piece detection
	if ($code =~ /\$$param\s*->\s*isa\s*\(\s*['"]Time::Piece['"]\s*\)/i ||
	    $code =~ /\$$param\s*->\s*(strftime|epoch|year|mon|mday)/) {
		$p->{type} = 'object';
		$p->{isa} = 'Time::Piece';
		$p->{semantic} = 'timepiece_object';
		$self->_log("  ADVANCED: $param is Time::Piece object");
		return;
	}

	# String date/time patterns via regex matching
	if ($code =~ /\$$param\s*=~\s*\/.*?\\d\{4\}.*?\\d\{2\}.*?\\d\{2\}/) {
		$p->{type} = 'string';
		$p->{semantic} = 'date_string';
		$p->{format} = 'YYYY-MM-DD or similar';
		$self->_log("  ADVANCED: $param validated as date string pattern");
		return;
	}

	# ISO 8601 date pattern
	if ($code =~ /\$$param\s*=~\s*\/.*?[Tt].*?[Zz].*?\//) {
		$p->{type} = 'string';
		$p->{semantic} = 'iso8601_string';
		$p->{matches} = '/^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z?$/';
		$self->_log("  ADVANCED: $param validated as ISO 8601 datetime");
		return;
	}

	# UNIX timestamp detection (numeric with specific range)
	if ($code =~ /\$$param\s*>\s*\d{9,}/ || # UNIX timestamps are 10+ digits
	    $code =~ /time\(\s*\)\s*-\s*\$$param/ ||
	    $code =~ /\$$param\s*-\s*time\(\s*\)/) {
		$p->{type} = 'integer';
		$p->{semantic} = 'unix_timestamp';
		$p->{min} = 0;
		$self->_log("  ADVANCED: $param appears to be UNIX timestamp");
		return;
	}

	# Date parsing with strptime or similar
	if ($code =~ /strptime\s*\(\s*\$$param/ ||
	    $code =~ /DateTime::Format::\w+\s*->\s*parse_datetime\s*\(\s*\$$param/) {
		$p->{type} = 'string';
		$p->{semantic} = 'datetime_parseable';
		$self->_log("  ADVANCED: $param is parsed as datetime");
		return;
	}
}

# --------------------------------------------------
# _detect_filehandle_type
#
# Purpose:    Detect file handle parameters and
#             file path string parameters by
#             analysing I/O operations, file test
#             operators, and path manipulation
#             patterns involving the parameter.
#
# Entry:      $p     - parameter hashref (modified
#                      in place).
#             $param - parameter name string.
#             $code  - method body source string.
#
# Exit:       Returns nothing. Modifies $p in place,
#             setting type, isa, and semantic keys.
#             Returns immediately on first match.
#
# Side effects: Logs detections to stdout when
#               verbose is set.
# --------------------------------------------------
sub _detect_filehandle_type {
	my ($self, $p, $param, $code) = @_;

	return unless defined $param && $param =~ /^\w+$/;

	# File handle operations
	if ($code =~ /(?:open|close|read|print|say|sysread|syswrite)\s*\(?\s*\$$param/) {
		$p->{type} = 'object';
		$p->{isa} = 'IO::Handle';
		$p->{semantic} = 'filehandle';
		$self->_log("  ADVANCED: $param is a file handle");
		return;
	}

	# Filehandle-specific operations
	if ($code =~ /\$$param\s*->\s*(readline|getline|print|say|close|flush|autoflush)/) {
		$p->{type} = 'object';
		$p->{isa} = 'IO::Handle';
		$p->{semantic} = 'filehandle';
		$self->_log("  ADVANCED: $param uses filehandle methods");
		return;
	}

	# File test operators
	if ($code =~ /(?:-[frwxoOeszlpSbctugkTBMAC])\s+\$$param/) {
		$p->{type} = 'string';
		$p->{semantic} = 'filepath';
		$self->_log("  ADVANCED: $param is tested as file path");
		return;
	}

	# File::Spec operations or path manipulation

lib/App/Test/Generator/SchemaExtractor.pm  view on Meta::CPAN

			push @comments, '# Parameter types detected:';
			foreach my $note (@param_notes) {
				push @comments, "#   - $note";
			}
		}
	}

	# Add relationship notes
	if ($schema->{relationships} && @{$schema->{relationships}}) {
		push @comments, (
			'#',
			'# Parameter relationships detected:'
		);
		foreach my $rel (@{$schema->{relationships}}) {
			my $desc = $rel->{description} || _format_relationship($rel);
			push @comments, "#   - $desc";
		}
	}

	# Add general notes
	if ($schema->{_notes} && scalar(@{$schema->{_notes}})) {
		push @comments, '#';
		push @comments, '# Notes:';
		foreach my $note (@{$schema->{_notes}}) {
			push @comments, "#   - $note";
		}
	}

	if($schema->{_analysis}) {
		push @comments, (
			'#',
			'# Analysis:',
			'# TODO:',
		);
		# confidence_factors:
		#   input:
		#   - No parameters found
		#   output:
		#   - 'Return type defined: object (+30)'
		#   - 'Total output confidence score: 30'
		#   - 'Medium confidence: return type defined'
		#   input_confidence: none
		#   output_confidence: medium
		#   overall_confidence: none
	}

	# Add warnings for complex types
	my @warnings;
	if ($schema->{input}) {
		foreach my $param_name (keys %{$schema->{input}}) {
			my $p = $schema->{input}{$param_name};

			if ($p->{type} && $p->{type} eq 'coderef') {
				push @warnings, "Parameter '$param_name' is a coderef - you'll need to provide a sub {} in tests";
			}

			if ($p->{semantic} && $p->{semantic} eq 'filehandle') {
				push @warnings, "Parameter '$param_name' is a filehandle - consider using IO::String or mock";
			}

			if ($p->{isa} && $p->{isa} =~ /DateTime/) {
				push @warnings, "Parameter '$param_name' requires DateTime - ensure DateTime is loaded";
			}
		}
	}

	if (@warnings) {
		push @comments, '#';
		push @comments, '# WARNINGS - Manual test setup may be required:';
		foreach my $warning (@warnings) {
			push @comments, "#   ! $warning";
		}
	}

	push @comments, '';

	return join("\n", @comments);
}

# --------------------------------------------------
# _serialize_parameter_for_yaml
#
# Purpose:    Convert a parameter spec hashref into
#             a cleaned, YAML-serialisable form
#             suitable for App::Test::Generator
#             consumption, handling semantic type
#             mappings, enum values, and object
#             class annotations.
#
# Entry:      $param - parameter spec hashref as
#                      produced by the merge and
#                      analysis pipeline.
#
# Exit:       Returns a new hashref containing only
#             the fields App::Test::Generator
#             understands, with internal _ keys
#             and semantic keys removed or converted.
#
# Side effects: None.
#
# Notes:      Semantic types are mapped to
#             appropriate base types with additional
#             constraint and note fields.
#             The original $param hashref is not
#             modified.
# --------------------------------------------------
sub _serialize_parameter_for_yaml {
	my ($self, $param) = @_;

	my %cleaned;

	# Copy basic fields that App::Test::Generator expects
	foreach my $field (qw(type position optional min max matches default)) {
		$cleaned{$field} = $param->{$field} if defined $param->{$field};
	}

	# Handle advanced type mappings
	if(my $semantic = $param->{semantic}) {
		if ($semantic eq 'datetime_object') {
			# DateTime objects: test generator needs to know how to create them
			$cleaned{type} = 'object';
			$cleaned{isa} = $param->{isa} || 'DateTime';
			$cleaned{_note} = 'Requires DateTime object';
		} elsif ($semantic eq 'timepiece_object') {
			$cleaned{type} = 'object';
			$cleaned{isa} = $param->{isa} || 'Time::Piece';
			$cleaned{_note} = 'Requires Time::Piece object';
		} elsif ($semantic eq 'date_string') {
			# Date strings: provide regex pattern
			$cleaned{type} = 'string';
			$cleaned{matches} ||= '/^\d{4}-\d{2}-\d{2}$/';
			$cleaned{_example} = '2024-12-12';
		} elsif ($semantic eq 'iso8601_string') {
			$cleaned{type} = 'string';
			$cleaned{matches} ||= '/^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z?$/';
			$cleaned{_example} = '2024-12-12T10:30:00Z';
		} elsif ($semantic eq 'unix_timestamp') {
			$cleaned{type} = 'integer';
			$cleaned{min} ||= 0;
			$cleaned{max} ||= $INT32_MAX;	# 32-bit max
			$cleaned{_note} = 'UNIX timestamp';
		} elsif ($semantic eq 'datetime_parseable') {
			$cleaned{type} = 'string';
			$cleaned{_note} = 'Must be parseable as datetime';
		} elsif ($semantic eq 'filehandle') {
			# File handles: special handling needed
			$cleaned{type} = 'object';
			$cleaned{isa} = $param->{isa} || 'IO::Handle';
			$cleaned{_note} = 'File handle - may need mock in tests';
		} elsif ($semantic eq 'filepath') {
			# File paths: string with path pattern
			$cleaned{type} = 'string';
			$cleaned{matches} ||= '/^[\\w\\/.\\-_]+$/';
			$cleaned{_note} = 'File path';
		} elsif ($semantic eq 'callback') {
			# Coderefs: mark as special type
			$cleaned{type} = 'coderef';
			$cleaned{_note} = 'CODE reference - provide sub { } in tests';
		} elsif ($semantic eq 'enum') {
			# Enum: keep as string but add valid values
			$cleaned{type} = 'string';
			if ($param->{enum} && ref($param->{enum}) eq 'ARRAY') {
				$cleaned{enum} = $param->{enum};
				$cleaned{_note} = 'Must be one of: ' . join(', ', @{$param->{enum}});
			}
		}
	}

	# Handle memberof even if not marked with semantic.
	# enum and memberof are mutually exclusive — only set memberof when enum
	# is not already being output (avoids the "has both" validation error).
	if($param->{enum} && ref($param->{enum}) eq 'ARRAY' && !$cleaned{enum}) {
		$cleaned{memberof} = $param->{enum};
	}
	if($param->{memberof} && ref($param->{memberof}) eq 'ARRAY') {
		$cleaned{memberof} = $param->{memberof};
	}

	# Handle object class
	if ($param->{isa} && !$cleaned{isa}) {
		$cleaned{isa} = $param->{isa};
	}



( run in 1.180 second using v1.01-cache-2.11-cpan-e93a5daba3e )