formula results from the CPAN

formula

Chemistry-Mol

view release on metacpan or search on metacpan

0.38  Apr 9 2021

        - Fixed Chemistry::File doc bug (David Westbrook, bug 20067).
        - Fixed unescaped left brace in regex (Jim Keenan, bug 115215).
        - Switched to Dist::Zilla.
        - Adjusted URLs in documentation, as the source has been hosted on
          GitHub.
        - New co-maintainer Andrius Merkys <merkys@cpan.org>.

0.37  May 10 2009
        - Parse non-integer formulas (Daniel Scott).
        - Fixed some typos.
        - Fixed bug in Chemistry::File, where $self->mols wasn't updated
          during the read loop.
        - Fixed bug where $atom->symbol tried to modify the symbol given
          (which crashed when the symbol was a constant!)
        - Added an undocumented "next_id" method (Liliana Felix Avila).
        - Documented the descriptor methods that existed since 0.36 but were
          undocumented.

0.36  Sep 20 2005
        - Sort formulas in Hill order.
        - Fixed calc_implicit_hydrogens for halides.

0.35  May 20 2005
        - Added formal_radical atom property.
        - Compatibility with Storable-2.14, which already takes care of
          weak references.
        - calc_implicit_hydrogens, add_implicit_hydrogens

0.34  May 16 2005
        - Fixed $atom->bonds duplication on $mol->separate (bug 1173237)

Changes view on Meta::CPAN

        - Fixed POD bug.

0.22  May 17 2004
        - Fixed bug in bonds($from)
        - Added add_atom_np, add_bond_np, bonds_neighbors
        - Fixed another memory leak
        - Added Atom::formal_charge

0.21  May 13 2004
        - Fixed bug where $/ was undef'ed in a nonlocal way in File.pm.
        - Added formula parser contributed by Brent Gregersen.
        - Added %j and %% formats to Formula.pm.

0.20  May 06 2004
        - Use Scalar::Util::weaken to avoid strong cyclic references
          and ensure garbage collection.
        - New methods for Chemistry::Mol: delete_atom, delete_bond,
          clone, combine, separate, distance
        - New methods for Chemistry::Atom: angle, dihedral, angle_deg, 
          dihedral_deg, delete
        - New methods for Chemistry::Bond: delete
        - Chemistry::Mol can export read_mol
        - Chemistry::Atom can export distance, angle, dihedral

0.11  Feb 22 2004
        - New methods for Chemistry::Mol: mass, formula, formula_hash
        - New methods for Chemistry::Atom: mass
        - New module: Chemistry::File::Formula

0.10  Nov 03 2003
        - New methods for Chemistry::Atom: distance
        - New methods for Chemistry::Obj: del_attr
        - New subroutines for Chemistry::Mol: read_mol, register_format
        - New class method for Mol, Atom, and Bond: reset_id
        - Incompatible changes:
            - Changed File I/O API. It should be more stable now, but it's

MANIFEST view on Meta::CPAN

t/Formula.t
t/Mol.t
t/Obj.t
t/add_implicit_h.t
t/change_id.t
t/compat.t
t/delete.t
t/descriptor.t
t/empty.mol
t/exception.t
t/formula_tests.txt
t/geom.t
t/graph.t
t/list.txt
t/mem.t
t/mol.pl
t/mol.pl.gz
t/pod.t
t/safe_clone.t
t/zlib.t

lib/Chemistry/File/Formula.pm view on Meta::CPAN

# $Id$

use strict;
use base "Chemistry::File";
use Chemistry::Mol;
use Carp;
use Text::Balanced qw(extract_bracketed);

=head1 NAME

Chemistry::File::Formula - Molecular formula reader/formatter

=head1 SYNOPSIS

    use Chemistry::File::Formula;

    my $mol = Chemistry::Mol->parse("H2O");
    print $mol->print(format => formula);
    print $mol->formula;    # this is a shorthand for the above 
    print $mol->print(format => formula, 
        formula_format => "%s%d{<sub>%d</sub>});

=cut

Chemistry::Mol->register_format('formula');

=head1 DESCRIPTION

This module converts a molecule object to a string with the formula and back.
It registers the 'formula' format with Chemistry::Mol.  Besides its obvious
use, it is included in the Chemistry::Mol distribution because it is a very
simple example of a Chemistry::File derived I/O module.

=head2 Writing formulas

The format can be specified as a printf-like string with the following control
sequences, which are specified with the formula_format parameter to $mol->print
or $mol->write.

=over

=item %s  symbol

=item %D  number of atoms

=item %d  number of atoms, included only when it is greater than one

lib/Chemistry/File/Formula.pm view on Meta::CPAN


=item %j{substr}  substr is inserted between the formatted string for each
element. (The 'j' stands for 'joiner'.) The format should have only one joiner,
but its location in the format string doesn't matter.

=item %% a percent sign

=back

If no format is specified, the default is "%s%d". Some examples follow. Let's
assume that the formula is C2H6O, as it would be formatted by default.

=over

=item C<< %s%D >>

Like the default, but include explicit indices for all atoms. 
The formula would be formatted as "C2H6O1"

=item C<< %s%d{E<lt>subE<gt>%dE<lt>/subE<gt>} >>

HTML format. The output would be
"CE<lt>subE<gt>2E<lt>/subE<gt>HE<lt>subE<gt>6E<lt>/subE<gt>O".

=item C<< %D %s%j{, } >>

Use a comma followed by a space as a joiner. The output would be 
"2 C, 6 H, 1 O".

=back

=head3 Symbol Sort Order

The elements in the formula are sorted by default in the "Hill order", which
means that:

1) if the formula contains carbon, C goes first, followed by H,
and the rest of the symbols in alphabetical order. For example, "CH2BrF".

2) if there is no carbon, all the symbols (including H) are listed
alphabetically.  For example, "BrH".

It is possible to supply a custom sorting subroutine with the 'formula_sort'
option. It expects a subroutine reference that takes a hash reference
describing the formula (similar to what is returned by parse_formula, discussed
below), and that returns a list of symbols in the desired order.

For example, this will sort the symbols in reverse asciibetical order:

    my $formula = $mol->print(
        format          => 'formula',
        formula_sort    => sub {
            my $formula_hash = shift;
            return reverse sort keys %$formula_hash;
        }
    );

=head2 Parsing Formulas

Formulas can also be parsed back into Chemistry::Mol objects.
The formula may have parentheses and square or triangular brackets, and 
it may have the following abbreviations:

    Me => '(CH3)',
    Et => '(CH3CH2)',
    Bu => '(C4H9)',
    Bn => '(C6H5CH2)',
    Cp => '(C5H5)',
    Ph => '(C6H5)',
    Bz => '(C6H5CO)',

The formula may also be preceded by a number, which multiplies the whole
formula. Some examples of valid formulas:

=over

    Formula              Equivalent to
    --------------------------------------------------------------
    CH3(CH2)3CH3         C5H12
    C6H3Me3              C9H12
    2Cu[NH3]4(NO3)2      Cu2H24N12O12
    2C(C[C<C>5]4)3       C152
    2C(C(C(C)5)4)3       C152
    C 1 0 H 2 2          C10H22 (whitespace is completely ignored)

=back

When a formula is parsed, a molecule object is created which consists of
the set of the atoms in the formula (no bonds or coordinates, of course).
The atoms are created in alphabetical order, so the molecule object for C2H5Br
would have the atoms in the following sequence: Br, C, C, H, H, H, H, H. 

If you don't want to create a molecule object, but would rather have a simple
hash with the number of atoms for each element, use the C<parse_formula>
method:

    my %formula = Chemistry::File::Formula->parse_formula("C2H6O");
    use Data::Dumper;
    print Dumper \%formula;

which prints something like

    $VAR1 = {
              'H' => 6,
              'O' => 1,
              'C' => 2
            };

The C<parse_formula> method is called internally by the C<parse_string> method.

=head3 Non-integer numbers in formulas

The C<parse_formula> method can also accept formulas that contain
floating-point numbers, such as H1.5N0.5. The numbers must be positive, and
numbers smaller than one should include a leading zero (e.g., 0.9, not .9).

When formulas with non-integer numbers of atoms are turned into molecule 
objects as described in the previous section, the number of atoms is always
B<rounded up>. For example, H1.5N0.5 will produce a molecule object with two
hydrogen atoms and one nitrogen atom.

There is currently no way of I<producing> formulas with non-integer numbers;
perhaps a future version will include an "occupancy" property for atoms that
will result in non-integer formulas.

=cut

sub parse_string {
    my ($self, $string, %opts) = @_;
    my $mol_class = $opts{mol_class} || "Chemistry::Mol";
    my $atom_class = $opts{atom_class} || "Chemistry::Atom";
    my $bond_class = $opts{bond_class} || "Chemistry::Bond";

    my $mol = $mol_class->new;
    my %formula = $self->parse_formula($string);
    for my $sym (sort keys %formula) {
        for (my $i = 0; $i < $formula{$sym}; ++$i) {
            $mol->add_atom($atom_class->new(symbol => $sym));
        }
    }
    return $mol;
}

sub write_string {
    my ($self, $mol, %opts) = @_;
    my @formula_parts;

    my $format = $opts{formula_format} || "%s%d";   # default format
    my $fh = $mol->formula_hash;
    $format =~ s/%%/\\%/g;                          # escape %% with a \
    my $joiner = "";
    $joiner = $1 if $format =~ s/(?<!\\)%j\{(.*?)}//;       # joiner %j{}

    my @symbols;
    if ($opts{formula_sort}) {
        @symbols = $opts{formula_sort}($fh);
    } else {
        @symbols = $self->sort_symbols($fh);
    }

    for my $sym (@symbols) {
        my $s = $format;
        my $n = $fh->{$sym};
        $s =~ s/(?<!\\)%s/$sym/g;                           # %s
        $s =~ s/(?<!\\)%D/$n/g;                             # %D
        $s =~ s/(?<!\\)%d\{(.*?)\}/$n > 1 ? $1 : ''/eg;     # %d{}
        $s =~ s/(?<!\\)%d/$n > 1 ? $n : ''/eg;              # %d
        $s =~ s/\\(.)/$1/g;                                 # other \ escapes
        push @formula_parts, $s;
    }
    return join($joiner, @formula_parts);
}

sub sort_symbols {
    my ($self, $formula_hash) = @_;
    my @symbols = keys %$formula_hash;
    if ($formula_hash->{C}) {
        # C and H first, followed by alphabetical order
        s/^([CH])$/\0$1/ for @symbols;
        @symbols = sort @symbols;
        s/^\0([CH])$/$1/ for @symbols;
        return @symbols;
    } else {
        # simple alphabetical order
        return sort @symbols;
    }
}

sub file_is {
    return 0; # no files are identified automatically as having this format
}

### Code derived from formula.pl by Brent Gregersen follows

my %macros = (
    Me => '(CH3)',
    Et => '(CH3CH2)',
    Bu => '(C4H9)',
    Bn => '(C6H5CH2)',
    Cp => '(C5H5)',
    Ph => '(C6H5)',
    Bz => '(C6H5CO)',
    # Ac is an element
    # Pr is an element
);


sub parse_formula {
    my ($self, $formula) = @_;
    my (%elements);

    #check balancing
    return %elements if (!ParensBalanced($formula));

    # replace other grouping with normal parens
    $formula =~ tr/<>{}[]/()()()/;

    # get rid of any spaces
    $formula =~ s/\s+//g;

    # perform macro expansion
    foreach (keys(%macros)) {
        $formula =~ s/$_/$macros{$_}/g;
    }

    # determine initial compound coeficent
    my $coef = ($formula =~ s/^(\d+\.?\d*)//) ? $1 : 1.0;

    # recursively process rest of formula
    return internal_formula_parser($formula, $coef, %elements);
}

sub internal_formula_parser {
    my ($formula, $coef, %form) = @_;
    my $tmp_coef;

    my ($extract, $remainder, $prefix) =
      extract_bracketed($formula, '()', '[^(]*');

    if (defined($extract) and $extract ne '') {
        $extract =~ s/^\((.*)\)$/$1/;
        if ($remainder =~ s/^(\d+\.?\d*)(.*)$/$2/) {
            $tmp_coef = $1 * $coef;
        } else {
            $tmp_coef = $coef;
        }

        # get formula of prefix ( it has no parens)
        %form = add_formula_strings($prefix, $coef, %form) if ($prefix ne '');

        # check remainder for more parens
        %form = internal_formula_parser($remainder, $coef, %form)
          if ($remainder ne '');

        # check extract for more parens
        %form =
          internal_formula_parser($extract, $tmp_coef, %form);    
          ## we already know this is ne ''
    } else { # get formula of complete string
        %form = add_formula_strings($remainder, $coef, %form)
          if ($remainder ne '');
    }
    return %form;
}

sub add_formula_strings {
    my ($formula, $coef, %elements) = @_;

#  print "Getting Formula of $formula\n";
    $formula =~ /^(?:([A-Z][a-z]*)(\d+\.?\d*)?)+$/o # XXX new
        or croak "Invalid Portion of Formula $formula";
    while ($formula =~ m/([A-Z][a-z]*)(\d+\.?\d*)?/go) { # XXX new
        my ($elm, $count) = ($1, $2);
        $count = 1 unless defined $count;
        if (defined $elements{$elm}) {
            $elements{$elm} += $count * $coef;
        } else {
            $elements{$elm} = $count * $coef;
        }
    }
    return %elements;
}

lib/Chemistry/File/Formula.pm view on Meta::CPAN

1;

=head1 SOURCE CODE REPOSITORY

L<https://github.com/perlmol/Chemistry-Mol>

=head1 SEE ALSO

L<Chemistry::Mol>, L<Chemistry::File>

For discussion about Hill order, just search the web for C<formula "hill
order">. The original reference is I<J. Am. Chem. Soc.> B<1900>, I<22>,
478-494.  L<http://dx.doi.org/10.1021/ja02046a005>.

=head1 AUTHOR

Ivan Tubert-Brohman <itub@cpan.org>.

Formula parsing code contributed by Brent Gregersen.

Patch for non-integer formulas by Daniel Scott.

=head1 COPYRIGHT

Copyright (c) 2005 Ivan Tubert-Brohman. All rights reserved. This program is
free software; you can redistribute it and/or modify it under the same terms as
Perl itself.

=cut

lib/Chemistry/Mol.pm view on Meta::CPAN

    for my $b (@{$self->{bonds}}) { $ret .= $b->print(2) }
    $ret;
}

=item $s = $mol->sprintf($format)

Format interesting molecular information in a concise way, as specified by
a printf-like format.

    %n - name
    %f - formula 
    %f{formula with format} - (note: right braces within
        the format should be escaped with a backslash)
    %s - SMILES representation
    %S - canonical SMILES representation
    %m - mass
    %8.3m - mass, formatted as %8.3f with core sprintf
    %q - formal charge
    %a - atom count
    %b - bond count
    %t - type
    %i - id

lib/Chemistry/Mol.pm view on Meta::CPAN

Note that you have to C<use Chemistry::File::SMILES> before using C<%s> or
C<%S> on C<< $mol->sprintf >>.

=cut

sub sprintf {
    my ($mol, $format) = @_;
    no warnings 'uninitialized'; # don't care if some properties are undefined
    $format ||= "%f";
    $format =~ s/%%/\\%/g;              # escape %% with a \
    $format =~ s/(?<!\\)%f\{(.*?)(?<!\\)\}/$mol->formula($1)/eg; # %f{}
    $format =~ s/(?<!\\)%f/$mol->formula/eg;                    # %f
    $format =~ s/(?<!\\)%s/$mol->print(format=>'smiles')/eg;    # %s
    $format =~ s/(?<!\\)%S/$mol->print(format=>'smiles', unique => 1)/eg;    # %s
    $format =~ s/(?<!\\)%n/$mol->name/eg;                       # %n
    $format =~ s/(?<!\\)%(\d*\.?\d*)m/
        $1 ? sprintf "%$1f", $mol->mass : $mol->mass/eg;        # %m
    $format =~ s/(?<!\\)%q/$mol->charge/eg;                     # %q
    $format =~ s/(?<!\\)%a/$mol->atoms/eg;                      # %a
    $format =~ s/(?<!\\)%b/$mol->bonds/eg;                      # %b
    $format =~ s/(?<!\\)%t/$mol->type/eg;                       # %t
    $format =~ s/(?<!\\)%i/$mol->id/eg;                         # %i

lib/Chemistry/Mol.pm view on Meta::CPAN

        $self->{charge} = shift;
        $self;
    } else {
        return $self->{charge} if defined $self->{charge};
        my $charge = 0;
        $charge += $_->formal_charge || 0 for $self->atoms;
        $charge;
    }
}

=item $mol->formula_hash

Returns a hash reference describing the molecular formula. For methane it would
return { C => 1, H => 4 }.

=cut

sub formula_hash {
    my ($self) = @_;
    my $formula = {};
    for my $atom ($self->atoms) {
        $formula->{$atom->symbol}++;
        $formula->{H} += $atom->hydrogens if $atom->hydrogens;
    }
    $formula;
}

=item $mol->formula($format)

Returns a string with the formula. The format can be specified as a printf-like
string with the control sequences specified in the L<Chemistry::File::Formula>
documentation.

=cut

sub formula {
    my ($self, $format) = @_;
    require Chemistry::File::Formula;
    $self->print(format => "formula", formula_format => $format);
}

=item my $mol2 = $mol->clone;

Makes a copy of a molecule. Note that this is a B<deep> copy; if your molecule
has a pointer to the rest of the universe, the entire universe will be cloned!

By default, clone() uses L<Storable> to copy the Perl data structure. L<Clone>
can be used instead by setting variable C<$Chemistry::Mol::clone_backend> to
C<Clone> (default is C<Storable>). The documentation of Storable claims L<Clone>

t/Bond.t view on Meta::CPAN

use strict;
use warnings;

use Chemistry::Mol;
use Chemistry::File::Formula;
#use Test::More "no_plan";
use Test::More tests => 17;

my $mol  = Chemistry::Mol->parse('CC', format => 'formula');
my $bond = $mol->new_bond(atoms => [ $mol->atoms ] );
my ($a1, $a2) = $mol->atoms;

is ( scalar $mol->atoms,    2,      "mol atom count" );
is ( scalar $mol->bonds,    1,      "mol bond count" );
is ( scalar $bond->atoms,   2,      "bond atom count" );
is ( scalar $a1->bonds,     1,      "atom bond count" );
is ( scalar $a2->bonds,     1,      "atom bond count" );

$bond->delete;

t/Formula.t view on Meta::CPAN

use Test::More;
my @lines;

open F, "<", "t/formula_tests.txt"
    or die "couldn't open t/formula_tests.txt; $!";
@lines = <F>;
close F;

plan tests => 8 + @lines;
use_ok('Chemistry::File::Formula');

# Constructors
my $mol = Chemistry::Mol->parse("CH4O", format => "formula");
isa_ok($mol, 'Chemistry::Mol', 'parse isa mol');
ok($mol->atoms == 6, "enough atoms");
my $formula = $mol->formula("%s%d{<sub>%d</sub>}");
is($formula, "CH<sub>4</sub>O", "formula format");

$mol = Chemistry::Mol->parse("1[Ph(Me)3]2", format => "formula");
my $fh = $mol->formula_hash;
is_deeply($fh, {C => 18, H => 28}, "formula hash 1[Ph(Me)3]2");

# test various parsing issues
for my $line (@lines) {
    chomp $line;
    my ($test_formula, $expected) = split /\t/, $line;
    my $got = Chemistry::Mol->parse($test_formula, format => "formula")
        ->print(format=>'formula');
    is($got, $expected, "$test_formula = $expected");
}

# parse_formula

my %formula_hash = Chemistry::File::Formula->parse_formula("C2H6O");
is_deeply(\%formula_hash, {H => 6, O => 1, C => 2}, 'parse_formula');


# parse_formula with non-integers

%formula_hash = Chemistry::File::Formula->parse_formula("C2.1H6.5O0.9");
is_deeply(\%formula_hash, {H => 6.5, O => 0.9, C => 2.1}, 
    'parse_formula (non-integer)');


# a formula with custom sort

$mol = Chemistry::Mol->parse("C2H6Br", format => "formula");
$formula = $mol->print(
    format       => 'formula', 
    formula_sort => sub {
        my $f = shift;
        reverse sort keys %$f;
    }
);
is ($formula, 'H6C2Br',     'formula_sort');

t/graph.t view on Meta::CPAN

for my $backend ('Clone', 'Storable') {
    $Chemistry::Mol::clone_backend = $backend;
    my $mol2 = $mol->clone;
    is_deeply( $mol, $mol2, "clone by $backend" );
}

# separate test
my @mols = $mol->separate;
is ( scalar @mols, 2, 'got 2 things' );
is ( scalar (grep $_->isa('Chemistry::Mol'), @mols), 2, 'separate: two mols' );
is ( $mols[0]->formula, 'CH2Cl',    'mol 1 is CClH2' );
is ( $mols[1]->formula, 'CHO2',     'mol 2 is CHO2' );
my $a1 = $mol->atoms(2);
my $a2 = $mols[0]->atoms(2);
my $nb_before = $a1->neighbors;
my $nb_after  = $a2->neighbors;
is ( $nb_after, $nb_before, "bond count for $a2 equal to $a1 ($nb_before)?" );

# combine - new
my $comb_new = Chemistry::Mol->combine(@mols);
isa_ok($comb_new, 'Chemistry::Mol');
for my $method (qw(atoms bonds formula)) {
    is ( scalar $comb_new->$method, scalar $mol->$method, "combine-new; $method" );
}

# combine - in place
my $comb_inplace = $mols[0]->combine($mols[1]);
is_deeply ( $comb_inplace, $mol, "combine-in place" );

#use Chemistry::File::SMILES; $mol->printf("%S\n");
#$_->printf("%f\n") for @mols;

( run in 0.289 second using v1.01-cache-2.11-cpan-211d0254c34 )