Benchmark-Perl-Formance-Cargo

 view release on metacpan or  search on metacpan

share/SpamAssassin/easy_ham/00204.5c64400ff51925eb0ca4bc7bdab0bc09  view on Meta::CPAN

        # hex to dec lookup table
        hextab ["0"] = 0;       hextab ["8"] = 8;
        hextab ["1"] = 1;       hextab ["9"] = 9;
        hextab ["2"] = 2;       hextab ["A"] = 10;
        hextab ["3"] = 3;       hextab ["B"] = 11;
        hextab ["4"] = 4;       hextab ["C"] = 12;
        hextab ["5"] = 5;       hextab ["D"] = 13;
        hextab ["6"] = 6;       hextab ["E"] = 14;
        hextab ["7"] = 7;       hextab ["F"] = 15;

        decoded = "";
        i   = 1;
        len = length (str);
        while ( i <= len ) {
                c = substr (str, i, 1);
                # check for usual start of URI hex encoding chars
                if ( c == "%" || c == "=" ) {
                        if ( i+2 <= len ) {
                                # valid hex encoding?
                                c1 = toupper(substr(str, i+1, 1));
                                c2 = toupper(substr(str, i+2, 1));
                                if ( !(hextab [c1] == "" && hextab [c2] == "") ) {
                                        code = 0 + hextab [c1] * 16 + hextab [c2] + 0
                                        c = sprintf ("%c", code)
                                        i = i + 2
                                }
                        }
                # + is space apparently
                } else if ( c == "+" ) {
                        c = " "
                }
                decoded = decoded c;
                ++i;
        }
        return decoded
}

> Cheers,
> Dave.

> PS the late reply is because the footer on the original mail (If
> you received this mail in error yadda yadda) got caught in my
> spam filter, and ended up in my junkmail directory.

he he...

share/SpamAssassin/easy_ham/01421.e01ad8fa7bcb36e969c838578051d684  view on Meta::CPAN

> Another area for potentially fruitful study:  it's clear that the
> highest-value indicators usually appear "early" in msgs, and for spam
> there's an actual reason for that:  advertising has to strive
> to get your attention early.  So, for example, if we only bothered to
> tokenize the first 90% of a msg, would results get worse?

[Neil Schemenauer]
> Spammers could exploit this including a large MIME part at the beginning
> of the message.  In pratice that would probably work fine.

Note that timtest.py's current tokenizer only looks at decoded text/* MIME
sections (or raw message text if no MIME exists); spammers could put
megabytes of other crap before that and it wouldn't even be looked at
(except that the email package has to parse non-text/* parts well enough to
skip over them, and tokens for the most interesting parts of Content-{Type,
Disposition, Transfer-Encoding} decorations are generated for all MIME
sections).

Schemes that remain ignorant of MIME are vulnerable to spammers putting
arbitrary amounts of "nice text" in the preamble area (after the headers and
before the first MIME section), which most mail readers don't display, but



( run in 0.381 second using v1.01-cache-2.11-cpan-26ccb49234f )