Dancer-SearchApp
view release on metacpan or search on metacpan
pod/todo.pod view on Meta::CPAN
come from a local C<.searchapp> file or better be stored per-URL / per-document
in an SQLite database for easy index reconstruction.
This needs close correlation with synonyms, which also could be (filesystem-)
local for a (shared) folder or (user-)global in an SQLite database.
=head2 Crawl queue(s)
We want to have queues in which we store URLs to be crawled
to allow for asynchronous submission of new items. This also
allows us to be rate limited and restartable.
This could be an SQLite database, or just a flat text
file if we have a way to store the last position within that text
file.
=head2 SQL-index into filesystem
Is there any use in reviving FFRIndex?
=head1 System integration
Automatically (re)scan resources by using a notification
method like the following to be notified about new or changed
resources.
=head2 Resource modification
=head3 Filesystem watchers
=head3 RSS scanner
=head3 Google Sitemap scanner
=head3 Hibiscus importer
This would immediately make all money transactions from Hibiscus
available for searching.
Can Hibiscus directly show a single transaction from the outside?
=head2 Interesting additional datasets
Open movie database L<http://omdbapi.com/> - has dumps available
Discogs data dumps - L<http://data.discogs.com/>
=head2 Automatic search
Automatic search should be triggered for incoming phone calls. This
allows to automatically show relevant emails if the sender is calling
and has their phone information in their email.
Also, the automatic search should be easily triggered by a command
line program. This likely needs something like L<HTTP::ServerEvent>
to keep a channel open so the server can push new information.
=head1 Data portability
Data portability is very important, not at least because of
seamless index upgrades/rollbacks/backups.
=head2 Export
=head3 Export index to DBI
=head3 Update indices from database
=head2 Share indices
Sharing indices would also be nice in the sense of websites or people
offering datasets
=head2 DBI connectivity
How can we get L<DBI> and L<Promises> work nicely together?
=head3 Schema migration/update via DBI
=head3 DBI import queue
New items to be imported into Elasticsearch could be stored/read from
a DBI table. This would allow for a wider distributed set of crawlers
feeding through DBI to Elasticsearch.
=head1 Index/query quality maintenance
To improve search results, a log of "failed" queries
should be kept and the user should be offered manual correction
of the failed queries.
=head2 top 10 failed queries
If a query had no results at all, the user should/could suggest
some synonyms or even documents to use instead
=head2 top 10 low-score queries
If a query had only low-score results/documents, the results are also
a candidate for manual improvement. How can we determine a low score?
=head2 top 10 abandoned queries
How will we determine if a query/word was abandoned?
=head2 Keep track of clickthrough
We should keep (server-side) track of click-throughs
to actually find out which files/documents are viewed and
rank those higher
Also, we should have a "unrank this" link to give the user
a way to make the engine forget misclicked "ranked" items
easily from the results.
=cut
( run in 1.894 second using v1.01-cache-2.11-cpan-39bf76dae61 )