App-Dochazka-REST
view release on metacpan or search on metacpan
lib/App/Dochazka/REST/Guide.pm view on Meta::CPAN
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# 1. Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# 3. Neither the name of SUSE LLC nor the names of its contributors may be
# used to endorse or promote products derived from this software without
# specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
# *************************************************************************
package App::Dochazka::REST::Guide;
use 5.012;
use strict;
use warnings;
=head1 NAME
App::Dochazka::REST::Guide - Dochazka REST server guide
=head1 SYNOPSIS
This POD-only module describes the Dochazka REST server (API) in more detail.
Dochazka as a whole aims to be a convenient, open-source ATT solution.
=head1 ARCHITECTURE
Dochazka consists of three main components:
=over
=item * REST server (this module)
The REST server listens for and processes incoming HTTP requests. Processing includes
authentication and authorization. The server attempts to map the request URI to a
Dochazka resource. The resource handler takes action on the request, depending on the
HTTP method (GET, PUT, POST, DELETE). Typically, this action will culminate in
one or more SQL statements which are sent to the PostgreSQL database for
execution. The results are sent back to the client in the HTTP reponse.
=item * PostgreSQL database
The PostgreSQL database is configured to listen for incoming SQL statements
from the REST server. Based on these statements, it creates, retrieves,
updates, and deletes (CRUD) employee attendance records and related data in the
Dochazka database.
=item * Dochazka clients
Dochazka clients, such as L<App::Dochazka::WWW>, L<App::Dochazka::CLI>, and
perhaps others, present a user interface (UI) to employees, by which they try
to divine their intent and express it in terms of HTTP requests to the REST
server.
The HTTP protocol is used in all communication between client and server. In
Dochazka, the term "client" should be understood in a broad sense to mean
anything that communicates with the server using the HTTP protocol. This
encompasses stand-alone report generators, specialized administration
utilities, cronjobs, web browsers, etc., in addition to the purpose-built
clients or just plain C<curl>.
=back
=head1 INSTALLATION
Installation is the process of creating (setting up, bootstrapping) a new
Dochazka instance, or "site" in Dochazka terminology.
It entails the following steps.
=head2 Server preparation
Dochazka REST needs hardware (either physical or virtualized) to run on.
The hardware will need to have a network connection, etc. Obviously, this
step is entirely beyond the scope of this document.
=head2 Software installation
Once the hardware is ready, the Dochazka REST software and all its
dependencies are installed on it. This could be accomplished by
downloading and unpacking the tarball (or running C<git clone>) and
following the installation instructions, or, more expediently, by
installing a packaged version of Dochazka REST if one is available
(see
L<https://build.opensuse.org/package/show/home:smithfarm/perl-App-Dochazka-REST>).
=head2 PostgreSQL setup
One of Dochazka REST's principal dependencies is PostgreSQL server (version
9.2 or higher). This needs to be installed (should happen automatically
when using the packaged version of L<App::Dochazka::REST>). Steps to enable
it:
bash# systemctl enable postgresql.service
lib/App/Dochazka/REST/Guide.pm view on Meta::CPAN
/usr/lib/perl5/site_perl/5.18.2/auto/share/dist/App-Dochazka-REST
Local site configuration directory is /etc/dochazka-rest
Loading configuration parameters from /etc/dochazka-rest
Setting up logging
Logging to /home/smithfarm/mrest.log
Calling App::Dochazka::REST::Dispatch::init()
Starting server
HTTP::Server::PSGI: Accepting connections at http://0:5000/
Note that the development web server L<HTTP::Server::PSGI> is used. To use
L<Starman> instead, use the following command:
$ dochazka-rest -- --server Starman
=head2 Take it for a spin
Point your browser to L<http://localhost:5000/>
=head1 BASIC PARAMETERS
=head2 UTF-8
The server assumes all incoming requests are encoded in UTF-8, and it encodes
all of its responses in UTF-8 as well.
=head2 HTTP(S)
In order to protect user passwords from network sniffing and other nefarious
activities, it is recommended that the server be set up to accept HTTPS
requests only.
=head2 Self-documenting
Another implication of REST is that the server provides "resources" and that
those resources are, to some extent at least, self-documenting.
=head1 EXPLORING THE SERVER
=head2 With a web browser
Some resources (those that use the GET method) are accessible using a web
browser. That said, if we are only interested in displaying information
from the database, GET requests are all we need and using a web browser can
be convenient.
To start exploring, fire up a standard web browser and point it to the base URI
of your L<App::Dochazka::REST> installation:
http://dochazka.site
and entering one's credentials in the Basic Authentication dialog.
=head2 With a command-line HTTP client
To access all the resources, you will need a client that is capable of
generating POST, PUT, and DELETE requests as well as GET requests. Also, since
some of the information L<App::Dochazka::REST> provides is in the response
headers, the client needs to be capable of displaying those as well.
One such client is Daniel Stenberg's B<curl>.
In the HTTP request, the client may provide an C<Accept:> header specifying
either HTML (C<text/html>) or JSON (C<application/json>). For the convenience
of those using a web browser, HTML is the default.
Here are some examples of how to use B<curl> (or a web browser) to explore
resources. These examples assume a vanilla installation of
L<App::Dochazka::REST> with the default root password. The same commands can be
used with a production server, but keep in mind that the resources you will see
may be limited by your privilege level.
=over
=item * GET resources
The GET method is used to search for and display information. The top-level
GET resources are listed at the top-level URI, either using B<curl>
$ curl -v -H 'Accept: application/json' http://demo:demo@dochazka.site/
Similarly, to display a list of sub-resources under the 'privhistory' top-level
resource, enter the command:
$ curl http://demo:demo@dochazka.site/employee -H 'Accept: application/json'
Oops - no resources are displayed because the 'demo' user has only passerby
privileges, but all the privhistory resources require at least 'active'. To
see all the available resources, we can authenticate as 'root':
$ curl http://root:immutable@dochazka.site/employee -H 'Accept: application/json'
=item * POST resources
With the GET method, we could only access resources for finding and displaying
information: we could not add, change, or delete information. For that we will
need to turn to some other client than the web browser -- a client like B<curl>
that is capable of generating HTTP requests with methods like POST (as well as
PUT and DELETE).
Here is an example of how we would use B<curl> to display the top-level POST
resources:
curl -v http://root:immutable@dochazka.site -X POST -H "Content-Type: application/json"
The "Content-Type: application/json" header is necessary because the server
only accepts JSON in the POST request body -- even though in this case we
did not send a request body, most POST requests will have one. For best
results, the request body should be a legal JSON string represented as a
sequence of bytes encoded in UTF-8.
=item * PUT resources
The PUT method is used to add new resources and update existing ones. Since
the resources are derived from the underlying database, this implies executing
INSERT and UPDATE statements on tables in the database.
PUT resources can be explored using a B<curl> command analogous to the one
given for the POST method.
=item * DELETE resources
Any time we need to delete information -- i.e., completely wipe it from
the database, we will need to use the DELETE method.
DELETE resources can be explored using a B<curl> command analogous to the one
given for the POST method.
Keep in mind that the data integrity constraints in the underlying PostgreSQL
database may make it difficult to delete a resource if any other resources
are linked to it. For example, an employee cannot be deleted until all
intervals, privhistory records, schedhistory records, locks, etc. linked to
that employee have been deleted. Intervals, on the other hand, can be
deleted as long as they are not subject to a lock.
=back
=head1 DOCUMENTATION OF REST RESOURCES
The definition of each resource includes an HTML string containing the
resource's documentation. This string can be accessed via POST request for
the C<docu> resource (provide the resource name in double quotes in the
request body).
In order to be "self-documenting", the definition of each REST resource
contains a "short" description and a "long" POD string. From time to time, the
entire resource tree is walked to generate a module,
L<App::Dochazka::REST::Docs::Resources>, containing all the resource
documentation.
=head1 REQUEST-RESPONSE CYCLE
Incoming HTTP requests are handled by L<App::Dochazka::REST::Resource>,
which inherits from L<Web::Machine::Resource>. The latter uses L<Plack> to
implement a PSGI-compliant stack.
L<Web::Machine> takes a "state-machine" approach to implementing the HTTP 1.1
standard. Requests are processed by running them through a state
machine, each "cog" of which is a L<Web::Machine::Resource> method that can
be overridden by a child module. In our case, this module is
L<App::Dochazka::REST::Resource>.
The behavior of the resulting web server can be characterized as follows:
=over
=item * B<Allowed methods test>
One of the first things the server looks at, when it receives a request, is
the method. Only certain HTTP methods, such as 'GET' and 'POST', are accepted.
If this test fails, a "405 Method Not Allowed" response is sent.
=item * B<Internal and external authentication, session management>
This takes place when L<Web::Machine> calls the C<is_authorized> method,
our implementation of which is in L<App::Dochazka::REST::Auth>.
Though the method is called C<is_authorized>, what it really does is
authenticate the request - i.e., validate the user's credentials to
determine his or her identity. B<Authorization> - determination whether the
user has sufficient privileges to make the request - takes place one step
further on. (The HTTP standard uses the term "authorized" to mean
"authenticated"; the name of this method is a nod to that usage.)
In C<is_authorized>, the user's credentials are authenticated
against an external database (LDAP), an internal database (PostgreSQL
'employees' table), or both. Session management techniques are utilized
to minimize external authentication queries, which impose latency. The
authentication and session management algorithms are described in
L<"AUTHENTICATION AND SESSION MANAGEMENT">. If authentication fails, a "401
Unauthorized" response is sent.
Since this is the first time that the PostgreSQL database is needed, this
is also where the L<DBIx::Connector> object is attached to the request
context. (The request context is a hashref that accompanies the request
as it undergoes processing.) For details, see
L<App::Dochazka::REST::Auth/"is_authorized">.
In a web browser, repeated failed authentication attempts are typically
associated with repeated display of the credentials dialog (and no other
indication of what is wrong, which can be confusing to users but is probably a
good idea, because any error messages could be abused by attackers).
=item * B<Authorization/ACL check>
After the request is authenticated (associated with a known employee), the
server examines the ACL profile of the resource being requested and compares it
with the employee's privilege level. If the privilege level is too low for the
requested operation, a "403 Forbidden" response is sent.
The ACL profile is part of the resource definition. It can be specified either
as a single value for all HTTP methods, or as a hash, e.g.:
{
GET => 'passerby',
PUT => 'admin',
DELETE => 'admin',
}
In certain operations (i.e., combinations of HTTP method and resource), the
full range of functionality may be available only to administrators. See These
operations are special cases. Their ACL profile is either 'inactive' or
'active', but a non-administrator employee may still get a 403 Forbidden error
on the operation if they are trying to do something, such as update an interval
belonging to a different employee, that is reserved for administrators.
=item * B<Test for resource existence>
The next test a request undergoes on its quest to become a response is the
test of resource existence. If the request is asking for a non-existent resource,
e.g. L<http://dochazka.site/employee/curent>, it cannot be fulfilled and a "404
Not Found" response will be sent.
For GET requests, this is ordinarily the last cog in the state machine: if the
test passes, a "200 OK" response is typically sent, along with a response body.
(There are exceptions to this rule, however - see L<the AUTHORIZATION
chapter|"AUTHORIZATION">.) Requests using other methods (POST, PUT, DELETE) are
subject to further processing as described below.
=back
=head2 Additional processing (POST and PUT)
Because they are expected to have a request body, incoming POST and PUT
requests are subject to the following additional test:
=over
=item * B<malformed_request>
This test examines the request body. If it is non-existent, the test
passes. If the body exists and is valid JSON, the test passes. Otherwise,
it fails.
=item * B<known_content_type>
Test the request for the 'Content-Type' header. POST and PUT requests
should have a header that says:
Content-Type: application/json
If this header is not present, a "415 Unsupported Media Type" response is
sent.
=back
=head2 Additional processing (POST)
=over
#=item * B<post_is_create>
#
#This test examines the POST request and places it into one of two
#categories: (1) generic request for processing, (2) a request that creates
#or otherwise manipulates a resource.
=back
=head1 DATA MODEL
This section describes the C<App::Dochazka::REST> data model. Conceptually,
Dochazka data can be seen to exist in the following classes of objects:
=over
##=item * Policy (parameters set when database is first created)
##
=item * Employee (an individual employee)
=item * Privhistory (history of changes in an employee's privilege level)
=item * Schedule (a schedule)
=item * Schedhistory (history of changes in an employee's schedule)
=item * Activities (what kinds of work are recognized)
=item * Intervals (the "work", or "attendance", itself)
=item * Locks (determining whether a reporting period is locked or not)
=item * Components (Mason components, i.e. report templates)
=back
The "state" of each object is stored in a PostgreSQL database (see
L<"DATABASE"> for details).
These classes are described in the following sections.
=for comment
##=head2 Policy
##
##Dochazka is configurable in a number of ways. Some configuration parameters
##are set once at installation time and, once set, can never be changed --
##these are referred to as "site policy" parameters. Others, referred to as
lib/App/Dochazka/REST/Guide.pm view on Meta::CPAN
must be encrypted. This can be accomplished using an SSL-capable HTTP
server or transparent proxy such as L<nginx|http://nginx.org/en/>.
If the C<DOCHAZKA_LDAP> site parameter is set to a true value, the
C<_authenticate> routine of L<App::Dochazka::REST::Resource> will attempt to
authenticate the request against an external resource using the LDAP protocol.
LDAP authentication takes place in two phases:
=over
=item * lookup phase
=item * authentication phase
=back
The purpose of the lookup phase is to determine if the user exists in the
LDAP resource and, if it does exist, to get its 'cn' property. In the second
phase, the password entered by the user is compared with the password stored
in the LDAP resource.
If the LDAP lookup phase fails, or if LDAP is disabled, L<App::Dochazka::REST>
falls back to "internal authentication", which means that the credentials are
compared against the C<nick>, C<passhash>, and C<salt> fields of the
C<employees> table in the database.
To protect user credentials from snooping, the actual passwords are not stored
in the database, Instead, they are run through a one-way hash function and
the hash (along with a random "salt" string) is stored in the database instead
of the password itself. Since some "one-way" hashing algorithms are subject to
brute force attacks, the Blowfish algorithm was chosen to provide the best
known protection.
If the request passes Basic Authentication, a session ID is generated and
stored in a cookie.
=head1 AUTHORIZATION
=head1 CLIENT-SERVER COMMUNICATION
As stated above, communication between the server and its clients takes place
using the HTTP protocol. More abstractly, the communication takes the form of
requests (from client to server) and responses (from server back to client) to
those requests. In other words, communication is never initiated by the server,
but always by the clients.
=head2 HTTP request
An HTTP request has the following basic components:
=over
=item * Method
Dochazka supports GET, PUT, POST, and DELETE
=item * URI
Universal Resource Indicator specifying a Dochazka resource
=item * Headers
More on these below
=item * Request entity
Data accompanying the request - may or may not be present
=back
=head3 Method
The Dochazka REST server accepts the following HTTP methods:
C<GET>, C<PUT>, C<POST>, and C<DELETE>.
=over
=item C<GET>
A C<GET> request on a resource is a request for information - in other words,
it is "read-only": C<GET> requests never change the underlying data. In
Dochazka, C<GET> requests frequently map to C<SELECT> statements.
=item C<PUT>
C<PUT> requests always refer to a concrete data entity, or chunk of data.
In simple cases, this will be a single record in the underlying database. If
the record already exists, the C<PUT> request is interpreted to mean
modification (or C<UPDATE> in SQL). If the record does not exist, then the
request will map to an C<INSERT> statement to create the resource. In both
cases, upon success the response status will be C<200 OK>.
=item C<POST>
Sometimes, especially for create operations, the exact specification of the
resource is not known beforehand. To address these cases, some resources accept
C<POST> requests. If the request causes a new resource to be created, the HTTP
response status will be C<201 Created> and there will be a C<Location> header
specifying the URI of the newly created resource.
=item C<DELETE>
As their name would suggest, C<DELETE> requests are issued when we want to
dissolve (destroy) a resource. Whether or not this actually happens is
determined by two factors: (1) whether the user issuing the request has the
requisite authorization and, (2), whether the underlying data record is
referred to by other records - in which case typically the C<DELETE> request
will fail with a C<500 Internal Server Error> status.
=back
=head3 URI
The purpose of the Universal Resource Indicator (URI, sometimes also known as
an URL) is to uniquely identify a resource.
URIs consist of several syntactical elements. An exhaustive description can be
found in RFC ..., but for Dochazka purposes we can present them as follows:
=over
=item C<https://>
This part of the URI says that we are using the HTTPS protocol (or SSL-encrypted
HTTP) to communicate. It is separated from the next component by two forward
slashes.
=item C<dochazka.site>
After the protocol, the next URI component is the REST server's domain name.
Obviously, this will differ from site to site. It is separated from the next
component (i.e. the resource specification) by a single forward slash.
=item Dochazka resource
As stated above, the domain name is terminated by a single forward slash.
Everything after that is interpreted as a resource specification.
A single forward slash C<'/'> specifies the root resource.
=back
Of these three components, the first two are site-specific. It is possible,
for example, to run the Dochazka server without SSL encryption, in which case
the protocol would be C<http://> instead of C<https://>.
Once the application's implementation at a given site has stabilized, these two
URI components will change very seldomly, if at all.
Dochazka resources are much more ephemeral. Different resources present
different ways that users can access and modify the data (in this case,
attendance data) in the underlying database.
Some resources, such as C<employee/nick/simona>, refer directly to a unit of
information that may or may not exist in the database information. Other
resources, like C<interval/new>, are not linked to a specific database record.
Also, in programming terms the resources are generalized, so we think about,
e.g., C<employee/nick/simona> and C<employee/nick/wanda> as two instances of a
more generalized C<employee/nick/:nick> resource, where C<:nick> is like an
argument to a function call.
And, indeed, internally all resources resolve to function calls. The function
in this case is referred to as the "resource handler".
Some resources accept all four HTTP methods listed above, others accept two or
three, and still others accept only one.
=head3 Headers
HTTP headers are somewhat obscure because they are often hidden by the
client. Nevertheless, they are an important part of the HTTP protocol. The
Dochazka REST server only accepts certain headers in the request.
#FIXME: describe the more common response headers
=head3 Body
C<PUT> and C<POST> requests may take a request body. If a request body is
expected or accepted, it must be a valid JSON string. (JSON is a simple way of
"stringifying" a data structure.)
=head2 HTTP response
The HTTP response returned by the REST server consists of:
=over
=item * Status code (e.g. 200, 400, 404, etc.)
=item * Headers
=item * Content body (or "response entity")
=back
=head3 Status
The HTTP standard stipulates a number of status codes. The server listens for
incoming requests. Under normal operation, the server processes each request.
The result of such processing is a "response", which is sent back to the client
that originated the request. Each response will contain one and only one status
code. The meanings of the various status codes are explained in the HTTP
standard. Some of the more common ones are as follows:
=over
=item C<200> (OK)
The request was accepted and processed. Refer to the response body for the
result.
=item C<204> ()
This code is returned on C<DELETE> requests when either the record was
successfully deleted or the resource did not exist in the first place.
=item C<404> (Not Found)
The resource specification given in the URI could not be associated with a
known resource.
=item C<405> (Method Not Allowed)
The resource was recognized but it is not defined for this method.
=item C<401> (Not authorized)
A valid method+resource combination was specified, but the user failed to
authenticate herself to the REST server.
=item C<403> (Forbidden)
A valid method+resource combination was specified and the user was successfully
authenticated, but the user is not authorized to perform the operation she is
requesting.
=item C<400> (Malformed)
( run in 0.999 second using v1.01-cache-2.11-cpan-39bf76dae61 )