The list of all modifications done recently on the library.
The Gradle build file, the ANT build file and the README have also been updating so that reflecting this upgrade as well as the migration from com.oreilly.servlet to Apache Commons File Upload.
Add a new logger which forwards log entries to SLF4J.
Then, UWS or TAP implementors are free to use whatever logging system they want (e.g. log4j, logback, ...).
In the configuration file, just set the property
logger
as follow: logger=slf4j
.
It is now possible to enable an automatic fix of input
ADQL query in TAP-Lib, through the configuration file's
property fix_on_fail
.
When enabled, and only if the parsing of the input query fails, TAP-Lib will try a quick fix on this query, then parses the fixed query and if it passes, finally run the query.
When a query is fixed in this way, TAP-Lib will log it
and will add an INFO element in the output VOTable.
This INFO element, named QUERY_AFTER_AUTO_FIX
will be set to the result of the auto fix, and so, to
the ADQL query really executed.
In anyway, this feature will always be disabled by default, even if omitted in the configuration file.
This feature fixes the most common issues with ADQL queries:
public
,year
,date
),distance
,
min
,avg
),_RAJ2000
,2mass
).?
,!
,$
,@
,#
,{
,}
,[
,]
,
~
,^
and `
.
For instance, here is an ADQL query that any user would want to run but whose the
parsing will immediately fail because of the starting _
,
the distance
(which is reserved to an ADQL function) and
public
(which is a reserved SQL word):
SELECT id, _raj2000, _dej2000, distance FROM public.myTable
The automatic quick fix will produce the following query:
SELECT id, "_raj2000", "_dej2000", "distance" FROM "public".myTable
This query should now run (if the case is correct for the column and schema names...but that is not tested by this function ; the user will still have to check that by himself).
Files uploaded by the user when creating/executing a synchronous job were never deleted after the job execution.
The same problem applied for the tables already uploaded in the database (in
TAP_UPLOAD
) when an error occurred before the end of the UPLOAD process.
Now, in case of error when uploading one or more files, or in case of success of the job, all uploaded files and their corresponding database tables are deleted after the end of the job.
Before correction, if two uploaded tables have been submitted by the user with the same name, or if one uploaded table contained duplicated column names, an obscure error message coming from the database was returned to the user.
Now, duplicated items (tables and columns) are searched before ingestion in the database. When one is detected, an error is immediately returned to the user and the query is aborted.
upload_default_db_limit
and upload_max_file_size
deprecated.
...
The property upload_default_db_limit
has been deprecated.
Indeed, in the current state of the TAP protocol, this makes no sense: the user
can not change the limit size (in bytes or rows) for uploaded tables.
The property upload_max_file_size
has been deprecated.
It was actually duplicated: upload_max_db_limit
, if expressed in
bytes already lets put a limit on the maximum size of an uploaded table/file.
The property upload_max_request_size
has been added.
It lets set a maximum size for a whole HTTP Multipart Request.
By default it is set to 250MB.
The default value of upload_max_db_size
is now 1 million rows.
The UPLOAD feature is still disabled by default (i.e. upload_enabled=false
).
You should check the documentation of the TAP configuration file for more details.
The end of the description of a UDF was not detected when this UDF was followed by another UDF definition. This was due to an incorrect double quote escape in the regular expression of a UDF's definition. Because of this incorrect parsing the TAP service could not start.
TAP_SCHEMA
schema/tables/columns.
...
When defining in the configuration file a different name for TAP_SCHEMA
content, the service implementor was also forced to define the same mapping in the
database with the column dbName
.
This is no longer necessary. From now on, the dbName
column will be ignored
for all standard TAP_SCHEMA
content. Instead, the name specified in the
configuration file (if any) will be used instead. This way, the mapping for
standard TAP_SCHEMA
content is only specified once and at only one place:
the configuration file.
text/plain
formatting
...
Before this fix, cancelling a TAP job (async or not) which was formatting the result in ASCII may failed, especially for large results. This was due to a non interruptible alignment process. This process is now checking whether a cancellation has been requested before formatting a new result line/row ; if so, the process is immediately stopped and the job can be cleaning declared as aborted.
The previous text formatting process was storing the entire table in memory.... hence OutOfMemoryError when dealing with large table.
Now, this process is done entirely in memory only for a table having less than 1000 lines. For a larger table, its content is stored in a temporary file. This file is deleted after usage or in case of error.
This formatting process has been tested under JVM monitoring (both JConsole and VisualVM) and tables larger than 3,000,000 rows, with success.
Until now, the generated VOTable file was un-readable even by STIL/STILTS/TOPCAT. To fix this, it was needed to temporary store the table to format into FITS so that STIL can read it at least 2 times.
The idea is not to return systematically a LONG for a
BigInteger and a DOUBLE for a BigDecimal. Instead, the output
datatype should be adapted in function of how the column has been tagged
in TAP_SCHEMA.columns
.
Thus, it is possible to format a BigDecimal into a LONG, an INTEGER, a SHORT, a FLOAT or a DOUBLE, whereas a BigInteger could be formatted into a LONG, an INTEGER or a SHORT.
The VOTable lets declare coordinate systems that can be referenced by FIELDs.
Following the last debate on how to properly declare them in VOTable,
the TAPLibrary will write in the RESOURCE of type "results" a COOSYS
item for each referenced coordinate system.
For instance: <COOSYS ID="GDR1_ICRS" system="ICRS" epoch="2015.0" />
To work this new feature requires two things in TAP_SCHEMA
:
the new table TAP_SCHEMA.coosys
,
declared as below in TAP_SCHEMA.tables
:
schema_name | table_name | table_type | description | utype |
---|---|---|---|---|
TAP_SCHEMA | TAP_SCHEMA.coosys | table | List of coordinate systems of coordinate columns published in this TAP service. |
It must have (at least) the following columns declared in
TAP_SCHEMA.columns
:
table_name | column_name | datatype | description | ucd | std |
---|---|---|---|---|---|
TAP_SCHEMA.coosys | id | VARCHAR | ID of the coordinate system definition as it must be in the VOTable. | meta.id;meta.main | 0 |
TAP_SCHEMA.coosys | system | VARCHAR | The coordinate system among: ICRS, eq_FK5, eq_FK4, ecl_FK4, ecl_FK5, galactic, supergalactic, xy, barycentric, geo_app. | meta.code | 0 |
TAP_SCHEMA.coosys | equinox | VARCHAR | Required to fix the equatorial or ecliptic systems (as e.g. J2000 as the default for eq_FK5 or B1950 as the default for eq_FK4). | time.equinox | 0 |
TAP_SCHEMA.coosys | epoch | VARCHAR | Epoch of the positions (if necessary). | time.epoch | 0 |
the additional column TAP_SCHEMA.columns.coosys_id
referencing items of TAP_SCHEMA.coosys
:
table_name | column_name | datatype | description | ucd | std |
---|---|---|---|---|---|
TAP_SCHEMA.columns | coosys_id | VARCHAR | ID of the used coordinate systems (if any). | meta.id | 0 |
Then, all you have to do, is to fill the column TAP_SCHEMA.columns.coosys_id
for every coordinate item on which you want to set a coordinate system. When this
column will be selected in an ADQL query, the VOTable's COOSYS item will be added
automatically in the query result VOTable document.
The TAPLibrary is now able to detect whether the RegTAP Data-Model is used.
RegTAP is detected successfully if the schema rr
exists (case sensitive)
and contains at least the following tables (names also case sensitive):
The table name can be prefixed by rr
(case sensitive) or not. For instance:
rr.capability
and capability
are both detected successfully.
All these constraints (including the case sensitive one) are based on the requirements of the REC-RegTAP-1.0 standard document. They are set in order to not declare the RegTAP DM by accident AND to provide a first low validation of the RegTAP schema and tables. Low validation because columns (as well as datatype, utypes, indices and UDF functions) are never checked.
Only the version 1.0
of RegTAP is supported for the moment.
The TAPLibrary is now able to detect whether the ObsCore Data-Model is used.
This automatic detection is actually really simple: the table
obscore
must be found (case INsensitive) in TAP_SCHEMA.tables
within the schema ivoa
(case sensitive).
By default, the ObsCore-DM version is set to 1.0
.
But, if ALL the following columns are found, it is set to 1.1
:
s_xel1
, s_xel2
, t_xel
, em_xel
and pol_xel
.
See on GitHub e8ef4e4..., 5c7debf... and dba0640....
/capabilities
and /tables
...
In the configuration file, two new properties have been added:
capabilities_stylesheet
and tables_stylesheet
.
They let specifying an XSLT stylesheet to link inside the output of
the resources /capabilities
and /tables
.
See the documentation of these properties for more details.
schema_index
, table_index
and column_index
...
According to PR-TAP-1.1,
the new column TAP_SCHEMA.tables.table_index
and
TAP_SCHEMA.columns.column_index
let recommend to a TAP client
an order on resp. tables and columns. Like a TAP client would do, the TAP
library reads these indices so that ordering tables and columns when generating
the output document of /tables
.
Since some TAP implementors would like to have the same feature for schemas,
the TAP library also apply the same process to the column
TAP_SCHEMA.schemas.schema_index
if provided, although it will not
become part of the coming REC-TAP-1.1.
See ong GitHub 6fc7f8f..., 26cee66... and 83d4a31....
In addition of the /sync, /async, /tables, ... endpoints,
DALI-1.0 and this
TAP implementation note propose to add a new one: /example
.
Its purpose is to list examples of ADQL queries possible to run
on this TAP service. This resource is written using a special
XHTML format which lets display it nicely in a browser as well as
any machine can easily extract those examples. The TAP client of TOPCAT
is able to detect and parse this resource in order to let the user see and
select examples of this list.
TAPLib does not provide any special API to format this resource. To make it work, you just have to write the example document using one of the format described in the two above mentionned IVOA documents and give a link to that document to the library. Examples (with explanation) of a such document are provided in GitHub.
See also the documentation of the property examples
for more details.
TAP_SCHEMA
and its tables and columns
...
This is very helpful if TAP_SCHEMA
(or some of its tables and their columns)
has a different name in the database (i.e. the schema is not named TAP_SCHEMA
in the database).
The mapping can be specified to
JDBCConnection or
directly in the configuration file (see
documentation of the property
TAP_SCHEMA...
for more details).
metadata
:
wrapping of a TAPMetadata
...
In addition of the values xml
, db
it was already possible to provide
its own extension of TAPMetadata.
With this last option the user has to declare "manually" the metadata:
he fills by himself a TAPMetadata's
extension instance.
This is indeed a nice feature if your metadata are not coming from the database schema
TAP_SCHEMA
(i.e. option db
) or from a VOSI representation of
the tables (i.e. option xml
). But what if you just want to change the behaviour
of some functions of TAPMetadata but still
want to get the metadata from a VOSI XML document or from TAP_SCHEMA
? Then,
you have to read/parse them by yourself...which is painful and may be the cause of errors.
So, that's why it is now possible to ask to TAPLib to fetch the metadata from a
VOSI XML document or from TAP_SCHEMA
and then to give this created default
TAPMetadata in parameter of your
own extension of TAPMetadata's
constructor.
See the documentation of the property metadata
for more details about how to make that work.
tap_factory
:
allow a constructor with the content of the configuration file as parameter
...
Though it was already possible to provide its own extension of TAPFactory, it was not possible to extend ConfigurableTAPFactory and thus, to keep using all properties (and why not, custom ones) listed in the configuration file. But now, if the extension of TAPFactory has a constructor with two parameters - (ServiceConnection, Properties) - then the content of the configuration file is provided as second parameter.
It can be a very useful feature if you want to define your own properties or want to change a small behaviour of the initialization of the TAP service with this configuration file.
See the documentation of the property tap_factory
for more details.
logger
: set a custom logger
...
The new property logger
allows the specification of a custom logger
in the configuration file. It can take the following values: default
or the classpath of a custom implementation of
TAPLog.
Ideally, there should be an implementation of UWSLog and TAPLog working with Log4j and another for SLF4J (and eventually for other logging mechanism). Additionally, an implementation storing log messages in database would be interesting. But all these ideas may be implemented in UWSLib and TAPLib in a future version.
See the documentation of the property logger
for more details.
udfs
: set a description
...
It is now possible to give a human description for each declared User Defined Function
in the configuration file. This feature is already possible by setting directly the attribute
FunctionDef.description.
This description will then be visible in the /capabilities
of the
TAP service.
See the documentation of the property udfs
for more details.
Until now, if the result of an ADQL query was formatted in VOTable, the TABLE item of this document did not have any name. Thus, VOTable reader could not set easily a useful name for the table.
Now, the attribute NAME is set to result_
suffixed by
the ID of the job which generated this table.
TAP_SCHEMA
...
According to TAP-1.0, names reported in TAP_SCHEMA
must be
ADQL regular identifiers. But in practice, it is not always possible.
For this reason, it has been suggested to double quote any schema/table/column name
which can not follow this rule so that making them become ADQL delimited identifiers.
Indeed, these names should be written in TAP_SCHEMA
as they should be
used in ADQL queries.
Besides, TAP-1.0 declares that a table name must be unique in the whole
TAP_SCHEMA
. To respect this rule and the above rule about making these
names written as they should be in ADQL queries, it is recommended to
qualify the table name by its schema name, when its table name alone is
not enough to uniquely identify it.
When a schema/table/column name is delimited and/or qualified, the library is now able to store them as such as well as resolve the non-delimited/-qualified term.
See on GitHub 19026c1... and 6ba9bff....
Two comments:
The class ResultSetTableIterator is used by the TAP library each time the result of an ADQL query is sent from the database. It lets browse the content of a ResultSet so that formatting this result into the asked output format. Thus, when a non supported object is returned from the database, ResultSetTableIterator must convert it into something usable by an OutputFormat.
Before this correction the function fetching a column value from the ResultSet was also performing the convertion: nextCol(). In order to help customization of this conversion process, these two steps have been separated:
The static public variable TAP.VERSION has been added so that clearly indicating which version of the TAP protocol is supported by the used TAPLibrary.
Dealing with several protocol versions in the same time is quite difficult and may significantly alter the TAPLibrary API in an unstable way. That's why, for the TAP library, only one version is implemented (i.e. the last one). To use an older version of the protocol, one must use an older version of the library.
In the UWS and TAP configuration files the executionDuration has to be provided into milliseconds. But the UWS parameter MUST be in seconds. So now, UWS is still keeping this duration in seconds (in its ExecutionDurationController) but TAP keeps it in milliseconds (in order to avoid unexpected silent modification of the API) and converts it into seconds for its controller (i.e. TAPExecutionDurationController), for the default home page and for the Capabilities page.
See on GitHub 2463d5f... and 47d36bf....
Bad uploaded file means here:
In such cases, the following error message is returned: "The input file is not a valid VOTable document!" A cause with more detais (especially the line and column numbers) may be appended.
Cases handled with no error:
Before this correction, when a user asked for the abortion of a TAP query, this was generally not working immediately. What really happens was that the SQL query kept running in the database until its end, the TAPLib was waiting for it to finish and then just ignore the returned result. So while the query is not finished, the TAP job was either not marked as aborted or was marked but still run in the background.
Now, when the ABORT request arrives, TAPLib stops the thread corresponding to the execution of the query in the database. If successful, the job is marked as aborted. If the result file was being written, this process is stopped and the partial file is deleted.
The incorrect abortion handling of SYNChronous queries has also been fixed. It is now recommended to make DBConnection.executeQuery(ADQLQuery) return NULL if the query has been aborted (indeed, the DBConnection is the only one that can reliably know that fact). JDBCConnection has been adapted consequently.
This abortion mechanism has also been improved so that working during the UPLOAD of a user table.
See on GitHub 714e93f... and 6ecd724....
When a query was executed, the opened transaction (only when a fetch size is set) was never closed.
See on GitHub 60eb214..., bd62184... and 5ac8f1f....
Without this information, it was impossible to resolve columns making reference to sub-queries of the clause FROM. See the JUnit test case for a concrete example.
foreignKey
s in /tables
...
The order of its children was incorrect according to the XSD schema: the 'fkColumn' must be written before 'description' and 'utype'.
TAP_SCHEMA.key_columns
...
Only if TAP_SCHEMA.keys
and especially TAP_SCHEMA.key_columns
were not empty, JDBCConnection failed
(with or without an error) to fetch the foreign keys. Thanks to this correction,
this should not be the case any more.
Several minor corrections about unsupported or incorrect formatting in VOTable but also for the other formats:
This was particularly annoying for numeric functions like sqrt(...)
(example query: SELECT TOP 1 sqrt(2) as s2 FROM whatever_table
).
It happens that the JDBC driver returned an Integer instead of a Short for a Postgres SMALLINT column. This is not a problem for all formats because of implicit cast, except for FITS which requires an exact correct type. This commit ensure that columns declared as SMALLINT in the metadata are always casted as Short.
If a BOOLEAN database column is encountered, its datatype will be considered as SMALLINT (because TAP 1.0 does not support BOOLEAN) and its values will be converted into 0 for FALSE and 1 for TRUE. This last part was missing in the TAP library before this correction.
See on GitHub c5cba4b... and 68666cc....
Now, Date and Time are converted into a string with the following resp.
formats: yyyy-MM-dd
and HH:mm:ss
.
The type of values tagged as UNKNOWN are resolved at the very end of a TAP query thanks to a ResultSet.
Several corrections about the parsing and interpretation of the configuration file:
init(...)
function of TAP and all its resources
Once the ADQL query is translated into SQL, it was logged only when its execution in the database finished. However, if the database returned an error, there were no way to debug the SQL query which causes this error. For this reason, the SQL query is now logged before its execution.
In order to improve the log usage, the ID of a job is set to the ID of the HTTP request triggering the creation of this job. Thus, you can search this particular HTTP request with the job ID.
When the log file indicated that a table was uploaded in the database (using the TAP_UPLOAD feature), there was no mention about the size of this table. Now, the total number of uploaded rows is written.
Give the error for a bug report in the log file when a synchronous job as been interrupted not by a TimeOut (cf executionDuration) but by an IOException.
This second version of the library finishes finally the TAP Library in a state much more stable. However, the price of this stability is a quite important modification of the API. A list of the main steps to follow in order to accomplish the migration from v1.0 to v2.0 is available on this page.
Below, you can find a more complete list of all modifications operated between the 2 major version of the library. Considering the amount of undergone modifications only the main one are listed below. You can anyway find all of them on the corresponding GitHub project.
A new sub-version of the ADQL Library is fixing some bugs such as bad translation of NOT BETWEEN
and
table.*
but also add some new features like a better checking of STC-S expressions
and UDFs.
A more complete list is available on the ADQL Library website.
The UWS has also undergone a lot of modifications, but most of them are internal, meaning that using the new version instead of the previous one should not imply many modification of the existing code. So, now the UWS Library fully supports HTTP multipart requests. It is also possible to rotate automatically log files. Besides XML responses are now fully validated according to W3C and the schema delivered by the IVOA. Thus, dates/times are now formatted in ISO-8601.
Those are the main modifications of the library, but you may find a more complete list on the UWS Library website.
Indeed, several classes of the library were generic types. On the contrary to the expected effect, this caused an unflexible
implementation by the users of the library. It was particularly true for output formats which had to be always implemented.
Now, all generic types have been removed. Consequently the output formats provided by the library can be used as provided, and
ServiceConnection and TAPFactory
do no need to be typed with a ResultSet
(or other, depending of your implementation).
Details and example on the migration page.
A new and simple way to create a TAP service is now provided by this new version of the library: write a configuration file. This file must be a Java Properties file, or in other words an ASCII text file in which each line is a pair key-value. Using this method, there is no need to implement ServiceConnection and TAPFactory and to write an HTTP servlet.
See this Getting Started for more details
In the old version, SAVOT was used to format query results in VOTable. Now, in order to format results in more formats but without depending from too many libraries, SAVOT has been replaced by STIL. Consequently more VOTable serializations (e.g. binary, tabledata, fits) can be used, and the FITS format has been added to the list of supported formats in the TAP Library.
Until now, the only way to give the list of schemas, tables and columns to the library was manual: create the corresponding Java objects (i.e. TAPSchema, TAPTable and TAPColumn). This new library version provides 2 new ways:
TAP_SCHEMA
while using
DBConnection.getTAPSchema().
Knowing the type of columns lets checking more precisely an ADQL query. Thus, some functions work only with characters whereas others work only with numerics. However, the type checking is limited since only 3 general types are taken into account: numeric, string and geometry (e.g. region, point, polygon). The precise type is never checked in order to deal with type compatibility as well as possible.
Such expressions are now checked according to the STC restriction provided in the IVOA's TAP document (part 6. Use of STC-S in TAP (informative)).
As well as it is possible to limit the available coordinate systems, geometrical functions (e.g. BOX, POINT, COORD1, AREA, ...) may be restricted: some of them may not be implemented by the service. Using ServiceConnection.getGeometries() lets list allowed geometrical functions.
In v1.0, the only way to define a User Defined Function was to provided a modified implementation of the
ADQLQueryFactory specifiying how to deal with unknown function (typically,
ignore it, throw an error or consider it as a UDF and then return the corresponding Java representation). Now, an easier way to define
UDF is provided. Basically, all known UDFs must be listed by
ServiceConnection.getUDFs(). Then, 2 cases must be taken into account:
the function already exists in the database with the same signature (i.e. same name and same parameters list) or not. In the second case,
an extension of UserDefinedFunction must be provided and
its function translate(ADQLTranslator)
must be especially correctly implemented.
Details and example are provided in this Getting Started.
This interface has indeed totally changed. The intent is to easier the interaction between the library and the database. Thus all the DB mechanisms (such as transactions creation/commit/rollback and SQL queries) are hidden inside implementations of this interface. Besides, that lets to deal with query results in a generic manner by returning a TableIterator instead of a ResultSet which is very specific to JDBC.
Consequently to the modification of the interface, its JDBC implementation - JDBCConnection - has been re-designed in order to cover the most used DBMS: PostgreSQL, Oracle, MySQL, SQLite and JavaDB (or Derby).
With some databases it is possible to specify the number of rows to retrieve a result set at a time from the database cursor. If disabled (i.e. fetch size ≤ 0), the database will wait having all rows before giving the result set. This is the default behavior in some DBMS like PostgreSQL. However, when dealing with large data set, it means to wait a long time before getting any row. First, because there is no indication of the progression of the query execution. And then, because the cancellation may not be as immediate as expected ; generally the cancellation is really done when all rows are collected. That is particularly annoying and can be avoided if result sets are not fetched at once. Hence this new possibility in this v2.0 of the TAP Library: set the fetch size.
In v1.0, HTTP multipart requests were a kind of patch over the UWS Library. Now, in v2.0, it is fully integrated in the UWS Library by an interface - RequestParser - easy to implement properly in the TAP Library - TAPRequestParser - to deal with UPLOADs as defined by the IVOA.
As specified by the TAP protocol 1.0, uploaded tables must be formatted in VOTable. The library is also using STIL to read uploaded VOTables.