The list of all modifications done recently on the library.
This new function - ADQLParser.tryQuickFix(...) - fixes the most common issues with ADQL queries:
public
,year
,date
),distance
,
min
,avg
),_RAJ2000
,2mass
).
The last point is far from being perfect but should work at least for
identifiers starting with a digit or an underscore, or an identifier including
one of the following character:?
,!
,$
,@
,#
,{
,}
,[
,]
,
~
,^
and `
.
It should also been noted that double-quoting a column/table name will make it case-sensitive. Then, it is possible that the query does not pass even after the double-quote operation ; the case would have to be checked by the user.
Finally, there is no attempt to fix column and table names (i.e. case sensitivity and/or typos) using tables/columns list/metadata. That could be a possible evolution of this function or an additional feature to implement in the parser.
For instance, here is an ADQL query that any user would want to run but whose the
parsing will immediately fail because of the starting _
,
the distance
(which is reserved to an ADQL function) and
public
(which is a reserved SQL word):
SELECT id, _raj2000, _dej2000, distance FROM public.myTable
Applying the function tryQuickFix(...)
will produce
the following query:
SELECT id, "_raj2000", "_dej2000", "distance" FROM "public".myTable
This query should now run (if the case is correct for the column and schema names...but that is not tested by this function ; the user will still have to check that by himself).
Since commit 3d96c9d908624f09e64eae5cf18e72fafbefaf80 aliases put on a table without double quotes are put in lower case and then double quoted. This modification was not effective for ORDER BY's column identifiers.
See on GitHub
In Oracle and Postgre, the concatenation is performed
with the operator ||
, but MySQL uses
instead the function CONCAT(...)
while
MS-SQLServer uses the operator +
.
See on GitHub
See on GitHub
See on GitHub: 005fc62... and 42cb5bb...
This new translator is named: SQLServerTranslator. Because the management of JOINs is a bit different in SQL Server, the ADQLQueryFactory and the object representation of the INNER and OUTER JOIN have been extended.
See on GitHub for its creation, but further modifications have been done on SQLServerTranslator (look at its history to get all related commits)
In MySQL, a schema is not part of a database but is
a database. For that reason, JDBCConnection
can not check
the existence of every declared schemas ; it will always considered they
exist. If not, an error from the database will be thrown.
Additionally, an appropriate ADQLTranslator
has been
created: MySQLTranslator
.
J2000
as valid STC-S's frame
...
The ADQL parser has been slightly changed in order to provide a more precise location of the REAL wrong part of the query when an error is thrown.
Before this fix, if an ADQL or SQL reserved word (e.g. 'point') was encountered outside of its normal syntax (e.g. 'point' not followed by an opening parenthesis), the next token was highlighted instead of this one. Hence a confusing error message.
For instance, the following ADQL query:
SELECT point FROM aTable
returned the following error message:
Encountered "FROM". Was expecting: "("
Now, it will return the following one:
Encountered "point". Was expecting one of: "*""TOP" [...] (HINT: "point" is a reserved ADQL word. To use it as a column/table/schema name/alias, write it between double quotes.)
This error message highlights exactly the source of the problem and even provide to the user a clear explanation of why the query did not parse and how it could be solved.
See on GitHub: db0dfda... and fe4c3e9...
TokenMgrError
now wrapped inside ParseException
...
A TokenMgrError
was thrown when an unexpected
character was encountered by the parser. This is the default behavior
of the parser generated by JavaCC.
However, as raised by Zarquan
and bourgesl in the GitHub issue
#17, in the Java
language, throwing an Error
should be reserved for very serious errors. That's why, any TokenMgrError
thrown by the parser is immediately caught and wrapped inside a
ParseException
with the appropriate position in the query.
NATURAL JOIN
or JOIN...USING
in first position
...
Some parsing exceptions (especially for functions or JOIN) did not specify the position of the error in the given ADQL query. It was also the case of several ADQL tree objects such as: BETWEEN, IN, EXISTS, operations, ... Now, the position is always given when possible.
See on GitHub: 163ec17... and 118cd72...
NumericConstant
and SearchTableList
...
See on GitHub: c142215... and 1a3bd2b...
UNKNOWN
...
This datatype is set automatically to a column (DBColumn)
or a function's return type (FunctionDef) when the type
can not be determined. For a column or function with this datatype the function
isUnknown() will return true
.
On the contrary, all the following functions will return false
:
isNumeric(),
isBinary(),
isString() and
isGeometry().
true
:
isNumeric,
isString and
isGeometry.
Besides, this class does not have an isUnknown()
function.
See on GitHub: 37e26a1... and 271e03c...
When translating into SQL, if a column was declared with no alias and was not delimited (i.e. between double quotes), the translator set an alias with the column name...but in the same case. Depending on the used database and its configuration, the different case between the alias and its references would throw an error.
For instance, the following ADQL query:
SELECT t.* FROM (SELECT (ra+ra_error) AS x, pmra AS "ProperMotion", (dec+dec_error) AS Y FROM mytable) AS t
triggered the following error with PostgreSQL:
Caused by a org.postgresql.util.PSQLException at [...] ERROR: column t.Y does not exist
because its translation in SQL was:
SELECT "t"."x", "t"."ProperMotion", "t"."Y" FROM (SELECT ("public"."mytable"."ra"+"public"."mytable"."ra_error") AS x, "public"."mytable"."pmra" AS "ProperMotion", ("public"."mytable"."dec"+"public"."mytable"."dec_error") AS Y FROM "public"."mytable") AS "t"
To avoid this error, automatic aliases are always delimited and written in lower-case. Which would now look like below:
SELECT "t"."x" AS "x", "t"."ProperMotion" AS "ProperMotion", "t"."y" AS "y" FROM (SELECT ("public"."table2"."ra"+"public"."table2"."ra_error") AS "x", "public"."table2"."pmra" AS "ProperMotion", ("public"."table2"."dec"+"public"."table2"."dec_error") AS "y" FROM "public"."table2") AS "t"
ORDER
/GROUP
and BY
...
Before this correction, ORDER BY
and GROUP BY
had to be written with
only one space (i.e. ' '). Now, it is perfectly allowed to write more than one space
(and even any space character such as a tab or a new line) between ORDER
/GROUP
and BY
.
The idea is to get rid of special characters such as '-', '+' , '(', ... in the columns name of the output table.
This infinite loop occured only when the replacement object is just a wrapping of the matching object ; after replacement, the new object was inspected for matching objects.
Example: infinite loop if we want to wrap all foo(...) functions with the function ROUND in the following query:
SELECT foo(foo(123)) FROM myTable
Expected result:
SELECT ROUND(foo(ROUND(foo(123)))) FROM myTable
But generated result was:
SELECT ROUND(ROUND(ROUND(......foo(foo(123))))) FROM myTable
Between double quotes an alias could contain any character, but it seems that the dot (.) was interpreted anyway (as table/column separator) before this fix.
CENTROID
returns a geometry not a numeric
...
In addition to this bug correction, a translation of CENTROID
using PgSphere
has been implemented.
See on GitHub: 6106390... and 3306dec...
NATURAL JOIN
s
...
The "normal" JOIN
(e.g. A JOIN B ON A.id = B.id JOIN C ON B.id = C.id
)
is correctly interpreted (e.g. ( (A JOIN B ON A.id = B.id) JOIN C ON B.id = C.id )
).
But with a NATURAL JOIN
, an ADQL instruction like
A NATURAL JOIN B NATURAL JOIN C
gives ( A NATURAL JOIN (B NATURAL JOIN C) )
instead of ( (A NATURAL JOIN B) NATURAL JOIN C )
.
DOUBLE and REAL parameters of mathematical functions must be casted into NUMERIC. Otherwise Postgres rejects the query.
A string constant in ADQL is enclosed between single quotes. But is is also possible to have single quotes being part of the string constant, by doubling these single quotes, as show below:
SELECT 'foo''s bar' FROM myTable
The result of a such query should return the string foo's bar
.
But the library did not do that until this commit.
The same bug but about double quotes is also fixed.
Since the version 1.3, the DEBUG mode was disabled by default, but it was surprisingly impossible to enable it again. Now, the DEBUG mode is still disabled by default as promessed in version 1.3, but it can be enabled using the function setDebug(boolean).
Due to this omission, it was impossible to make any reference to any column returned by this sub-query in the parent query. (see the JUnit test for a concrete example)
AS
for SELECT item aliases is optional
...
Before this correction the alias was optional but not the keyword AS
.
Now, if the alias is provided, it may be specified with OR without AS
.
For instance: if all mathematical functions must be replaced by a dumb UDF named 'foo' in the ADQL query:
SELECT sqrt(abs(81)) FROM myTable
the result should be:
SELECT foo(foo(81)) FROM myTable
but before this correction it was:
SELECT foo(abs(81)) FROM myTable
Some functions like PI()
and
RAND()
were affected by this bug.
LOG()
,
LOG10()
) were not casted into NUMERIC.
Without this CAST, PostgreSQL rejects quite often the query.
ORDER BY
and GROUP BY
...
a. In ORDER BY
, GROUP BY
and USING
only regular and delimited identifiers are accepted, not qualified column names.
For instance:
SELECT table.column_name FROM table ORDER BY table.column_name
is wrong. We should instead write:
SELECT table.column_name FROM table ORDER BY column_name
or
SELECT table.column_name AS mycol FROM table ORDER BY mycol
b. No SELECT item index in GROUP BY
According to the ADQL grammar, a GROUP BY item must be a column name or an alias. Before this correction, the index of a SELECT item could have been provided as it is possible for ORDER BY. This is no longer possible. The ADQL parser will accept only column name or alias in a GROUP BY clause.
c. But a qualified identifier in GROUP BY is allowed
For instance, the following ADQL query is correct:
SELECT * FROM table GROUP BY table.oid
See on GitHub: 7a70c60... and 8e2fa9f...
Before this correction, it was possible to write the following ADQL query:
SELECT * FROM fakeSchema.myTable
...although the table myTable
was defined in the tables metadata (e.g. TAP_SCHEMA)
with no schema. Now, it is no longer possible: if a table is defined no schema as myTable
it MUST NOT be prefixed by any schema name. But as before, if a table is defined in
the tables metadata (e.g. TAP_SCHEMA) with a schema, this latter MAY prefix the table name in an ADQL query.
Functions whose some parameters are another function were not correctly identified: since the inner functions were not yet identified, their type was UNKNOWN and so it makes the identification of the parent function much easier since an UNKNOWN parameter is not checked. But, it was a problem if the parameter occurs to be finally of the wrong type.
Unknown datatypes in the declaration of User Defined Functions were reported with a confusing
error message. Now, the error message specifies such datatype as param<i>
(<i>
being the index of the parameter).
Besides types having space like double precision
were not supported: they throw
an "unknown datatype" error. Now, more database datatypes and in particular double precision
and character varying
are supported.
"aSchema"."aTable.WithADot"
...
This qualified table name was interpreted similarly as the table "aSchema"."WithADot"
, or
in other words, the table WithADot
inside the schema aSchema
.
Now, it will really be interpreted as a table named aTable.WithADot
inside the schema
aSchema
.
This sub-version fixes also few (but quite important) bugs and adds a better support for STC-S expressions and UDFs.
STC-S expressions like a region definition or a coordinate system, as restricted by the TAP protocol (see 6. Use of STC-S in TAP (informative)) are now checked by the library. A syntax mistake will be notified with a ParserException.
DBChecker has been modified in order to be also able to check whether the used
geometrical functions, coordinate systems and User Defined Functions are allowed. For that, a list of allowed items may be provided
in the constructor. For geometries, each item of the list must be a function name (e.g. AREA, COORDSYS, BOX, CIRCLE). For coordinate
system, a kind of regular expression (e.g. ICRS * *
) is expected for each item. But for UDFs, each item must be an
instance of FunctionDef in which a
UserDefinedFunction may be provided in order to
specify how to translate the function.
Warning: The possibility to list coordinate systems directly in the parser has been removed. Now, verification of coordinate systems must be done afterwards using a QueryChecker (see A.3. Checker to see how).
Actually, it is not really about a new translator. JDBCTranslator is an abstract class between the interface ADQLTranslator and the concrete translator PostgreSQLTranslator. The idea is to use this abstract class for any other translator into SQL language. Indeed, the translation into SQL is very similar in all DBMS. So in this abstract translator all common instructions in ADQL and SQL are already translated ; only functions related to geometrical functions and column types are left for implementation. Thus, a huge work can be avoided while willing to support a new SQL-based translator.
This modification has no impact at all if you are already using PostgreSQLTranslator and/or PgSphereTranslator.
NOT BETWEEN
...
Before this fix, NOT BETWEEN
were translated by
PostgreSQLTranslator into BETWEEN
. So, the negation
was forgotten. This grave mistake is now fixed!
SELECT table.*
...
table.*
was translated as provided...which is a grave mistake in case the TAP table content is different
from the database table. This should be translated by replacing this "joker" by the full list of columns as described in the TAP
metadata. It is actually the behavior of the general "joker" *
.
A string like 'foo\'bar\''
contains twice the same escaped character: '
. However, this escaped string
is not interpreted as one by Postgres by default. To specify a string may contain escaped characters, it must be prefixed by
E
(in upper- or lower-case). The library was not doing so before this version 1.3 which is now prefixing string
constants only when at least one \ is detected: E'foo\'bar\''
.
When no alias is specified for a SELECT item, the library is adding one automatically. Thus, the alias of a column is the name of the column and the alias of a function is the function name. For a concatenation or mathematical operations, the sign of the operation is used as alias. However, before this new sub-version, double quotes were not surrounding those aliases. That is a problem for mathematical operations which were then interpreted as a real operation rather than just an alias: a SQL error were then returned by the DBMS.
This bug is now fixed: automatic aliases are always surrounded by double quotes.
The DEBUG mode of the parser was enabled by default...meaning that all grammar rules were listed when parsing a query. This mode is still possible but now disabled by default.
This sub-version is just a correction of several bugs often encountered. Here are the major bugs fixed (the list is not exhaustive):
Parse exceptions about columns inside the clauses ORDER and GROUP BY had no information about the position in the query. This is now fixed!
Brackets in mathematical or logical operations had no effect when translating the query in SQL. The notion of priority was well read and represented in the object tree, but it was not used during the SQL translation.
For instance, the following ADQL query:
SELECT oid AS oid FROM A WHERE oid = 1 AND (colTxtB = 'toto' or colC*(colA-colB) > colD);
...was translated in SQL:
SELECT oid AS oid FROM A WHERE oid = 1 AND colTxtB = 'toto' OR colC*colA-colB > colD
In conformity with the IVOA definition, these two functions must have either 1 or 2 parameters. Before, the library was allowing just one parameter. Now, both are allowed.
The query checker was not allowing subqueries to use columns of the parent/main query.
SELECT oid, colA, colB FROM A WHERE ... AND (SELECT COUNT(*) FROM B WHERE B.oid = A.oid) > 1;
After 3-4 NATURAL joins or joins using the keyword USING, the query checker was lost and returned wrong errors. Columns which were supposed to be joined were not seen like that by the checker. Now they are!
For example: let's suppose that we have 3 tables (A, B and C) and all of them have a column with the same name: oid. The following query:
SELECT oid FROM A NATURAL JOIN B NATURAL JOIN C; -- which more or less equivalent to: SELECT oid FROM A JOIN B USING(oid) JOIN C USING(oid);
With the version 1.1, we get the following error:
((X)) 1 unresolved identifiers ! * [l.1 c.8 - l.1 c.11] - Ambiguous column name "oid" ! It may be (at least) "A.oid" or "B.oid".
When creating your own user defined function with some parameters using DefaultUDF, you may have had an ArrayIndexOutOfBoundsException. This bug is now fixed!
Before this correction, the alias of a sub-query in the clause FROM was not taken into account when checking the query with DBChecker.
For instance, the following query:
SELECT q.flux, q.filter, p.oid, p.ra, p.dec FROM (SELECT flux, filter, oidref FROM flux WHERE flux BETWEEN 0 AND 10) AS q INNER JOIN basic ON q.oidref = p.oid;
In this case, the following error was returned:
((X)) 7 unresolved identifiers ! * [l.1 c.8 - l.1 c.14] - Unknown column "q.flux" ! * [l.1 c.16 - l.1 c.24] - Unknown column "q.filter" ! * [l.1 c.26 - l.1 c.31] - Unknown column "p.oid" ! * [l.1 c.33 - l.1 c.37] - Unknown column "p.ra" ! * [l.1 c.39 - l.1 c.44] - Unknown column "p.dec" ! * [l.2 c.99 - l.2 c.107] - Unknown column "q.oidref" ! * [l.2 c.110 - l.2 c.115] - Unknown column "p.oid" ! adql.db.exception.UnresolvedIdentifiersException: 7 unresolved identifiers ! at adql.db.DBChecker.check(DBChecker.java:170) at adql.parser.ADQLParser.QueryExpression(ADQLParser.java:539) at adql.parser.ADQLParser.Query(ADQLParser.java:473) at adql.parser.ADQLParser.parseQuery(ADQLParser.java:269) at adql.demo.ADQLDemo.main(ADQLDemo.java:125)
The problem has been fixed by setting a default description of the sub-query into the
ADQLTable which describes it.
See the following correction for more details.
This method builds a DefaultDBTable object describing the table returned by the represented sub-query.