The question:
I am new in Android and I’m working on a query in SQLite.
My problem is that when I use accent in strings e.g.
- ÁÁÁ
- ááá
- ÀÀÀ
- ààà
- aaa
- AAA
If I do:
SELECT * FROM TB_MOVIE WHERE MOVIE_NAME LIKE '%a%' ORDER BY MOVIE_NAME;
It’s return:
- AAA
- aaa (It’s ignoring the others)
But if I do:
SELECT * FROM TB_MOVIE WHERE MOVIE_NAME LIKE '%à%' ORDER BY MOVIE_NAME;
It’s return:
- ààà (ignoring the title “ÀÀÀ”)
I want to select strings in a SQLite DB without caring for the accents and the case. Please help.
The Solutions:
Below are the methods you can try. The first solution is probably the best. Try others if the first one doesn’t work. Senior developers aren’t just copying/pasting – they read the methods carefully & apply them wisely to each case.
Method 1
Generally, string comparisons in SQL are controlled by column or expression COLLATE
rules. In Android, only three collation sequences are pre-defined: BINARY (default), LOCALIZED and UNICODE. None of them is ideal for your use case, and the C API for installing new collation functions is unfortunately not exposed in the Java API.
To work around this:
- Add another column to your table, for example
MOVIE_NAME_ASCII
-
Store values into this column with the accent marks removed. You can remove accents by normalizing your strings to Unicode Normal Form D (NFD) and removing non-ASCII code points since NFD represents accented characters roughly as plain ASCII + combining accent markers:
String asciiName = Normalizer.normalize(unicodeName, Normalizer.Form.NFD) .replaceAll("[^\p{ASCII}]", "");
-
Do your text searches on this ASCII-normalized column but display data from the original unicode column.
Method 2
In Android sqlite, LIKE
and GLOB
ignore both COLLATE LOCALIZED
and COLLATE UNICODE
(they only work for ORDER BY
). However, there is a solution without having to add extra columns to your table. As @asat explains in this answer, you can use GLOB
with a pattern that will replace each letter with all the available alternatives of that letter. In Java:
public static String addTildeOptions(String searchText) {
return searchText.toLowerCase()
.replaceAll("[aáàäâã]", "\[aáàäâã\]")
.replaceAll("[eéèëê]", "\[eéèëê\]")
.replaceAll("[iíìî]", "\[iíìî\]")
.replaceAll("[oóòöôõ]", "\[oóòöôõ\]")
.replaceAll("[uúùüû]", "\[uúùüû\]")
.replace("*", "[*]")
.replace("?", "[?]");
}
And then (not literally like this, of course):
SELECT * from table WHERE lower(column) GLOB "*addTildeOptions(searchText)*"
This way, for example in Spanish, a user searching for either mas or más will get the search converted into m[aáàäâã]s, returning both results.
It is important to notice that GLOB
ignores COLLATE NOCASE
, that’s why I converted everything to lower case both in the function and in the query. Notice also that the lower()
function in sqlite doesn’t work on non-ASCII characters – but again those are probably the ones that you are already replacing!
The function also replaces both GLOB
wildcards, *
and ?
, with “escaped” versions.
Method 3
You can use Android NDK to recompile the SQLite source including the desired ICU (International Components for Unicode).
Explained in russian here:
http://habrahabr.ru/post/122408/
The process of compiling the SQLilte with source with ICU explained here:
How to compile sqlite with ICU?
Unfortunately you will end up with different APKs for different CPUs.
Method 4
You need to look at these, not as accented characters, but as entirely different characters. You might as well be looking for a, b, or c. That being said, I would try using a regex for it. It would look something like:
SELECT * from TB_MOVIE WHERE MOVIE_NAME REGEXP '.*[aAàÀ].*' ORDER BY MOVIE_NAME;
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0