Database erros due to UTF-8 filenames


Sebert, Holger.ext
 

Hi,

I've setup Toaster and a MySQL docker container, all running on Ubuntu 16.04.
I am encountering the following database error, when building my Yocto project:

ERROR: (1366, "Incorrect string value: '\\xC5\\x91tan\\xC3...' for column 'path' at row 1")
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python3.7/dist-packages/django/db/backends/mysql/base.py", line 71, in execute
return self.cursor.execute(query, args)
File "/usr/local/lib/python3.7/dist-packages/MySQLdb/cursors.py", line 206, in execute
res = self._query(query)
File "/usr/local/lib/python3.7/dist-packages/MySQLdb/cursors.py", line 319, in _query
db.query(q)
File "/usr/local/lib/python3.7/dist-packages/MySQLdb/connections.py", line 260, in query
_mysql.connection.query(self, query)
MySQLdb._exceptions.OperationalError: (1366, "Incorrect string value: '\\xC5\\x91tan\\xC3...' for column 'path' at row 1")

The query that raised this error looks as follows:

INSERT INTO `orm_target_file`
(`target_id`, `path`, `size`, `inodetype`, `permission`,
`owner`, `group`, `directory_id`, `sym_target_id`)
VALUES (19,
'/usr/share/ca-certificates/mozilla/NetLock_Arany_=Class_Gold=_F\xc5\x91tan\xc3\xbas\xc3\xadtv\xc3\xa1ny.crt',
1476, 1, 'rw-r--r--', 'root', 'root', NULL, NULL)

The file causing this error has the following UTF-8 encoded filename:

NetLock_Arany_=Class_Gold=_Főtanúsítvány.crt

When looking into the database I found out that the column `path` of table
`orm_target_file` has the following properties:

CHARACTER_SET_NAME: latin1
COLLATION_NAME: latin1_swedish_ci

Apperently, the column `path` is not ready for UTF-8 strings. I can fix that
manually by doing the following mysql command using the `mysql` tool:

ALTER TABLE orm_target_file
CONVERT TO CHARACTER SET utf8
COLLATE utf8_general_ci;

This change makes the database error disappear.

I would like to fix that directly in Toasters's `orm/models.py`. I found the
following definition in class `Target_File`:

path = models.FilePathField()

It seems like I need to pass some clever options to `FilePathField`, but which?
My own research in that direction has brought up nothing useful so far.

My questions are thus:

* How can I parametrize `FilePathField` to properly handle UTF-8 encoded
filenames in the underlying database?

* How should a correspondig migration file look like in `orm/migrations`?

Thanks!

Best,
Holger


Reyna, David
 

Hi Holger,

This is an interesting problem. I will investigate.

We should see if there are any other localization fields that might have to support UTF-8 strings. Certainly all local path names will need to be supported.

I am also curious on how the local time zone support is working for you.

David

-----Original Message-----
From: toaster@lists.yoctoproject.org <toaster@lists.yoctoproject.org> On Behalf Of Sebert, Holger.ext
Sent: Monday, November 16, 2020 4:57 AM
To: toaster@lists.yoctoproject.org
Subject: [Toaster] Database erros due to UTF-8 filenames

Hi,

I've setup Toaster and a MySQL docker container, all running on Ubuntu 16.04.
I am encountering the following database error, when building my Yocto project:

ERROR: (1366, "Incorrect string value: '\\xC5\\x91tan\\xC3...' for column 'path' at row 1")
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python3.7/dist-packages/django/db/backends/mysql/base.py", line 71, in execute
return self.cursor.execute(query, args)
File "/usr/local/lib/python3.7/dist-packages/MySQLdb/cursors.py", line 206, in execute
res = self._query(query)
File "/usr/local/lib/python3.7/dist-packages/MySQLdb/cursors.py", line 319, in _query
db.query(q)
File "/usr/local/lib/python3.7/dist-packages/MySQLdb/connections.py", line 260, in query
_mysql.connection.query(self, query)
MySQLdb._exceptions.OperationalError: (1366, "Incorrect string value: '\\xC5\\x91tan\\xC3...' for column 'path' at row 1")

The query that raised this error looks as follows:

INSERT INTO `orm_target_file`
(`target_id`, `path`, `size`, `inodetype`, `permission`,
`owner`, `group`, `directory_id`, `sym_target_id`)
VALUES (19,
'/usr/share/ca-certificates/mozilla/NetLock_Arany_=Class_Gold=_F\xc5\x91tan\xc3\xbas\xc3\xadtv\xc3\xa1ny.crt',
1476, 1, 'rw-r--r--', 'root', 'root', NULL, NULL)

The file causing this error has the following UTF-8 encoded filename:

NetLock_Arany_=Class_Gold=_Főtanúsítvány.crt

When looking into the database I found out that the column `path` of table
`orm_target_file` has the following properties:

CHARACTER_SET_NAME: latin1
COLLATION_NAME: latin1_swedish_ci

Apperently, the column `path` is not ready for UTF-8 strings. I can fix that
manually by doing the following mysql command using the `mysql` tool:

ALTER TABLE orm_target_file
CONVERT TO CHARACTER SET utf8
COLLATE utf8_general_ci;

This change makes the database error disappear.

I would like to fix that directly in Toasters's `orm/models.py`. I found the
following definition in class `Target_File`:

path = models.FilePathField()

It seems like I need to pass some clever options to `FilePathField`, but which?
My own research in that direction has brought up nothing useful so far.

My questions are thus:

* How can I parametrize `FilePathField` to properly handle UTF-8 encoded
filenames in the underlying database?

* How should a correspondig migration file look like in `orm/migrations`?

Thanks!

Best,
Holger


Sebert, Holger.ext
 

Hi David,

as far as I can tell, Toaster doesn't set charset and collation by itself, but uses
the defaults of the server.

The problem can be solved by passing adequate parameters when starting up
the MySQL server, like so:

docker run -dit --network host --name running-toaster-db toaster-db --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci

If this is the right solution, maybe we can put this somewhere in the documentation?

Best,
Holger
________________________________________
Von: Reyna, David <david.reyna@windriver.com>
Gesendet: Montag, 16. November 2020 14:17:53
An: Sebert, Holger.ext; toaster@lists.yoctoproject.org
Betreff: RE: Database erros due to UTF-8 filenames

Hi Holger,

This is an interesting problem. I will investigate.

We should see if there are any other localization fields that might have to support UTF-8 strings. Certainly all local path names will need to be supported.

I am also curious on how the local time zone support is working for you.

David

-----Original Message-----
From: toaster@lists.yoctoproject.org <toaster@lists.yoctoproject.org> On Behalf Of Sebert, Holger.ext
Sent: Monday, November 16, 2020 4:57 AM
To: toaster@lists.yoctoproject.org
Subject: [Toaster] Database erros due to UTF-8 filenames

Hi,

I've setup Toaster and a MySQL docker container, all running on Ubuntu 16.04.
I am encountering the following database error, when building my Yocto project:

ERROR: (1366, "Incorrect string value: '\\xC5\\x91tan\\xC3...' for column 'path' at row 1")
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python3.7/dist-packages/django/db/backends/mysql/base.py", line 71, in execute
return self.cursor.execute(query, args)
File "/usr/local/lib/python3.7/dist-packages/MySQLdb/cursors.py", line 206, in execute
res = self._query(query)
File "/usr/local/lib/python3.7/dist-packages/MySQLdb/cursors.py", line 319, in _query
db.query(q)
File "/usr/local/lib/python3.7/dist-packages/MySQLdb/connections.py", line 260, in query
_mysql.connection.query(self, query)
MySQLdb._exceptions.OperationalError: (1366, "Incorrect string value: '\\xC5\\x91tan\\xC3...' for column 'path' at row 1")

The query that raised this error looks as follows:

INSERT INTO `orm_target_file`
(`target_id`, `path`, `size`, `inodetype`, `permission`,
`owner`, `group`, `directory_id`, `sym_target_id`)
VALUES (19,
'/usr/share/ca-certificates/mozilla/NetLock_Arany_=Class_Gold=_F\xc5\x91tan\xc3\xbas\xc3\xadtv\xc3\xa1ny.crt',
1476, 1, 'rw-r--r--', 'root', 'root', NULL, NULL)

The file causing this error has the following UTF-8 encoded filename:

NetLock_Arany_=Class_Gold=_Főtanúsítvány.crt

When looking into the database I found out that the column `path` of table
`orm_target_file` has the following properties:

CHARACTER_SET_NAME: latin1
COLLATION_NAME: latin1_swedish_ci

Apperently, the column `path` is not ready for UTF-8 strings. I can fix that
manually by doing the following mysql command using the `mysql` tool:

ALTER TABLE orm_target_file
CONVERT TO CHARACTER SET utf8
COLLATE utf8_general_ci;

This change makes the database error disappear.

I would like to fix that directly in Toasters's `orm/models.py`. I found the
following definition in class `Target_File`:

path = models.FilePathField()

It seems like I need to pass some clever options to `FilePathField`, but which?
My own research in that direction has brought up nothing useful so far.

My questions are thus:

* How can I parametrize `FilePathField` to properly handle UTF-8 encoded
filenames in the underlying database?

* How should a correspondig migration file look like in `orm/migrations`?

Thanks!

Best,
Holger