Dataset Errata: Difference between revisions

No edit summary
Line 14: Line 14:


== Unicode Non Breaking Space ==
== Unicode Non Breaking Space ==
The following files start with Unicode Character 'ZERO WIDTH NO-BREAK SPACE' (U+FEFF)
The following files start with Unicode Character 'ZERO WIDTH NO-BREAK SPACE' (U+FEFF). This can also be the unicode byte order mark, in which case it should only appear in utf-8 encoded unicode files. It is present in 85/100 files in the ent set as well, but they all properly detect as utf-8.
*NY_01497989.txt
*NY_01497989.txt
*NY_01497992.txt
*NY_01497992.txt
150

edits