Strip UTF-8 BOM from beginning (string.strip_utf8_bom)
Declaration
out = string.strip_utf8_bom(text)
Parameters
- text
string. Text that may contain UTF-8 BOM.
Returns
- out
string. Text with BOM removed.
Description
UTF-8 BOM appears as three invisible bytes at file start: "\xEF\xBB\xBF" (In Lua string literals, "\x" followed by two hex digits denotes a single byte; e.g., "\x58" is 'X'. See ASCII reference.)
Example
txt = "\xEF\xBB\xBFXXTouch"
sys.alert(txt..', '..#txt) -- "XXTouch, 10"
--
txt = string.strip_utf8_bom(txt)
sys.alert(txt..', '..#txt) -- "XXTouch, 7"
Note: Uses function outside of this chapter: sys.alert
Tips
UTF-8 does not require a BOM, although Unicode allows it. Non-BOM UTF-8 is standard; putting BOM in UTF-8 files is mainly a Windows practice and may cause issues elsewhere.