Encoding            package:base            R Documentation(latin1)

_R_e_a_d _o_r _S_e_t _t_h_e _D_e_c_l_a_r_e_d _E_n_c_o_d_i_n_g_s _f_o_r _a _C_h_a_r_a_c_t_e_r _V_e_c_t_o_r

_D_e_s_c_r_i_p_t_i_o_n:

     Read or set the declared encodings for a character vector.

_U_s_a_g_e:

     Encoding(x)

     Encoding(x) <- value

_A_r_g_u_m_e_n_t_s:

       x: A character vector.

   value: A character vector of positive length.

_D_e_t_a_i_l_s:

     Character strings in R can be declared to be in '"latin1"' or
     '"UTF-8"'.  These declarations can be read by 'Encoding', which
     will return a character vector of values '"latin1"', '"UTF-8"' or
     '"unknown"', or set, when 'value' is recycled as needed and other
     values are silently treated as '"unknown"'.  ASCII strings will
     never be marked with a declared encoding, since their
     representation is the same in all encodings.

     There are other ways for character strings to acquire a declared
     encoding apart from explicitly setting it (and these have changed
     as R has evolved).  Functions 'scan', 'read.table', 'readLines',
     and 'parse' have an 'encoding' argument that is used to declare
     encodings, 'iconv' declares encodings from its 'from' argument,
     and console input in suitable locales is also declared. 
     'intToUtf8' declares its output as '"UTF-8"', and output text
     connections are marked if running in a suitable locale.  Under
     some circumstances (see its help page) 'source(encoding=)' will
     mark encodings of character strings it outputs.

     Most character manipulation functions will set the encoding on
     output strings if it was declared on the corresponding input. 
     These include 'chartr', 'strsplit', 'strtrim', 'tolower' and
     'toupper' as well as 'sub(useBytes = FALSE)' and 'gsub(useBytes =
     FALSE)'.  Note that such functions do not _preserve_ the encoding,
     but if they know the input encoding and that the string has been
     successfully re-encoded to the current encoding, they mark the
     output with the latter (if it is '"latin1"' or '"UTF-8"').

     'substr' does preserve the encoding, and 'chartr', 'tolower' and
     'toupper' preserve UTF-8 encoding on systems with Unicode wide
     characters.  With their 'fixed' and 'perl' options, 'strsplit',
     'sub' and 'gsub' will give a marked UTF-8 result if any of the
     inputs are UTF-8.

     'paste' and 'sprintf' return a UTF-8 marked element if any of the
     inputs to that element are UTF-8.

_V_a_l_u_e:

     A character vector.

_E_x_a_m_p_l_e_s:

     ## x is intended to be in latin1
     x <- "fa\xE7ile"
     Encoding(x)
     Encoding(x) <- "latin1"
     x
     xx <- iconv(x, "latin1", "UTF-8")
     Encoding(c(x, xx))
     c(x, xx)

