Home > Uncategorized > Text file encoding

Text file encoding

Detect the encoding of a text file:

$ file all.txt 
all.txt: ISO-8859 text

Get more verbosity:

$ file --mime all.txt 
all.txt: text/plain; charset=iso-8859-1

Change the encoding of a text file:

iconv --from-code=UTF-8 --to-code=ISO-8859-2 file.txt >tmp.txt

This latter tip is from here.

Update (20110918)
You can also use “chardet” for detecting charcter encoding. Usage:

$ chardet test.txt 
test.txt: utf-8 (confidence: 0.99)

Update (20140106)
vim has also an excellent detector. If you open a file in vim and it looks good, check out what encoding is being used by vim:

:set fileencoding
  1. Rockin
    March 24, 2011 at 12:43

    Thank you for this tip !

    It’s not rare to have encoding conflicts when you work on a project with people who haven’t configured their editors in the same way.

    After few tries, it seems to work correctly :)
    It will probably help me on my future team projects !


  2. October 29, 2015 at 07:46

    Thank you for explanation !
    $chardet input.txt
    >>>input.txt: windows-1251 with confidence 0.343864991477
    works just fine for me.

  1. No trackbacks yet.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: