Saturday, October 18, 2008

CSV parser

Comma separated text file CSV  is a lightweight alternative to XML/KML and JSON. It is extremely simple to create and maintain by a text editor. All the spreadheets and databases can export CSV.

GPS POI files are widely available around the web in CSV format. No wonder that many people are trying to display those files on Google Maps. The code of my old page is so messy that not many people have managed to use it as a template. I am just finishing a cleaner page but actually the CSV parsing was one the main troubles.

Basic CSV is extremely simple to parse. Split the file into lines by split("\n") and split the lines by split(","). That's it, BUT. In real CSV the strings (that are separated by commas) may also contain commas if the string is quoted by double quotes. How not to split those quoted strings?

Four hours of googling and 100+ articles, discussions and snippets didn't bring a script but an idea how to do it.

First split the line by quote. Then find the odd-indexed splits by if(i%2) and replace its commas temporarily with something like "::::". Do the usual splitting by comma and after that replace the original commas back.

Four hours googling plus one hour of code writing and testing gave results. A CSV parser that seems to work perfect.

/**
 * parseCsv()
 * @return an array of GLatLng() objects
 * @author Esa 2008
 */
String.prototype.parseCsv = function(opt_options){
  var results = [];
  var opts = opt_options||{};
  var iLat = opts.lat||1;
  var iLng = opts.lng||0;
  var lines = this.split("\n");
  for (var i=0; i
    var blocks = lines[i].split('"');
    //finding commas inside quotes. Replace them with '::::'
for(var j=0;j
      if(j%2){
        blocks[j]=blocks[j].replace(/,/g,'::::');
      }
    }  //@author Esa 2008, keep this note.
    lines[i] = blocks.join("");
    var lineArray = lines[i].split(",");
    //skip empty lines
    if(lineArray.length>1){
      var lat = lineArray[iLat]*1;
      var lng = lineArray[iLng]*1;
      var point = new GLatLng(lat,lng);
      //after splitting with commas, we put hidden ones back
 for(var cell in lineArray){
        lineArray[cell] = lineArray[cell].replace(/::::/g,',');
      }
 point.textArray = lineArray;
 results.push(point);
    }
  }
  return results;
}

Note that the array of a line cells is attached as .textArray property to the GLatLng() object.

No comments: