After diving into a little PowerShell scripting, I found myself automating the creation of a bunch of CSV files! (Go me.)
But after merging similar CSV files together, I realized I had no way to identify where a specific row originated from.
Luckily, the CSV filename contained a unique identifier that I could use to note the data source.
So how do we add a new column to an existing CSV file, then populate that column with the name of the CSV file itself?
I learned that it was possible to do this in PowerShell:
Get-ChildItem *.csv | ForEach-Object { $CSV = Import-CSV -Path $_.FullName -Delimiter "," $FileName = $_.Name $CSV | Select-Object *,@{N='Filename';E={$FileName}} | Export-CSV $_.FullName -NTI -Delimiter "," }
However, when you are working with many large CSV files, this could potentially take more time than you are willing to wait.
I am assuming that the ‘Import-CSV’ function requires the entire contents of a CSV file to be loaded before processing and exporting.
So I finally gave up after half a day of researching and decided to write a C# application.
But why?!?
-You
Based on my assumption, and no obvious way around the slowness in PowerShell (please leave a comment to enlighten me), it made more sense to program the work myself and have control over the stream of information in the CSV file. This way, I could process and export a single row without having to wait for the entire file to be loaded into memory.
Here is the C# code…
(Not thoroughly tested, but it will definitely work when given true CSV files as arguments).
static void Main(string[] args) { // Determine if all files are indeed CSVs and we should do work bool doWork = false; if (args.Length > 0) // At least one argument was given { foreach (string arg in args) { // Look at the extension ending of the file and check if it is a CSV string[] type = arg.Split("."); if (type[type.Length - 1].ToLower() == "csv") { // We are still good doWork = true; } else { // This isn't a CSV, bail. Console.WriteLine("Invalid file type in argument."); doWork = false; break; // No need to keep checking the other arguments } } } else // No arguments given { Console.WriteLine("No arguments"); } if (doWork == true) // All files are indeed CSVs { foreach (string arg in args) { string sourceFile = arg; // Save the source file name. string destinationFile = arg + "updated" + ".csv"; // New file will be <file_name>.csvupdated.csv // Using statement to automatically close the reading file stream. using (StreamReader reader = new StreamReader(sourceFile)) { // Using statement to automatically close the writing file stream. using (StreamWriter writer = new StreamWriter(destinationFile)) { string lineIn = ""; // Where we will store the current line read bool isHeader = true; // Flag for knowing if the CSV header is the current line while ((lineIn = reader.ReadLine()) != null) // As long as a line is being read { if (isHeader == true) // If this is the first read / header row { writer.WriteLine(lineIn + "," + "Filename"); // Append the final column header "Filename" to the first row isHeader = false; } else // This is just another row { writer.WriteLine(lineIn + "," + sourceFile); // Append the file name as the last column } } } } } } }
I am always open to feedback, and definitely reach out if you notice a bug.
Skipping past the (probably incomplete) work in the first conditional clause to validate the argument as a CSV file, my approach was to first open the original file with a StreamReader object along with creating the new file with a StreamWriter object. Then for each line (or row) read, append the new data (or column) at the end and write the updated string to the new file. The boolean flag ‘isHeader’ is set to false after the first row is read to account for the new column header which we name “Filename”. I also knew that a comma was the delimiter.
This was a simple C# console application that produced an executable which I could also call from a PowerShell script. This application performed the same work that was first revealed with only PowerShell commands but at a much faster rate.