Jul 29, 2015
I wasn't sure where to put this topic so I put it here since I figured it is a question that would apply to virtually any version even though I am using SQL Server 2005.
We have a vendor that sends us a fixed width text file every day that needs to be imported to our database in 3 different tables. I am trying to import all of the data to a staging table and then plan on merging/inserting select data from the staging table to the 3 tables. The file has 77 columns of data and 20,000+ records. I created an XML format file which I sampled below:
<?xml version='1.0'?>
<BCPFORMAT xmlns='http://schemas.microsoft.com/sqlserver/2004/bulkload/format' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>
<RECORD>
<FIELD xsi:type='CharFixed' LENGTH='6'/>
[Code] ....
The data file is a fixed width file with no column delimiters or row delimiters that I can tell. When I run the following insert statement I get the error below it.
BULK INSERT myStagingTable
FROM '.........myDataSource.txt'
WITH (
FORMATFILE = '.........myFormatFile.xml',
ERRORFILE = '.........errorlog.log'
);
Here is the error:
Msg 4832, Level 16, State 1, Line 1
Bulk load: An unexpected end of file was encountered in the data file.
Msg 7399, Level 16, State 1, Line 1
The OLE DB provider 'BULK' for linked server '(null)' reported an error. The provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 1
Cannot fetch a row from OLE DB provider 'BULK' for linked server '(null)'.
Does anyone have any experience with using bcp or BULK INSERT to import fixed length files. I have created a format file and tried various combinations of lengths, terminators, and so on but can't.
How to convert a text file to a fixed length file:
Here is my code attempt and the sample text file.
Input:
Desired output:
Note:
The set var
command does not store the variables.
Help GREATLY appreciated!
marc_s4 Answers
This should crack the problem.
The FOR
loop assigns the contents of each line to line
in turn. The routine :process
analyses the aline and builds each column, character-by-character until it finds a comma. It then adds a large number of spaces to the end of the line - the 'x' will be lopped off in the next statement and it's a litle dificult to show trailing spaces on lines (and some editors throw them away unprompted)
The CALL SET
line uses a parser characteristic. Suppose result
is currently abcd
and column
is 123 ..manyspaces.. x
and %1 is 5. The parser processes the line by replacing any %var% by its current value, then CALLs it, so what is CALLed is
without the +s of course - they're to show how the parser sees the code.
So that neatly lops off the unwanted trailing spaces and the x in column
; the SHIFT removes the first element from the parameter list provided to :process
and the next column is processed until all have been done and there are no parameters left. Write out the result
, and we're done...
Here's a one-liner to solve all your problems. :)
If you're putting this into a batch script, be sure to replace %
with %%
in that line.
Contents of answer.txt:
A feeble explanation:
Someone more comfortable with powershell can probably explain this better than I, as it took a bunch of trial and error for me to compose that line. But basically, as I understand it, it means as follows:
Notepad++ Fixed Length File
- Send the output of a powershell command to
answer.txt
- Read
comma3.txt
- The
%
is shorthand forfor each line
- Return a formatted line similar to
printf '%-10s %-14s %-19s etc.'
usingline.split(',')
for the string arguments
There's probably a Write-Content
command for powershell, but it was easier just to have the command console redirect the output of powershell. If you're working with very large csv files and this command works too slowly, Write-Content
might be worth investigating.
Expanding on Rojos answer, I created an excel macro that used column widths on line 1, exported the file as csv, the called powershell to convert to space delimited like so:
'Change to correct drive and directory:
'remove ' marks that excel may insert
'Loop through cells on first row of spreadsheet to create powershell command with correct column widths
Delete the rows with column widths so that they are not exported:
Find and replace any commas and tabs as these may cause issues:
Variable Length
Save the workbook as CSV:
Pass all the commands to cmd
This may be a convenient solution if you import your csv to excel.