Importing Excel Files in Modeler Without Losing Data

Hi Everyone,

My organization is placing a heavy emphasis on automating and enhancing data analysis processes – and in many cases Modeler is fantastic for this. The issue is that the vast majority of input files we work with are Excel, for better or worse. My question revolves around the issues SPSS Modeler has importing Excel files. I’ve seen this same issue posted about before, but I have yet to see a solution.

The specific issue is this: when I’m working with a large Excel file, there may be instances where some data elements are blank. If I have a field that’s a string in Excel, and the first 100 rows of that are blank (since that’s when Modeler determines the type), but there is valid data after, Modeler will default to importing that entire field as a Real numerical format. The “valid” data farther down is completely lost at that point even if I tried to convert it in another node. There is no way to force it to do anything different without doing extra work outside of Modeler – like adding dummy values in Excel or converting to another format like CSV or Statistics (which isn’t always easy).

This is a huge issue for so many of the projects we work on and I can’t imagine we’re alone. I know there are workarounds outside of SPSS, but they’re a pain when you could be dealing with many files that routinely change and are always stored in Excel.

1) Is there a place to submit a change request? A fantastic solution would be to allow you to change the types at the Excel import node like you can for a CSV. However I’ll settle for “blank” fields to default to “string” instead of a numerical field (or at least provide the option). At least if it’s string you don’t lose the data and can convert later if necessary. There has to be a better way than being okay with losing data (and oftentimes being unaware you did).

2) Are there any possible workarounds using the R or Python integration? I’ve seen some R nodes that other users have created that allow you to import data (using a bit of a hack). Is there potential for that to work here with Excel? Has it been tried?

Any help here would be greatly appreciated!

Related: