Data scientist toolbox

ByMaki Pa 2023-01-062023-01-06

Reading data from excel

Excel format is popular in analysis and data science. Pandas reads the excel data well, but sometimes you would like to specify data types. How can you do that?

Let’s assume your data looks like this

As you can see, pandas read all data as int, despite the first column should be text, the second short and the third digital 10.2. How we can force reading data in such a format.

Pandas data types

We can use command dtype with a dictionary for all or selected columns. But what are the types you can use in data science? Pandas datatypes correspond to Numpy detailed datatypes, but for most cases it is enough to use following datatypes: object, int, float, datetime, bool. As mentioned, you can use Numpy ones: object (str, np.string_, np.unicode_), for int (np.int_, np.int8, np.int16, np.int32, np.int64, np.uint8, np.uint16, np.uint32, np.uint64), and for float (np.float16, … 32, … 64),

pd.read_excel('file_name.xlsx', dtype={'Customer': str, 'Zip code': np.int32, 'Credit limit': float})

Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Customer 4 non-null object
1 Zip code 4 non-null int32
2 Credit limit 4 non-null float64
dtypes: float64(1), int32(1), object(1)
memory usage: 208.0+ bytes

We no only get data in format we want, but also optimased memory usage.

Currency display formatting

The question remains what data type for currency you should use? It is still float for internal representation, and this format displays.

Type “object” means “string”. You need to distinguish between the underlying data (e.g., the integer 1234) and its (string) representation e.g., 1,234. Pandas allows you to define custom formatters on a per-column basis. You should store integer data as integer data and define a custom formatter for it.

pd.options.display.float_format = '{:6.2f}'.format
pd.options.display.float_format = '{:,}'.format
pandas.io.formats.format.IntArrayFormatter

After that you got results

Customer Zip code Credit limit
0 00123     20345    120,000.00
1 00362     21890     10,000.00
2 00467     34020     20,000.00
3 00234     34300     30,000.00

There are 5 basic numerical types representing booleans (bool), integers (int), unsigned integers (uint) floating point (float) and complex. Those with numbers in their name indicate the bitsize of the type (i.e. how many bits are needed to represent a single value in memory). Some types, such as int and intp, have differing bitsizes, dependent on the platforms (e.g. 32-bit vs. 64-bit machines). This should be considered when interfacing with low-level code (such as C or Fortran) where the raw memory is addressed.

And that is it. Happy coding.

Data Science | Python | SQL

Product Grouping in Noisy E-commerce Datasets
ByMaki Pa 2025-12-302025-12-31

Hybrid Approaches to Product Grouping in Noisy E-commerce Datasets: A Comparative Analysis of Set-Theoretic vs. Vector Space Models. In scientific literature, this…

Read More Product Grouping in Noisy E-commerce Datasets
Data Science | SQL

15 Essential SQL Tips You Can’t Live Without
ByMaki Pa 2025-10-112025-10-11

Whether you’re optimizing performance or simplifying your queries, these SQL tips from mietwood.com will help you write cleaner, faster, and more efficient…

Read More 15 Essential SQL Tips You Can’t Live Without
Data Science | SQL

Query optimization with join condition
ByMaki Pa 2025-10-062025-10-06

Microsoft SQL Server (MS SQL Server) is a relational database management system (RDBMS) developed by Microsoft. Transact-SQL (T-SQL) is the specific dialect…

Read More Query optimization with join condition
Data Science | Erasmus

Career Path of a Business Analyst in Business and ICT
ByMaki Pa 2025-06-252025-10-01

Exploring the Career Path of a Business Analyst in Business and ICT In today’s rapidly evolving business landscape, the role of a…

Read More Career Path of a Business Analyst in Business and ICT
Customer Experience Management | Data Science

Student Profiling for your lecture
ByMaki Pa 2025-09-302025-09-30

People profiling problem can be approached as an analysis of co-occurrence, how often lectures are chosen together and correlation, the strength and…

Read More Student Profiling for your lecture
Data Science

Fuzzy-Set Qualitative Comparative Analysis
ByMaki Pa 2025-09-282025-09-28

Qualitative comparative analysis (QCA) is an asymmetric data analysis technique that combines the logic and empirical intensity of qualitative approaches. The symmetric…

Read More Fuzzy-Set Qualitative Comparative Analysis

Reading data from excel

Pandas data types

Currency display formatting

Similar Posts