<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>pandas &#8211; Customer Experience Management</title>
	<atom:link href="https://mietwood.com/tag/pandas/feed" rel="self" type="application/rss+xml" />
	<link>https://mietwood.com</link>
	<description>Customer Experience Can Be Managed</description>
	<lastBuildDate>Fri, 06 Jan 2023 20:27:49 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://mietwood.com/wp-content/uploads/2022/09/cropped-Fav7-32x32.png</url>
	<title>pandas &#8211; Customer Experience Management</title>
	<link>https://mietwood.com</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Data scientist toolbox</title>
		<link>https://mietwood.com/data-scientist-toolbox-for-data-science</link>
		
		<dc:creator><![CDATA[Maki Pa]]></dc:creator>
		<pubDate>Fri, 06 Jan 2023 20:27:48 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[pandas]]></category>
		<category><![CDATA[python]]></category>
		<guid isPermaLink="false">https://mietwood.com/?p=1541</guid>

					<description><![CDATA[<p>Reading data from excel Excel format is popular in analysis and data science. Pandas reads the excel data well, but sometimes you would like to specify data types. How can you do that? Let&#8217;s assume your data looks like this As you can see, pandas read all data as int, despite the first column should...</p>
<p>The post <a rel="nofollow" href="https://mietwood.com/data-scientist-toolbox-for-data-science">Data scientist toolbox</a> appeared first on <a rel="nofollow" href="https://mietwood.com">Customer Experience Management</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading">Reading data from excel</h2>



<p>Excel format is popular in analysis and data science. Pandas reads the excel data well, but sometimes you would like to specify data types. How can you do that? </p>


<div class="wp-block-image">
<figure class="alignright size-full is-resized"><img decoding="async" src="https://mietwood.com/wp-content/uploads/2023/01/2023-01-06-20_09_04-Reading-excel-with-data-types.xlsx-Excel.jpg" alt="" class="wp-image-1543" width="274" height="126"/><figcaption class="wp-element-caption">Excel data science</figcaption></figure>
</div>


<p>Let&#8217;s assume your data looks like this</p>



<p>As you can see, pandas read all data as int, despite the first column should be text, the second short and the third digital 10.2. How we can force reading data in such a format.</p>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img fetchpriority="high" decoding="async" width="388" height="144" src="https://mietwood.com/wp-content/uploads/2023/01/2023-01-06-20_06_13-C__Users_prac_Documents_Programy_Brazilian-e-commerce-Spyder-Python-3.9.png" alt="" class="wp-image-1544" srcset="https://mietwood.com/wp-content/uploads/2023/01/2023-01-06-20_06_13-C__Users_prac_Documents_Programy_Brazilian-e-commerce-Spyder-Python-3.9.png 388w, https://mietwood.com/wp-content/uploads/2023/01/2023-01-06-20_06_13-C__Users_prac_Documents_Programy_Brazilian-e-commerce-Spyder-Python-3.9-300x111.png 300w" sizes="(max-width: 388px) 100vw, 388px" /><figcaption class="wp-element-caption">Pandas dataset</figcaption></figure>
</div>


<h2 class="wp-block-heading">Pandas data types</h2>



<p>We can use command dtype with a dictionary for all or selected columns. But what are the types you can use in data science? Pandas datatypes correspond to Numpy detailed datatypes, but for most cases it is enough to use following datatypes: object, int, float, datetime, bool. As mentioned, you can use <a href="https://numpy.org/doc/stable/user/basics.types.html" target="_blank" aria-label="Numpy ones (opens in a new tab)" rel="noreferrer noopener" class="ek-link">Numpy ones</a>: object (str, np.string_, np.unicode_), for int (np.int_, np.int8, np.int16, np.int32, np.int64, np.uint8, np.uint16, np.uint32, np.uint64), and for float (np.float16, &#8230; 32, &#8230; 64), </p>



<pre class="wp-block-code"><code>pd.read_excel('file_name.xlsx', dtype={'Customer': str, 'Zip code': np.int32, 'Credit limit': float})</code></pre>



<pre class="wp-block-code"><code>Data columns (total 3 columns):<br># Column Non-Null Count Dtype<br>--- ------ -------------- -----<br>0 Customer 4 non-null object<br>1 Zip code 4 non-null int32<br>2 Credit limit 4 non-null float64<br>dtypes: float64(1), int32(1), object(1)<br>memory usage: 208.0+ bytes</code></pre>



<p>We no only get data in format we want, but also optimased memory usage. </p>



<h2 class="wp-block-heading">Currency display formatting</h2>



<p>The question remains what data type for currency you should use? It is still float for internal representation, and this format displays.</p>



<p>Type &#8220;object&#8221; means &#8220;string&#8221;. You need to distinguish between the underlying data (e.g., the integer 1234) and its (string) representation e.g., 1,234. Pandas allows you to define custom formatters on a per-column basis. You should store integer data as integer data and define a custom formatter for it.</p>



<pre class="wp-block-code"><code>pd.options.display.float_format = '{:6.2f}'.format
pd.options.display.float_format = '{:,}'.format
pandas.io.formats.format.IntArrayFormatter</code></pre>



<p>After that you got results</p>



<pre class="wp-block-code"><code>Customer Zip code Credit limit
0 00123     20345    120,000.00
1 00362     21890     10,000.00
2 00467     34020     20,000.00
3 00234     34300     30,000.00</code></pre>



<p>There are 5 basic numerical types representing booleans (bool), integers (int), unsigned integers (uint) floating point (float) and complex. Those with numbers in their name indicate the bitsize of the type (i.e. how many bits are needed to represent a single value in memory). Some types, such as <code>int</code> and <code>intp</code>, have differing bitsizes, dependent on the platforms (e.g. 32-bit vs. 64-bit machines). This should be considered when interfacing with low-level code (such as C or Fortran) where the raw memory is addressed.</p>



<p>And that is it. Happy coding.</p>
<p>The post <a rel="nofollow" href="https://mietwood.com/data-scientist-toolbox-for-data-science">Data scientist toolbox</a> appeared first on <a rel="nofollow" href="https://mietwood.com">Customer Experience Management</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
