Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am bit new to dask. I have large csv file and large list. Length of row of csv are equal to length of the list. I am trying to create a new column in the Dask dataframe from a list. In pandas, it pretty straight forward, however in Dask I am having hard time creating new column for it. I am avoiding to use pandas because my data is 15GB+.

Please see my tries below.

csv Data

name,text,address
john,some text here,MD
tim,some text here too,WA

Code tried

import dask.dataframe as dd
import numpy as np

ls = ['one','two']

ddf = dd.read_csv('../data/test.csv')
ddf.head()

Try #1: 
ddf['new'] = ls # TypeError: Column assignment doesn't support type list

Try #2: What should be passed here for condlist?
ddf['new'] = np.select(choicelist=ls) # TypeError: _select_dispatcher() missing 1 required positional argument: 'condlist'

Looking for this output:

   name                text address new
0  john      some text here      MD one
1   tim  some text here too      WA two

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
4.3k views
Welcome To Ask or Share your Answers For Others

1 Answer

Try creating a dask dataframe and then appending it like this -

ls = dd.from_array(np.array(['one','two']))
ddf['new'] = ls

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share

548k questions

547k answers

4 comments

86.3k users

...