怎么对pyspark的dataframe进行随机抽样?比如无放回的随机选10%的行,或者1000行
1个回答
语法和pandas dataframe的随机抽样差不多
# withReplace, fraction, seed是参数,fraction是0到1之间的数
df_sample = df.sample(withReplacement, fraction, seed=None)
怎么对pyspark的dataframe进行随机抽样?比如无放回的随机选10%的行,或者1000行
语法和pandas dataframe的随机抽样差不多
# withReplace, fraction, seed是参数,fraction是0到1之间的数
df_sample = df.sample(withReplacement, fraction, seed=None)